How Much of an Ancestor’s DNA Do You Have?

Or, more interestingly, how much will that vary and how can you easily increase the number and quality of your DNA matches?

Brit Nicholson
6 min readJun 28, 2018

The model used to produce these numbers has been updated a few times. More accurate numbers can be found here.

The DNA testing industry is booming, often as people attempt to find close relatives or ancestors whom they couldn’t previously identify. If a person knows how to use the tools available for analyzing the DNA segments they share with matches, these goals can be fairly easily achieved. However, many people would be surprised to learn how little of a piece of the puzzle they actually have within their own genome. It would be hard to understate how important it is to get other relatives to test their DNA if you really want to find answers about your ancestors.

I’ve made a model, described in detail here, that not only predicts the percentage of an ancestor’s DNA that you could reproduce, which is a trivial calculation that you can often perform in your head, but also gives the range over which that percentage could vary (with 95% confidence).

These mean percentages are well known and can be found in many places, such as in the chart below.

Chart modified from currach.johnjtierney.com. Consanguinity Chart Now with More DNA Flavor! by John J. Tierney is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

As you can see in the bottom right corner of each block, you’ll share a fairly predictable percentage of DNA with certain relatives. With parents and siblings, that will be 50% (exactly for parents). With half-siblings, uncles, aunts, nieces, nephews, and grandparents, you’ll share about 25%. The number you see in the blocks above can vary by several percentage points. And they hardly give a clue as to what percentage of an ancestor’s genome you and a relative reproduce when both of you get your DNA tested.

In order to show the range of expected percentages, I developed a very simple model that calculates the percentage of reproduced DNA and I let it run 20,000 times. Using the bootstrapping method, percentages that fall within the middle 95% of values can be said to occur with 95% confidence. While the mean values are trivial (except as a check against errors in the model), the minimum and maximum values also show the range you would not expect percentages to fall outside of. Below are model results for various combinations of relatives and what percentage of DNA they could reproduce for ancestors up to great-grandparents.

The percentage of reproduced DNA for one parent if X number of children get their DNA tested. The minimum, mean, and maximum are shown for 20,000 trials. Intervals are given with 95% confidence. For example, three siblings can be expected to reproduce between 82.7% and 91.8% of a parent’s DNA 95% of the time. Of course, it’s better to just get your parent to test their DNA if you can.
The percentage of reproduced DNA for one grandparent if you get your DNA sequenced along with X aunts and/or uncles (obviously who are related to your grandparent). You can expect to find about 95.3% of the available DNA matches out there for a grandparent if you can convince 4 aunts or uncles to test their DNA. Note that you still won’t know exactly what part of your genomes came from the grandparent of interest, but you can narrow down certain chromosome segments based on the family trees of other people who match with you or your relatives there.
The percentage of reproduced DNA for one grandparent if you get your DNA tested along with X number of first cousins . It doesn’t matter if they’re half or full first cousins, so long as they’re a grandchild of your grandparent. And, although potentially not practical, each one would have to be from a different aunt or uncle, otherwise the percentage of reproduced DNA would be much lower.
A combination of the above two tables, this one shows the percentage of DNA reproduced for one grandparent based on your DNA plus several combinations of your aunts/uncles and first cousins. Note that the cousins cannot be children of the aunts or uncles, as that would make a cousin’s DNA no longer helpful for contributing to the percentage. Generally, an aunt or uncles DNA is twice as valuable of a contribution, however note that one aunt or uncle plus four cousins is better than two aunts or uncles plus two cousins. However, that would require that your grandparent had at least five children.
Percentages of reproduced DNA for a couple of examples with aunts or uncles, siblings, and first cousins.
Moving on to great-grandparents (the model can handle any number of generations back that the user specifies), here’s the percentage of DNA reproduced by you plus X great-aunts and/or great-uncles. These would be siblings of one of your grandparents, all of whom are children of the great-grandparent of interest. You can no longer expect to reproduce much more than 70% of these ancestors’ DNA, and that’s if you’re lucky enough to find four children of a great-grandparent who are still alive and are willing to get their DNA sequenced.
The percentage of reproduced DNA for one great-grandparent if you and X number of your second cousins get your DNA sequenced. It doesn’t matter if they’re half or full second cousins, so long as they’re a grandchild of your grandparent. And, like for first cousins, each second cousin here would have to be from a different great-aunt or great-uncle.
Percentages of reproduced DNA for a great-grandparent based on several combinations of second cousins plus great-aunts and/or great-uncles and your own DNA.

The model relies on three rules, the latter two of which increase the variability of shared DNA between relatives. The first is that parents randomly pass half of their genome to their children. The second is that those parents pass their parents’ DNA somewhat randomly, but on average, half from each. The third is that relatives can expect their similar genomes to overlap by about half of what they have of their ancestors. For example, siblings, who each share 50% of a parent’s DNA, should expect about 25 of those percentage points to overlap.

This model does not differentiate between male and female ancestors, although it would be more accurate to do so. It happens that recombination from mothers to children is greater than that from fathers, resulting in more variability in lines that are majority male and less variability in lines that are majority female. Since this simple model doesn’t include differences in recombination, the results here are more like averages, or what you would expect if the numbers of your ancestors in a particular line were pretty close to half male and half female.

What can you do with a higher percentage of an ancestor’s DNA? Most websites don’t let you analyze relationships between mutual matches, but GEDmatch.com does. When you think that a chromosome segment of your genome came from an ancestor of interest, you can make a list of DNA relatives who share that segment with you. If you still don’t have enough information to prove which ancestor it’s from, you can compare those DNA relatives with each other on GEDmatch, excluding your own DNA this time. What you’ll find, if enough of them have well populated family trees, is that they share certain segments with each other that came from your ancestor, but that you didn’t inherit. Of course, if you manage your relatives’ kits, you can just analyze their matches at any website to which you’ve uploaded the data.

One final thing that I thought was interesting about these model results is that an adjustment can be applied based on already known percentages. For example, I already know that I have 29% of my maternal grandfather’s DNA and only 21% of my maternal grandmother’s DNA. If I’m wondering what percentage of DNA I share with my maternal grandfather’s father, I should be able to multiply the model results by 29/25. Based on the simulation results, I would’ve expected to share 7.3–17% of my DNA with my maternal grandfather’s father (with 95% confidence), but now I would expect to have inherited about 8.4–20% of that great-grandfather’s DNA. That ratio could be built into the model as a special function for calculating percentages adjusted by known ratios.

As a next step, I would like to treat recombination from mothers and fathers differently, however that would require a dataset of grandparent-grandchild relationships, with the additional constraint that the sex of the parent would need to be known. Because recombination occurs more in a mother’s genome than in a father’s, the shared DNA for maternal grandparent-grandchild relationships would have a lower standard deviation. The shared percentage of DNA for paternal grandparents would vary more from the expected 25%. I thought it would be a great idea to get standard deviations for sex-specific relationships in order to train a future model on those values, so I sent messages to just about everyone who has a dataset of shared DNA.

Update: I am so very grateful that nobody provided me with the simple aggregated statistics that I requested. In 2019, Carl Veller et al. finally released the standard deviations I had been waiting for. These are calculated from mathematical formulas and are therefore much more accurate than what I would’ve gotten for empirical data. (Empirical data are very accurate in some fields, but they’re wildly inaccurate in genetic genealogy. It’s a messy field for data.) This means that I was ready to make my model, but hadn’t started training it when the peer-reviewed statistics came out. I was checking the literature very frequently. When the standard deviations were finally available, I was able to make the most accurate shared DNA data that have ever existed.

Cover photo by Sharon McCutcheon. Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. That model was also used to make a very accurate relationship prediction tool. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits.

--

--