Why does 23andMe show that I share an unusually high amount of DNA (50%) with my full-sibling?
Alternate title/misleading answer: 23andMe counts FIR twice
Scientists who aren’t familiar with genetic genealogy will be very confused by this question. After all, 50% is the expected amount, or average, that two siblings share. People asking this question have likely been trapped inside of an AncestryDNA bubble.
AncestryDNA doesn’t report the amount of fully-identical regions (FIR) that two people share with each other. To avoid confusion, I’ll note that they do count and use FIR in identifying and labeling full-siblings. And here’s one thing that some people won’t believe when you say it: AncestryDNA counts FIR as half-identical regions (HIR). They’ll riposte that AncestryDNA ignores FIR, but that isn’t true. When you count the number of of centiMorgans (cM) from HIR and FIR, but you count FIR as if it’s HIR, a population of full-siblings will share 37.5% DNA, on average. This has lead a large percentage-perhaps a majority-of genetic testing consumers to believe that 37.5% is a normal average to use.
Here’s the part that people get right, kind of. When asked why 23andMe reports an average of 50% between siblings, they answer that 23andMe counts FIR twice. While not technically wrong, I find this answer to be very misleading. Why? Well, FIR is a match on both chromatids of a chromosome, so it’s a double match, i.e. it should be counted twice. In fact, in order for a population of full-siblings to have 50% identical by descent (IBD) sharing, on average, which is what geneticists know to be a fact, FIR have to be counted twice. When I see someone answer the question with “23andMe counts FIR twice,” I see a lot of other people say that it’s a misleading way to do it and that it over-counts or double-counts. But it isn’t misleading; it’s correct. The best answer to the original question is that “AncestryDNA counts FIR as if it’s HIR.” Not that AncestryDNA ignores FIR, because that wouldn’t be a true statement.
If for some reason you don’t believe me, consider this Java code I wrote and use daily. The following code counts HIR between two individuals. I’ve never run this code on full-siblings and gotten an average of anything close to 50%. It’s always 37.5%. Please note that this code only works on genomes in which the only shared base-pairs are indeed IBD. This can be done in simulated data when the farthest back shared ancestors have uniquely labeled DNA (distinct from each other). In these genomes, humans don’t share over 99% of their DNA, as it makes it very easy to know the exact amount of IBD sharing between two individuals.
Anyone with the most basic programming skills will see that there’s no part of the above code that checks if two individuals share FIR, and therefore there’s no way for the code to completely exclude FIR sharing. It simply checks every base-pair of the genome and adds a 1 to the count if people match on either their maternal or paternal chromosomes. Please note that this code won’t work for relationships in which person 1, via their paternal side, matches the maternal side of person 2, or vice versa.
Conversely, here’s an algorithm that I never thought of writing until a social media admin. “corrected” me for saying that AncestryDNA counts FIR as HIR. They said that it doesn’t count FIR at all — it completely ignores it for reporting purposes. I politely explained in detail how my original statement was correct. I mentioned that if AncestryDNA completely ignored FIR for reporting purposes that full-siblings would only share 25% by that metric. And that seemed to end the conversation. Then I wrote the following code, running it only once. I always want to be told when I’m wrong and I wrote this code for the purpose of telling me I was wrong if that was the case. It doesn’t count any sharing on DNA regions that are fully-identical.
The code doesn’t show the average that you’d expect if AncestryDNA used this metric. It doesn’t produce the 37.5% average that AncestryDNA reports, so it isn’t the way they count HIR sharing. The code resulted in 25% shared DNA between full-siblings when ignoring FIR. Clearly, nobody should use this method, as there would be no difference in reported DNA sharing between half-siblings and full-siblings. I haven’t used the code for anything since. But it showed me that AncestryDNA doesn’t completely ignore FIR in its cM or percentage reporting, otherwise it would report 25% shared DNA, on average, between full-siblings.
I’ve shown you the code for HIR sharing and some useless code for ignoring FIR. I might as well show you my code for IBD sharing (HIR + FIR) because someone will inevitably ask. Again, this code can only be used on genomic data in which the only shared base-pairs are IBD. This produces the 50% average between full-siblings that one sees at 23andMe, and which accurately describes true IBD sharing. Here it is:
I hope you’ve found this information helpful. And I really hope that I start hearing people answer the above question with “AncestryDNA counts FIR as if it’s HIR” and not that other typical answer, which insinuates that 23andMe reports misleading information by double-counting shared DNA.
Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. That model was also used to make a very accurate relationship prediction tool. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits.
Originally published at http://www.dna-sci.com.