DNA companies’ database size
One of the most important goals of DNA testing companies is to have the largest possible database. If their database is small, then due to a lack of data, their tests fail to provide an accurate analysis. It is also crucial to have a diverse database that covers all corners of the world (the more countries, the better). It’s no use having a large number of samples originating from one country if the company does not possess any data from another country.
For this reason, I contacted the 5 biggest DNA testing companies and asked them to reveal the size of their database (click here to read the emails).
The following table presents the database sizes based on the companies’ replies to me by email:
|LivingDNA||No data provided|
However, being a suspicious person, I wasn’t satisfied with the companies’ feedback.
With the assistance of Google’s Keyword Planner tool, I checked the search volume of various DNA testing companies and attempted to compare them. We can assume that if there is a small number of Google searches for a particular company, then it is less well-known; consequently, there will be fewer users so its database cannot be rich. Conversely, if a company has a rich Google search history, then — most probably — the company is popular among the consumers. I am aware that this is probably not the best solution, but we can draw a valuable conclusion.
Let’s see the results:
The most interesting thing is that the database size and the frequency of Google search correlate with each other. Therefore, the brand search frequency was proportional to the database size.
It also turned out that the two most frequently searched companies (AncestryDNA and 23andMe) do have the largest databases. They are followed by MyHeritage, then FamiltyTreeDNA, and finally, LivingDNA.
This is why I believe that they provided valid data.
Genealogy and my family history research
Basically, DNA test companies base their ethnicity estimation on the reference group composition in each region (country) and compare it with our DNA. The members of the reference group are mostly people who are familiar with their family trees, originate from a certain region (country) for several generations back, and possess a rather homogeneous ethnicity of ancestors.
As I know my ancestors 7 generations back, we can conclude that I could be a perfect reference group member. However, let’s see what my family tree actually says: I am 70% Hungarian, 25% Croatian, and I also had some Slovakian ancestors.
Note: Before proceeding to the in-depth analysis, it is important to mention that we inherit 50% of our DNA from each of our parents. This does not necessarily mean that we inherit from our grandparents in the same ratio.
In an ideal case, 25-25-25-25% is the gene inheritance from our grandparents; however, it can easily happen that we inherit 30% from one of them, 20% from another, while 15% from the third, and 35% from the fourth.
Nevertheless, it can be expected that there will be some synergy or similarity between the family tree and the genetic DNA results. Let’s see which DNA test companies succeeded in showing that!
AncestryDNA test result
According to Ancestry results, I’m 97% Eastern European, which is true.
It also finds out exactly which Eastern European nations I have: Hungarian, Croatian, and Slovakian roots. The visualization of the roots is correct as well.
According to this DNA test, I’m 83% European, and more specifically, 77% Eastern European (which includes Hungary), but 7% Middle Eastern (Turkey), and the same 6% Sephardic Jew. This is pretty weird because I have no evidence of these ancestors in the last 300 years.
Overall, I think this company’s DNA test report is pretty general and I have no roots in Turkey or Jewish ancestors.
Their DNA test results seem to be a little more accurate, since according to its calculations, I’m 50% from Balkan and 25% Eastern European (which includes Hungary). However, the proportion is the opposite compared to my genealogy result.
Greek, South Italian, Central Asian roots aren’t correct in my opinion.
According to this company’s DNA results, I’m mostly Eastern European: 52%, which is a little bit low, according to my genealogy research.
I have 25% Croatian ancestors, which seems to be accurate. However, I don’t have French and German ancestors.
This company’s DNA report is the most general, probably because of the small database. We can say that it figured out my Hungarian ancestry when it put me on the Eastern Europe part of the map. However, nothing else is accurate in terms of ethnicity estimation.
I can declare from a genealogy perspective that the winner is AncestryDNA.
My own ethnicity calculation
Besides the database size, there is another important factor: data analysis and ethnicity calculation. How is our DNA calculated and compared to the reference groups? What method is applied? How sophisticated is that method?
The process of this procedure is a very complex mathematical and programming task, which you can read about here.
In short, the comparison of the DNA is based on Principal Component Analysis (PCA). But what is PCA? It substantially reduces the complexity of data in which a large number of variables are interrelated. (Source)
An amateur genetic researcher, David Wesolowski, designed an algorithm that is based on PCA. It calculates the positions of the sample relative to other samples in 25 dimensions of genetic variation. Based on this, we can model ancestors and create genetic maps.
But where can we “borrow” DNA samples and data? The answer is here: Reich’s Harvard DNA database. It is a public database with 5,637 anonymous present-day DNA data with additional important data: the country of origin.
So basically, we compared my DNA and Harvard’s DNA samples and here is the result:
According to the comparison, the 2 nearest ethnicities are Hungarian and Croatian (it’s at -1.0 y-axis). By the way, according to my genealogy research, this is 100% true!
Here is a zoomed screenshot (Andras is me):
According to my ethnicity calculation, my family tree research, and database size investigation, currently (2020), the best DNA testing company is AncestryDNA!