The raw data wall
NOTE: This blog’s recommendation of GEDmatch has been withdrawn due to privacy issues. See “Withdrawing a recommendation,” The Legal Genealogist (https://www.legalgenealogist.com/blog : posted 15 May 2019).
Okay, so it’s not the 12th of June 1987. This isn’t the Brandenburg Gate. I’m not the President of the United States, and you’re not the Premier of the Soviet Union. But hey… c’mon, AncestryDNA…
Tear down this wall!It’s been just about two months now since I got the results of my AncestryDNA autosomal DNA testing. That’s the kind of test that works across gender lines and helps identify cousins with whom you can share genealogical data to try to find your common ancestors.1
And, I have to say, I’d be a lot happier with what I can do with the results of this test if I actually had the results of this test.
The real underlying raw data results.
You see, there’s only so much you can do with a system that’s built around matching family trees that people have uploaded.
In my case, for example, I have 24 matches in the 4th-6th cousin range with a 95% or 96% confidence level. Five of those 24 haven’t uploaded any family tree at all. Seven have private trees. One has a family tree with exactly three people in it. So for more than half of my matches, I can’t find out anything that’s useful just by opening the link to that match.
I have surnames in common with some of the others. You know, those rare surnames like Jones or Johnson or Robertson or Baker. And we share some locations… like the State of Texas or the State of North Carolina.
These are not things that are moving me very far down the road very quickly.
So let me tell you, AncestryDNA, that I need more from you. I really want my raw data. And, even more, I want all your other tested folks to have theirs.
At Family Tree DNA, my Family Finder raw data is 7.4 Mb of compressed data that, when extracted, becomes a 23.9 Mb plain text file in CSV format that can be loaded into a text editor or a spreadsheet. The image above is a very small snippet of my own raw data loaded into a spreadsheet program — just a handful of results from a few of the thousands and thousands of spots in my autosomal DNA that were sampled and recorded in this test.
At 23andMe, my Relative Finder raw data is 7.8 Mb of compressed data that decompresses into 24.7 Mb of plain text — line after line after line of identifiers (the RS ID is a reference SNP ID number2), chromosome numbers, positions and just which two of the four possible results (A, C, G and T) were found at each position.
Now because I have both of those files, I can play around with my results in all kinds of different ways. Most particularly, I can upload them to third-party utility sites and get much more benefit out of having tested my autosomal DNA.
One of my favorite sites — one I’ve written about before3 — is GedMatch.com. It’s become so popular that its hosting company is giving it running fits and it’s trying to raise enough funds through donations to keep all its features available. (I’ve donated — how about you?) But you know what, AncestryDNA? There’s a reason that GedMatch is so popular — and being able to look at the raw data in lots of different ways — including, but not limited to, comparing it to data of people who’ve tested with other companies — is right there at the top of the list.
I understand this isn’t a high priority for you, AncestryDNA. But it is important to those of us who are your customers who are interested in genetic genealogy. And I’m not the only one who says so. Read more from CeCe Moore of Your Genetic Genealogist,6 Razib Khan of Gene Expression,7, Debbie Cruwys Kennett of Cruwys news8 and Blaine Bettinger of The Genetic Genealogist9 — and those are just for starters.
So how ’bout it, AncestryDNA?
We know you can do it.
C’mon now… tear down that wall.
- See generally Judy G. Russell, “Autosomal DNA testing,” National Genealogical Society Magazine, October-December 2011, 38-43. ↩
- See Wikipedia (http://www.wikipedia.com), “dbSNP,” rev. 18 Sep 2012. ↩
- Judy G. Russell, “Gedmatch: a DNA geek’s dream site,” The Legal Genealogist, posted 12 Aug 2012 (https://www.legalgenealogist.com/blog : accessed 18 Sep 2012). ↩
- Lindsay M. Greenawalt, “Top 10 things to do with your FTDNA raw data,” Confessions of a Cryokid, posted 16 Jun 2011 (http://cryokidconfessions.blogspot.com : accessed 18 Sep 2012). ↩
- ISOGG Wiki (http://www.isogg.org/wiki), “Autosomal DNA tools,” rev. 2 Dec 2011. ↩
- CeCe Moore, “Follow Up: Lab Error Responsible for Adoptee’s Confusing Match at AncestryDNA ,” Your Genetic Genealogist, posted 24 Aug 2012 (http://www.yourgeneticgenealogist.com : accessed 18 Sep 2012). ↩
- Razib Khan, “Ancestry.com’s AncestryDNA won’t give you your raw data,” Gene Expression, posted 16 Sep 2012 (http://blogs.discovermagazine.com/gnxp/ : accessed 18 Sep 2012). ↩
- Debbie Cruwys Kennett, “AncestryDNA’s response to my request for my raw genetic data,” Cruwys news, posted 30 Aug 2012 (http://cruwys.blogspot.com : accessed 18 Sep 2012). ↩
- Blaine Bettinger, “Problems with AncestryDNA’s Genetic Ethnicity Prediction?,” The Genetic Genealogist, posted 19 Jun 2012 (http://www.thegeneticgenealogist.com : accessed 18 Sep 2012). ↩