I was pretty excited when 23andMe first appeared on the scene, offering genomic scanning to the masses. In the years since, beyond distributing several 23andMe kits as gifts and introducing people to Promethease, nothing much happened. The whole FDA debacle didn't help. But after we decided that we wanted to have children, my interest surged again, and right about the same time, Genos Research began (briefly) to offer "whole exome scanning", where the exome is the part of the genome that codes for proteins, and constitutes about 50M basepairs, compared to most other services' microarray-based scanning that covers <1M.

Naturally, I signed us up.

After an interminable three months, we received our results, and I reached out to [livejournal.com profile] cariaso, one of SNPedia's founders, to see if Promethease could help with telling us anything interesting about our potential progeny. To my surprise, he replied that he'd removed that very function, because of challenges around handling "compound heterozygosity and intron boundaries." I realized that I had some reading to do.

But because code is easier than research, I started there instead.

Genos offers the option to download your data in a Promethease-compatible format, but it turned out to contain surprisingly little data. There's <50k locations reported, compared to the current 23andMe scan that assays >600k. I assume they've filtered out just the locations that Promethease can report useful data about, but it still seems small, given that SNPedia has data on nearly 100k SNPs. Could the rest really all be introns?

Since that was again threatening to turn into research, I decided to instead try combining the Genos data with my existing 23andMe data.

Oy vey.

I don't, it turns out, have two formats of data to deal with. I have three, because my 23andMe raw data has scans from the first three versions of their microarray, whereas Eden's is from their fourth. And look:

# rsid chromosome position genotype
i6059704 1 2541269 AA
rs10797440 1 2541269 AG

They changed the name! Now, you can understand that they might have started with an internal name (i...) and then switched to the official name (rs...). But then:

# rsid chromosome position genotype
i6015169 1 11855171 AA
i5003529 1 11855171 AA

And merging Genos and 23andMe isn't always trivial, since they often disagree about *which* SNP is at which position.

# rsid chromosome position genotype
rs2066472 1 11862971 CC

chr1 11862971 11862972 rs763539350 G/G
Maybe this could be explained if they were using a different location reference for the human genome, but AFAICT they don't seem to be. Such confusion. Luckily, Genos also offered a raw data download totalling 13GiB in formats that I don't yet understand, but maybe there's some hope for working with those in future.
At any rate, after much banging and cursing, I do have something that generates superficially interesting data for potential progeny, given a pair of genome scans from Genos or 23andMe. Try it out?
Miki Habryn

April 2017


