Sunday, April 26, 2015

Blogging My Genome, episode 7: sifting for bad news

This is the seventh in a series of blog posts about my genome, which I had sequenced through Illumina's Understand Your Genome program.

Hey, it's been awhile! We've been unbelievably busy at my company, but I've been plugging away on my genome analysis slowly. When I last blogged, I'd completed the process of identifying small variants in my genome (affecting just one or a few DNA nucleotides). This takes us into an interesting new analysis phase - interpreting the consequences of those variants in the context of existing knowledge of human genetics. I previously went into depth on a certain variant I'd known to look for, but now we'll sift through the others in my VCF file - nearly four million of them!

I began with Ensembl's Variant Effect Predictor (VEP), one of several available tools that annotates VCF variants with their likely consequences for known genes and other genomic features. VEP produces a new VCF file with this additional information crammed into each entry, like so:

1       871215  .       C       G       1357.81 .       AB=0;ABP=0;AC=2;AF=1;AN=2;AO=45;CIGAR=1X;DP=
0261008|0.00232558|||||||||||||||G:0.0629|||||0.03|0.08|0.16|0.0026|        GT:DP:RO:QR:AO:QA:GL    

That's really not pretty, but VEP also produces a nice series of summary charts. For example, it breaks down putative consequences of my variants in protein-coding sequences.