Converting from 23andMe to VCF
The raw 23andMe results can be downloaded as a tab-delimited file with four columns, the marker ID, chromosome name, position and the genotype:
rs6139074 20 63244 AA rs1418258 20 63799 CC rs6086616 20 68749 TT rs6039403 20 69094 AG
The dot “.” can be used for a marker ID in case it is not known. The conversion command is then:
bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf
It is important to check the output printed on the screen, which may look for example like this:
Rows total: 612647 Rows skipped: 4751 Missing GTs: 20525 Hom RR: 318339 Het RA: 165598 Hom AA: 103420 Het AA: 14
Here the program converted more than 95% of the rows, in the example 20525 genotypes are missing (“--”) and 4751 sites were skipped because the tool only considers SNPs and ignores deletions and insertions.
One should also check the number of non-reference heterozygous genotypes, which
was 14 in this example. It should be small like this, because large number of
heterozygous alts (Het AA) indicates that the input alleles are not on the
forward strand. We can check this explicitly using the fixref plugin.
Merging into multi-sample VCFs
Single-sample VCFs created in the previous step can be merged into one multi-sample VCF using the following commands. The input files must be indexed, then can be merged:
bcftools index sampleA.bcf bcftools index sampleB.bcf bcftools merge -Ob -o output.bcf sampleA.bcf sampleB.bcf
Feedback
We welcome your feedback, please help us improve this page by either opening an issue on github or editing it directly and sending a pull request.