Plugin fixref
Warning
|
Do not use the program blindly, make an effort to understand what
strand convention your data uses! Make sure the reason for mismatching REF
alleles is not a different reference build!! Also do NOT use bcftools norm --check-ref s for this purpose,
as it will result in nonsense genotypes!!!
|
This tool helps to determine and fix strand orientation.
Currently it can collect and print numbers useful in determining
the strand convention (the stats
mode), swap REF/ALT alleles based on the
SNP reference ID (the id
mode),
flip or swap non-ambiguous SNPs (the flip
mode),
or convert from the
Illumina TOP strand
convention to the forward strand (the top
mode).
Run the stats to learn the number of REF allele mismatches and the number of non-biallelic sites:
bcftools +fixref test.bcf -- -f ref.fa
Another tool for checking the reference allele mismatches:
bcftools norm --check-ref e -f /path/to/reference.fasta input.vcf.gz -Ou -o /dev/null
If there are no REF mismatches and the number of multi-allelic sites is small, we are done. If the output shows that the VCF is TOP-compatible, the following command can be used to fix the strand:
bcftools +fixref test.bcf -Ob -o output.bcf -- -f ref.fa -m top
If the file contains dbSNP reference identificators (rsXXX in the ID column), the following commands can be used to swap the reference and alternate alleles:
# Get the dbSNP annotation file. Make sure the correct reference build is used (e.g. b37) # https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/VCF/All_20151104.vcf.gz wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/VCF/All_20151104.vcf.gz.tbi # Swap the alleles bcftools +fixref broken.bcf -Ob -o fixref.bcf -- -d -f /path/to/reference.fasta -i All_20151104.vcf.gz # The above command might have changed the coordinates, we must sort the VCF. bcftools sort fixref.bcf -Ob -o fixref.sorted.bcf
In the most extreme case when nothing else is working, one can simply force the unambiguous alleles onto the forward strand and drop the ambiguous genotypes.
bcftools +fixref test.bcf -Ob -o output.bcf -- -f ref.fa -m flip -d
Note that this is an extremely unsafe operation and will most likely result in nonsense genotypes. If you decide to use it anyway, make sure to check the sanity of the result with the af-dist plugin!!
Warning
|
Do not use the program blindly, make an effort to understand what
strand convention your data uses! Make sure the reason for mismatching REF
alleles is not a different reference build!! Also do NOT use bcftools norm --check-ref s for this purpose,
as it will result in nonsense genotypes!!!
|
Feedback
We welcome your feedback, please help us improve this page by either opening an issue on github or editing it directly and sending a pull request.