Frequently Asked Questions
This error is triggered when the number of values in the data line does not match
its definition in the header.
A common error is to define a tag with variable number of fields
(such as Number=G
or Number=A
or Number=R
in the header) and output incorrect
number of values at multiallelic the data lines. The number of values
must correspond to the number of alleles as explained in the section 1.4.2 of the VCF specification.
How to verify and fix:
Look up the tag definition in the header (bcftools view -h file.vcf.gz | grep TAG
) to check the expected number
of values and then check the number of alleles and values in the data line (bcftools view -H file.vcf.gz -r chr1:1234567
).
Note that the program only works with ploidy 1 or 2, so if defined as Number=G
and the ploidy is bigger,
the program will fail.
If the tag is not important for your analysis, a quick and dirty workaround is to remove the
tag from the VCF completely (bcftools annotate -x TAG
).
As described in the manual page, the -R
option takes into account overlapping records.
If a strict subset by position is required, add (or replace with) the -T
option.
view
and query
. What is going on?Say you want to print a list of samples with non-reference genotypes. This can be done using the following command
$ bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i 'GT="alt"' file.vcf 1:67893 sample3 0/1
However, you may also want to print genotypes of ALL samples at variant sites with at least one
non-reference genotype. In order for this to work, first select the desired rows
with the view
command, then let query
format the output
$ bcftools view -i 'GT="alt"' file.vcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' 1:67893 sample1 0/0 1:67893 sample2 0/0 1:67893 sample3 0/1
Please see here.
Feedback
We welcome your feedback, please help us improve this page by either opening an issue on github or editing it directly and sending a pull request.