Frequently Asked Questions
This error is triggered when the number of values in the data line does not match
its definition in the header.
A common error is to define a tag with variable number of fields
Number=R in the header) and output incorrect
number of values at multiallelic the data lines. The number of values
must correspond to the number of alleles as explained in the section 1.4.2 of the VCF specification.
How to verify and fix:
Look up the tag definition in the header (
bcftools view -H file.vcf.gz | grep TAG) to check the expected number
of values and then check the number of alleles and values in the data line (
bcftools view -h file.vcf.gz -r chr1:1234567).
Note that the program only works with ploidy 1 or 2, so if defined as
Number=G and the ploidy is bigger,
the program will fail.
If the tag is not important for your analysis, a quick and dirty workaround is to remove the
tag from the VCF completely (
bcftools annotate -x TAG).
As described in the manual page, the
-R option takes into account overlapping records.
If a strict subset by position is required, add (or replace with) the
query. What is going on?
Say you want to print a list of samples with non-reference genotypes. This can be done using the following command
$ bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i 'GT="alt"' file.vcf 1:67893 sample3 0/1
However, you may also want to print genotypes of ALL samples at variant sites with at least one
non-reference genotype. In order for this to work, first select the desired rows
view command, then let
query format the output
$ bcftools view -i 'GT="alt"' file.vcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' 1:67893 sample1 0/0 1:67893 sample2 0/0 1:67893 sample3 0/1
Please see here.