Annotating VCF/BCF files

Remove annotations

Set the ID column to . and remove INFO/DP and FORMAT/DP annotations

bcftools annotate -x ID,INFO/DP,FORMAT/DP file.vcf.gz

Remove all INFO fields and all FORMAT fields except for GT and PL

bcftools annotate -x INFO,^FORMAT/GT,FORMAT/PL file.vcf
Transfer annotations from one VCF file to another

Populate the columns ID, QUAL and the INFO/TAG annotation

# do not replace TAG if already present
bcftools annotate -a src.bcf -c ID,QUAL,+TAG dst.bcf

# overwrite existing TAG annotations
bcftools annotate -a src.bcf -c ID,QUAL,TAG dst.bcf

Carry over all INFO and FORMAT annotations except FORMAT/GT

bcftools annotate -a src.bcf -c INFO,^FORMAT/GT dst.bcf
Transfer annotations from a tab-delimited text file to a VCF

The following command can be used to transfer values from a tab-delimited file into a new INFO/TAG annotation. Note that if the TAG is not defined in the VCF header, a header fragment with the definition must be provided via the -h option.

# Annotate from a tab-delimited file with six columns (the fifth is ignored),
# first indexing with tabix. The coordinates in the text file are 1-based, same
# as the coordinates in the VCF
tabix -s1 -b2 -e2 annots.tab.gz
bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,POS,REF,ALT,-,TAG file.vcf


# Annotate from a tab-delimited file with regions (1-based coordinates, inclusive)
tabix -s1 -b2 -e3 annots.tab.gz
bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,FROM,TO,TAG input.vcf


# Annotate from a bed file (0-based coordinates, half-closed, half-open intervals)
bcftools annotate -a annots.bed.gz -h annots.hdr -c CHROM,FROM,TO,TAG input.vcf
Overwriting / not overwriting existing tags and the handling of missing values

Modifiers that control what to do with missing values:

-c TAG

Add TAG if the source value is not missing (“.”). If TAG exists in the target file, it will be overwritten

-c +TAG

Add TAG if the source value is not missing and TAG is not present in the target file.

-c .TAG

Add TAG even if the source value is missing. This can overwrite non-missing values with a missing value and can create empty VCF fields (TAG=.)

-c .+TAG

Add TAG even if the source value is missing but only if TAG does not exist in the target file; existing tags will not be overwritten.

Transfer annotation from INFO column to FORMAT

Imagine you need to transfer INFO/DP annotation to FORMAT/DP. This is currently not possible using a single bcftools annotate command, but can be done easily in multiple steps. This is a complete example that can be copy and pasted as is:

# Create a test VCF
echo -e '##fileformat=VCFv4.3' > test.vcf
echo -e '##INFO=<ID=DP,Number=1,Type=Integer,Description="Read depth">' >> test.vcf
echo -e '##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' >> test.vcf
echo -e '##contig=<ID=1,length=248956422,assembly=hg38>' >> test.vcf
echo -e '#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tsmpl1\tsmpl2' >> test.vcf
echo -e '1\t16648016\t.\tG\t.\t.\t.\tDP=10\tGT\t0/0\t0/0' >> test.vcf

# Extract INFO/DP into a tab-delimited annotation file
bcftools query -f '%CHROM\t%POS\t%DP\n' test.vcf | bgzip -c > annot.txt.gz

# Index the file with tabix
tabix -s1 -b2 -e2 annot.txt.gz

# Create a header line for the new annotation
echo -e '##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">' >> hdr.txt

# Transfer the annotation to sample 'smpl1'
bcftools annotate -s smpl1 -a annot.txt.gz -h hdr.txt -c CHROM,POS,FORMAT/DP test.vcf

Feedback

We welcome your feedback, please help us improve this page by either opening an issue on github or editing it directly and sending a pull request.