Plugin trio-dnm3
This plugin can be used to screen variants for possible de-novo mutations in trios (i.e. in samples with parental data available).
The program adds the following annotations:
-
FORMAT/DNM: posterior probability of the variant being DNM (see
--dnm-tagoption) -
FORMAT/VA: the variant allele given as a 0-based index to REF,ALT alleles (see
--vaoption) -
FORMAT/VAF: the fraction of reads supporting the de novo allele (see
--vafoption)
There are several calling models are available:
- Naive model
-
This simply looks at sample genotypes (FORMAT/GT) and identifies sites that violate Mendelian inheritance, taking into account sex inheritance patterns on sex chromosomes and in pseudo-autosomal regions. This model is activated as
bcftools +trio-dnm3 -P samples.ped --use-NAIVE
- DNG, DeNovoGear model
-
The original DeNovoGear model (
--use-DNG) or with some problems fixed (--use-ALM --with-pPL) This model is activated as
bcftools +trio-dnm3 -P samples.ped --use-DNG bcftools +trio-dnm3 -P samples.ped --use-ALM --with-pPL
- ALM, allele-likelihood model
-
A newer calling model that produces a cleaner call set and addresses the main limitation of DNG, namely its insensitivity to parental emission of the de novo allele. This model is activated as
bcftools +trio-dnm3 -P samples.ped --use-ALM
- DMM, Dirichlet-multinomial model
-
The newest calling model that addresses the main limitation of ALM - its overconfidence at low parental depth - which can lead to misclassification of inherited variants as de novo due to binomial sampling. This model is the default and can be explicitly activated as
bcftools +trio-dnm3 -P samples.ped --use-DMM
For more information and math notes see PAPER.
The list of plugin-specific options can be obtained by running
bcftools +trio-dnm3, which will print the following usage page:
Usage: bcftools +trio-dnm3 [OPTIONS]
Common options:
-e, --exclude EXPR Exclude trios for which the expression is true (one matching sample invalidates a trio)
-i, --include EXPR Include trios for which the expression is true (one failing samples invalidates a trio)
-o, --output FILE Output file name [stdout]
-O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
-r, --regions REG Restrict to comma-separated list of regions
-R, --regions-file FILE Restrict to regions listed in a file
--regions-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
-t, --targets REG Similar to -r but streams rather than index-jumps
-T, --targets-file FILE Similar to -R but streams rather than index-jumps
--targets-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
--no-version Do not append version and command line to the header
-v, --verbosity INT Verbosity level
-W, --write-index[=FMT] Automatically index the output files [off]
General options:
-m, --min-score NUM Do not add FMT/DNM annotation if the log score is smaller than NUM
-p, --pfm [1X:|2X:]P,F,M Sample names of child (the proband), father, mother; "1X:" for male pattern of chrX inheritance [2X:]
-P, --ped FILE PED file with the columns: <ignored>,proband,father,mother,sex(1:male,2:female)
-X, --chrX LIST List of regions with chrX inheritance pattern or one of the presets: [GRCh37]
GRCh37 .. X:1-60000,chrX:1-60000,X:2699521-154931043,chrX:2699521-154931043
GRCh38 .. X:1-9999,chrX:1-9999,X:2781480-155701381,chrX:2781480-155701381
--dnm-tag TAG[:type] Output tag with DNM quality score and its type [DNM:log]
log .. log-scaled quality (-inf,0; float)
flag .. is a DNM, implies --use-NAIVE (1; int)
phred .. phred quality (0-255; int)
prob .. probability (0-1; float)
--force-AD Calculate VAF even if the number of FMT/AD fields is incorrect. Use at your own risk!
--va TAG Output tag name for the variant allele [VA]
--vaf TAG Output tag name for variant allele fraction [VAF]
Models:
--use-NAIVE v0, Naive calling model which uses only FMT/GT to determine DNMs
--use-DNG v1, Original DeNovoGear model, implies --dng-priors
--use-ALM v2, Basic allele-likelihood model (the default until v1.24)
--use-DMM v3, Dirichlet-multinomial model with site noise awareness (the default since v1.24)
Model options:
--dng-priors Use the original DeNovoGear priors (including bugs in prior assignment, but with chrX bugs fixed)
--mrate NUM Mutation rate [1e-8]
-n, --strictly-novel When Mendelian inheritance is violiated, score highly only novel alleles (e.g. in LoH regions)
--with-pAD Do not use FMT/QS but parental FMT/AD
Model options specific to --use-ALM:
--ad, --allele-dropout NUM Mixture weight for missed inherited alleles due to low read depth [0]
--min-vaf NUM Baseline variant allele fraction for mosaic scoring, by default off for ALM [0]
--np, --noise-prior NUM Prior probability of site-level noise; negative disables multialellic penalty in the child [1e-3]
--phi NUM Dirichlet-multinomial overdispersion for modelling genotypes [1e3]
--pns FRAC[,NUM][:TYPE] Maximum allowed parental noise, fraction or number of reads; TYPE is snv, indel, or both;
--pn FRAC[,NUM][:TYPE] --pn is the same as --pns applied for alleles observed in both parents, defaults:
--pns 0.045,0:snv --pn 0.011,0:snv --pns 0:indel --pn 0:indel
--sb, --strand-bias NUM Strand bias mixture coefficient; requires FMT/SP [0]
--with-pPL Do not use FMT/QS but parental FMT/PL (inflates FDR)
Model options specific to --use-DMM:
--max-QM NUM Maximum QM value (phred); negative value to ignore FORMAT/QM annotation [30]
--min-vaf NUM Baseline variant allele fraction for mosaic scoring [0.2]
--np, --noise-prior NUM Prior probability of site-level noise; negative disables multialellic penalty in the child [1e-3]
--phi NUM Dirichlet-multinomial overdispersion for modelling genotypes [1e3]
--pns FRAC[,NUM][:TYPE] See above [--pns 0.045,0:snv --pns 0:indel]
--pn FRAC[,NUM][:TYPE] See above [--pn 0.011,0:snv --pn 0:indel]
--sb, --strand-bias NUM Strand bias mixture coefficient; requires FMT/SP [1e-2]
Model options specific to --use-DNG:
--sb, --strand-bias NUM Strand bias mixture coefficient; requires FMT/SP [0]
Example:
# Annotate VCF with FORMAT/DNM, run for a single trio
bcftools +trio-dnm3 -p proband,father,mother file.bcf
# Same as above, but read the trio(s) from a PED file
bcftools +trio-dnm3 -P file.ped file.bcf
# Same as above plus extract a list of significant DNMs using the bcftools/query command
bcftools +trio-dnm3 -P file.ped file.bcf -Ou | bcftools query -i'DNM>10' -f'[%CHROM:%POS %SAMPLE %DNM\n]'
# A complete example with a variant calling step. Note that this is one long
# command and should be on a single line. Also note that a filtering step is
# recommended, e.g. by depth and VAF (not shown here):
bcftools mpileup -a AD,QM,SP -f ref.fa -Ou proband.bam father.bam mother.bam |
bcftools call -mv -Ou |
bcftools +trio-dnm3 -p proband,father,mother -Oz -o output.vcf.gz
Feedback
We welcome your feedback, please help us improve this page by either opening an issue on github or editing it directly and sending a pull request.