当前位置: 首页 > 工具软件 > VCF > 使用案例 >

gatk过滤_vcf文件过滤

左丘成天
2023-12-01

1:参考文献:

Li H. Towards better understanding of artifacts in variant calling

from high-coverage samples[J]. Bioinformatics, 2014:

btu356.

2:针对GATK的call SNP有UnifiedGenotyper与HaplotypeCaller。现在基本上HaplotypeCaller可以取代UnifiedGenotyper。原因截取如下:

The HaplotypeCaller is a more

recent and sophisticated tool than the UnifiedGenotyper. Its

ability to call SNPs is equivalent to that of the UnifiedGenotyper,

its ability to call indels is far superior, and it is now capable

of calling non-diploid samples. It also comprises several unique

functionalities such as the reference confidence model (which

enables efficient and incremental variant discovery on ridiculously

large cohorts) and special settings for RNAseq

data.

As of GATK version 3.3, we recommend using HaplotypeCaller

in all cases, with no exceptions.(摘自GATK官方回复)

3:对于vcf文件过滤的建议参数:https://software.broadinstitute.org/gatk/guide/article?id=3225,以下这些过滤参数的设置主要是在无法使用VQSR的时候可以使用如下参数:

For SNPs:

QD < 2.0

MQ < 40.0

FS > 60.0

SOR > 3.0

MQRankSum < -12.5

ReadPosRankSum < -8.0

If your callset was generated with UnifiedGenotyper for legacy

reasons, you can add HaplotypeScore

> 13.0.

--clusterWindowSize 5 --clusterSize

2另外还加上这两个参数,如果某个地方密集出现SNP可能是缺失或者插入。

For indels:

QD < 2.0

ReadPosRankSum < -20.0

InbreedingCoeff < -0.8

FS > 200.0

SOR > 10.0

4:在参考文献GATK中call snp使用的参数有:-stand_call_conf 30

-stand_emit_conf 10,现在stand_emit_conf这个参数在我使用的GATKv3.7已经不存在

另外建议添加:

-minPruning  Minimum support to not prune paths in the

graph

-mbq Minimum base quality required

to consider a base for calling

-nct Number of CPU threads

to allocate per data thread

“-stand_call_conf 30 -mbq 20 --minPruning 2 -nct

10”这是我用的参数

5:另外Indel的范围一般是指:50bp,关于参考文献可以查看:

Tattini L, D’Aurizio R, Magi A. Detection of genomic structural

variants from next-generation sequencing data[J]. Frontiers in

bioengineering and biotechnology, 2015, 3: 92.

 类似资料: