从 gnomAD 数据库下载的 WES mutation 中包含了很多冗余信息。如果直接用于 GATK mutect2 filter Germline,将极大占用内存资源。GATK在这一步,仅需要AF信息,因此可以删除其它冗余数据。
1 12237 rs1324090652 G A 81.96 AC0 AC=0;AN=0;rf_tp_probability=4.32548e-01;FS=0.00000e+00;InbreedingCoeff=-1.43700e-01;MQ=2.20000e+01;MQRankSum=-3.58000e-01; QD=4.31000e+00;ReadPosRankSum=3.58000e-01;SOR=1.29200e+00;BaseQRankSum=7.36000e-01;ClippingRankSum=-3.58000e-01;DP=16096;VQSLOD=1.57000e+00;VQSR_culprit=QD;segdup;rf_negative_label;rf_label=FP;rf_train;variant_type=snv;allele_type=snv;n_alt_alleles=1;pab_max=1.00000e+00;gq_hist_alt_bin_freq=0|1|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq=3635|1739|178|101|14|1|0|0|1|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_bin_freq=2|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger=0;dp_hist_all_bin_freq=125622|126|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger=0;ab_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|2|1|0|0|0|0|0|0;AC_nfe_seu=0;AN_nfe_seu=0;nhomalt_nfe_seu=0;controls_AC_afr_male=0;controls_AN_afr_male=0;controls_nhomalt_afr_male=0;non_neuro_AC_eas_kor=0;non_neuro_AN_eas_kor=0;non_neuro_nhomalt_eas_kor=0;non_topmed_AC_amr=0;non_topmed_AN_amr=0;non_topmed_nhomalt_amr=0;non_cancer_AC_asj_female=0;non_cancer_AN_asj_female=0;non_cancer_nhomalt_asj_female=0;AC_raw=5;AN_raw=11338;AF_raw=4.40995e-04;nhomalt_raw=1。。。。。。。。。。。
1 865486 rs748060990 C T 13693 PASS AF=0.000267348
1 865489 rs1262270747 C G 837 PASS AF=7.31465e-06
1 865495 rs776481309 TCTC T 1701.02 PASS AF=2.12911e-05
1 865499 rs758377285 C T 672.78 PASS AF=1.38185e-05
1 865506 rs1247056920 C T 1851.41 PASS AF=6.72007e-06
bcftools annotate -x ^INFO/AF gnomad.exomes.r2.1.1.sites.1.vcf.bgz
参考资料:
https://github.com/samtools/bcftools/issues/158