最近从公共数据库下载了一堆bam文件和reference 基因组文件,重新分析外显子流程时,跑出了“Exception in thread "main" picard.PicardException: New reference sequence does not contain a matching contig for NC_007605”这个错误。
源代码是这样的:
java -jar picard.jar ReorderSam \
I=original.bam \
O=reordered.bam \
R=reference.fasta \
CREATE_INDEX=TRUE
搜了一下,gatk官网给出的解决方式,见链接:https://gatkforums.broadinstitute.org/gatk/discussion/10071/question-about-picard-reordersam-new-reference-sequence-does-not-contain-a-matching-contig-for
链接给出的解释是:By default the tool requires an exact match -- to relax that requirement, use ALLOW_INCOMPLETE_DICT_CONCORDANCE
ALLOW_INCOMPLETE_DICT_CONCORDANCE=Boolean,
If true, then allows only a partial overlap of the BAM contigs with the new reference
sequence contigs. By default, this tool requires a corresponding contig in the new
reference for each read contig Default value: false. This option can be set to 'null' to
clear the default value. Possible values: {true, false}
如果不加ALLOW_INCOMPLETE_DICT_CONCORDANCE这个参数的话,系统默认为FALSE,即精确匹配,如果想放松匹配要求的话,则可以在命令行添加参数ALLOW_INCOMPLETE_DICT_CONCORDANCE=TRUE,这样就不会报错了。
见修改后的命令行:
java -jar picard.jar ReorderSam \
I=original.bam \
O=reordered.bam \
R=reference.fasta \
CREATE_INDEX=TRUE
ALLOW_INCOMPLETE_DICT_CONCORDANCE=true