当前位置: 首页 > 工具软件 > FAI > 使用案例 >

[E::fai_build_core] Different line length in sequence ‘kraken:taxid|436|NZ_CP062147.1‘

尚嘉庆
2023-12-01

ERROR record:
下载了所有细菌fna后,整合成一整个fna文件,大小99G.
samtools faidx library.fna

error:[E::fai_build_core] Different line length in sequence 'kraken:taxid|436|NZ_CP062147.1'


google

Did you take a look at that sequence in question? It may be just a
case of a broken fasta record.

The error looks pretty clear - Your sequences may be of unequal length in different lines. Why an indexer does not auto-normalize (or at least provide an option for it),

picard NormalizeFasta --INPUT 1.fa --OUTPUT normalized.fa得到的结果依然无法够建索引
seqkit seq -w 70 s.fa > s2.fa只是把fa的序列行的每一行碱基数目调整,对错误序列部分无改正效果


error sequence 所在 row 250946642
总row 1497262490

查看 250946642 后50000行 找到错误行
sed -n '250946642,250996642'p normalized_library.fa > index.error.50000

kraken:taxid|436|NZ_CP062147.1’的末尾出现了新的seq

44099 CTCCGCCCCATCCGGCCCCGCCACACGGAGCTGCCCCGCCGCGTCCCAGCCCAGCCAGCGATGCC>krak
44100 en:taxid|1513|NZ_CP035785.1 Clostridium tetani strain Harvard 49205 ch
44101 romosomeNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
44102 NNNNNNNNNNNNNNNNNNNNNCAACAACGTATTTCATTTTAACACATTTAAATTTACCTATTGAGTATTA

grep '[A-Z]>' normalized_library.fa找出有多少行fa出了错

CTCCGCCCCATCCGGCCCCGCCACACGGAGCTGCCCCGCCGCGTCCCAGCCCAGCCAGCGATGCC>krak
TTATGTGGGATTAAACTTGAAATTTCATT>kraken:taxid|290847|NC_017382.1 Helicoba

查看真正错误所在行:grep -n 'CC>krak' normalized_library.fa

250990740:CTCCGCCCCATCCGGCCCCGCCACACGGAGCTGCCCCGCCGCGTCCCAGCCCAGCCAGCGATGCC>kra
659005136:TTATGTGGGATTAAACTTGAAATTTCATT>kraken:taxid|290847|NC_017382.1 Helicoba

先删除那部分试试
sed ‘row1d;row2d’ .fa > .fa

 类似资料: