当前位置: 首页 > 工具软件 > Lima > 使用案例 >

使用SMRTlink分析转录组三代测序(PacBio Iso-seq)(未完成)

冯飞鹏
2023-12-01

摘要

测序三个样品:

sample1.subreads.bam			# 测序结果,一般都有几百GB的大小
sample1.subreads.bam.pbi		# 由pbindex根据bam文件生成的索引文件,与samtools index生成的索引文件不同
sample2.subreads.bam
sample2.subreads.bam.pbi
sample3.subreads.bam
sample3.subreads.bam.pbi

安装

流程

1、生成CCS序列,这一步比较耗时,可以适当分配多一些CPU:

ccs --skip-polish --min-passes 1 --min-rq 0.99 -j 40 sample1.subreads.bam ccs/samples1.bam
ccs --skip-polish --min-passes 1 --min-rq 0.99 -j 40 sample2.subreads.bam ccs/samples2.bam
ccs --skip-polish --min-passes 1 --min-rq 0.99 -j 40 sample3.subreads.bam ccs/samples3.bam

2、去除barcode序列:

lima --isoseq -j 20 ccs/sample1.bam primers.fasta demux/sample1.bam
lima --isoseq -j 20 ccs/sample2.bam primers.fasta demux/sample2.bam
lima --isoseq -j 20 ccs/sample3.bam primers.fasta demux/sample3.bam

其中,primer.fasta的内容为:

$ cat primers.fasta 
>primer_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>primer_3p
AAGCAGTGGTATCAACGCAGAGTAC

对于sample1,该步骤将输出以下文件:

demux/sample1.json
demux/sample1.lima.clips
demux/sample1.lima.counts
demux/sample1.lima.report
demux/sample1.lima.summary
demux/sample1.primer_5p--primer_3p.bam
demux/sample1.primer_5p--primer_3p.bam.pbi
demux/sample1.primer_5p--primer_3p.subreadset.xml

值得注意的是,理论上应该输出demux/sample1.bam文件,实际上却输出demux/sample1.primer_5p--primer_3p.bam,而多出来的primer_5p--primer_3p则是primers.fasta中的序列名称。

3、去除polyA(refine)

isoseq3 refine --require-polya --min-polya-length 20 -j 20 demux/sample1.primer_5p--primer_3p.bam primers.fasta flnc/sample1.bam
isoseq3 refine --require-polya --min-polya-length 20 -j 20 demux/sample2.primer_5p--primer_3p.bam primers.fasta flnc/sample2.bam
isoseq3 refine --require-polya --min-polya-length 20 -j 20 demux/sample3.primer_5p--primer_3p.bam primers.fasta flnc/sample3.bam

sample1为例,输出以下文件:

flnc/sample1.bam
flnc/sample1.bam.pbi
flnc/sample1.consensusreadset.xml
flnc/sample1.filter_summary.json
flnc/sample1.report.csv

4、聚类(Cluster)

isoseq3 cluster -j 20 flnc/sample1.bam clustered/sample1.bam
isoseq3 cluster -j 20 flnc/sample2.bam clustered/sample2.bam
isoseq3 cluster -j 20 flnc/sample3.bam clustered/sample3.bam

sample1为例,输出以下文件:

clustered/sample1.bam
clustered/sample1.bam.pbi
clustered/sample1.cluster
clustered/sample1.cluster_report.csv
clustered/sample1.fasta.gz
clustered/sample1.transcriptset.xml

5、抛光(Polishment)

isoseq3 polish -j 20 clustered/sample1.bam sample1.subreads.bam polished/sample1.bam
isoseq3 polish -j 20 clustered/sample2.bam sample2.subreads.bam polished/sample2.bam
isoseq3 polish -j 20 clustered/sample3.bam sample3.subreads.bam polished/sample3.bam

以sample1为例,输出以下文件:

polished/sample1.bam
polished/sample1.bam.pbi
polished/sample1.cluster_report.csv
polished/sample1.hq.fasta.gz
polished/sample1.hq.fastq.gz
polished/sample1.lq.fasta.gz
polished/sample1.lq.fastq.gz
polished/sample1.transcriptset.xml

6、比对(Alignment)

# Generate index
pbmm2 index -j 20 --preset ISOSEQ genome.fasta genome.isoseq.mmi
# Alignment
pbmm2 align -j 20 --preset ISOSEQ --sort genome.isoseq.mmi polished/sample1.bam aligned/sample1.bam
pbmm2 align -j 20 --preset ISOSEQ --sort genome.isoseq.mmi polished/sample2.bam aligned/sample2.bam
pbmm2 align -j 20 --preset ISOSEQ --sort genome.isoseq.mmi polished/sample3.bam aligned/sample3.bam

sample1为例,生成以下文件:

aligned/sample1.bam
aligned/sample1.bam.pbi

7、折叠(Collapse)

isoseq3 collapse -j 20 aligned/sample1.bam ccs/sample1.bam collapsed/sample1.gff
isoseq3 collapse -j 20 aligned/sample2.bam ccs/sample2.bam collapsed/sample2.gff
isoseq3 collapse -j 20 aligned/sample3.bam ccs/sample3.bam collapsed/sample3.gff

sample1为例,输出以下文件:

collapsed/sample1.abundance.txt
collapsed/sample1.fasta
collapsed/sample1.fastq
collapsed/sample1.gff
collapsed/sample1.group.txt
collapsed/sample1.read_stat.txt
collapsed/sample1.report.json

输出的GFF文件,实际上是GTF文件(GFF2.5,含有gene_idtranscript_id),但该命令的输出文件名又必须是.gff后缀。且是先transcript,后接该transcript下的exon,因此并没有按照坐标排序好,为了能够在IGV中可视化组装效果,需要做一些转换:

bedtools sort -i collapsed/sample1.gff | bgzip -c > collapsed/sample1.sorted.gtf.gz
tabix -p gff collapsed/sample1.sorted.gtf.gz

生成以下文件:

collapsed/sample1.sorted.gtf.gz
collapsed/sample1.sorted.gtf.gz.tbi
 类似资料: