bwa(Burrows-Wheeler Aligner)
# 软件下载
git clone https://github.com/lh3/bwa.git
cd bwa;
# bwa构建索引
bwa index [options] <in.fasta>
-a STR BWT construction algorithm: bwtsw, is or rb2 [auto]
-p STR prefix of the index [same as fasta name] # 构建的参考基因组index带有和fasta文件同名的前缀
-b INT block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
-6 index files named as <in.fasta>.64.* instead of <in.fasta>.*
# 示例
bwa index rRNA.homo_sapiens.fa
# bwa-mem reads 测序读长大于70bp或者组装后的contigs碱基比对到相近的参考基因组
bwa mem [options] <idxbase> <in1.fq> [in2.fq]
bwa mem [-aCHMpP] [-t nThreads] [-k minSeedLen] [-w bandWidth] [-d zDropoff] [-r seedSplitRatio] [-c maxOcc] [-A matchScore] [-B mmPenalty] [-O gapOpenPen] [-E gapExtPen] [-L clipPen] [-U unpairPen] [-R RGline] [-v verboseLevel] db.prefix reads.fq [mates.fq]
Algorithm options:
-t INT number of threads [1] # 指定线程数
-k INT minimum seed length [19] # 最小比对长度
-w INT band width for banded alignment [100] #
-d INT off-diagonal X-dropoff [100]
-r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
-y INT seed occurrence for the 3rd round seeding [20]
-c INT skip seeds with more than INT occurrences [500]
-D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
-W INT discard a chain if seeded bases shorter than INT [0]
-m INT perform at most INT rounds of mate rescues for each read [50]
-S skip mate rescue
-P skip pairing; mate rescue performed unless -S also in use
Scoring options:
-A INT score for a sequence match, which scales options -TdBOELU unless overridden [1]
-B INT penalty for a mismatch [4]
-O INT[,INT] gap open penalties for deletions and insertions [6,6]
-E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
-L INT[,INT] penalty for 5'- and 3'-end clipping [5,5]
-U INT penalty for an unpaired read pair [17]
-x STR read type. Setting -x changes multiple parameters unless overridden [null]
pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 (PacBio reads to ref)
ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0 (Oxford Nanopore 2D-reads to ref)
intractg: -B9 -O16 -L5 (intra-species contigs to ref)
Input/output options:
-p smart pairing (ignoring in2.fq)
-R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
-H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null]
-o FILE sam file to output results to [stdout]
-j treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
-5 for split alignment, take the alignment with the smallest coordinate as primary
-q don't modify mapQ of supplementary alignments
-K INT process INT input bases in each batch regardless of nThreads (for reproducibility) []
-v INT verbosity level: 1=error, 2=warning, 3=message, 4+=debugging [3]
-T INT minimum score to output [30]
-h INT[,INT] if there are <INT hits with score >80% of the max score, output all in XA [5,200]
-a output all alignments for SE or unpaired PE
-C append FASTA/FASTQ comment to SAM output
-V output the reference FASTA header in the XR tag
-Y use soft clipping for supplementary alignments
-M mark shorter split hits as secondary(可用于兼容picard)
specify the mean, standard deviation (10% of the mean if absent), max
(4 sigma from the mean if absent) and min of the insert size distribution.
FR orientation only. [inferred]
# 二代测序平台
bwa mem ref.fa reads.fq > aln-se.sam
bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
# 示例
bwa mem -t 10 -M ref.fa read1.fq read2.fq 1> aln.sam 2>>bwa.log
# pacbio/ont
bwa mem -x pacbio ref.fa reads.fq > aln.sam
bwa mem -x ont2d ref.fa reads.fq > aln.sam
# bwa-backtrack 适用于测序读长70bp左右
# 单端测序序列比对
bwa aln ref.fa short_read.fq > aln_sa.sai
bwa samse ref.fa aln_sa.sai short_read.fq > aln-se.sam
# 双端测序序列比对
bwa aln ref.fa read1.fq > aln_sa1.sai
bwa aln ref.fa read2.fq > aln_sa2.sai
bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sam
bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] <in.db.fasta> <in.query.fq> > <out.sai>
bwa samse [-n maxOcc] <in.db.fasta> <in.sai> <in.fq> > <out.sam>
bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis] [-P] <in.db.fasta> <in1.sai> <in2.sai> <in1.fq> <in2.fq> > <out.sam>
# bwa-sw
bwa bwasw ref.fa long_read.fq > aln.sam
bwa bwasw [-a matchScore] [-b mmPen] [-q gapOpenPen] [-r gapExtPen] [-t nThreads] [-w bandWidth] [-T thres] [-s hspIntv] [-z zBest] [-N nHspRev] [-c thresCoef] <in.db.fasta> <in.fq> [mate.fq] > aln.sam
# bowtie2安装
conda install bowtie2
2.git 安装
git clone https://github.com/BenLangmead/bowtie2.git
# bowtie2文档说明
bowtie2-build ref.fa ref
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | --sra-acc <acc> | b <bam>} -S [<sam>]
-f Reads (specified with <m1>, <m2>, <s>) are FASTA files. FASTA files usually have extension .fa, .fasta, .mfa, .fna or similar. FASTA files do not have a way of specifying quality values, so when -f is set, the result is as if --ignore-quals is also set.
--end-to-end 要求从一端到另一端的全部读对齐,而不需要从两端对字符进行任何修整
--maxins 有效双端对齐的最大片段长度
1 The read is one of a pair
2 The alignment is one end of a proper paired-end alignment
4 The read has no reported alignments
8 The read is one of a pair and has no reported alignments
16 The alignment is to the reverse reference strand
32 The other mate in the paired-end alignment is aligned to the reverse reference strand
64 The read is mate 1 in a pair
128 The read is mate 2 in a pair
Thus, an unpaired read that aligns to the reverse reference strand will have flag 16. A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).
bowtie2-inspect # index文件名查找
bowtie2-inspect [options]* <bt2_base>
--large-index force inspection of the 'large' index, even if a
'small' one is present.
-a/--across <int> Number of characters across in FASTA output (default: 60)
-n/--names Print reference sequence names only
-s/--summary Print summary incl. ref names, lengths, index properties
-e/--bt2-ref Reconstruct reference from .bt2 (slow, preserves colors)
# 示例
/opt/biotools/bowtie2-2.2.9/bowtie2-inspect -a 80 rRNA.homo_sapiens.fa >> rRNA.homo_sapiens.fa
# 软件下载
# boost 下载
wget https://dl.bintray.com/boostorg/release/1.71.0/source/boost_1_71_0.tar.gz
tar -zxvf boost_1_71_0.tar.gz
/bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> link=static runtime-link=static stage install # 自定义安装路径 默认安装路径时/usr/local
./bz # 安装
# tophat2 下载安装
# source code
wget -c http://ccb.jhu.edu/software/tophat/downloads/tophat-2.1.1.tar.gz
# binary
wget -c http://ccb.jhu.edu/software/tophat/downloads/tophat-2.1.1.Linux_x86_64.tar.gz
tar zxvf tophat-2.1.1.Linux_x86_64.tar.gz
cd tophat-2.1.1/
./configure --prefix=<install_prefix> --with-boost=<boost_install_prefix> --with-bam=<samtools_install_prefix>
make install
# hisat2 下载
# wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip
wget -c http://ccb.jhu.edu/software/hisat2/downloads/hisat2-2.0.0-beta-source.zip
unzip hisat2-2.0.0-beta-source.zip
cd hisat2-2.0.0
# 添加到环境中
cp /install/path/hisat2* ~/bin/
hisat2 [options]* -x <hisat2-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [-S <hit>]