kaldi 目录中egs/aishell/v1文件夹,就是使用ivector+PLDA方法的说话人识别小例子,数据集为aishell-1。
首先要自己明白大概的流程。
下面从run.sh开始,一句一句读代码。本人小白,linux shell也是现学的,见笑了。
#!/bin/bash
.<<EOF
linux shell 学习地址:https://www.runoob.com/linux/linux-shell.html
在一般情况下,人们并不区分 Bourne Shell 和 Bourne Again Shell,所以,像 #!/bin/sh,它同样也可以改为 #!/bin/bash。
#! 告诉系统其后路径所指定的程序即是解释此脚本文件的 Shell 程序。
是一个约定的标记,它告诉系统这个脚本需要什么解释器来执行,即使用哪一种 Shell。
EOF
# Copyright 2017 Beijing Shell Shell Tech. Co. Ltd. (Authors: Hui Bu)
# 2017 Jiayu Du
# 2017 Chao Li
# 2017 Xingyu Na
# 2017 Bengu Wu
# 2017 Hao Zheng
# Apache 2.0
# This is a shell script that we demonstrate speech recognition using AIShell-1 data.
# it's recommended that you run the commands one by one by copying and pasting into the shell.
# See README.txt for more info on data required.
# Results (EER) are inline in comments below
data=/export/a05/xna/data
data_url=www.openslr.org/resources/33
.<<EOF
https://www.runoob.com/linux/linux-shell-variable.html
上面两句为,变量赋值,引号可用可不用
注意,变量名和等号之间不能有空格,这可能和你熟悉的所有编程语言都不一样。
使用一个定义过的变量,只要在变量名前面加美元符号即可
变量名外面的花括号是可选的,加不加都行,加花括号是为了帮助解释器识别变量的边界
推荐给所有变量加上花括号,这是个好的编程习惯。
使用 readonly 命令可以将变量定义为只读变量,只读变量的值不能被改变。
使用 unset 命令可以删除变量
字符串是shell编程中最常用最有用的数据类型(除了数字和字符串,也没啥其它类型好用了),字符串可以用单引号,也可以用双引号,也可以不用引号。
单引号里的任何字符都会原样输出,单引号字符串中的变量是无效的;
双引号里可以有变量,双引号里可以出现转义字符
如何提取子字符串,查找子字符串
EOF
. ./cmd.sh
. ./path.sh
.<<EOF
类似python中的import,第一个点表示引用文件,
第二个点表示在当前目录下找,两个点中间一定要有空格
注意,第二个点一定要有,一定要写成 ./cmd.sh,而不是 cmd.sh,运行其它二进制的程序也一样,直接写 cmd.sh,linux 系统会去 PATH 里寻找有没有叫 cmd.sh 的,而只有 /bin, /sbin, /usr/bin,/usr/sbin 等在 PATH 里,你的当前目录通常不在 PATH 里,所以写成 cmd.sh 是会找不到命令的,要用 ./cmd.sh 告诉系统说,就在当前目录找。
EOF
set -e # exit on error
.<<EOF
https://blog.csdn.net/todd911/article/details/9954961
你写的每个脚本都应该在文件开头加上set -e,这句语句告诉bash如果任何语句的执行结果不是true则应该退出。
这样的好处是防止错误像滚雪球般变大导致一个致命的错误,而这些错误本应该在之前就被处理掉。
如果要增加可读性,可以使用set -o errexit,它的作用与set -e相同。
EOF
local/download_and_untar.sh $data $data_url data_aishell
local/download_and_untar.sh $data $data_url resource_aishell
.<<EOF
v1文件夹下面有个子文件夹叫local,这里是引用local文件夹下面的download_and_untar.sh文件,而且引用了两遍,也会执行两遍。看见美元符号,就应该想到是变量引用,而且肯定是之前定义过的变量。为啥会引用两遍呢?我们打开download_and_untar.sh文件看看。是要对文件data_aishell.tgz执行一遍下载并解压,对文件resource_aishell执行一遍下载并解压。linux中后缀名为sh的文件,就像是python中后缀名为py的文件,可以用notepad++打开并查看编辑。download_and_untar.sh这个文件在使用时可以有4个参数。
第一个是remove_archive,表示解压缩后是否移除压缩包,只保留解压后的文件,可以是false或true。
第二个是data-base,表示data的文件夹地址,run.sh中已经给了,data=/export/a05/xna/data,但是v1中没有export文件夹,应该是要自己新建这个目录,download_and_untar.sh里面有判断,没有的话会报错退出
第三个是url-base,表示下载数据的网址,但一般肯定很慢,还要翻墙,还是事先准备好数据的好
第四个是corpus-part,可以是data_aishell或resource
在使用download_and_untar.sh时,第一个参数可给可不给,后面三个应该都是要给的,但是这里只给了两个
EOF
# Data Preparation数据准备
local/aishell_data_prep.sh $data/data_aishell/wav $data/data_aishell/transcript
# 引用local文件夹下面的aishell_data_prep.sh文件,并给了两个参数
# 看见 AISHELL data preparation succeeded 就好了
# Now make MFCC features.
# mfccdir should be some place with a largish disk where you
# want to store MFCC features. 建议使用一个大容量的磁盘存储mfcc,
# 每个wav首先被分为很多帧,每帧用一串数字来表示,mfcc就是这串数字
mfccdir=mfcc
# 变量赋初值
for x in train test; do
steps/make_mfcc.sh --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir
sid/compute_vad_decision.sh --nj 10 --cmd "$train_cmd" data/$x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/$x
done
.<<EOF
make_mfcc.sh,主要,需要三个参数,数据所在文件夹,日志文件夹,存放mfcc特征的文件夹,--后面接的都是超参数设置
compute_vad_decision.sh也需要三个参数,"Usage: $0 [options] <data-dir> [<log-dir> [<vad-dir>]]",但可以只有一个参数
fix_data_dir.sh 只需要一个参数,下面是它的使用说明,
# This script makes sure that only the segments present in
# all of "feats.scp", "wav.scp" [if present], segments [if present]
# text, and utt2spk are present in any of them.
# It puts the original contents of data-dir into
# data-dir/.backup
echo "Usage: utils/data/fix_data_dir.sh <data-dir>"
echo "e.g.: utils/data/fix_data_dir.sh data/train"
echo "This script helps ensure that the various files in a data directory"
echo "are correctly sorted and filtered, for example removing utterances"
echo "that have no features (if feats.scp is present)"
EOF
# train diag ubm
sid/train_diag_ubm.sh --nj 10 --cmd "$train_cmd" --num-threads 16 \
data/train 1024 exp/diag_ubm_1024
.<<EOF
train_diag_ubm.sh中的介绍:
# This is a modified version of steps/train_diag_ubm.sh, specialized for
# speaker-id, that does not require to start with a trained model, that applies
# sliding-window CMVN, and that expects voice activity detection (vad.scp) in
# the data directory. We initialize the GMM using gmm-global-init-from-feats,
# which sets the means to random data points and then does some iterations of
# E-M in memory. After the in-memory initialization we train for a few
# iterations in parallel.
echo "Usage: $0 <data> <num-gauss> <output-dir>"
echo " e.g.: $0 data/train 1024 exp/diag_ubm"
echo "Options: "
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
echo " --nj <num-jobs|4> # number of parallel jobs to run."
echo " --num-iters <niter|20> # number of iterations of parallel "
echo " # training (default: $num_iters)"
echo " --stage <stage|-2> # stage to do partial re-run from."
echo " --num-gselect <n|30> # Number of Gaussians per frame to"
echo " # limit computation to, for speed"
echo " --subsample <n|5> # In main E-M phase, use every n"
echo " # frames (a speedup)"
echo " --num-frames <n|500000> # Maximum num-frames to keep in memory"
echo " # for model initialization"
echo " --num-iters-init <n|20> # Number of E-M iterations for model"
echo " # initialization"
echo " --initial-gauss-proportion <proportion|0.5> # Proportion of Gaussians to start with"
echo " # in initialization phase (then split)"
echo " --num-threads <n|32> # number of threads to use in initialization"
echo " # phase (must match with parallel-opts option)"
echo " --parallel-opts <string|'--num-threads 32'> # Option should match number of threads in"
echo " # --num-threads option above"
echo " --min-gaussian-weight <weight|0.0001> # min Gaussian weight allowed in GMM"
echo " # initialization (this relatively high"
echo " # value keeps counts fairly even)"
echo " --delta-window <n|3> # number of frames of context used to"
echo " # calculate delta"
echo " --delta-order <n|2> # number of delta features"
echo " --apply-cmn <true,false|true> # if true, apply sliding window cepstral mean"
echo " # normalization to features"
EOF
#train full ubm
sid/train_full_ubm.sh --nj 10 --cmd "$train_cmd" data/train \
exp/diag_ubm_1024 exp/full_ubm_1024
.<<EOF
# This trains a full-covariance UBM from an existing (diagonal or full) UBM,
# for a specified number of iterations. This is for speaker-id systems
# (we use features specialized for that, and vad).
echo "Usage: steps/train_full_ubm.sh <data> <old-ubm-dir> <new-ubm-dir>"
echo "Trains a full-covariance UBM starting from an existing diagonal or"
echo "full-covariance UBM system."
echo " e.g.: steps/train_full_ubm.sh --num-iters 8 data/train exp/diag_ubm exp/full_ubm"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
echo " --nj <n|16> # number of parallel training jobs"
echo " --num-gselect <n|20> # Number of Gaussians to select using"
echo " # initial model (diagonalized if needed)"
echo " --subsample <n|5> # Take every n'th sample, for efficiency"
echo " --num-iters <n|4> # Number of iterations of E-M"
echo " --min-gaussian-weight <weight|1.0e-05> # Minimum Gaussian weight (below this,"
echo " # we won't update, and will remove Gaussians"
echo " # if --remove-low-count-gaussians is true"
echo " --remove-low-count-gaussians <true,false|true> # If true, remove Gaussians below min-weight"
echo " # (will only happen on last iteration, in any case"
echo " --cleanup <true,false|true> # If true, clean up accumulators, intermediate"
echo " # models and gselect info"
exit 1;
echo " --apply-cmn <true,false|true> # if true, apply sliding window cepstral mean"
echo " # normalization to features"
EOF
#train ivector,
#ivector_extractor就是T矩阵
sid/train_ivector_extractor.sh --cmd "$train_cmd --mem 10G" \
--num-iters 5 exp/full_ubm_1024/final.ubm data/train \
exp/extractor_1024
.<<EOF
echo "Usage: $0 <fgmm-model> <data> <extractor-dir>"
echo " e.g.: $0 exp/ubm_2048_male/final.ubm data/train_male exp/extractor_male"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
echo " --num-iters <#iters|10> # Number of iterations of E-M"
echo " --nj <n|10> # Number of jobs (also see num-processes and num-threads)"
echo " --num-processes <n|4> # Number of processes for each queue job (relates"
echo " # to summing accs in memory)"
echo " --num-threads <n|4> # Number of threads for each process (can't be usefully"
echo " # increased much above 4)"
echo " --stage <stage|-4> # To control partial reruns"
echo " --num-gselect <n|20> # Number of Gaussians to select using"
echo " # diagonal model."
echo " --sum-accs-opt <option|''> # Option e.g. '-l hostname=a15' to localize"
echo " # sum-accs process to nfs server."
echo " --apply-cmn <true,false|true> # if true, apply sliding window cepstral mean"
echo " # normalization to features"
EOF
#extract ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
exp/extractor_1024 data/train exp/ivector_train_1024
.<<EOF
# This script extracts iVectors for a set of utterances, given
# features and a trained iVector extractor.
echo "Usage: $0 <extractor-dir> <data> <ivector-dir>"
echo " e.g.: $0 exp/extractor_2048_male data/train_male exp/ivectors_male"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
echo " --nj <n|10> # Number of jobs (also see num-threads)"
echo " --num-threads <n|1> # Number of threads for each job"
echo " --stage <stage|0> # To control partial reruns"
echo " --num-gselect <n|20> # Number of Gaussians to select using"
echo " # diagonal model."
echo " --min-post <min-post|0.025> # Pruning threshold for posteriors"
echo " --apply-cmn <true,false|true> # if true, apply sliding window cepstral mean"
echo " # normalization to features"
EOF
#train plda
$train_cmd exp/ivector_train_1024/log/plda.log \
ivector-compute-plda ark:data/train/spk2utt \
'ark:ivector-normalize-length scp:exp/ivector_train_1024/ivector.scp ark:- |' \
exp/ivector_train_1024/plda
.<<EOF
~kaldi/src/ivectorbin/ivector-compute-plda.cc
"Computes a Plda object (for Probabilistic Linear Discriminant Analysis)\n"
"from a set of iVectors. Uses speaker information from a spk2utt file\n"
"to compute within and between class variances.\n"
"\n"
"Usage: ivector-compute-plda [options] <spk2utt-rspecifier> <ivector-rspecifier> "
"<plda-out>\n"
"e.g.: \n"
" ivector-compute-plda ark:spk2utt ark,s,cs:ivectors.ark plda\n";
EOF
#split the test to enroll and eval
mkdir -p data/test/enroll data/test/eval
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/enroll
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/eval
local/split_data_enroll_eval.py data/test/utt2spk data/test/enroll/utt2spk data/test/eval/utt2spk
trials=data/test/aishell_speaker_ver.lst
local/produce_trials.py data/test/eval/utt2spk $trials
utils/fix_data_dir.sh data/test/enroll
utils/fix_data_dir.sh data/test/eval
.<<EOF
local/split_data_enroll_eval.py
# This script splits the test set utt2spk into enroll set and eval set
# For each speaker, 3 utterances are randomly selected as enroll samples,
# and the others are used as eval samples for evaluation
# input: test utt2spk
# output: enroll utt2spk, eval utt2spk
local/produce_trials.py
# This script generate trials file.
# Trial file is formatted as:
# uttid spkid target|nontarget
# If uttid belong to spkid, it is marked 'target',
# otherwise is 'nontarget'.
# input: eval set uttspk file
# output: trial file
utils/fix_data_dir.sh
# This script makes sure that only the segments present in
# all of "feats.scp", "wav.scp" [if present], segments [if present]
# text, and utt2spk are present in any of them.
# It puts the original contents of data-dir into
# data-dir/.backup
EOF
#extract enroll ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
exp/extractor_1024 data/test/enroll exp/ivector_enroll_1024
#extract eval ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
exp/extractor_1024 data/test/eval exp/ivector_eval_1024
<<EOF
# This script extracts iVectors for a set of utterances, given
# features and a trained iVector extractor.
echo "Usage: $0 <extractor-dir> <data> <ivector-dir>"
echo " e.g.: $0 exp/extractor_2048_male data/train_male exp/ivectors_male"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
echo " --nj <n|10> # Number of jobs (also see num-threads)"
echo " --num-threads <n|1> # Number of threads for each job"
echo " --stage <stage|0> # To control partial reruns"
echo " --num-gselect <n|20> # Number of Gaussians to select using"
echo " # diagonal model."
echo " --min-post <min-post|0.025> # Pruning threshold for posteriors"
echo " --apply-cmn <true,false|true> # if true, apply sliding window cepstral mean"
echo " # normalization to features"
EOF
#compute plda score
$train_cmd exp/ivector_eval_1024/log/plda_score.log \
ivector-plda-scoring --num-utts=ark:exp/ivector_enroll_1024/num_utts.ark \
exp/ivector_train_1024/plda \
ark:exp/ivector_enroll_1024/spk_ivector.ark \
"ark:ivector-normalize-length scp:exp/ivector_eval_1024/ivector.scp ark:- |" \
"cat '$trials' | awk '{print \\\$2, \\\$1}' |" exp/trials_out
.<<EOF
~kaldi/src/ivectorbin/ivector-plda-scoring.cc
"Computes log-likelihood ratios for trials using PLDA model\n"
"Note: the 'trials-file' has lines of the form\n"
"<key1> <key2>\n"
"and the output will have the form\n"
"<key1> <key2> [<dot-product>]\n"
"(if either key could not be found, the dot-product field in the output\n"
"will be absent, and this program will print a warning)\n"
"For training examples, the input is the iVectors averaged over speakers;\n"
"a separate archive containing the number of utterances per speaker may be\n"
"optionally supplied using the --num-utts option; this affects the PLDA\n"
"scoring (if not supplied, it defaults to 1 per speaker).\n"
"\n"
"Usage: ivector-plda-scoring <plda> <train-ivector-rspecifier> <test-ivector-rspecifier>\n"
" <trials-rxfilename> <scores-wxfilename>\n"
"\n"
"e.g.: ivector-plda-scoring --num-utts=ark:exp/train/num_utts.ark plda "
"ark:exp/train/spk_ivectors.ark ark:exp/test/ivectors.ark trials scores\n"
"See also: ivector-compute-dot-products, ivector-compute-plda\n";
EOF
#compute eer
awk '{print $3}' exp/trials_out | paste - $trials | awk '{print $1, $4}' | compute-eer -
# Result
# Scoring against data/test/aishell_speaker_ver.lst
# Equal error rate is 0.140528%, at threshold -12.018
exit 0