从说话人识别demo开始学习kaldi--（1）run.sh

那正初
2023-12-01
kaldi 目录中egs/aishell/v1文件夹，就是使用ivector+PLDA方法的说话人识别小例子，数据集为aishell-1。
首先要自己明白大概的流程。
下面从run.sh开始，一句一句读代码。本人小白，linux shell也是现学的，见笑了。
#!/bin/bash
.<<EOF
linux shell 学习地址：https://www.runoob.com/linux/linux-shell.html
在一般情况下，人们并不区分 Bourne Shell 和 Bourne Again Shell，所以，像 #!/bin/sh，它同样也可以改为 #!/bin/bash。
#! 告诉系统其后路径所指定的程序即是解释此脚本文件的 Shell 程序。
是一个约定的标记，它告诉系统这个脚本需要什么解释器来执行，即使用哪一种 Shell。
EOF

# Copyright 2017 Beijing Shell Shell Tech. Co. Ltd. (Authors: Hui Bu)
#           2017 Jiayu Du
#           2017 Chao Li
#           2017 Xingyu Na
#           2017 Bengu Wu
#           2017 Hao Zheng
# Apache 2.0

# This is a shell script that we demonstrate speech recognition using AIShell-1 data.
# it's recommended that you run the commands one by one by copying and pasting into the shell.
# See README.txt for more info on data required.
# Results (EER) are inline in comments below

data=/export/a05/xna/data
data_url=www.openslr.org/resources/33
.<<EOF
https://www.runoob.com/linux/linux-shell-variable.html
上面两句为，变量赋值，引号可用可不用
注意，变量名和等号之间不能有空格，这可能和你熟悉的所有编程语言都不一样。
使用一个定义过的变量，只要在变量名前面加美元符号即可
变量名外面的花括号是可选的，加不加都行，加花括号是为了帮助解释器识别变量的边界
推荐给所有变量加上花括号，这是个好的编程习惯。
使用 readonly 命令可以将变量定义为只读变量，只读变量的值不能被改变。
使用 unset 命令可以删除变量
字符串是shell编程中最常用最有用的数据类型（除了数字和字符串，也没啥其它类型好用了），字符串可以用单引号，也可以用双引号，也可以不用引号。
单引号里的任何字符都会原样输出，单引号字符串中的变量是无效的；
双引号里可以有变量，双引号里可以出现转义字符
如何提取子字符串，查找子字符串
EOF


. ./cmd.sh
. ./path.sh
.<<EOF
类似python中的import，第一个点表示引用文件，
第二个点表示在当前目录下找，两个点中间一定要有空格

注意，第二个点一定要有，一定要写成 ./cmd.sh，而不是 cmd.sh，运行其它二进制的程序也一样，直接写 cmd.sh，linux 系统会去 PATH 里寻找有没有叫 cmd.sh 的，而只有 /bin, /sbin, /usr/bin，/usr/sbin 等在 PATH 里，你的当前目录通常不在 PATH 里，所以写成 cmd.sh 是会找不到命令的，要用 ./cmd.sh 告诉系统说，就在当前目录找。
EOF

set -e # exit on error
.<<EOF
https://blog.csdn.net/todd911/article/details/9954961
你写的每个脚本都应该在文件开头加上set -e,这句语句告诉bash如果任何语句的执行结果不是true则应该退出。
这样的好处是防止错误像滚雪球般变大导致一个致命的错误，而这些错误本应该在之前就被处理掉。
如果要增加可读性，可以使用set -o errexit，它的作用与set -e相同。
EOF



local/download_and_untar.sh $data $data_url data_aishell
local/download_and_untar.sh $data $data_url resource_aishell
.<<EOF
v1文件夹下面有个子文件夹叫local，这里是引用local文件夹下面的download_and_untar.sh文件，而且引用了两遍，也会执行两遍。看见美元符号，就应该想到是变量引用，而且肯定是之前定义过的变量。为啥会引用两遍呢？我们打开download_and_untar.sh文件看看。是要对文件data_aishell.tgz执行一遍下载并解压，对文件resource_aishell执行一遍下载并解压。linux中后缀名为sh的文件，就像是python中后缀名为py的文件，可以用notepad++打开并查看编辑。download_and_untar.sh这个文件在使用时可以有4个参数。
第一个是remove_archive，表示解压缩后是否移除压缩包，只保留解压后的文件，可以是false或true。
第二个是data-base，表示data的文件夹地址，run.sh中已经给了，data=/export/a05/xna/data，但是v1中没有export文件夹，应该是要自己新建这个目录，download_and_untar.sh里面有判断，没有的话会报错退出
第三个是url-base，表示下载数据的网址，但一般肯定很慢，还要翻墙，还是事先准备好数据的好
第四个是corpus-part，可以是data_aishell或resource
在使用download_and_untar.sh时，第一个参数可给可不给，后面三个应该都是要给的，但是这里只给了两个

EOF


# Data Preparation数据准备
local/aishell_data_prep.sh $data/data_aishell/wav $data/data_aishell/transcript
# 引用local文件夹下面的aishell_data_prep.sh文件，并给了两个参数
# 看见  AISHELL data preparation succeeded 就好了

# Now make MFCC  features.
# mfccdir should be some place with a largish disk where you
# want to store MFCC features. 建议使用一个大容量的磁盘存储mfcc，
# 每个wav首先被分为很多帧，每帧用一串数字来表示，mfcc就是这串数字

mfccdir=mfcc
# 变量赋初值

for x in train test; do
  steps/make_mfcc.sh --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir
  sid/compute_vad_decision.sh --nj 10 --cmd "$train_cmd" data/$x exp/make_mfcc/$x $mfccdir
  utils/fix_data_dir.sh data/$x
done
.<<EOF
make_mfcc.sh，主要，需要三个参数，数据所在文件夹，日志文件夹，存放mfcc特征的文件夹，--后面接的都是超参数设置
compute_vad_decision.sh也需要三个参数，"Usage: $0 [options] <data-dir> [<log-dir> [<vad-dir>]]"，但可以只有一个参数

fix_data_dir.sh 只需要一个参数，下面是它的使用说明，
# This script makes sure that only the segments present in
# all of "feats.scp", "wav.scp" [if present], segments [if present]
# text, and utt2spk are present in any of them.
# It puts the original contents of data-dir into
# data-dir/.backup

  echo "Usage: utils/data/fix_data_dir.sh <data-dir>"
  echo "e.g.: utils/data/fix_data_dir.sh data/train"
  echo "This script helps ensure that the various files in a data directory"
  echo "are correctly sorted and filtered, for example removing utterances"
  echo "that have no features (if feats.scp is present)"

EOF

# train diag ubm
sid/train_diag_ubm.sh --nj 10 --cmd "$train_cmd" --num-threads 16 \
  data/train 1024 exp/diag_ubm_1024
.<<EOF
train_diag_ubm.sh中的介绍：
# This is a modified version of steps/train_diag_ubm.sh, specialized for
# speaker-id, that does not require to start with a trained model, that applies
# sliding-window CMVN, and that expects voice activity detection (vad.scp) in
# the data directory.  We initialize the GMM using gmm-global-init-from-feats,
# which sets the means to random data points and then does some iterations of
# E-M in memory.  After the in-memory initialization we train for a few
# iterations in parallel.

  echo "Usage: $0  <data> <num-gauss> <output-dir>"
  echo " e.g.: $0 data/train 1024 exp/diag_ubm"
  echo "Options: "
  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
  echo "  --nj <num-jobs|4>                                # number of parallel jobs to run."
  echo "  --num-iters <niter|20>                           # number of iterations of parallel "
  echo "                                                   # training (default: $num_iters)"
  echo "  --stage <stage|-2>                               # stage to do partial re-run from."
  echo "  --num-gselect <n|30>                             # Number of Gaussians per frame to"
  echo "                                                   # limit computation to, for speed"
  echo " --subsample <n|5>                                 # In main E-M phase, use every n"
  echo "                                                   # frames (a speedup)"
  echo "  --num-frames <n|500000>                          # Maximum num-frames to keep in memory"
  echo "                                                   # for model initialization"
  echo "  --num-iters-init <n|20>                          # Number of E-M iterations for model"
  echo "                                                   # initialization"
  echo " --initial-gauss-proportion <proportion|0.5>       # Proportion of Gaussians to start with"
  echo "                                                   # in initialization phase (then split)"
  echo " --num-threads <n|32>                              # number of threads to use in initialization"
  echo "                                                   # phase (must match with parallel-opts option)"
  echo " --parallel-opts <string|'--num-threads 32'>       # Option should match number of threads in"
  echo "                                                   # --num-threads option above"
  echo " --min-gaussian-weight <weight|0.0001>             # min Gaussian weight allowed in GMM"
  echo "                                                   # initialization (this relatively high"
  echo "                                                   # value keeps counts fairly even)"
  echo " --delta-window <n|3>                              # number of frames of context used to"
  echo "                                                   # calculate delta"
  echo " --delta-order <n|2>                               # number of delta features"
  echo " --apply-cmn <true,false|true>                     # if true, apply sliding window cepstral mean"
  echo "                                                   # normalization to features"
  
EOF


#train full ubm
sid/train_full_ubm.sh --nj 10 --cmd "$train_cmd" data/train \
  exp/diag_ubm_1024 exp/full_ubm_1024
.<<EOF
# This trains a full-covariance UBM from an existing (diagonal or full) UBM,
# for a specified number of iterations.  This is for speaker-id systems
# (we use features specialized for that, and vad).

  echo "Usage: steps/train_full_ubm.sh <data> <old-ubm-dir> <new-ubm-dir>"
  echo "Trains a full-covariance UBM starting from an existing diagonal or"
  echo "full-covariance UBM system."
  echo " e.g.: steps/train_full_ubm.sh --num-iters 8 data/train exp/diag_ubm exp/full_ubm"
  echo "main options (for others, see top of script file)"
  echo "  --config <config-file>                           # config containing options"
  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
  echo "  --nj <n|16>                                      # number of parallel training jobs"
  echo "  --num-gselect <n|20>                             # Number of Gaussians to select using"
  echo "                                                   # initial model (diagonalized if needed)"
  echo "  --subsample <n|5>                                # Take every n'th sample, for efficiency"
  echo "  --num-iters <n|4>                                # Number of iterations of E-M"
  echo "  --min-gaussian-weight <weight|1.0e-05>           # Minimum Gaussian weight (below this,"
  echo "                                                   # we won't update, and will remove Gaussians"
  echo "                                                   # if --remove-low-count-gaussians is true"
  echo "  --remove-low-count-gaussians <true,false|true>   # If true, remove Gaussians below min-weight"
  echo "                                                   # (will only happen on last iteration, in any case"
  echo "  --cleanup <true,false|true>                      # If true, clean up accumulators, intermediate"
  echo "                                                   # models and gselect info"
  exit 1;
  echo " --apply-cmn <true,false|true>                     # if true, apply sliding window cepstral mean"
  echo "                                                   # normalization to features"
  
EOF


#train ivector，
#ivector_extractor就是T矩阵
sid/train_ivector_extractor.sh --cmd "$train_cmd --mem 10G" \
  --num-iters 5 exp/full_ubm_1024/final.ubm data/train \
  exp/extractor_1024
.<<EOF
  echo "Usage: $0 <fgmm-model> <data> <extractor-dir>"
  echo " e.g.: $0 exp/ubm_2048_male/final.ubm data/train_male exp/extractor_male"
  echo "main options (for others, see top of script file)"
  echo "  --config <config-file>                           # config containing options"
  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
  echo "  --num-iters <#iters|10>                          # Number of iterations of E-M"
  echo "  --nj <n|10>                                      # Number of jobs (also see num-processes and num-threads)"
  echo "  --num-processes <n|4>                            # Number of processes for each queue job (relates"
  echo "                                                   # to summing accs in memory)"
  echo "  --num-threads <n|4>                              # Number of threads for each process (can't be usefully"
  echo "                                                   # increased much above 4)"
  echo "  --stage <stage|-4>                               # To control partial reruns"
  echo "  --num-gselect <n|20>                             # Number of Gaussians to select using"
  echo "                                                   # diagonal model."
  echo "  --sum-accs-opt <option|''>                       # Option e.g. '-l hostname=a15' to localize"
  echo "                                                   # sum-accs process to nfs server."
  echo " --apply-cmn <true,false|true>                     # if true, apply sliding window cepstral mean"
  echo "                                                   # normalization to features"
EOF



#extract ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/train exp/ivector_train_1024
.<<EOF
# This script extracts iVectors for a set of utterances, given
# features and a trained iVector extractor.

  echo "Usage: $0 <extractor-dir> <data> <ivector-dir>"
  echo " e.g.: $0 exp/extractor_2048_male data/train_male exp/ivectors_male"
  echo "main options (for others, see top of script file)"
  echo "  --config <config-file>                           # config containing options"
  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
  echo "  --nj <n|10>                                      # Number of jobs (also see num-threads)"
  echo "  --num-threads <n|1>                              # Number of threads for each job"
  echo "  --stage <stage|0>                                # To control partial reruns"
  echo "  --num-gselect <n|20>                             # Number of Gaussians to select using"
  echo "                                                   # diagonal model."
  echo "  --min-post <min-post|0.025>                      # Pruning threshold for posteriors"
  echo " --apply-cmn <true,false|true>                     # if true, apply sliding window cepstral mean"
  echo "                                                   # normalization to features"

EOF


#train plda
$train_cmd exp/ivector_train_1024/log/plda.log \
  ivector-compute-plda ark:data/train/spk2utt \
  'ark:ivector-normalize-length scp:exp/ivector_train_1024/ivector.scp  ark:- |' \
  exp/ivector_train_1024/plda
.<<EOF
~kaldi/src/ivectorbin/ivector-compute-plda.cc

        "Computes a Plda object (for Probabilistic Linear Discriminant Analysis)\n"
        "from a set of iVectors.  Uses speaker information from a spk2utt file\n"
        "to compute within and between class variances.\n"
        "\n"
        "Usage:  ivector-compute-plda [options] <spk2utt-rspecifier> <ivector-rspecifier> "
        "<plda-out>\n"
        "e.g.: \n"
        " ivector-compute-plda ark:spk2utt ark,s,cs:ivectors.ark plda\n";
EOF


#split the test to enroll and eval
mkdir -p data/test/enroll data/test/eval
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/enroll
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/eval
local/split_data_enroll_eval.py data/test/utt2spk  data/test/enroll/utt2spk  data/test/eval/utt2spk
trials=data/test/aishell_speaker_ver.lst
local/produce_trials.py data/test/eval/utt2spk $trials
utils/fix_data_dir.sh data/test/enroll
utils/fix_data_dir.sh data/test/eval

.<<EOF
local/split_data_enroll_eval.py
# This script splits the test set utt2spk into enroll set and eval set
# For each speaker, 3 utterances are randomly selected as enroll samples,
# and the others are used as eval samples for evaluation
# input: test utt2spk
# output: enroll utt2spk, eval utt2spk

local/produce_trials.py 
# This script generate trials file.
# Trial file is formatted as:
# uttid spkid target|nontarget

# If uttid belong to spkid, it is marked 'target',
# otherwise is 'nontarget'.
# input: eval set uttspk file
# output: trial file

utils/fix_data_dir.sh
# This script makes sure that only the segments present in
# all of "feats.scp", "wav.scp" [if present], segments [if present]
# text, and utt2spk are present in any of them.
# It puts the original contents of data-dir into
# data-dir/.backup

EOF




#extract enroll ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/enroll  exp/ivector_enroll_1024
#extract eval ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/eval  exp/ivector_eval_1024
<<EOF
# This script extracts iVectors for a set of utterances, given
# features and a trained iVector extractor.

  echo "Usage: $0 <extractor-dir> <data> <ivector-dir>"
  echo " e.g.: $0 exp/extractor_2048_male data/train_male exp/ivectors_male"
  echo "main options (for others, see top of script file)"
  echo "  --config <config-file>                           # config containing options"
  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
  echo "  --nj <n|10>                                      # Number of jobs (also see num-threads)"
  echo "  --num-threads <n|1>                              # Number of threads for each job"
  echo "  --stage <stage|0>                                # To control partial reruns"
  echo "  --num-gselect <n|20>                             # Number of Gaussians to select using"
  echo "                                                   # diagonal model."
  echo "  --min-post <min-post|0.025>                      # Pruning threshold for posteriors"
  echo " --apply-cmn <true,false|true>                     # if true, apply sliding window cepstral mean"
  echo "                                                   # normalization to features"

EOF


#compute plda score
$train_cmd exp/ivector_eval_1024/log/plda_score.log \
  ivector-plda-scoring --num-utts=ark:exp/ivector_enroll_1024/num_utts.ark \
  exp/ivector_train_1024/plda \
  ark:exp/ivector_enroll_1024/spk_ivector.ark \
  "ark:ivector-normalize-length scp:exp/ivector_eval_1024/ivector.scp ark:- |" \
  "cat '$trials' | awk '{print \\\$2, \\\$1}' |" exp/trials_out
.<<EOF
~kaldi/src/ivectorbin/ivector-plda-scoring.cc

        "Computes log-likelihood ratios for trials using PLDA model\n"
        "Note: the 'trials-file' has lines of the form\n"
        "<key1> <key2>\n"
        "and the output will have the form\n"
        "<key1> <key2> [<dot-product>]\n"
        "(if either key could not be found, the dot-product field in the output\n"
        "will be absent, and this program will print a warning)\n"
        "For training examples, the input is the iVectors averaged over speakers;\n"
        "a separate archive containing the number of utterances per speaker may be\n"
        "optionally supplied using the --num-utts option; this affects the PLDA\n"
        "scoring (if not supplied, it defaults to 1 per speaker).\n"
        "\n"
        "Usage: ivector-plda-scoring <plda> <train-ivector-rspecifier> <test-ivector-rspecifier>\n"
        " <trials-rxfilename> <scores-wxfilename>\n"
        "\n"
        "e.g.: ivector-plda-scoring --num-utts=ark:exp/train/num_utts.ark plda "
        "ark:exp/train/spk_ivectors.ark ark:exp/test/ivectors.ark trials scores\n"
        "See also: ivector-compute-dot-products, ivector-compute-plda\n";
		
EOF



#compute eer
awk '{print $3}' exp/trials_out | paste - $trials | awk '{print $1, $4}' | compute-eer -



# Result
# Scoring against data/test/aishell_speaker_ver.lst
# Equal error rate is 0.140528%, at threshold -12.018

exit 0
从说话人识别demo开始学习kaldi--（1）run.sh

相关阅读

相关文章

相关问答

相关文档