【语音识别】日语语音识别系统Julius(v4.4)的基于DNN的识别（5月8号：识别结果更新）

阳英朗

2023-12-01

发现国内对于Julius的资料太少了，现在补充一下。Julius最新更新于2016.9，加入了基于DNN的识别，但实际使用的时候发现有很多必要条件并没有在homepage上标明出来。现在做一个00readme-DNN(←)的翻译。日本人的英语很多语法问题，特地附上原文。

A. Julius and DNN-HMM
======================

From 4.4, Julius can perform DNN-HMM based recognition in two ways:

1. standalone: directly compute DNN for HMM inside Julius (>= 4.4) //1.单机：直接为HMM构建DNN在julius里（版本>= 4.4）【本文仅翻译这一块】

2. network: receive state probabilities calculated by other process
via socket (<= 4.3.1)

Both are described below.

A.1. Standalone mode
=====================

From version 4.4, Julius is capable of performing DNN-HMM based recognition by itself. It can read a DNN definition along with a HMM, and can compute the network against input (spliced) feature vectors and output the node scores of output layer for each frame, which will be used as output probabilities of corresponding HMM states in the
HMM. All computation will be done in a single process.

// 从版本4.4开始，julius可以选择DNN-HMM进行识别。julius中的HMM可以读取一个DNN的定义，并且能够使用输入（拼接）特征向量建立网络并输出每一帧的输出层的node scores。这将被用作HMM输出可能性。所有构建是单线程。

Note that the current implementation is very simple and limited. Only basic functions are implemented for NN. Any number of hidden layers can be defined, but the number of the nodes in the hidden layers should be the same. No batch computation is performed: all frame-wise. SIMD instruction (Intel AVX) is used to speed up the computation. Only tested on Windows and Ubuntu on Intel PC.See "libsent/src/phmm/calc_dnn.c" for the actual implementation.

//注意的是，目前是非常简单和有限的功能。只有NN的基本功能。隐藏层数可以定义，但与隐藏层中的node数应该相同。没有bach供选择：所有帧长。SIMD指令（ntel AVX）被用作加速这个构建。只在Intel PC的Windows和Ubuntu进行了测试。看 "libsent/src/phmm/calc_dnn.c" 可以得到实际的更新信息。

o run, you need // 你需要

1) an HMM AM (GMM defs are ignored, only its structure is used) //一个HMM声学模型
2) a DNN definition that corresponds to 1) //一个与上1一致的DNN定义
3) ".dnnconf" configuration file (text) // ".dnnconf"

The .dnnconf file specifies the parameters, options, DNN definition files, and other parameters all relating to DNN computation. A sample file is located in the top directory of Julius archive as "Sample.dnnconf".

// ".dnnconf"文件写明了参数，选项，DNN定义文件和其他与构建DNN相关的参数。给了一个样例在 "Sample.dnnconf"。

The matrix/vector definitions should be given in ".npy" format(i. e. python's "NumPy.save" format). Only 32bit-float little endian datatype is acceptable.

//矩阵向量应该定义成".npy" 形式（比如python's "NumPy.save" ）。只有32bit 小端数据类型被接受。

To prepare a model for DNN-HMM, note that the orders are important.The order of the output nodes in the DNN should be the order of HMM state definition id. If not, Julius won't work properly.

//顺序很重要。DNN的输出是HMM的状态定义。否则，无法正确运行。

Julius uses SIMD instruction for internal DNN computation. For Intel CPU, dispatch function for several Intel SIMD instruction sets (SSE, AVX and FMA) are implemented. You need gcc-4.7 or later to compile all the codes. They are all compiled and built-in into Julius, and will be determined which one to use at run time. Run "julius -setting" and see which code will be used on your cpu. AVX can be run on Sandy Bridge, and FMA on Haswell, later one will run faster. And for ARM architecture, you can enable NEON SIMD codes by adding "--enable-neon" to configure.

//Julius在DNN构建中使用的是SIMD指令。对于Intel CPU,有很多类型的指令类型（SSE, AVX and FMA）。你需要至少gcc-4.7或更高版本。Julius已经包含这些了，你可以定义用哪个在运行的时候。运行"julius -setting" ，看什么code类型将被用在你的cpu。 AVX can be run on Sandy Bridge, and FMA on Haswell, later one will run faster. And for ARM architecture, you can enable NEON SIMD codes by adding "--enable-neon" to configure.

--------------------------------

自己的感觉就是更新了很局限的一些功能，尝试后发现出error，找不到原因才仔细去读这些说明文件发现有很多限定条件。大家多注意。

--------------------------------

5.8更新：

【重要】4.4版本这个DNN-HMM声学模型（.SID）在使用的时候,老版本（4.3）

julius.dnnconf DNN(Julius単体)版の特徴量変換設定ファイル

这个是没有的，4.4一定注意要用上这个，否则会一直提示你的特征量输入不对。

在32Bit服务器上跑完了，大概2W条语音用了35小时左右，对比了4.3版本的结果发现是有不一样的，自己筛选几条来看识别结果是要好些，等识别率计算好了再写上来。

【语音识别】日语语音识别系统Julius(v4.4)的基于DNN的识别（5月8号：识别结果更新）

相关阅读

相关文章

相关问答

相关文档