环境:Ubuntu18.0.4 python3.6
安装DeepSpeech: 会自动安装最新的版本
pip install deepspeech
或者,也可以指定版本:
pip install deepspeech~=0.9.3
● 首先wget获取deepspeech的model:这里选取最新的0.9.3
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
● 在wget语音数据,这里下载了 0.4.0版本的。
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz
● 然后将语言解压:
tar -xvf audio-0.4.1.tar.gz
● 然后执行下列语句:
deepspeech --model deepspeech-0.9.3-models.pbmm --audio audio/4507-16021-0012.wav
结果如下:
Loading model from file deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2022-10-05 23:10:21.707689: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0969s.
Running inference.
why should one hald on the way
Inference took 4.557s for 2.735s audio file.
输出的英文为: why should one hald on the way
同理可以输出其他audio对应的英文。
题外话:
可以查看deepspeech可执行的的命令:
deepspeech# deepspeech --help
usage: deepspeech [-h] --model MODEL [--scorer SCORER] --audio AUDIO [--beam_width BEAM_WIDTH] [--lm_alpha LM_ALPHA]
[--lm_beta LM_BETA] [--version] [--extended] [--json]
[--candidate_transcripts CANDIDATE_TRANSCRIPTS] [--hot_words HOT_WORDS]
Running DeepSpeech inference.
optional arguments:
-h, --help show this help message and exit
--model MODEL Path to the model (protocol buffer binary file)
--scorer SCORER Path to the external scorer file
--audio AUDIO Path to the audio file to run (WAV format)
--beam_width BEAM_WIDTH
Beam width for the CTC decoder
--lm_alpha LM_ALPHA Language model weight (lm_alpha). If not specified, use default from the scorer package.
--lm_beta LM_BETA Word insertion bonus (lm_beta). If not specified, use default from the scorer package.
--version Print version and exits
--extended Output string from extended metadata
--json Output json from metadata with timestamp of each word
--candidate_transcripts CANDIDATE_TRANSCRIPTS
Number of candidate transcripts to include in JSON output
--hot_words HOT_WORDS
Hot-words and their boosts.