目录
1. 环境准备
python 3.7以及下面的包
pip install torch torchaudio omegaconf
2. 下载并加载已经训练好的speech2text模型
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_stt',
language='en', # also available 'de', 'es'
device=device, )
3.下载一段音频来测试效果。
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
完整代码
import torch
from glob import glob
device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_stt',
language='en', # also available 'de', 'es'
device=device, )
(read_batch, split_into_batches,read_audio, prepare_model_input) = utils # see function signature for details
# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
device=device)
output = model(input)
for example in output:
print(decoder(example.cpu()))
运行结果
the boch canoe slit on the smooth planks blew the sheet to the dark blue background it's easy to tell a depth of a well four hours of steady work faced us
初步测试总结
英语的语速快慢影响结果的输出。你们也可以自己录一段英语,试一试。
1. 所需环境
window环境, python=3.7
2. 下载模型和测试用到的音频. 分别放入model文件夹和audio文件夹
https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
3. 创建一个新的虚拟环境(用Conda或者是virtualenv都可以).
4. 安装deepspeech
conda install deepspeech
5. 执行下面命令
deepspeech --model model/deepspeech-0.9.3-models.pbmm --scorer model/deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
输出结果
(speed2text_tf) PS ...speech2text_tf> deepspeech --model model/deepspeech-0.9.3-models.pbmm --scorer model/deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
Loading model from file model/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-12-24 13:24:09.108387: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0119s.
Loading scorer from files model/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.0107s.
Running inference.
experience proves this
Inference took 0.670s for 1.975s audio file.
6. 实时的语音转换(从mic到文字)
DeepSpeech-examples/mic_vad_streaming at r0.9 · mozilla/DeepSpeech-examples · GitHub
下载上面github里的mic_vad_streaming.py和requirements.txt
用下面命令安装所需要的包
pip install -r requirements.txt
执行下面命令
python mic_vad_streaming/mic_vad_streaming.py -m model/deepspeech-0.9.3-models.pbmm -s model/deepspeech-0.9.3-models.scorer
结果如下:
(speed2text_tf) PS ...speech2text_tf> python mic_vad_streaming/mic_vad_streaming.py -m model/deepspeech-0.9.3-models.pbmm -s model/deepspeech-0.9.3-models.scorer
Initializing model...
INFO:root:ARGS.model: model/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-12-24 13:51:58.003296: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:ARGS.scorer: model/deepspeech-0.9.3-models.scorer
Listening (ctrl-C to exit)...
Recognized: no
Recognized: he
Recognized: hear me
Recognized: hear me
Recognized: to
Recognized: for i think seven
但是说实话,效果很一般,可能和我的口音有关吧,我只能这样解释。
参考资料
Welcome to DeepSpeech’s documentation! — Mozilla DeepSpeech 0.9.3 documentation
1. 安装包
pip install SpeechRecognition
2. 实践代码
导入speech_recognition包并创建Recoginzer实例
import speech_recognition as sr
r = sr.Recognizer()
每个 Recognizer 实例都有七种方法,用于使用各种 API 从音频源识别语音。这些是:
recognize_bing()
: Microsoft Bing Speechrecognize_google()
: Google Web Speech APIrecognize_google_cloud()
: Google Cloud Speech - requires installation of the google-cloud-speech packagerecognize_houndify()
: Houndify by SoundHoundrecognize_ibm()
: IBM Speech to Textrecognize_sphinx()
: CMU Sphinx - requires installing PocketSphinxrecognize_wit()
: Wit.ai在这七个中,只有 recognize_sphinx() 可以离线使用 CMU Sphinx 引擎。其他六个都需要互联网连接。
监听你的麦克风
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(audio);
print('dd',audio);
具体代码中的其它api可以参考下面链接
The Ultimate Guide To Speech Recognition With Python – Real Python
完整的代码实例如下:
#!/usr/bin/env python3
# NOTE: this example requires PyAudio because it uses the Microphone class
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(audio);
print('dd',audio);
# recognize speech using Sphinx
try:
print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
# recognize speech using Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
try:
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
except sr.UnknownValueError:
print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Cloud Speech service; {0}".format(e))
# recognize speech using Wit.ai
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings
try:
print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY))
except sr.UnknownValueError:
print("Wit.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))
# recognize speech using Microsoft Bing Voice Recognition
BING_KEY = "INSERT BING API KEY HERE" # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
# recognize speech using Microsoft Azure Speech
AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE" # Microsoft Speech API keys 32-character lowercase hexadecimal strings
try:
print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY))
except sr.UnknownValueError:
print("Microsoft Azure Speech could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Azure Speech service; {0}".format(e))
# recognize speech using Houndify
HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE" # Houndify client IDs are Base64-encoded strings
HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE" # Houndify client keys are Base64-encoded strings
try:
print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY))
except sr.UnknownValueError:
print("Houndify could not understand audio")
except sr.RequestError as e:
print("Could not request results from Houndify service; {0}".format(e))
# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))
# recognize speech using whisper
try:
print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))
except sr.UnknownValueError:
print("Whisper could not understand audio")
except sr.RequestError as e:
print("Could not request results from Whisper")
3. pyaudio 安装问题解放方法
如你是mac m1的电脑,安装pyaudio时,如果遇到问题,可以参考下面博客
参考资料