当前位置: 首页 > 工具软件 > UniMRCP > 使用案例 >

基于Freeswitch + Unimrcp + 谷歌ASR 的语音识别的实现

全鸿晖
2023-12-01

准备:

  1. 你要有国际信用卡开通谷歌云服务的账号,并申请一个项目,然后使用ASR语音识别
  2. 装好一台Freeswitch服务器,并编译好unimrcp.so模块
  3. 装好一台centOS 7服务器,用来跑unimrcp

 

开始:

一,unimrcp服务器的安装和配置

官方文档:http://www.unimrcp.org/manuals/html/GoogleSRRPMInstallationManual.html

强烈建议看官方文档,科学而且规范.

 

  1. 安(下载)装unimrcp需要unimrcp官网的账号,不然你下载不了,而且后续的申请免费使用的许可证,也是要账号的。这个不花钱,申请吧。

账号注册地址:https://www.unimrcp.org/profile-registration

以上步骤我得到了我的账号:michael_cctv 密码:michael_password

 

  1. 使用YUM 方式来安装,但是要先添加unimrcp 的yum仓库

地址为 /etc/yum.repos.d/unimrcp.repo , 将官网申请的账号密码替换下面配置

 

[unimrcp]

name=UniMRCP Packages for Red Hat / Cent OS-$releasever $basearch

baseurl=https://username:password@unimrcp.org/repo/yum/main/rhel$releasever/$basearch/

enabled=1

sslverify=1

gpgcheck=1

gpgkey=https://unimrcp.org/keys/unimrcp-gpg-key.public

 

[unimrcp-noarch]

name=UniMRCP Packages for Red Hat / Cent OS-$releasever noarch

baseurl=https://username:password@unimrcp.org/repo/yum/main/rhel$releasever/noarch/

enabled=1

sslverify=1

gpgcheck=1

gpgkey=https://unimrcp.org/keys/unimrcp-gpg-key.public

 

验证是否能连接仓库:yum repolist unimrcp

yum repolist unimrcp-noarch

     查看有哪些安装包:

yum --disablerepo="*" --enablerepo="unimrcp" list available

yum --disablerepo="*" --enablerepo="unimrcp-noarch" list available

 

     2.安装谷歌ASR插件yum install unimrcp-gsr

In order to install the additional data files for the sample client application umc, the following command can be used (不是必须装,但是也装了吧) yum install umc-addons

 

(官网有手动安装RPM包的方法,步骤太多,有兴趣可以自己玩。)

 

3.申请许可证以及导入。

我们申请的是试用许可证,只有两个并发通道,只能试用一个月。

A,在unimrcp的安装目录下面有一个程序用来收集服务器信息的  /opt/unimrcp/bin/unilicnodegen 程序跑一次后会得到一个文本文件(名字是unimode.info

B, 申请试用的页面,将unimode.info文件作为附件发给官方,最慢隔天就能收到官方邮件发送回来的试用许可证。比如我收到的是 umsgsr_2469f3d1-f33d-4c58-9d83-f037ed1416dd.lic

   C, 将许可证放到指定目录。cp umsgsr_*.lic /opt/unimrcp/data

 

 

二, 申请谷歌云证书

     因为你要用Google Cloud Speech-to-Text API, 假设你的国际信用卡和谷歌云账号都已经办理OK 那么申请一个证书给unimrcp服务器来授权试用吧。

 

1.      打开 Cloud Platform Console.

https://console.cloud.google.com

2.      In the drop-down menu at the top, select a project My First Project created by default or create a new project.(创建项目)

   3.   项目计费 (新账号有300美元的赠送,用来测试足够了)

   4.   激活 Speech-to-Text API.

   5.   生成证书

5.1.      In the Google Cloud Platform Console, navigate to API & Services > Credentials > Create credentials > Service account key

5.2.      Under Service account, select New service account.

5.3.      Under Service account name, enter a service account name of your choice. For example, accessor.

5.4.      Under Role, select Project > Owner.

To better understand the Cloud IAM roles that you can grant to your service account to access Cloud Platform resources, check out the following page.

https://cloud.google.com/iam/docs/understanding-roles

5.5.      Under Key type, leave JSON selected.

5.6.      Click Create to create a new service account and download the json credentials file.

6. 证书安装。谷歌的授权证书是一个JSON文件。拷贝到指定unimrcp安装目录即可
    cp *.json /opt/unimrcp/data

 

 

三, unimrcp服务器插件配置。

为了谷歌ASRunimrcp服务器上能用,改/opt/unimrcp/conf/unimrcpserver.xml 文件

改《plugin-factory》,其他插件基本没用, 也可以关闭。

<!-- Factory of plugins (MRCP engines) -->

  <plugin-factory>

      <engine id="Demo-Synth-1" name="demosynth" enable="true"/>

      <engine id="Demo-Recog-1" name="demorecog" enable="false"/>

      <engine id="Demo-Verifier-1" name="demoverifier" enable="true"/>

      <engine id="Recorder-1" name="mrcprecorder" enable="true"/>

      <engine id="GSR-1" name="umsgsr" enable="true"/>

  </plugin-factory>

为了记录日志,改/opt/unimrcp/conf/logger.xml 文件

增加 <source name="GSR-PLUGIN" priority="INFO" masking="NONE"/>

 

谷歌ASR插件的配置在/opt/unimrcp/conf/umsgsr.xml. 默认配置基本能用,但是你要做单个词识别,还是句子识别,或者修改识别的时间,超时设置,都是在这个文件里面去改。里面有参数说明。多改多试,适合自己即可。

 

验证是否安装成功

  1. 加载unimrcp服务

cd /opt/unimrcp/bin

./unimrcpserver

跑起来了应该能看到这一句 

 [INFO]   Load Plugin [GSR-1] [/opt/unimrcp/plugin/umsgsr.so]

看到谷歌ASR插件许可证的情况

[NOTICE] UniMRCP GSR License

 

-product name:    umsgsr

-product version: 1.0.0

-license owner:   Name

-license type:    trial

-issue date:      2017-05-11

-exp date:        2017-06-10

-channel count:   2

-feature set:     0

[NOTICE] Set Google App Credentials /opt/unimrcp/data/My First Project-a78…c15.json

 

  1. 运行客户端来测试unimrcp

A, 加载unimrcp客户端

    cd /opt/unimrcp/bin

./umc

B,  运行 run gsr1  (其中gsr1是指存储在/opt/unimrcp/conf/umc-scenarios/gsr1.xml中的配置文件,文件里面配置了调用call steve 这个PCM文件去做识别)

控制台输出:

<?xml version="1.0"?>

<result>

  <interpretation grammar="command" confidence="0.92">

    <instance>call Steve</instance>

    <input mode="speech">call Steve</input>

  </interpretation>

</result>

 

 

五: Freeswitch的配置

Freeswitch的网卡IP 192.168.252.100,是个DMZ 地址,另外在局域网还映射了10.10.3.100这个局域网地址。 Unimrcp服务器的局域网地址10.10.80.173  (模拟了NAT环境)

 

/usr/local/freeswitch/conf/autoload_configs/unimrcp.conf.xml 配置如下。

<configuration name="unimrcp.conf" description="UniMRCP Client">

  <settings>

    <!-- UniMRCP profile to use for TTS -->

    <param name="default-tts-profile" value="unimrcpserver-mrcp-v2"/>

    <!-- UniMRCP profile to use for ASR -->

    <param name="default-asr-profile" value="uni2"/>

    <!-- UniMRCP logging level to appear in freeswitch.log.  Options are:

         EMERGENCY|ALERT|CRITICAL|ERROR|WARNING|NOTICE|INFO|DEBUG -->

    <param name="log-level" value="DEBUG"/>

    <!-- Enable events for profile creation, open, and close -->

    <param name="enable-profile-events" value="false"/>

 

    <param name="max-connection-count" value="100"/>

    <param name="offer-new-connection" value="1"/>

    <param name="request-timeout" value="60000"/>

  </settings>

 

  <profiles>

    <X-PRE-PROCESS cmd="include" data="../mrcp_profiles/*.xml"/>

  </profiles>

 

</configuration>

 

 

 

/usr/local/freeswitch/conf/mrcp_profiles目录下面有很多配置文件,不同配置文件对应了不同的unimrcp的引擎,不用的都可以删掉。我建了一个文件unimrcpserver-mrcp-v2.xml

 

<include>

 

  <!-- UniMRCP Server MRCPv2 -->

  <profile name="uni2" version="2">

    <param name="client-ext-ip" value="10.10.3.100"/>

    <param name="client-ip" value="192.168.252.100"/>

    <param name="client-port" value="16090"/>

    <param name="server-ip" value="10.10.80.173"/>

    <param name="server-port" value="8060"/>

    <!--param name="force-destination" value="1"/-->

    <param name="sip-transport" value="udp"/>

    <!--param name="ua-name" value="FreeSWITCH"/-->

    <!--param name="sdp-origin" value="FreeSWITCH"/-->

    <param name="rtp-ext-ip" value="10.10.3.100"/>

    <param name="rtp-ip" value="192.168.252.100"/>

    <param name="rtp-port-min" value="14000"/>

    <param name="rtp-port-max" value="15000"/>

    <!-- enable/disable rtcp support -->

    <param name="rtcp" value="0"/>

    <!-- rtcp bye policies (rtcp must be enabled first)

 

             0 - disable rtcp bye

             1 - send rtcp bye at the end of session

             2 - send rtcp bye also at the end of each talkspurt (input)

    -->

    <param name="rtcp-bye" value="2"/>

    <!-- rtcp transmission interval in msec (set 0 to disable) -->

    <param name="rtcp-tx-interval" value="5000"/>

    <!-- period (timeout) to check for new rtcp messages in msec (set 0 to disable) -->

    <param name="rtcp-rx-resolution" value="1000"/>

    <!--param name="playout-delay" value="50"/-->

    <!--param name="max-playout-delay" value="200"/-->

    <!--param name="ptime" value="20"/-->

    <param name="codecs" value="PCMU PCMA L16/96/8000"/>

 

    <!-- Add any default MRCP params for SPEAK requests here -->

    <synthparams>

    </synthparams>

    <!-- Add any default MRCP params for RECOGNIZE requests here -->

    <recogparams>

      <!--param name="start-input-timers" value="false"/-->

    </recogparams>

  </profile>

</include>

 

 

 

在dialplan中使用识别

<action application="play_and_detect_speech" data="ivr/ivr-welcome_to_freeswitch.wav detect:unimrcp:uni2 {start-input-timers=false}builtin:speech/transcribe"/>

 

 

<extension name="3566660">

      <condition field="destination_number" expression="^(3566660)$">

        <action application="answer"/>

<!--      <action application="detect_speech" data="unimrcp:uni2 {start-input-timers=false,input-timeout=60000,recognition-timeout=60000}builtin:speech/transcribe uni2"/>

        <action application="sleep" data="15000"/>

      <action application="play_and_detect_speech" data="ivr/ivr-welcome_to_freeswitch.wav detect:unimrcp:uni2 {start-input-timers=false,input-timeout=60000,recognition-timeout=60000}builtin:speech/transcribe?language=zh-CN"/>

-->

      <action application="play_and_detect_speech" data="ivr/ivr-welcome_to_freeswitch.wav detect:unimrcp:uni2 {start-input-timers=false,input-timeout=60000,recognition-timeout=60000}builtin:speech/transcribe?language=zh-CN"/>

         <action application="log" data="CRIT ${detect_speech_result}"/>

        <action application="hangup"/>

      </condition>

 </extension>

 

 <extension name="3566661">

      <condition field="destination_number" expression="^(3566661)$">

        <action application="answer"/>

      <action application="set" data="bind_meta_key=#"/>

      <action application="bind_meta_app" data="1 a s detect_speech::unimrcp:uni2 builtin:speech/transcribe uni2"/>

      <action application="bind_meta_app" data="2 a s lua::catch-event.lua"/>

        <action application="conference" data="$1"/>

        <action application="hangup"/>

      </condition>

 </extension>

 

 

划重点:由电话会议呼叫过来,作为一个单独的客户端去识别长语音,识别到的语音通过百度翻译输出多种文字。

    <extension name="MEETING">

      <condition field="destination_number" expression="^(3588888)$">

        <action application="set" data="call_timeout=130"/>

        <action application="lua" data="unimcrp.lua $1"/>

      </condition>

    </extension>

 类似资料: