讯飞WebAPI语音听写流式接口,用于1分钟内的即时语音转文字技术,支持实时返回识别结果,达到一边上传音频一边获得识别文本的效果。开启动态修正的好处是能提高识别效果的准确度。 官方网站控制台在线测试的URL:https://www.xfyun.cn/services/voicedictation,拿网页上的识别效果跟不开启动态修正的识别结果进行对比,会发现如果不开启动态识别,那么识别出的结果和网页上的结果相差甚远。并且能清楚的看到网页上已经输出好的文字会变化,很明显网页的结果是开启动态识别的。
代码中在握手成功后的第一帧请求时带上动态修正参数后(dwa=wpgs),说同一段声音源,对比控制台和代码的识别结果:
fun firstFrame(
audio: String,
@LanguageCode
language: String
): RecognizeRequest {
return RecognizeRequest(
common = Common(APP_ID),
business = Business(
language = when (language) {
LanguageCode.JP -> LANGUAGE_JP
LanguageCode.EN -> LANGUAGE_EN
LanguageCode.CN -> LANGUAGE_CN
else -> error("Unsupported language.")
},
dwa = "wpgs"
),
data = RequestData(
status = STATUS_FIRST,
audio = audio
)
)
}
数据源原文:语音分析,自然语言,处理内容审核图像识别,人脸识别,文字识别,语音硬件,医疗服务,基础服务。
控制台输出结果:语音分析,自然语言处理内容审核图像识别,人脸识别,文字识别,语音硬件,医疗服务,技术服务。
代码中打印识别的response结果:
2021-03-01 15:56:16.026 21006-21385/co.logre V/dynamicResult: dynamicResult: 语音
2021-03-01 15:56:16.478 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析
2021-03-01 15:56:17.123 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析自然
2021-03-01 15:56:17.281 21006-21108/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言
2021-03-01 15:56:17.450 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理
2021-03-01 15:56:18.085 21006-21394/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀
2021-03-01 15:56:18.404 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容
2021-03-01 15:56:18.568 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容是
2021-03-01 15:56:18.879 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核
2021-03-01 15:56:19.521 21006-21107/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像
2021-03-01 15:56:19.839 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别
2021-03-01 15:56:20.526 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸
2021-03-01 15:56:20.640 21006-21410/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别
2021-03-01 15:56:21.440 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别文字
2021-03-01 15:56:21.602 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别文字的
2021-03-01 15:56:22.562 21006-21386/co.logre V/dynamicResult: dynamicResult: 语音分析,自然语言处理,内容审核,图像识别,人脸识别文字识别
2021-03-01 15:56:22.901 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音
2021-03-01 15:56:23.060 21006-21404/co.logre V/dynamicResult: dynamicResult: 语音的
2021-03-01 15:56:23.396 21006-21404/co.logre V/dynamicResult: dynamicResult: 语音
2021-03-01 15:56:23.545 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音邮件
2021-03-01 15:56:24.029 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要
2021-03-01 15:56:24.187 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要服务员
2021-03-01 15:56:24.506 21006-21386/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要扶
2021-03-01 15:56:24.972 21006-21405/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要扶自己祝福
2021-03-01 15:56:25.915 21006-21109/co.logre V/dynamicResult: dynamicResult: ,语音硬件医疗服务及服务
2021-03-01 15:56:31.698 21006-21387/co.logre V/dynamicResult: dynamicResult:
从日志能发现,动态修正识别的颗粒度更加精细化,在识别的过程中会有较为精确的翻译,下面贴出具体的json值去观察分析(ps:解析用的是vad版本的json,即非动态结果的json去解析,动态修正的json跟不用动态的json相比,多了“pgs”,"rg",少了“vad”,多出的这两个参数跟识别的关系并不密切,故下面的json结果不影响识别):
package co.logre.service.dto
import android.util.Log
import com.squareup.moshi.Json
import com.squareup.moshi.JsonClass
@JsonClass(generateAdapter = true)
data class RecognizeResponse(
// 会话的id,只在握手成功后第一帧请求时返回
@Json(name = "sid") val sid: String?,
// 返回码,0表示成功,其它表示异常
@Json(name = "code") val code: Int,
// 错误描述
@Json(name = "message") val message: String?,
// 听写结果
@Json(name = "data") val data: ResponseData?
)
@JsonClass(generateAdapter = true)
data class ResponseData(
// 识别结果是否结束标识
@Json(name = "status") val status: Int,
// 听写识别结果
@Json(name = "result") val result: ResponseResult?
)
fun ResponseData?.linkResults(): String {
this ?: return "<NoData>"
result ?: return "<NoResult>"
return result.words.flatMap { it.unitList }.joinToString(separator = "") { it.text }
}
@JsonClass(generateAdapter = true)
data class ResponseResult(
// 起始的端点帧偏移值
@Json(name = "bg") val beginInFrame: Int,
// 结束的端点帧偏移值
@Json(name = "ed") val endInFrame: Int,
// 返回结果的序号
@Json(name = "sn") val sequenceNumber: Int,
// 是否是最后一片结果
@Json(name = "ls") val lastSection: Boolean,
// 听写结果
@Json(name = "ws") val words: List<Word>,
// Vad Info, vinfo = 1时生效
@Json(name = "vad") val vad: VadResult?,
@Json(name = "pgs") val pgs: String
)
fun ResponseData.vadInfo(): VadInfo? {
val result = this.result?.vad?.results?.singleOrNull()
if (result == null) {
Log.w("RecognizeResponse", "vinfo not set.")
}
return result
}
@JsonClass(generateAdapter = true)
data class VadResult(
@Json(name = "ws") val results: List<VadInfo>
)
@JsonClass(generateAdapter = true)
data class VadInfo(
// 起始的端点帧偏移值
@Json(name = "bg") val beginInFrame: Int,
// 结束的端点帧偏移值
@Json(name = "ed") val endInFrame: Int
) {
companion object {
const val MILLIS_PER_FRAME = 10
}
val beginMillis get() = beginInFrame.toLong().times(MILLIS_PER_FRAME)
val endMillis get() = endInFrame.toLong().times(MILLIS_PER_FRAME)
}
@JsonClass(generateAdapter = true)
data class Word(
// 起始的端点帧偏移值
@Json(name = "bg") val offset: Int,
// 中文分词
@Json(name = "cw") val unitList: List<WordUnit>
)
@JsonClass(generateAdapter = true)
data class WordUnit(
// 字词
@Json(name = "w") val text: String
)
2021-03-01 15:56:16.026 21006-21385/co.logre V/dynamicResult: dynamicResult: 语音
{"result":{"lastSection":false,"sequenceNumber":1,"words":[{"offset":0,"unitList":[{"text":"语音"}]}]},"status":0}
2021-03-01 15:56:16.478 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析
{"result":{"lastSection":false,"sequenceNumber":2,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]}]},"status":1}
2021-03-01 15:56:17.123 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析自然
{"result":{"lastSection":false,"sequenceNumber":3,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]}]},"status":1}
2021-03-01 15:56:17.281 21006-21108/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言
{"result":{"lastSection":false,"sequenceNumber":4,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]},{"offset":0,"unitList":[{"text":"语言"}]}]},"status":1}
2021-03-01 15:56:17.450 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理
{"result":{"lastSection":false,"sequenceNumber":5,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]},{"offset":0,"unitList":[{"text":"语言"}]},{"offset":0,"unitList":[{"text":"处理"}]}]},"status":1}
2021-03-01 15:56:18.085 21006-21394/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀
{"result":{"lastSection":false,"sequenceNumber":6,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]},{"offset":0,"unitList":[{"text":"语言"}]},{"offset":0,"unitList":[{"text":"处理"}]},{"offset":0,"unitList":[{"text":"呀"}]}]},"status":1}
2021-03-01 15:56:18.404 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容
{"result":{"lastSection":false,"sequenceNumber":7,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]},{"offset":0,"unitList":[{"text":"语言"}]},{"offset":0,"unitList":[{"text":"处理"}]},{"offset":0,"unitList":[{"text":"呀"}]},{"offset":0,"unitList":[{"text":"内容"}]}]},"status":1}
2021-03-01 15:56:18.568 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容是
2021-03-01 15:56:18.879 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核
2021-03-01 15:56:19.521 21006-21107/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像
2021-03-01 15:56:19.839 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别
2021-03-01 15:56:20.526 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸
2021-03-01 15:56:20.640 21006-21410/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别
{"result":{"lastSection":false,"sequenceNumber":11,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"分析"}]},{"offset":0,"unitList":[{"text":"自然"}]},{"offset":0,"unitList":[{"text":"语言"}]},{"offset":0,"unitList":[{"text":"处理"}]},{"offset":0,"unitList":[{"text":"呀"}]},{"offset":0,"unitList":[{"text":"内容"}]},{"offset":0,"unitList":[{"text":"审核"}]},{"offset":0,"unitList":[{"text":"图像"}]},{"offset":0,"unitList":[{"text":"识别"}]}]},"status":1}
2021-03-01 15:56:21.440 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别文字
2021-03-01 15:56:21.602 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音分析自然语言处理呀内容审核图像识别人脸识别文字的
2021-03-01 15:56:22.562 21006-21386/co.logre V/dynamicResult: dynamicResult: 语音分析,自然语言处理,内容审核,图像识别,人脸识别文字识别
{"result":{"lastSection":false,"sequenceNumber":16,"words":[{"offset":126,"unitList":[{"text":"语音"}]},{"offset":170,"unitList":[{"text":"分析"}]},{"offset":246,"unitList":[{"text":","}]},{"offset":246,"unitList":[{"text":"自然"}]},{"offset":286,"unitList":[{"text":"语言"}]},{"offset":318,"unitList":[{"text":"处理"}]},{"offset":386,"unitList":[{"text":","}]},{"offset":386,"unitList":[{"text":"内容"}]},{"offset":430,"unitList":[{"text":"审核"}]},{"offset":498,"unitList":[{"text":","}]},{"offset":498,"unitList":[{"text":"图像"}]},{"offset":546,"unitList":[{"text":"识别"}]},{"offset":602,"unitList":[{"text":","}]},{"offset":602,"unitList":[{"text":"人脸识别"}]},{"offset":694,"unitList":[{"text":"文字"}]},{"offset":734,"unitList":[{"text":"识别"}]}]},"status":1}
2021-03-01 15:56:22.901 21006-21396/co.logre V/dynamicResult: dynamicResult: 语音
2021-03-01 15:56:23.060 21006-21404/co.logre V/dynamicResult: dynamicResult: 语音的
{"result":{"lastSection":false,"sequenceNumber":18,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"的"}]}]},"status":1}
2021-03-01 15:56:23.396 21006-21404/co.logre V/dynamicResult: dynamicResult: 语音
{"result":{"lastSection":false,"sequenceNumber":19,"words":[{"offset":0,"unitList":[{"text":"语音"}]}]},"status":1}
2021-03-01 15:56:23.545 21006-21387/co.logre V/dynamicResult: dynamicResult: 语音邮件
{"result":{"lastSection":false,"sequenceNumber":20,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"邮件"}]}]},"status":1}
2021-03-01 15:56:24.029 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要
{"result":{"lastSection":false,"sequenceNumber":21,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"邮件"}]},{"offset":0,"unitList":[{"text":"你"}]},{"offset":0,"unitList":[{"text":"不要"}]}]},"status":1}
2021-03-01 15:56:24.187 21006-21109/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要服务员
{"result":{"lastSection":false,"sequenceNumber":22,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"邮件"}]},{"offset":0,"unitList":[{"text":"你"}]},{"offset":0,"unitList":[{"text":"不要"}]},{"offset":0,"unitList":[{"text":"服务员"}]}]},"status":1}
2021-03-01 15:56:24.506 21006-21386/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要扶
{"result":{"lastSection":false,"sequenceNumber":23,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"邮件"}]},{"offset":0,"unitList":[{"text":"你"}]},{"offset":0,"unitList":[{"text":"不要"}]},{"offset":0,"unitList":[{"text":"扶"}]}]},"status":1}
2021-03-01 15:56:24.972 21006-21405/co.logre V/dynamicResult: dynamicResult: 语音邮件你不要扶自己祝福
{"result":{"lastSection":false,"sequenceNumber":24,"words":[{"offset":0,"unitList":[{"text":"语音"}]},{"offset":0,"unitList":[{"text":"邮件"}]},{"offset":0,"unitList":[{"text":"你"}]},{"offset":0,"unitList":[{"text":"不要"}]},{"offset":0,"unitList":[{"text":"扶"}]},{"offset":0,"unitList":[{"text":"自己"}]},{"offset":0,"unitList":[{"text":"祝福"}]}]},"status":1}
2021-03-01 15:56:25.915 21006-21109/co.logre V/dynamicResult: dynamicResult: ,语音硬件医疗服务及服务
{"result":{"lastSection":false,"sequenceNumber":25,"words":[{"offset":834,"unitList":[{"text":","}]},{"offset":834,"unitList":[{"text":"语音"}]},{"offset":874,"unitList":[{"text":"硬件"}]},{"offset":914,"unitList":[{"text":"医疗"}]},{"offset":970,"unitList":[{"text":"服务"}]},{"offset":1030,"unitList":[{"text":"及"}]},{"offset":1066,"unitList":[{"text":"服务"}]}]},"status":1}
2021-03-01 15:56:27.037 21006-21107/co.logre V/dynamicResult: dynamicResult: 。
{"result":{"lastSection":true,"sequenceNumber":26,"words":[{"offset":0,"unitList":[{"text":"。"}]}]},"status":2}
2021-03-01 15:56:31.698 21006-21387/co.logre V/dynamicResult: dynamicResult:
官方文档中提到的:
data.result.ws.bg | int | 起始的端点帧偏移值,单位:帧(1帧=10ms) 注:以下两种情况下bg=0,无参考意义: 1)返回结果为标点符号或者为空;2)本次返回结果过长。 |
可以看到,动态翻译完最准的话,data中"offset"字段的属性值不为0(即data.result.ws.bg != 0),所以过滤出offset不为0的数据组装即可。
源代码:
if (responseData.result?.words?.isNotEmpty()!!) {
if (responseData.result.words[0].offset != 0) {
Log.v("responseData.result", "responseData: " + responseData.linkResults())
}
}
过滤完之后的数据:
2021-03-01 17:03:05.188 23915-24001/co.logre V/responseData.result: responseData: 语音分析,自然语言处理,内容审核,图像识别,人脸识别,文字识别
2021-03-01 17:03:08.702 23915-24001/co.logre V/responseData.result: responseData: ,语音硬件医疗服务协助服务
----------------------
如果需要获取每句语音的时长,官方的描述是:
vinfo返回参数 (这种是非动态的,上面提到的官方贴出来的是动态的。)
若设置了vinfo=1,还有如下字段返回(若同时开通并设置了dwa=wpgs,则vinfo失效):
参数 | 类型 | 描述 |
---|---|---|
data.result.vad | object | 端点帧偏移值信息 |
data.result.vad.ws | array | 端点帧偏移值结果 |
data.result.vad.bg | int | 起始的端点帧偏移值,单位:帧(1帧=10ms) |
data.result.vad.ed | int | 结束的端点帧偏移值,单位:帧(1帧=10ms) |
data.result.vad.eg | number | 无需关心 |
那么:
若是非动态的情况下,发的vinfo=1:
用vad里直接就能拿到的bg,ed做差拿到帧差,之后就得到了语音的毫秒数(1帧=10ms)
若是动态修正的情况下:
则以每句第一个单词的data.result.ws.bg作为起始帧的位置,最后一个单词的data.result.ws.bg作为结束帧的位置,算出帧差之后就得到了语音的毫秒数(1帧=10ms)