iOS系统语音识别

王轶
2023-12-01

iOS10语音识别框架Speech,项目中用到语音识别功能,这里简单的进行了一下封装,大概实现了系统语音识别的功能。还没测试,应该会有很多坑。

语音识别功能封装

系统的语音识别+外部语音输入,实现语音转文字功能。项目地址:https://github.com/X-Morris/MMVoiceEngine

  • MMSpeechRecognizer

这个类对系统的语音识别功能进行简单的封装,达到能够识别我们对着设备说话,语音转文字的功能。

MMSpeechRecognizer.h

//
//  MMSpeechRecognizer.h
//  MMVoiceEngine
//
//  Created by Morris_ on 2020/11/10.
//

#import <Foundation/Foundation.h>
#import <Speech/Speech.h>

@class MMSpeechRecognizerConfig;

NS_ASSUME_NONNULL_BEGIN

@protocol MMSpeechRecognizerDelegate <NSObject>

@optional

/*!
 *  开始录音回调
 *
 */
- (void)onStart;

/*!
 *
 *
 */
- (void)onStop:(NSError *)error;

/*!
 *
 *
 */
- (void)result:(SFSpeechRecognitionResult * _Nullable)result;

/*!
 *  音量变化回调
 *  在录音过程中,回调音频的音量。
 *
 *  @param volume 音量,范围从
 */
- (void)onVolumeChanged:(int)volume;

@end

/*!
 *  语音识别类,是一个单例对象。
 */
@interface MMSpeechRecognizer : NSObject

/*!
 *  返回单例对象
 */
+ (instancetype)sharedInstance;

/*!
 *  销毁单例对象
 */
- (void)destroy;

/*!
*  语音识别参数设置
*/
@property (nonatomic, strong) MMSpeechRecognizerConfig *config;

/*!
 *  开始语音识别
 *  即开始录入语音,并识别录入的语音,识别到就会有返回。
 */
- (void)start;

/*!
 *  停止语音识别
 *  即停止语音录入,结束语音识别。
 */
- (void)stop;

/*!
 *  回调
 *
 */
@property (nonatomic, weak) id <MMSpeechRecognizerDelegate> delegate;

/*!
 *  授权
 *
 */
+ (SFSpeechRecognizerAuthorizationStatus)authorizationStatus;
+ (void)requestAuthorization:(void(^)(SFSpeechRecognizerAuthorizationStatus status))handler;

@end

NS_ASSUME_NONNULL_END

MMSpeechRecognizer.m

//
//  MMSpeechRecognizer.m
//  MMVoiceEngine
//
//  Created by Morris_ on 2020/11/10.
//

#import "MMSpeechRecognizer.h"
#import "MMSpeechRecognizerConfig.h"
#import <AVFoundation/AVFoundation.h>
#import <UIKit/UIKit.h>

@interface MMSpeechRecognizer ()<SFSpeechRecognizerDelegate>

@property (nonatomic, strong) SFSpeechRecognizer *speechRecognizer;
@property (nonatomic, strong) SFSpeechAudioBufferRecognitionRequest *recognitionRequest;
@property (nonatomic, strong) SFSpeechRecognitionTask *recognitionTask;
@property (nonatomic, strong) AVAudioEngine *audioEngine;

@property (nonatomic, assign) BOOL starting;

@end

@implementation MMSpeechRecognizer

static MMSpeechRecognizer *_sharedInstance = nil;
static dispatch_once_t onceToken;

- (void)dealloc {
    // Remove monitor
    [self removeMonitor];
}

+ (instancetype)sharedInstance {
    dispatch_once(&onceToken, ^{
        // Init
        _sharedInstance = [[MMSpeechRecognizer alloc] init];
        // Add monitor
        [_sharedInstance addMonitor];
    });
    return _sharedInstance;
}

- (void)destroy {
    onceToken = 0;
    _sharedInstance = nil;
}

- (void)setConfig:(MMSpeechRecognizerConfig *)config {
    if (!_config) {
        _config = config;
    }
}

// Start. Private function.
- (void)reStart
{
    // Checking the authorization Status
    [MMSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
        if (status == SFSpeechRecognizerAuthorizationStatusAuthorized)
        {
            [self resetRecognitionTask];
            [self startAudioEngine];
        }
        else
        {
            [self stopWithError:[NSError errorWithDomain:[NSString stringWithFormat:@"Authorization :%ld",(long)status] code:-1 userInfo:nil]];
        }
    }];
}

// Start event. Public function.
- (void)start
{
    [self startCallback];
    
    [self reStart];
}

// Stop audioEngine. Private function.
- (void)stopAudioEngine
{
    if (self.audioEngine && self.audioEngine.isRunning) {
        [self.audioEngine stop];
        [self.audioEngine.inputNode removeTapOnBus:0];
        [self.recognitionRequest endAudio];
    }
}
// Start audioEngine. Private function.
- (void)startAudioEngine
{
    [self stopAudioEngine];
    
    // Configure the microphone input.
    AVAudioInputNode *inputNode = self.audioEngine.inputNode;
    [inputNode removeTapOnBus:0];
    AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];
    __weak typeof(self)weakSelf = self;
    NSError *error = nil;
    [inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        if (weakSelf.recognitionRequest) {
            [weakSelf.recognitionRequest appendAudioPCMBuffer:buffer];
        }
    }];
    [self.audioEngine prepare];
    [self.audioEngine startAndReturnError:&error];
    if (error)
    {
        [self stopWithError:error];
        return;
    }
}

// Stop event. Private function.
- (void)stopWithError:(NSError *)error
{
    [self stopAudioEngine];
    [self stopCallback:error];
}

// Stop event. Public function.
- (void)stop
{
    [self stopAudioEngine];
    
    [self stopCallback:nil];
}

// Authorization
+ (SFSpeechRecognizerAuthorizationStatus)authorizationStatus
{
    return [SFSpeechRecognizer authorizationStatus];
}
+ (void)requestAuthorization:(void(^)(SFSpeechRecognizerAuthorizationStatus status))handler {
    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
        dispatch_async(dispatch_get_main_queue(), ^{
            if (handler) {
                handler(status);
            }
        });
    }];
}

#pragma mark - SFSpeechRecognizerDelegate

- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available
{
    
}

#pragma mark - call back

- (void)startCallback
{
    if ([self.delegate respondsToSelector:@selector(onStart)]) {
        [self.delegate onStart];
    }
    self.starting = YES;
}

- (void)stopCallback:(NSError *)error
{
    if ([self.delegate respondsToSelector:@selector(onStop:)]) {
        [self.delegate onStop:error];
    }
    self.starting = NO;
}

- (void)resultCallback:(SFSpeechRecognitionResult * _Nullable)result
{
    if ([self.delegate respondsToSelector:@selector(result:)]) {
        [self.delegate result:result];
    }
}

#pragma mark - private

- (void)resetRecognitionTask
{
    // Cancel the previous task if it's running.
    if (self.recognitionTask) {
        //[self.recognitionTask cancel]; // Will cause the system error and memory problems.
        [self.recognitionTask finish];
    }
    self.recognitionTask = nil;
    
    // Configure the audio session for the app.
    NSError *error = nil;
    [AVAudioSession.sharedInstance setCategory:AVAudioSessionCategoryRecord withOptions:AVAudioSessionCategoryOptionDuckOthers error:&error];
    if (error)
    {
        [self stopWithError:error];
        return;
    }
    [AVAudioSession.sharedInstance setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];
    if (error)
    {
        [self stopWithError:error];
        return;
    }

    // Create and configure the speech recognition request.
    self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    self.recognitionRequest.taskHint = SFSpeechRecognitionTaskHintConfirmation;
    
    // Keep speech recognition data on device
    if (@available(iOS 13, *)) {
        self.recognitionRequest.requiresOnDeviceRecognition = NO;
    }

    // Create a recognition task for the speech recognition session.
    // Keep a reference to the task so that it can be canceled.
    __weak typeof(self)weakSelf = self;
    self.recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
        __strong typeof(self)strongSelf = weakSelf;
        //NSLog(@"Recognized voice: %@",result.bestTranscription.formattedString);
        //NSLog(@"Recognized error: %@",error);
        //NSLog(@"Recognized finishing: %d",weakSelf.recognitionTask.isFinishing);
        
        [strongSelf resultCallback:result];
        
        if (error != nil || result.final)
        {
            // Stop recognizing speech if there is a problem.
            [strongSelf stopAudioEngine];
                        
            // Re-Strt
            // [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
            // [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=203 "SessionId=com.siri.cortex.ace.speech.session.event.SpeechSessionId@599be0be, Message=No audio data received." UserInfo={NSLocalizedDescription=SessionId=com.siri.cortex.ace.speech.session.event.SpeechSessionId@599be0be, Message=No audio data received., NSUnderlyingError=0x600002630090 {Error Domain=SiriSpeechErrorDomain Code=102 "(null)"}}
            //[strongSelf reStart];
            
            strongSelf.speechRecognizer = nil;
            [strongSelf performSelector:@selector(reStart) withObject:nil afterDelay:1];
        }
    }];
}

#pragma mark - Monitor

- (void)addMonitor
{
    [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(appDidBecomeActive) name:UIApplicationDidBecomeActiveNotification object:nil];
    [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(appDidEnterBackground) name:UIApplicationDidEnterBackgroundNotification object:nil];
}

- (void)removeMonitor
{
    [[NSNotificationCenter defaultCenter] removeObserver:self];
}

- (void)appDidBecomeActive
{
    if (self.starting) {
        [self startAudioEngine];
    }
}

- (void)appDidEnterBackground
{
    if (self.starting) {
        [self stopAudioEngine];
    }
}

#pragma mark - get

- (SFSpeechRecognizer *)speechRecognizer {
    if (!_speechRecognizer) {
        if (!_config) {
            _config = [MMSpeechRecognizerConfig defaultConfig];
        }
        _speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:self.config.locale];
        _speechRecognizer.delegate = self;
    }
    return _speechRecognizer;
}

- (AVAudioEngine *)audioEngine {
    if (!_audioEngine) {
        _audioEngine = [[AVAudioEngine alloc] init];
    }
    return _audioEngine;
}

@end



//AVAudioSession    // 语音设置相关API
//AVAudioEngine     // 记录语音的API
//AVAudioFormat     // 音频的抽象类
//AVAudioInputNode  // 音频节点
//AVAudioPCMBuffer  //


/// 错误码
// 203/201 本次识别任务完成,未识别到任何语音
// 216  几次203/201之后报216,报216时recognitionTask.isFinishing=NO。程序再无反应。

MMSpeechRecognizerConfig.h

//
//  MMSpeechRecognizerConfig.h
//  MMVoiceEngine
//
//  Created by Morris_ on 2020/11/11.
//

#import <Foundation/Foundation.h>
#import <Speech/SFSpeechRecognitionTaskHint.h>

NS_ASSUME_NONNULL_BEGIN

@interface MMSpeechRecognizerConfig : NSObject

+ (MMSpeechRecognizerConfig *)defaultConfig;
- (instancetype)initWithLocale:(NSLocale *)locale;

/*!
 *  语言设置。
 *
 */
@property (nonatomic, strong) NSLocale *locale;

/*!
 *  语音场景设置。
 *
 */
@property (nonatomic) SFSpeechRecognitionTaskHint defaultTaskHint;

/*!
 *  语音场景设置。
 *
 */
@property (nonatomic, copy) NSArray<NSString *> *contextualStrings;

/*!
 *  扩展参数。
 *
 */
@property (nonatomic, strong, nonnull) NSDictionary *params;

@end

NS_ASSUME_NONNULL_END

MMSpeechRecognizerConfig

//
//  MMSpeechRecognizerConfig.m
//  MMVoiceEngine
//
//  Created by Morris_ on 2020/11/11.
//

#import "MMSpeechRecognizerConfig.h"

@implementation MMSpeechRecognizerConfig

- (instancetype)init {
    if (self = [super init]) {
        self.defaultTaskHint = SFSpeechRecognitionTaskHintUnspecified;
        self.locale = [NSLocale currentLocale];
    }
    return self;
}

+ (MMSpeechRecognizerConfig *)defaultConfig
{
    return [[MMSpeechRecognizerConfig alloc] init];
}

- (instancetype)initWithLocale:(NSLocale *)locale {
    if (self = [super init]) {
        self.locale = locale;
    }
    return self;
}

@end

使用语音识别功能

ViewController.m

//
//  ViewController.m
//  MMVoiceEngine-Demo
//
//  Created by Morris_ on 2020/11/10.
//

#import "ViewController.h"
#import <MMVoiceEngine/MMVoiceEngine.h>
#import "SpeechAnalyzer.h"

@interface ViewController ()<MMSpeechRecognizerDelegate, SpeechAnalyzerDelegate>

@property (nonatomic, strong) UIButton *startBtn;
@property (nonatomic, strong) UITextView *textView;

@property (nonatomic, strong) SpeechAnalyzer *speechAnalyzer;

@end

@implementation ViewController

- (void)viewDidLayoutSubviews {
    [super viewDidLayoutSubviews];
    
    self.startBtn.frame = CGRectMake(10, CGRectGetHeight(self.view.frame)-40-10, CGRectGetWidth(self.view.frame)-20, 40);
    self.textView.frame = CGRectMake(10, 64, CGRectGetWidth(self.view.frame)-20, CGRectGetHeight(self.view.frame)*0.5-64);
}

- (void)viewDidAppear:(BOOL)animated {
    [super viewDidAppear:animated];
    
    // 检查授权
    [MMSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
        switch (status) {
            case SFSpeechRecognizerAuthorizationStatusAuthorized:
                self.startBtn.enabled = YES;
                [self.startBtn setTitle:@"Start Recording" forState:UIControlStateNormal];
                break;
            case SFSpeechRecognizerAuthorizationStatusDenied:
                self.startBtn.enabled = NO;
                [self.startBtn setTitle:@"User denied access to speech recognition" forState:UIControlStateNormal];
                break;
            case SFSpeechRecognizerAuthorizationStatusRestricted:
                self.startBtn.enabled = NO;
                [self.startBtn setTitle:@"Speech recognition restricted on this device" forState:UIControlStateNormal];
                break;
            default:
                self.startBtn.enabled = NO;
                break;
        }
    }];
}

- (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
    
    self.startBtn = [UIButton buttonWithType:UIButtonTypeCustom];
    [self.startBtn setTitle:@"Start Recording" forState:UIControlStateNormal];
    [self.startBtn setTitleColor:[UIColor redColor] forState:UIControlStateNormal];
    [self.startBtn addTarget:self action:@selector(startBtnClick:) forControlEvents:UIControlEventTouchUpInside];
    [self.view addSubview:self.startBtn];
    
    self.textView = [[UITextView alloc] init];
    self.textView.textColor = [UIColor darkGrayColor];
    self.textView.backgroundColor = [UIColor whiteColor];
    self.textView.userInteractionEnabled = NO;
    [self.view addSubview:self.textView];
}


// MARK:- Event

- (void)startBtnClick:(UIButton *)sender
{
    sender.selected = !sender.selected;
    if (sender.selected)
    {
        MMSpeechRecognizerConfig *config = [[MMSpeechRecognizerConfig alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en"]];
        [[MMSpeechRecognizer sharedInstance] setConfig:config];
        [MMSpeechRecognizer sharedInstance].delegate = self;
        [[MMSpeechRecognizer sharedInstance] start];
    }
    else
    {
        [[MMSpeechRecognizer sharedInstance] stop];
    }
}

// MARK:- MMSpeechRecognizerDelegate

- (void)onStart
{
    NSLog(@"%s",__func__);
    self.startBtn.enabled = YES;
    [self.startBtn setTitle:@"Stop Recording" forState:UIControlStateNormal];
}

- (void)onStop:(NSError *)error
{
    NSLog(@"%s",__func__);
    self.startBtn.enabled = YES;
    [self.startBtn setTitle:@"Start Recording" forState:UIControlStateNormal];
}

- (void)result:(SFSpeechRecognitionResult * _Nullable)result
{
    [self.speechAnalyzer recognize:result];
}


// MARK:- SpeechAnalyzerDelegate

- (void)speechAnalyzer:(SpeechAnalyzer *)analyzer recognizedCommand:(NSString *)command
{
    NSLog(@"recognizedCommand: %@",command);
    NSLog(@"%@",[NSThread currentThread]);
    self.textView.text = command;
}


// MARK:- lazy load

- (SpeechAnalyzer *)speechAnalyzer {
    if (!_speechAnalyzer) {
        _speechAnalyzer = [[SpeechAnalyzer alloc] init];
        _speechAnalyzer.delegate = self;
    }
    return _speechAnalyzer;
}


@end

注意事项

  • 语音识别功能需要使用Microphone和Speech的功能,需要在info.plist文件中添加相应的权限控制

Privacy - Microphone Usage Description

Privacy - Speech Recognition Usage Description

  • 关于系统闪退

Xcode 12.0 beta (12A6159) ,由于系统原因闪self.audioEngine.inputNode会闪退。我向苹果提了Feedback。在Xcode 12.0 12.0 beta 2 (12A6163b) 此问题已解决。

详情见AVAudioEngine gets inputNode property crash in iOS14

  • 关于后台语音识别

We explicitly don’t support speech recognition in the background currently.

We consider this issue closed. If you have any questions or concern regarding this issue, please update your report directly (http://bugreport.apple.com).

Thank you for taking the time to notify us of this issue.

Best Regards,

Apple Developer Support

Worldwide Developer Relations

 类似资料: