当前位置: 首页 > 知识库问答 >
问题:

如何在iOS语音API上检测语音启动

杨凯旋
2023-03-14

我有一个在XCode/目标C开发的iOS应用程序。它使用iOS语音应用编程接口来处理连续语音识别。它正在工作,但是我想在语音开始时将麦克风图标变成红色,我还想检测语音何时结束。

我实现了接口SFSpeechRecognitionTaskDelegate,该接口提供了onDetectedSpeechStart和speechRecognitionTask:Did假想Transcription:的回调,但这些回调直到处理第一个单词的结尾时才会发生,而不是在演讲的最开始时。

我想检测讲话的开头(或任何噪音)。我认为从installTapOnBus(从AVAudioPCMBuffer)可以实现,但我不确定如何检测这是沉默还是可能是语音的噪音。

此外,语音API不会在用户停止讲话时提供事件,即沉默检测,它只是记录直到超时。我有一个黑客通过检查上次事件触发之间的时间来检测沉默,不确定他们的方法是否更好。

代码在这里,

    NSError * outError;
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
    [audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
    [audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];

    SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

    if (speechRequest == nil) {
        NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
        return;
    }

    audioEngine = [[AVAudioEngine alloc] init];
    AVAudioInputNode* inputNode = [audioEngine inputNode];

    speechRequest.shouldReportPartialResults = true;

    // iOS speech does not detect end of speech, so must track silence.
    lastSpeechDetected = -1;

    speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];

    [inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
        long millis = [[NSDate date] timeIntervalSince1970] * 1000;
        if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
            lastSpeechDetected = -1;
            [speechTask finish];
            return;
        }
        [speechRequest appendAudioPCMBuffer: buffer];
    }];

    [audioEngine prepare];
    [audioEngine startAndReturnError: &outError];

共有3个答案

安坚诚
2023-03-14

您是否尝试过使用avcaptureadiochannel?这里是文档的链接

您有一个volume属性,提供频道的当前音量(增益)。

马淳
2023-03-14

这是我们最终得到的有效代码。

关键是安装Taponbus(),然后是检测卷的神奇代码,

*buffer.floatChannelData[0]);

-(void) doActualRecording {
    NSLog(@"doActualRecording");

    @try {
    //if (!recording) {
        if (audioEngine != NULL) {
            [audioEngine stop];
            [speechTask cancel];
            AVAudioInputNode* inputNode = [audioEngine inputNode];
            [inputNode removeTapOnBus: 0];
        }

        recording = YES;
        micButton.selected = YES;

        //NSLog(@"Starting recording...   SFSpeechRecognizer Available? %d", [speechRecognizer isAvailable]);
        NSError * outError;
        //NSLog(@"AUDIO SESSION CATEGORY0: %@", [[AVAudioSession sharedInstance] category]);
        AVAudioSession* audioSession = [AVAudioSession sharedInstance];
        [audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
        [audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
        [audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];

        SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
        //NSLog(@"AUDIO SESSION CATEGORY1: %@", [[AVAudioSession sharedInstance] category]);
        if (speechRequest == nil) {
            NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
            return;
        }

        speechDetectionSamples = 0;

        // This some how fixes a crash on iPhone 7
        // Seems like a bug in iOS ARC/lack of gc
        AVAudioEngine* temp = audioEngine;
        audioEngine = [[AVAudioEngine alloc] init];
        AVAudioInputNode* inputNode = [audioEngine inputNode];

        speechRequest.shouldReportPartialResults = true;

        // iOS speech does not detect end of speech, so must track silence.
        lastSpeechDetected = -1;

        speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];

        [inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
            @try {
                long long millis = [[NSDate date] timeIntervalSince1970] * 1000;
                if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
                    lastSpeechDetected = -1;
                    [speechTask finish];
                    return;
                }
                [speechRequest appendAudioPCMBuffer: buffer];

                //Calculate volume level
                if ([buffer floatChannelData] != nil) {
                    float volume = fabsf(*buffer.floatChannelData[0]);

                    if (volume >= speechDetectionThreshold) {
                        speechDetectionSamples++;

                        if (speechDetectionSamples >= speechDetectionSamplesNeeded) {

                            //Need to change mic button image in main thread
                            [[NSOperationQueue mainQueue] addOperationWithBlock:^ {

                                [micButton setImage: [UIImage imageNamed: @"micRecording"] forState: UIControlStateSelected];

                            }];
                        }
                    } else {
                        speechDetectionSamples = 0;
                    }
                }
            }
            @catch (NSException * e) {
                NSLog(@"Exception: %@", e);
            }
        }];

        [audioEngine prepare];
        [audioEngine startAndReturnError: &outError];
        NSLog(@"Error %@", outError);
    //}
    }
    @catch (NSException * e) {
        NSLog(@"Exception: %@", e);
    }
}
宰坚
2023-03-14

我建议使用AVAudioRecorderNSTmer进行回调,对电源信号进行低通滤波。这样,您将能够检测到音频记录器读数何时达到特定阈值,低通滤波将有助于减轻噪声。

在公园里。h文件:

#import <UIKit/UIKit.h>
#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>

@interface ViewController : UIViewController{
    AVAudioRecorder *recorder;
    NSTimer *levelTimer;
    double lowPassResults;
}

- (void)levelTimerCallback:(NSTimer *)timer;
@end

在. m文件中:

#import "ViewController.h"

@interface ViewController ()

@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];

    // AVAudioSession already set in your code, so no need for these 2 lines.
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
    [[AVAudioSession sharedInstance] setActive:YES error:nil];

    NSURL *url = [NSURL fileURLWithPath:@"/dev/null"];

    NSDictionary *settings = [NSDictionary dictionaryWithObjectsAndKeys:
                              [NSNumber numberWithFloat: 44100.0],                 AVSampleRateKey,
                              [NSNumber numberWithInt: kAudioFormatAppleLossless], AVFormatIDKey,
                              [NSNumber numberWithInt: 1],                         AVNumberOfChannelsKey,
                              [NSNumber numberWithInt: AVAudioQualityMax],         AVEncoderAudioQualityKey,
                              nil];

    NSError *error;

    lowPassResults = 0;

    recorder = [[AVAudioRecorder alloc] initWithURL:url settings:settings error:&error];

    if (recorder) {
        [recorder prepareToRecord];
        recorder.meteringEnabled = YES;
        [recorder record];
        levelTimer = [NSTimer scheduledTimerWithTimeInterval: 0.05 target: self selector: @selector(levelTimerCallback:) userInfo: nil repeats: YES];
    } else
        NSLog(@"%@", [error description]);
}


- (void)levelTimerCallback:(NSTimer *)timer {
    [recorder updateMeters];

    const double ALPHA = 0.05;
    double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
    lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;  

    NSLog(@"lowPassResults: %f",lowPassResults);

    // Use here a threshold value to stablish if there is silence or speech
    if (lowPassResults < 0.1) {
        NSLog(@"Silence");
    } else if(lowPassResults > 0.5){
        NSLog(@"Speech");
    }

}


- (void)didReceiveMemoryWarning {
    [super didReceiveMemoryWarning];
    // Dispose of any resources that can be recreated.
}


@end
 类似资料:
  • 我目前正在从事一个Android AppEngine项目,使用语音作为主要输入方法。在android上,您可以使用语音包将语音命令转换为纯文本。语音识别不是在设备本身上完成的,而是发送到一个谷歌服务器,该服务器返回文本。 供您参考:http://developer.android.com/resources/articles/speech-input.html 我的目标是使用相同的google服务

  • 我似乎在这上面找不到任何东西。iOS7中是否有任何Siri类或API允许您进行文本到语音转换?我所要做的就是如下所示: 然后让Siri从我的应用程序中说出来。 看来我们应该有能力做到这一点,不是吗?似乎是一件微不足道的事情。

  • 我正在构建一个应用程序,它使用语音命令来执行某些功能。我这里有一些代码 然而,这种方法需要通过点击按钮来激活。有没有办法通过语音命令启动语音识别器?就像现在的谷歌一样,你可以说“Ok Google”,然后它就会打开语音识别器活动并监听命令? 谢谢。

  • 我是快速和iOS应用程序开发的新手。我正在使用AVSpeechSynthesiser创建一个文本到语音应用程序。我想设置一个字符串说在英语,但我希望它把那个特定的字符串翻译成语音,但在不同的语言,如阿拉伯语。我是否能够使用AVSpeechSynthesizer做到这一点,或者我需要使用翻译API来做到这一点。 谢谢你

  • OpenEars是一个开源的iOS类库,用于在iPhone和iPad实现语音识别功能。本demo利用此开源类库实现了简单的语音识别。可以识别:CHANGE、LEFT、RIGHT、FORWARD、BACKWARD、GO等英文,其他语素需要训练。 [Code4App.com]

  • 使用Microsoft语音API转录中/大型音频文件(每个文件约6-10分钟)的最佳方式是什么?比如批量音频文件转录? 我使用了https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample中提供的代码,用于连续转录语音,但它在某个时候停止转录。转录有任何限制吗?我只使用免