Monday, March 12, 2012

Quick Start for Kinect: Audio Fundamentals

The previous article is here: Quick Start for Kinect: Skeletal Tracking Fundamentals






This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
  • [00:35] Kinect microphone information
  • [01:10] Audio data
  • [02:15] Speech recognition information
  • [05:08] Recording audio
  • [08:17] Speech recognition demo

Capturing Audio Data

From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

C#



private void RecordAudio()
{
    using (var source = new KinectAudioSource())
    {
        var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
        var buffer = new byte[1024];
        source.SystemMode = SystemMode.OptibeamArrayOnly;
        using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
        {
            WriteWavHeader(fileStream, recordingLength);

            //Start capturing audio                               
            using (var audioStream = source.Start())
            {
                //Simply copy the data from the stream down to the file
                int count, totalCount = 0;
                while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
                {
                    fileStream.Write(buffer, 0, count);
                    totalCount += count;
                }
            }
        }

        if (FinishedRecording != null)
            FinishedRecording(null, null);
    }
}


Speech Recognition


To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK and set up the KinectAudioSource for speech recognition:

C#


using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;


using (var source = new KinectAudioSource())
{
    source.FeatureMode = true;
    source.AutomaticGainControl = false; //Important to turn this off for speech recognition
    source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
}


Next, we can initialize the SpeechRecognitionEngine to use the kinect recognizer and setup a "grammer" for speech recognition:


C#



private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();



using (var sre = new SpeechRecognitionEngine(ri.Id))
{                
    var colors = new Choices();
    colors.Add("red");
    colors.Add("green");
    colors.Add("blue");
    var gb = new GrammarBuilder();
    //Specify the culture to match the recognizer in case we are running in a different culture.                                 
    gb.Culture = ri.Culture;
    gb.Append(colors);
   
    // Create the actual Grammar instance, and then load it into the speech recognizer.
    var g = new Grammar(gb);                  
    sre.LoadGrammar(g);
}





Then we need to setup event hook and finally the audio stream for kinect is applied to the speech recognition engine:


C#



sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;


using (Stream s = source.Start())
{
    sre.SetInputToAudioStream(s,
                              new SpeechAudioFormatInfo(
                                  EncodingFormat.Pcm, 16000, 16, 1,
                                  32000, 2, null));
    Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    sre.RecognizeAsync(RecognizeMode.Multiple);
    Console.ReadLine();
    Console.WriteLine("Stopping recognizer ...");
    sre.RecognizeAsyncStop();                       
}



static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
    Console.WriteLine("\nSpeech Rejected");
    if (e.Result != null)
        DumpRecordedAudio(e.Result.Audio);
}

static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
    Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);
}

static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
}

private static void DumpRecordedAudio(RecognizedAudio audio)
{
    if (audio == null)
        return;

    int fileId = 0;
    string filename;
    while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
        fileId++;

    Console.WriteLine("\nWriting file: {0}", filename);
    using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
        audio.WriteToWaveStream(file);
}


No comments:

Post a Comment