Showing posts with label C#. Show all posts
Showing posts with label C#. Show all posts

Monday, March 12, 2012

Quick Start for Kinect: Audio Fundamentals

The previous article is here: Quick Start for Kinect: Skeletal Tracking Fundamentals






This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
  • [00:35] Kinect microphone information
  • [01:10] Audio data
  • [02:15] Speech recognition information
  • [05:08] Recording audio
  • [08:17] Speech recognition demo

Capturing Audio Data

From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

C#



private void RecordAudio()
{
    using (var source = new KinectAudioSource())
    {
        var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
        var buffer = new byte[1024];
        source.SystemMode = SystemMode.OptibeamArrayOnly;
        using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
        {
            WriteWavHeader(fileStream, recordingLength);

            //Start capturing audio                               
            using (var audioStream = source.Start())
            {
                //Simply copy the data from the stream down to the file
                int count, totalCount = 0;
                while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
                {
                    fileStream.Write(buffer, 0, count);
                    totalCount += count;
                }
            }
        }

        if (FinishedRecording != null)
            FinishedRecording(null, null);
    }
}


Speech Recognition


To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK and set up the KinectAudioSource for speech recognition:

C#


using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;


using (var source = new KinectAudioSource())
{
    source.FeatureMode = true;
    source.AutomaticGainControl = false; //Important to turn this off for speech recognition
    source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
}


Next, we can initialize the SpeechRecognitionEngine to use the kinect recognizer and setup a "grammer" for speech recognition:


C#



private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();



using (var sre = new SpeechRecognitionEngine(ri.Id))
{                
    var colors = new Choices();
    colors.Add("red");
    colors.Add("green");
    colors.Add("blue");
    var gb = new GrammarBuilder();
    //Specify the culture to match the recognizer in case we are running in a different culture.                                 
    gb.Culture = ri.Culture;
    gb.Append(colors);
   
    // Create the actual Grammar instance, and then load it into the speech recognizer.
    var g = new Grammar(gb);                  
    sre.LoadGrammar(g);
}





Then we need to setup event hook and finally the audio stream for kinect is applied to the speech recognition engine:


C#



sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;


using (Stream s = source.Start())
{
    sre.SetInputToAudioStream(s,
                              new SpeechAudioFormatInfo(
                                  EncodingFormat.Pcm, 16000, 16, 1,
                                  32000, 2, null));
    Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    sre.RecognizeAsync(RecognizeMode.Multiple);
    Console.ReadLine();
    Console.WriteLine("Stopping recognizer ...");
    sre.RecognizeAsyncStop();                       
}



static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
    Console.WriteLine("\nSpeech Rejected");
    if (e.Result != null)
        DumpRecordedAudio(e.Result.Audio);
}

static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
    Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);
}

static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
}

private static void DumpRecordedAudio(RecognizedAudio audio)
{
    if (audio == null)
        return;

    int fileId = 0;
    string filename;
    while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
        fileId++;

    Console.WriteLine("\nWriting file: {0}", filename);
    using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
        audio.WriteToWaveStream(file);
}


Quick Start for Kinect: Skeletal Tracking Fundamentals

The previous article is here: Quick Start for Kinect: Working with Depth Data



This video covers the basics of skeletal tracking using the Kinect sensor.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
     [00:31] Skeleton Tracking API
     [01:24] Understanding Skeleton Quality and Joint data
     [03:27] Setup skeleton tracking
     [03:44] Adding a basic hand tracked cursor
     [09:12] Using TransformSmoothing to remove “skeletal jitter”

C++


First you need to initialize a kinect sensor interface:


INuiSensor *            m_pNuiSensor;
HRESULT hr = NuiCreateSensorById( instanceName, &m_pNuiSensor );


then get one frame and detect the skeleton in this frame and find the human skeleton using both color image and depth date. The detail information is in CVPR 2011 best paper "Real-time Human Pose Recognition in Parts from Single Depth Images" . The link of this paper is here:http://research.microsoft.com/apps/pubs/default.aspx?id=145347


The C++ code is as follows:



void CSkeletalViewerApp::Nui_GotSkeletonAlert( )
{
    NUI_SKELETON_FRAME SkeletonFrame = {0};


    bool bFoundSkeleton = false;


    if ( SUCCEEDED(m_pNuiSensor->NuiSkeletonGetNextFrame( 0, &SkeletonFrame )) )
    {
        for ( int i = 0 ; i < NUI_SKELETON_COUNT ; i++ )
        {
            if( SkeletonFrame.SkeletonData[i].eTrackingState == NUI_SKELETON_TRACKED ||
                (SkeletonFrame.SkeletonData[i].eTrackingState == NUI_SKELETON_POSITION_ONLY && m_bAppTracking))
            {
                bFoundSkeleton = true;
            }
        }
    }


    // no skeletons!
    if( !bFoundSkeleton )
    {
        return;
    }


    // smooth out the skeleton data
    HRESULT hr = m_pNuiSensor->NuiTransformSmooth(&SkeletonFrame,NULL);
    if ( FAILED(hr) )
    {
        return;
    }


    // we found a skeleton, re-start the skeletal timer
    m_bScreenBlanked = false;
    m_LastSkeletonFoundTime = timeGetTime( );


    // draw each skeleton color according to the slot within they are found.
    Nui_BlankSkeletonScreen( GetDlgItem( m_hWnd, IDC_SKELETALVIEW ), false );


    bool bSkeletonIdsChanged = false;
    for ( int i = 0 ; i < NUI_SKELETON_COUNT; i++ )
    {
        if ( m_SkeletonIds[i] != SkeletonFrame.SkeletonData[i].dwTrackingID )
        {
            m_SkeletonIds[i] = SkeletonFrame.SkeletonData[i].dwTrackingID;
            bSkeletonIdsChanged = true;
        }


        // Show skeleton only if it is tracked, and the center-shoulder joint is at least inferred.
        if ( SkeletonFrame.SkeletonData[i].eTrackingState == NUI_SKELETON_TRACKED &&
            SkeletonFrame.SkeletonData[i].eSkeletonPositionTrackingState[NUI_SKELETON_POSITION_SHOULDER_CENTER] != NUI_SKELETON_POSITION_NOT_TRACKED)
        {
            Nui_DrawSkeleton( &SkeletonFrame.SkeletonData[i], GetDlgItem( m_hWnd, IDC_SKELETALVIEW ), i );
        }
        else if ( m_bAppTracking && SkeletonFrame.SkeletonData[i].eTrackingState == NUI_SKELETON_POSITION_ONLY )
        {
            Nui_DrawSkeletonId( &SkeletonFrame.SkeletonData[i], GetDlgItem( m_hWnd, IDC_SKELETALVIEW ), i );
        }
    }


    if ( bSkeletonIdsChanged )
    {
        UpdateTrackingComboBoxes();
    }


    Nui_DoDoubleBuffer(GetDlgItem(m_hWnd,IDC_SKELETALVIEW), m_SkeletonDC);
}


Sunday, March 11, 2012

Quick Start for Kinect: Working with Depth Data

The previous article is here: Quick Start for Kinect: Camera Fundamentals




This video covers the basics of reading camera data from the Kinect sensor.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
  • [00:25] Camera data information
  • [03:30] Creating the UI
  • [04:48] Initializing the Kinect runtime
  • [07:18] Reading values from the RGB camera
  • [11:26] Reading values from the Depth camera
  • [13:06] Adjusting camera tilt 
C++ Initialization

INuiSensor *            m_pNuiSensor;
HRESULT hr = NuiCreateSensorById( instanceName, &m_pNuiSensor );

and C++ get RGB data

void CSkeletalViewerApp::Nui_GotColorAlert( )
{
    NUI_IMAGE_FRAME imageFrame;

    HRESULT hr = m_pNuiSensor->NuiImageStreamGetNextFrame( m_pVideoStreamHandle, 0, &imageFrame );

    if ( FAILED( hr ) )
    {
        return;
    }

    INuiFrameTexture * pTexture = imageFrame.pFrameTexture;
    NUI_LOCKED_RECT LockedRect;
    pTexture->LockRect( 0, &LockedRect, NULL, 0 );
    if ( LockedRect.Pitch != 0 )
    {
        m_pDrawColor->Draw( static_cast<BYTE *>(LockedRect.pBits), LockedRect.size );
    }
    else
    {
        OutputDebugString( L"Buffer length of received texture is bogus\r\n" );
    }

    pTexture->UnlockRect( 0 );

    m_pNuiSensor->NuiImageStreamReleaseFrame( m_pVideoStreamHandle, &imageFrame );
}

Saturday, March 10, 2012

Quick Start for Kinect: Camera Fundamentals

The previous one is here:http://magic-soap-vision.blogspot.com/2012/03/quick-start-for-kinect-setting-up-your.html




This video covers the basics of reading camera data from the Kinect sensor.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
  • [00:25] Camera data information
  • [03:30] Creating the UI
  • [04:48] Initializing the Kinect runtime
  • [07:18] Reading values from the RGB camera
  • [11:26] Reading values from the Depth camera
  • [13:06] Adjusting camera tilt 

Initializing the runtime

In the Window_Loaded event, initialize the runtime with the options you want to use. For this example, set RuntimeOptions.UseColor to use the RGB and depth camera:
C#
//Kinect Runtime
Runtime nui;
private void Window_Loaded(object sender, RoutedEventArgs e)
{
    SetupKinect();
}
private void SetupKinect()
{
    if (Runtime.Kinects.Count == 0)
    {
        this.Title = "No Kinect connected";
    }
    else
    {
        //use first Kinect
        nui = Runtime.Kinects[0];
 
        //Initialize to return both Color & Depth images
        nui.Initialize(RuntimeOptions.UseColor | RuntimeOptions.UseDepth);
    }
}
Visual Basic
'Kinect Runtime
Private nui As Runtime
Private Sub Window_Loaded(ByVal sender As Object, ByVal e As RoutedEventArgs)
    SetupKinect()
End Sub
Private Sub SetupKinect()
    If Runtime.Kinects.Count = 0 Then
        Me.Title = "No Kinect connected"
    Else
        'use first Kinect
        nui = Runtime.Kinects(0)
        'Initialize to return both Color & Depth images
        nui.Initialize(RuntimeOptions.UseColor Or RuntimeOptions.UseDepth)
    End If

Understanding the video frame ready returned values

The video frame returns an ImageFrameReadyEventArgs that contains anImageFrame class. As shown below, the ImageFrame class contains two things:
  • Metadata about the image, such as ImageType to know if it’s a depth or color image, and resolution to know the size of the image
  • Image – the actual image data, which is stored in the Bits byte[] array
image

Converting the byte[] array to an image

To convert the byte[] array that represents the camera image to display it in an Image control (ex: image1 below), call the BitmapSource.Create method as shown below. The last parameter is stride. The stride is the number of bytes from one row of pixels in memory to the next row of pixels in memory. Stride is also called pitch.  For more information, go to http://msdn.microsoft.com/en-us/library/aa473780(VS.85).aspx:
C#
PlanarImage imageData = e.ImageFrame.Image;
image1.Source = BitmapSource.Create(imageData.Width, imageData.Height, 96, 96, 
                PixelFormats.Bgr32, null, imageData.Bits, data.Width * imageData.BytesPerPixel);
Visual Basic
Dim imageData As PlanarImage = e.ImageFrame.Image
image1.Source = BitmapSource.Create(imageData.Width, imageData.Height, 96, 96, _
    PixelFormats.Bgr32, Nothing, imageData.Bits, imageData.Width * imageData.BytesPerPixel)
The Coding4Fun Kinect Toolkit has an extension method built into the ImageFrame class that simplifies creating the bitmap:
C#
image1.Source = e.ImageFrame.ToBitmapSource();
Visual Basic
image1.Source = e.ImageFrame.ToBitmapSource()