MVVM and Speech using the Kinect-Pt. II

In the last post I talked about what I recently done with Speech recognition and tying it in with MVVM’s concepts of Commands.

In this post, I want to walkthrough, step by step of how I set things up.  To get everything installed I just followed the directions for setting up the Kinect SDK, which also included the direction on setting up the Speech API.  Google that and you’ll be well on your way.

After getting it setup, I recommend you give the Kinect SDK samples a try to make sure everything installed correctly.  From there I took a look at what the Kinect speech sample was doing and modified it to work with the default audio source instead of the Kinect.  Mostly, because my Kinect needs to pull double duty between my hacking and actually letting me play on the Xbox.  Not sure how I can convince the wife we need a second one just yet.

Note that some of the code examples use some extensions methods in a little library of mine.  So you might not be able to directly copy/paste. Hit up the continue reading link for the rest…

Speech Command

Nothing groundbreaking here.  This is basically a note for note reimplementation of the DelegateCommand found in Prism and other MVVM frameworks.  The main purpose here is to give somewhere for the Phrase attached property to live.  As I look back, I could have very easily done this entire thing with one class, but I didn’t want to take time to setup an entire MVVM framework just for my little proof of concept so I still needed something that implemented ICommand and it just made sense to put it there.

The attached property does one bit of magic.  I provide a callback for when the value changes and in there we do this:

<span style="color: #0000ff">public</span> <span style="color: #0000ff">static</span> <span style="color: #0000ff">void</span> PhraseChanged(DependencyObject d,<br /> DependencyPropertyChangedEventArgs e)<br />{<br />    <span style="color: #0000ff">if</span> (DesignerProperties.GetIsInDesignMode(d))<br />        <span style="color: #0000ff">return</span>;<br /><br />    <span style="color: #0000ff">if</span> (!d.Is&lt;ICommandSource&gt;())<br />        <span style="color: #0000ff">throw</span> <span style="color: #0000ff">new</span> <br />InvalidCastException(<span style="color: #006080">&quot;Can only use objects that implement the ICommandSource interface&quot;</span>);<br /><br /><br />    SpeechFactory.AddPhrase(e.NewValue.ToString(), d.As&lt;ICommandSource&gt;());<br />}<br />
So we basically pass the phrase and the object that has the attached property set on it.  We do a quick check to make sure it implements the ICommandSource interface.  Objects that do this provide a Command property and a CommandParameter that we will use to execute the command later on.

Speech Factory

This is where 95% of the work is done.  Basically, it’s a static class that does all the instantiation and setup of the speech recognition engine.  It also maintains the list of phrases the developer has entered in the Xaml.

The first thing we do is setup all the recognition engine code:

<p><span style="color: #0000ff">private</span> <span style="color: #0000ff">static</span> <span style="color: #0000ff">bool</span> _commandMode;<br /><span style="color: #0000ff">private</span> <span style="color: #0000ff">static</span> Timer _timer = <span style="color: #0000ff">new</span> Timer(3000);<br /><span style="color: #0000ff">private</span> <span style="color: #0000ff">static</span> SpeechRecognitionEngine _engine;<br /><span style="color: #0000ff">private</span> <span style="color: #0000ff">static</span> Dictionary&lt;<span style="color: #0000ff">string</span>, ICommandSource&gt; _phrases =<br />    <span style="color: #0000ff">new</span> Dictionary&lt;<span style="color: #0000ff">string</span>, ICommandSource&gt;();<br /><span style="color: #0000ff">private</span> <span style="color: #0000ff">static</span> <span style="color: #0000ff">string</span> _commandWord = <span style="color: #006080">&quot;computer&quot;</span>;<br /><br /><span style="color: #0000ff">static</span> SpeechFactory()<br />{<br /><br />    _engine = <span style="color: #0000ff">new</span> SpeechRecognitionEngine();<br />    _engine.SpeechRecognized += SpeechRecognized;<br />    _engine.SpeechHypothesized += SpeechHypothesized;<br />    _engine.SpeechRecognitionRejected += SpeechRecognitionRejected;<br />    _engine.SetInputToDefaultAudioDevice();<br /><br />    _engine.LoadGrammar(<span style="color: #0000ff">new</span> Grammar(<span style="color: #0000ff">new</span> GrammarBuilder(<span style="color: #0000ff">new</span> Choices(_commandWord))));<br />    _engine.RecognizeAsync(RecognizeMode.Multiple);<br /><br />    _timer.Elapsed += <span style="color: #0000ff">new</span> ElapsedEventHandler(_timer_Elapsed);<br />}</p>

We setup all the recognition events and kick it off.  Whenever a new phrase is added we reload the grammar and choice lists the engine uses to determine what has been spoken.

<span style="color: #0000ff">public</span> <span style="color: #0000ff">static</span> <span style="color: #0000ff">void</span> AddPhrase(<span style="color: #0000ff">string</span> phrase, ICommandSource source)<br />{<br /><br />    _phrases.Add(phrase.Trim().ToLower(), source);<br /><br />    <span style="color: #008000">// TODO: There are culture considerations to take into effect</span><br />    var choices = _phrases.Keys.Union(<span style="color: #0000ff">new</span> <span style="color: #0000ff">string</span>[]{_commandWord}).ToArray();<br />    var gb = <span style="color: #0000ff">new</span> GrammarBuilder(<span style="color: #0000ff">new</span> Choices(choices));<br /><br />    _engine.UnloadAllGrammars();<br />    _engine.LoadGrammar(<span style="color: #0000ff">new</span> Grammar(gb));<br />}<br />
One final little thing I do is the concept of a command word.  So it

One thought on “MVVM and Speech using the Kinect-Pt. II

  1. Ray

    Hi I was wondering if you had an example of this in full that I could have a look at. I have been rooming the internet for hours trying to figure out how to get the sound detection working on my WPF application but the problem is. All the examples use a console and I’m having trouble mapping them onto my application… would be greatful.

    Kind Regards


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>