I have always been fascinated by interacting with a computer in ways beyond the traditional keyboard and mouse. The problem is, as I learned while getting my degree in computer science, some problems are really hard.
I didn’t want to dedicate years to obtaining PhD in computer vision or spend my life doing statistics. But, like any other good developer, I’ll just take the toolkit someone else came up with to solve the hard problems.
Enter the Kinect.
In reality, Microsoft has had a Speech API for quite some time and I’ve played with it in the past, but with the Kinect they’ve produced a beautiful piece of hardware that can do 3D depth sensing, speech recognition and directional detection, live skeletal tracking and more all in package that doesn’t cost a lot more than a high-end web cam. This first example doesn’t actually require the Kinect, in fact, the code itself just uses the default audio input. But it can be easily changed to use the Kinect audio stream. Later projects I’m working on will use the Kinect camera’s to do some hopefully neat things.
Since the Kinect hacking started my brain has been churning with ideas. Of the most pragmatic, was the thought to tie in speech recognition to an MVVM application. The nice thing about a well implemented screen using MVVM is that you have your UI described seperately in XAML on the front end and a class (ViewModel) containing your library of commands that can be executed. Using a Command object you can tie a specific element, like a button, to perform a specific command like Save very cleanly and easily.
This clean seperation of concerns mean you don’t really care how a command is invoked, whether it’s a button press, keyboard shortcut, or voice command it all works the same. Your ViewModel executes the command and the UI happily updates through the powerful databinding of XAML.
Aside from the obvious scifi references that this brings to mind, it could also help by making programs more accessible to the vision or mobility impaired. Also, it could be just plain more efficient in some scenarios.
Most of the work is done for us by the commanding infrastructure in WPF. So first I’d like ta take a look at how this implementation will be used. Below is a standard button declaration with a command attached.
The other great thing about XAML is the extensibility, so by the time we’ve implemented this speech API the only thing that will change is this:
One simple property added and that’s pretty much all the end developer needs to do. The only other thing we need for using the speech recognition is something I call a SpeechCommand, which is basically just an implementation of the standard DelegateCommand found in MVVM frameworks. The SpeechCommand acts exactly like the standard commands, but it is also the place for the Phrase AttachedProperty to live and is the glue that bridges the application to my wrapper around the Speech API.
In the next post I’ll walkthrough how I built the app and post some source code. Until then, I leave you with a screenshot. Please note, that no mouse or keyboards were harmed (or used ) in the taking of this screenshot.