Friday, April 10, 2015

Microsoft Windows Speech Recognition

I've been a long time user of Dragon Naturally Speaking, going all the way back to 2006. I've also fooled around with the speech recognition provided by Microsoft, which used to be a part of Microsoft Office but it's now built into Windows (starting with Windows Vista if I recall correctly). I've upgraded computers a few times over the past several years, and in the process I got away from using Dragon Naturally Speaking. Recently, I decided maybe I should get speech recognition another shot, and since I haven't tried Microsoft's speech recognition for a while I thought I would start there.

So, here I sit, staring at my computer screen wearing a microphone headset, trying to dictate the complete article without touching the keyboard. It's perhaps not the most effective way of using speech recognition, especially if you're not physically unable to use a mouse / keyboard, but it does give a good feel for the accuracy and usability of the software. Once I'm finished here, I will go back to using Dragon Naturally Speaking to see what has changed in the past couple of years.


Truth be told, after just two paragraphs dictating using Windows' speech recognition, I'm feeling both impressed and a little frustrated. Many of the phrases that I've said above came out perfectly, but every time I make a mistake it can be a little bit cumbersome trying to fix it. There's an extra step or two that's required every time you want to correct something, requiring a bit more time and effort. Certainly the speech recognition built into Windows 7 is adequate for getting started with speech recognition, and I don't have any real performance complaints (though admittedly I'm on a fast computer). I also haven't checked if anything has really changed with Windows 8.1, though I did play around with it briefly on a laptop and it doesn't seem to be any different than the speech recognition in Windows 7.

Before I get into any further details, it's also worth mentioning that often the biggest bottleneck in writing an article or paper isn't even your typing speed. Rather, I regularly find myself staring at the screen trying to think up precisely what I want to say next. If you have a good idea of what you want to say, speech recognition can be much faster than writing or typing. In fact, if you speak in complete sentences, the accuracy of speech recognition is extremely good. The real problem is when you pause halfway through a sentence or phrase, as then the software has to figure out what exactly you said without as much context around it.

If you're curious, the preceding four paragraphs took me about 15 minutes to dictate. If I were to go back and reread all of the above text, I expect it would take me know more than 4 minutes. The normally type at around 70 words per minute if I'm going fast, so I can dictate flat-out, I would be closer to 100 or 120 words per minute. Sadly, it seems my brain simply doesn't function that fast most of the time. Which is probably a good thing, considering how frequently we seem to hear stories of people posting information to the Internet without thinking. There's a very real possibility that in another 10 years, fast typing speeds may become far less useful as computers will be able to do the work for us.

Getting Started with Windows Speech Recognition

I could have started here, but I thought the above paragraphs would be a better introduction. Let's assume at this point that you actually want to get started using Windows' speech recognition; what exactly do you need to do? The first thing you need to do might not seem quite so obvious: you need to buy a decent microphone. If you're using a laptop, let me just say that you shouldn't expect speech recognition to work with the built in microphone. Something else you need to be aware of is that you need a relatively quiet place an order to dictate properly, although a better microphone with noise canceling technology might help in some environments.

Assuming you've got all the hardware you need, the fastest way to get going is to simply open your Start Menu (or the Windows 8 Start Screen) and pipe “speech recognition”. On both Windows 7 and windows 8.1, one of the options that shows up and should say “Windows Speech Recognition”. Choose that option and you should see a screen that says “Welcome to Speech Recognition”. At this point, Windows will help you configure your hardware and walk you through the training process. The initial training will take about 6 minutes; you can do additional training if you want that will take about 5 minutes. When you're finished with the training process, you can begin dictating or controlling the computer.


Looking specifically at the way of Microsoft handles speech recognition training, I find it a bit interesting that they give you disjointed phrases. As I just mentioned earlier, speaking in complete sentences or at least complete phrases is helpful for improving accuracy as the computer can recognize context. Dragon NaturallySpeaking usually gives you a complete phrase to read during the training sequence, and that in turn can help you become accustomed to speaking in complete phrases. The Microsoft training on the other hand will often show up about 10 words at a time, breaking things up in the middle of a sentence or phrase. Since the computer knows exactly what it's listening for, this doesn't inherently affect the training process or accuracy, but it can inadvertently teach the user to speak in incomplete phrases.

Training Complete

If you're brand new to speech recognition, once the training is complete it might seem a little daunting – what are you supposed to do now? Probably the best thing you can do at that point is ask the computer, “What can I say?” That will open up the speech recognition tutorial screen, and from there you can see most of the common phrases that you can use. At this point, I would do want mention that Windows 7 and Windows 8.1 don't show identical information, though all of the important stuff is covered in either operating system.

Probably the biggest problem you will now encounter with Microsoft's speech recognition is this: it doesn't work with a lot of things very well. And when I say “a lot of things”, what I really mean is that it doesn't work with web browsers, generic text boxes, or other “unknown” areas where you might want to input text. I've done all of this dictating using Word 2013, and other than a few idiosyncrasies I really don't have many complaints. But if you want to try dictating into a text box on a message forum, good luck. Or rather, let me clarify: if you want to use Google Chrome or Mozilla Firefox, Microsoft's speech recognition is practically worthless; on the other hand, it seems to work okay in Internet Explorer.

I haven't tried testing a ton of other applications, but it does appear that any application that isn't explicitly supported will at best get a more cumbersome speech recognition interface. Firefox for example allows you to dictate into text boxes, but it will show you your dictated text and then you have to say “insert” to accept the text. You have to make any edits to the text before inserting it, otherwise you lose the ability to edit the text after inserting. Other than Microsoft Office, WordPad, Notepad, and Internet Explorer, and not entirely sure what other applications are fully supported by Microsoft's speech recognition. In a pinch, you can always dictate into WordPad and then copy/paste into other programs as needed, but that's hardly convenient and in most cases the point of speech recognition is convenience.

MS speech recognition with Chrome fails
Firefox requires inserting text after dictating/editing
As with many things, you get what you pay for. The speech recognition built-in to Microsoft Windows certainly functions better than nothing at all, and this post alone is proof that you can dictate a fairly complex article without a whole lot of training. But if you want speech recognition that can work with a larger selection of applications, you're almost forced to go with a retail program like Dragon NaturallySpeaking. The good news is that Dragon NaturallySpeaking is relatively affordable these days; Dragon NaturallySpeaking Home 13.0 is available for $50 while the Premium 13.0 version doubles the price to $100. What's the major difference between the two? I'll cover that later when I get to Dragon NaturallySpeaking, but the short summary is that Home 13.0 is more for beginning users and Premium 13.0 is far more advanced users; you can check out the feature matrix for additional details.

In summary then, if you've reached the point where typing on a computer isn't something you want to do and the idea of speech recognition sounds appealing, give it a shot. Any modern computer should have everything you need, other than a microphone. While better microphones can certainly improve the accuracy, I've found that you don't need to spend a lot of money to get good results. My personal microphone is a relatively expensive Sennheiser ME3 with Andrea USB adapter. I like this microphone because it doesn't mess up your hair and the USB adapter means you can use it with any computer – the microphone input ports on some computers just don't have the quality necessary for speech recognition. However, I've used a couple different gaming headsets for speech recognition before, and other than the fact that they weren't particularly comfortable I didn't notice that they were any less accurate. Your mileage may vary.

But comfort matters, especially if you're going to be wearing it for hours a day, so my advice is to try to get something that doesn't cover your ears. You can check out these headset microphones with gooseneck boom or earhook microphones – many of those can be had for $50 or so, which is a lot easier to stomach than nearly $200. Also note that if you buy Dragon NaturallySpeaking, there are packages that include a basic microphone for about $15 extra. Again, you get what you pay for, but until you're sure that you're going to use speech recognition a lot I wouldn't recommend investing a ton of money into a quality microphone.

No comments:

Post a Comment