Apress - Smart Home Automation with Linux (2010)- P36 potx

CHAPTER 5 ■ COMMUNICATION 158 The Software for Voice Recognition This part of the problem is rather poorly supported by Linux currently, which is not surprising. To understand even the simplest phrases, you need an acoustic model to generate representations of the sounds themselves in a statistical fashion (often as part of the initial training with a specific speaker) and a language model to consider the probabilities of what words and sounds are likely to follow another (to limit the processing necessary when analyzing speech), both of which are language-specific. Most of the native Linux software is either old and incomplete, impossible to compile, or commercial. Even the high-grade solutions, such as Sphinx (http://cmusphinx.org), require so many levels of installation and training that no one is really sure if it works! The commercial offerings have the problem of scarcity, with few to none of the supposedly available software sporting a “buy here” page. This absence even includes ViaVoice from IBM, which was once free but withdrawn in 2002. Even older software that once existed as commercial Linux software has transformed into Windows-only packages. It is indeed a strange state of affairs when the easiest method of processing vocal commands under Linux is through Windows! This can either take the approach of running a virtual machine (through either Wine or VMware Server) or using a native Windows machine. The virtualization approach has a few problems because of incongruities between the virtual and real sound cards, but software such as ViaVoice or Dragon Naturally Speaking can often be coaxed into working after a while. If the software is to be run on your server, which it usually is, then you are also adding the dependency of X Windows to it, increasing its processing load. Consequently, the most efficient way is to employ a separate Windows machine running the previously mentioned software. Or, since you’ve already paid the “Windows tax,” use the software built in to Vista, and download the Windows Speech Recognition Macros module. With tablet machines and subnotebooks beginning to include voice recognition software in their later versions, it may be soon possible to find a (closed source) library in a Linux machine in the near future. Although it’s important to have a good recognition algorithm, it is more important to have access to its results. In most Windows software, this is never a high priority. It is more usual for them to adopt the “We’ll give you all the functionality we think you’ll need in one package,” whereas Linux uses the “Here are lots of tools we think you’ll need; you can work out how to produce the functionality ” method. Consequently, you will need to experiment with the software before purchase. The solution given here covers the use of the software built into Windows Vista. Begin by training the speech recognition system in Vista; then work through the tutorial, and install Windows Speech Recognition Macros, downloadable from the Microsoft web site (http://www.microsoft.com/downloads/details.aspx?FamilyID=fad62198-220c-4717-b044- 829ae4f7c125&displaylang=en). You next need to program a series of macros for the commands you want to use, such as “lights on” and “lights off.” Each macro will trigger a command; in our case, this will be wget to trick Apache into running the necessary code on our server. Figure 5-1 shows the macro configuration panel. CHAPTER 5 ■ COMMUNICATION 159 Figure 5-1. Preparing a voice macro under Vista. (Used with permission from Microsoft.) Naturally, the auth keyword is a misnomer, since anyone (from anywhere) could request the same page and trigger the command. However, by using the machine’s local IP address, the request will never leave your intranet, and by locking the Windows machine down, no one else could discover the secret key. 7 So, once again, you’re vulnerable only to those with physical access to the machines (also known as your family, who also has access to the light switch itself)! From here, the server code is trivial and expected: <?php $cmd = $HTTP_GET_VARS['cmd']; if ($cmd == "lightson") { system("heyu turn bedroom_light on"); 7 You can also set up a virtual host to respond only to machines on your intranet, so any requests from outside would be unable to access this file. CHAPTER 5 ■ COMMUNICATION 160 else if ($cmd == "lightsoff") { system("heyu turn bedroom_light off"); } ?> You can then work on abstracting and extending this at will. In Chapter 7 you’ll integrate this into a general-purpose message system. ■ Note Before investing heavily into voice recognition software, ensure that it can distinguish between whatever different voices can control the system, because a lot of software can listen only to a single, preselected voice since its primary purpose is dictation and not voice recognition. Note that most software of this type doesn’t provide access to the words you’ve actually spoken, just that the computer thinks there’s a higher probability of it being this one than that one. Although this gives you fewer opportunity for error, it also prevents the use of any analog or scaled commands, such as “dim to 72%.” Remote Voice Control Being able to use your voice in several different rooms of the house is a definite advantage. However, this adds new complexity since you must do one of the following: • Run a microphone from every room in the house back to the computer: You can purchase small audio mixers that will combine the inputs from multiple microphones quite cheaply. The most natural place is near light sockets and bulbs, since there’s already a cable running nearby. However, you will need to shield their cables to avoid mains hum. • Have a separate computer in each room, and process the data locally: This gives you the highest level of control since multiple people can talk to the server simultaneously, and the server is only processing request data, not audio data. This is more expensive, however, and requires that you’re able to hide a (small) PC in each room. In each case, the acoustics of each room will differ, so you might need to record your voice from difference places in the room. In old films, before the days of boom mics, the microphones used to be hidden inside large props such as radios or telephones so they could be positioned close enough to the actors to pick up their voices without extraneous noise. You can do the same on a smaller scale by mounting microphones (or even PCs!) inside a chair or under a table. The main consideration is then how you get the cables (for power and data) run back to the voice machine. If you’re starting a home automation project from scratch or are decorating, then you have the option of pulling up floor boards and running cables underneath. Such decisions are not to be taken lightly, however, particularly because maintenance is very costly! CHAPTER 5 ■ COMMUNICATION 161 ■ Note Old Bluetooth headsets and hands-free units were both expensive and bulky. They are now, however, much cheaper and can provide a sneaky way of adding wireless remote microphones throughout the house for capturing voice commands or security monitoring. For me, however, the second option is preferable because having a separate voice recognition machine isn’t as bad as it sounds. OK, so there’s a high cost involved and extra power issues, but since the machine has nothing else to do, it can exist without keyboard, mouse, or monitor and sit quietly, untouched, for many years without maintenance. Also, with the low-cost notebooks available, you can place (read: hide) one in two or more rooms with their own microphones, thereby eliminating most of the problems of audio acoustics you would otherwise encounter, along with the ponderings on how to wire microphones and their preamplifiers between rooms. The cost of the low-end machines preinstalled with Vista, which includes voice recognition software, is now not much more than the cost of a software license for some of the other packages. I hope those developers will soon realize this and the market they’re missing before this book’s second edition! Speech Synthesis This is the easy part of the problem, since the hard work has already been done for us, through a package called Festival (http://www.cstr.ed.ac.uk/projects/festival/). Festival began in 2004 from the Centre for Speech Technology Research (CSTR) at the University of Edinburgh where it still resides, although recent functionality has been provided by many sources, including Carnegie Melon University, because of its open source license. It generates words through a complex system of phonemes and prosodics and is able to handle the nuances of different languages by manipulating these dynamically with language- specific code, handled by Festival’s built-in Scheme interpreter. The basic install of Festival is available with most distributions, albeit with a limited set of voices. A quick study of /usr/share/festival will show you how many. These can be sampled by running Festival and using the interactive prompt: $ festival Festival Speech Synthesis System 1.96:beta July 2004 Copyright (C) University of Edinburgh, 1996-2004. All rights reserved. For details type `(festival_warranty)' festival> (SayText "Hello automation") #<Utterance 0xb6a8eff8> festival> (voice_lp_diphone) lp_diphone festival> (SayText "Hello automation") #<Utterance 0xb6c56ec8> festival> (quit) The brackets notation is because of the Scheme interpreter that’s processing the commands, and the lp_diphone reference is an alternative Italian female “voice” that’s often supplied by default. Before you go any further, write a short script to simplify the speaking process (apologies for the obvious English bias): CHAPTER 5 ■ COMMUNICATION 162 #!/bin/bash SPEAKER=/usr/share/festival/voices/english/$1 if [ -d $SPEAKER ]; then VOX=$voice_$1$ fi shift echo "$VOX (SayText \"" $* "\")" | festival pipe You can then call the following: say default Hello automation or the following to more easily switch to an alternate voice: say kal_diphone Hello automation For better voices, you need to look further afield at MBROLA. MBROLA is a (currently) binary-only back end to Festival that provides alternate voices to Festival, without needing to upgrade the Festival package itself. The install for the base MBROLA code, through Debian on an Intel-based system, is as follows: wget http://tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbrola3.0.1h_i386.deb sudo dpkg -i mbrola3.0.1h_i386.deb You then need to download new voice data to make use of this code. Several voices are available to us here, but the three main U.S centric ones are of primary interest here. I’ll demonstrate an install of us1, with us2 and us3 requiring the obvious changes to the URL: 8 wget -c http://tcts.fpms.ac.be/synthesis/mbrola/dba/us1/us1-980512.zip wget -c http://www.festvox.org/packed/festival/latest/festvox_us1.tar.gz unzip -x us1-980512.zip tar xvf festvox_us1.tar.gz The data can then be copied into the appropriate place, according to your distribution: # these require root privileges mkdir -p /usr/share/festival/voices/english/us1_mbrola/ mv us1 /usr/share/festival/voices/english/us1_mbrola/ mv festival/lib/voices/english/us1_mbrola/* /usr/share/festival/voices/english/us1_mbrola/ Of course, other distributions may package this for you, thus saving the work. 8 Detailed in full at http://ubuntuforums.org/showthread.php?t=751169 if you’d rather copy and paste . these dynamically with language- specific code, handled by Festival’s built-in Scheme interpreter. The basic install of Festival is available with most distributions, albeit with a limited set. has nothing else to do, it can exist without keyboard, mouse, or monitor and sit quietly, untouched, for many years without maintenance. Also, with the low-cost notebooks available, you can place. the Microsoft web site (http://www.microsoft.com/downloads/details.aspx?FamilyID=fad6219 8-2 20c-4717-b04 4- 829ae4f7c125&displaylang=en). You next need to program a series of macros for the

Định dạng
Số trang	5
Dung lượng	258,67 KB