Apress - Smart Home Automation with Linux (2010)- P41 pptx

CHAPTER 5 ■ COMMUNICATION 183 Post: Array ( [sender] => 012345678 [content] => Null Wow this might work, you know! [inNumber] => 447786202240 [submit] => Submit [network] => UNKNOWN [email] => none [keyword] => NULL [comments] => Wow this might work, you know! ) Both contain enough information to let you switch your lights on with a text message. The code is trivial as follows: if ($_POST['from'] == "012345678") { if ($_POST['text'] == "bedroom on") { system("/usr/local/bin/heyu turn bedroom_light on"); } else if ($_POST['text'] == "bedroom off") { system("/usr/local/bin/heyu turn bedroom_light off"); } } To eliminate the sending of fiddly text messages (and perhaps save money), you can test future permutations of this script with a simple web page. Using the simpler format, you can write code such as the following: <form action="echo.php" method="POST"> <input name="from" value="phone num"> <input name="text" value="your message here"> <input name="msgid" value="" type="hidden"> <input name="type" value="1" type="hidden"> <input type="submit" value="Send Fake SMS"> </form> Being on an open web server, there are some security issues. You eliminate one by having the phone number verified by a piece of code on the server (never validate credentials on the client). You can further limit another issue (although not eliminate it) by changing your simply named echo.php script to iuytvaevew.php, employing security through obscurity so that it is not accidentally found. Some providers will call your web page using HTTPS, which is the best solution and worth the extra time in setting up a specific username and password for them. You can rebalance the concepts of security and accessibility by allowing multiple phones to access the house, by creating a white list of mobile phone numbers, and by adding to this list explicitly. Or you could ban any access to your page from an IP that isn’t similarly approved and known to be your gateway provider. If you were likely to be communicating a lot through SMS, you could automatically add new phone numbers to a pending list of preapproved devices, which in turns sends a notification message to the SMS administrator, where they can issue a special command to add them onto the list. If your facilities allow, having a physical mobile phone connected through Gnokii may be useful in emergencies when you have no Internet connectivity and you want to be informed that the automatic power cycling of the router (with a AW12 perhaps, as mentioned in Chapter 1) is in progress. CHAPTER 5 ■ COMMUNICATION 184 Conclusion With so many ways of communicating into and out of a system, you must begin with a solid framework. My method is to separate the input systems from the processing, allowing any input mechanism (mobile phone, e-mails, or web interface) to generate a command in a known common format that can be processed by a single script. In a similar way, all messages are sent to a single script that then formats the message in a particular format, suitable for the given communication channel. You can also add an automatic process upon receipt of any, or all, of these messages. So, once you have code to control a video, light switch, or alarm clock, you can process them in any order to either e-mail your video, SMS your light switch, speak to your alarm clock, or do any combination thereof. C H A P T E R 6 ■ ■ ■ 185 Data Sources Making Homes Smart Although being able to e-mail your light switch is very interesting and infinitely cooler than programming yet another version of “Hello, World,” it never feels like an automatic house. After all, you as a human are controlling it. By providing your house with information about the real world, it is then able to make decisions for itself. This is the distinction between an automated home and a smart home. Why Data Is Important For years, the mantra “Content is king” has been repeated in every field of technology. Although most of the data in your home automation environment so far has been generated from your own private living patterns, there is still a small (but significant) amount of data that you haven’t generated, such as TV schedules. I’ll now cover this data to see what’s available and how you can (legally) make use of it. Legalities All data is copyrighted. Whether it is a table of rainfall over the past 20 years or the listing for tonight’s TV, any information that has been compiled by a human is afforded a copyright. The exception is where data has been generated by a computer program, in which case the source data is copyrighted by the individual who created it, and the copyright to the compiled version is held by the person who facilitated the computer to generate it, usually the person who paid for the machine. Unfortunately, all useful data falls into the first category. Even when the data is made publicly available, such as on a web site, or when it appears to be self-evident (such as the top ten music singles), the data still has a copyright attached to it, which requires you to have permission to use it. 1 Depending on jurisdiction, copyright will traditionally lapse 50 or 75 years after the death of the last surviving author. However, with the introduction of new laws, such as the Sonny Bono Copyright Term Extension Act, even these lengthy periods may be extended. In this field, the data becomes useless before it becomes available, which is unfortunate. 1 IANAL: I am not a lawyer, and all standard disclaimers apply here! CHAPTER 6 ■ DATA SOURCES 186 Fortunately, there are provisions for private use and study in most countries that allow you to process this data for your own personal use. Unfortunately, this does not include redistributing the data to others or manipulating the data into another format. This, from a purely technical and legal point of view, means that you can’t do the following: • Provide the data to others in your household. They have to download it themselves. This includes reproducing the information on a home page or distributing a TV or radio signal to other machines. • Improve the format of the data and provide it to others who are technically unable to do the same. This includes parsing the data from one web site to show it in a more compact format at home. There is even a questionable legality in some areas over whether you are allowed to provide tools that improve or change the format of existing (copyrighted) data. Fortunately, most companies turn a blind eye to this area, as they do for the internal distribution of data to members of your household—not that they’d know, or be able to prove, it if you did. The larger issue has to do with improvements to the data, since most data is either too raw or too complex to be useful. Let’s take a web site containing the weather forecast as an example; the raw data might include only the string “rain, 25,” which would need to be parsed into a nice icon and a temperature bar to be user-friendly. A complex report could include a friendly set of graphics on the original site but make the original data set unavailable to anyone else who either tries to load the report from another site through deep linking or tries to reference the source table data used to build the image. Screen Scraping This is the process whereby a web page is downloaded by a command-line tool, such as wget or cURL, and then processed by an HTML parser so that individual elements can be read and extracted from it. This is the most legally suspect and most troublesome method of processing information. It is the most suspect because you are downloading copyrighted content from a site in a manner that is against the site’s terms and conditions—so much so that, until fairly recently, one famous weather site labeled its images as please_dont_scrape_this_use_the_api.gif! Scraping is troublesome because it is very difficult to accurately parse a web page for content. It is very easy to parse the page on a technical level because the language is computer-based, and parsers already exist. It is also very easy for a user to parse the rendered page for the data, because the eye human will naturally seek out the information it desires. But knowing that the information is in the top- left corner of the screen is a very difficult thing for a machine to assess. Instead, most scrapers will work on a principle of blocking. This is where the information is known to exist in a particular block, determined beforehand by a programmer, and the parser blindly copies data from that block. For example, it will go to the web page, find the third table, look in the fifth column and second row, and read the data from the first paragraph tag. This is time-consuming to determine but easy to parse. It is troublesome because any breakages in the HTML format itself (either introduced intentionally by the CHAPTER 6 ■ DATA SOURCES 187 developers or introduced accidentally because of changes in advertising 2 ) will require the script to be modified or rewritten. Because of the number of different languages and libraries available to the would-be screen-scraper and the infinite number of (as yet undetermined) formats into which you’d like to convert the data, there isn’t really a database of known web sites with matching scraping code. To do so would be a massive undertaking. However, if you’re unable to program suitable scraping code, it might be best to seek out local groups or those communities based around the web site in question, such as TV fan pages. Any home will generally have a large number of data sources, and trying to maintain scrapers for each source will be time-consuming if you attempt it alone. The mechanics of scraping are best explained with an example. In this case, I’ll use Perl and the WWW::Mechanize and HTML::TokeParser modules. Begin by installing them in any way suitable for your distribution. I personally use the CPAN module, which generally autoconfigures itself upon invocation of the cpan command. Additional mirrors can be added by adding to the URL list like this: o conf urllist push ftp://ftp-mirror.internap.com/pub/CPAN/ o conf commit This is then followed by the installation of the modules themselves: perl -MCPAN -e 'install WWW::Mechanize' perl -MCPAN -e 'install HTML::TokeParser' Lest I advocate scraping a page of a litigious company, I will provide an example using my own Minerva site to retrieve the most recent story from the news page at http://www.minervahome.net/news.htm. Begin by loading the page in a web browser to get a feel for the page layout and to see where the target information is located. Also, review other pages to see whether there’s any commonality that can be exploited. You can do this by reviewing the source (as either a whole page or with a “view source selection” option) or enlisting the help of Firebug 3 to highlight the tables and subcomponents within the table. Then look for any “low-hanging fruit.” These are the easily solved parts of a problem, so you might find the desired text inside a specially named div element or included inside a table with a particular id attribute. Many professionally designed web sites do this to make redesigns quicker and unwittingly help the scraper. If there are no distinguishing features around the text, look to the elements surrounding it. And the elements surround those. Work outward until you find something unique enough to be of interest or you reach the root html node. If you’ve found nothing unique, then you will have to describe the data with code like “in the first row and second column of the third table.” 2 And although the Web exists as a free resource for information, someone will be paying for advertising space to offset the production costs. 3 Firebug is an extension to Firefox that allows web developers (and curious geeks) full access to the inner workings of the web pages that appear in the browser. . ftp://ftp-mirror.internap.com/pub/CPAN/ o conf commit This is then followed by the installation of the modules themselves: perl -MCPAN -e 'install WWW::Mechanize' perl -MCPAN -e 'install. providing your house with information about the real world, it is then able to make decisions for itself. This is the distinction between an automated home and a smart home. Why Data Is Important. (with a AW12 perhaps, as mentioned in Chapter 1) is in progress. CHAPTER 5 ■ COMMUNICATION 184 Conclusion With so many ways of communicating into and out of a system, you must begin with

Định dạng
Số trang	5
Dung lượng	278,79 KB