CHAPTER 6 ■ DATA SOURCES 203 if (stristr($rss->items[$i]['title'], "nasa")) system("sendsms myphone "+$rss->items[$i]['description']); This can be particularly useful for receiving up-to-minute sports results, lottery numbers, or voting information from the glut of reality TV shows still doing the rounds on TV stations the world over. Even if it requires a little intelligent pruning to reduce the pertinent information into 140 octets (in the United States) or 160 characters (in Europe, RSA, and Oceania), which is the maximum length of a single unconcatenated text message, it will be generally cheaper than signing up for the paid-for services that provide the same results. Retrieving Data: Pull This encompasses any data that is purposefully requested when it is needed. One typical example is the weather or financial information that you might present at the end of the news bulletin. In these cases, although the information can be kept up-to-date in real time by simulating a push technology, few people need this level of granularity—once a day is enough. For this example, you will use the data retrieved from an online API to produce your own currency reports. This can be later extended to generate currency conversion tables to aid your holiday financing. The data involved in exchange rates is fairly minimal and consists of a list of currencies and the ratio of conversion between each of them. One good API for this is at Xurrency.com. It provides a SOAP-based API that offers up-to-date reports of various currencies. Which specific currencies can vary over time, so Xurrency.com has thoughtfully provided an enumeration function also. If you’re using PHP and PHP- SOAP, then all the packing and unpacking of the XPI data is done automatically for you so that the initialization of the client and the code to query the currency list is simply as follows: $client = new SoapClient("http://xurrency.com/api.wsdl"); $currencies = $client->getCurrencies(); The getCurrencies method is detailed by the Web Services Description Language (WSDL). This is an XML file that describes the abstract properties of the API. The binding from this description to actual data structures takes place at each end of the transfer. Both humans and machines can use the WSDL to determine how to utilize the API, but most providers also include a human-friendly version with documentation and examples, such as the one at http://xurrency.com/api. This getCurrencies method results in an array of currency identifiers (eur for Euro, usd for U.S. dollars, and so on) that can then be used to find the exchange rates. $fromCurrency = "eur"; $toCurrency = "usd"; $toTarget = $client->getValue(1, $fromCurrency, $toCurrency); $fromTarget = $client->getValue(1, $toCurrency, $fromCurrency); Remember that the conversion process, in the real world, is not symmetrical, so two explicit calls have to be made. You can then generate a table with a loop such as the following: $fromName = $client->getName($fromCurrency); $toName = $client->getName($toCurrency); CHAPTER 6 ■ DATA SOURCES 204 for($i=1;$i<=20;++$i) { print "$i $fromName = ".round($i*$toTarget, 2)." $toName\n"; } Or you can store the rates in a file for comparison on successive days. (Note the PHP use of @ in the following example to ignore errors that might be generated by an inaccessible or nonexistent file.) $currencyDir = "/var/log/myhouse/currency"; $yesterdayRate = @file_get_contents("$currencyDir/$toCurrency"); $message = "The $fromName has "; if ($exchangeRate > $yesterdayRate) { $message .= "strengthed against the $toName reaching ".$exchangeRate; } else if ($exchangeRate < $yesterdayRate) { $message .= "lost against the $toName dropping to ".$exchangeRate; } else { $message .= "remained steady at ".$exchangeRate; } @file_put_contents("$currencyDir/$toCurrency", $exchangeRate); In all cases, you write the current data into a regularly updating log file, as you did with the weather status, for the same reasons—that is, to prevent continually requerying it. However, with the financial markets changing more rapidly, you might want to update this file several times a day. Private Data Most of us have personal data on computers that are not owned or controlled by us. Even though the more concerned of us 10 try to minimize this at every turn, it is often not possible or convenient to do so. Furthermore, there are (now) many casual Linux users who are solely desktop-based and aren’t interested in running their own remote servers and will gladly store their contact information, diary, and e-mail on another computer. The convenience is undeniable—having your data available from any machine in the world (with a network connection) provides a truly location-less digital lifestyle. But your home is not, generally, location-less. Therefore, you need to consider what type of useful information about yourself is held on other computers and how to access it. Calendar Groupware applications are one of the areas in which Linux desktop software has been particularly weak. Google has entered this arena with its own solution, Google Calendar, which links into your e- mail, allowing daily reminders to be sent to your inbox as well as to the calendars of other people and groups. 10 “Concerned” is the politically correct way of saying “paranoid.” CHAPTER 6 ■ DATA SOURCES 205 Calendar events that occur within the next 24 hours can also be queried by SMS, and new ones can be added by sending a message to GVENT (48368). Currently, this functionality is available only to U.S. users but is a free HA feature for those it does affect. The information within the calendar is yours and available in several different ways. First, and most simply, it can be embedded into any web page as an iframe: <iframe src="http://www.google.com/calendar/embed? my_email_address %40gmail.com&ctz=Europe/London" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"></iframe> This shows the current calendar and allows to you edit existing events. However, you will need to manually refresh the page for edits to become visible, and new events cannot be added without venturing into the Google Calendar page. The apparent security hole that this public URL opens is avoided, since you must already be signed into your Google account for this to work; otherwise, the login page is shown. Alternatively, if you want your calendar to be visible without signing into your Google account, then you can generate a private key that makes your calendar data available to anyone that knows this key. The key is presented as a secret URL. To discover this URL, go the Settings link at the top right of your Google Calendar account, and choose Calendars. This will open a list of calendars that you can edit and those you can’t. Naturally, you can’t choose to expose the details of the read-only variants. So, select your own personal calendar, and scroll down to the section entitled Private Address. The three icons on the right side, labeled XML, ICAL, and HTML, provide a URL to retrieve the data for your calendar in the format specified. A typical HTML link looks like this: http://www.google.com/calendar/embed?src=my_email_address %40gmail.com&ctz=Europe/London&pvttk=5f93e4d926ce3dd2a91669da470e98c5 The XML version is as follows: http://www.google.com/calendar/feeds/my_email_address %40gmail.com/private-5f93e4d926ce3dd2a91669da470e98c5/basic The ICAL version uses a slightly different format: http://www.google.com/calendar/ical/my_email_address %40gmail.com/private-5f93e4d926ce3dd2a91669da470e98c5/basic.ics The latter two are of greater use to us, since they can be viewed (but not edited) in whatever software you choose. If you’re not comfortable with the XML processing language XSLT, then a simple PHP loop can be written to parse the ICAL file, like this: $regex = "/BEGIN:VEVENT.*?DTSTART:[^:]*:([^\s]*).*?SUMMARY:([^\n]*) .*?END:VEVENT/is"; preg_match_all($regex, $contents, $matches, PREG_SET_ORDER); CHAPTER 6 ■ DATA SOURCES 206 for($i=0;$i<sizeof($matches);++$i) { // $matches[$i][1] holds the entire ICAL event // $matches[$i][1] holds the time // $matches[$i][2] holds the summary } The date format in ICAL can be stored in one of three formats: • Local time • Local time with time zone • UTC time You need not worry about which version is used, since you can use the existing PHP library functions, such as this: $prettyDate = strftime("%A %d %b %Y.", strtotime($matches[$i][1])); ■ Note Be warned that the XML version of your data includes back references to your calendar, which include your private key. Naturally, other online calendar applications exist, offering similar functionality. This version is included as a guide. But having gotten your data onto your own machine, you can trigger your own e- mail notifications, send SMS messages to countries currently unsupported by Google, or automatically load the local florist’s web page when the words grandma and birthday appear. Webmail Most of today’s workforce considers e-mail on the move as a standard feature of office life. But for the home user, e-mail falls into one of two categories: • It is something that is sent to their machine and collected by their local client (often an old version of Outlook Express); consequently, it’s unavailable elsewhere. • It is a web-based facility, provided by Yahoo!, Hotmail, or Google, and can be accessed only through a web browser. Although both statements are (partially) correct, it does hide extra functionality that can be provided very cheaply. In the first case, you can provide your own e-mail server (as I covered in Chapter 5) and add a webmail component using software such as AtMail. This allows your home machine to continue being in charge of all your mail, except that you don’t need to be at home to use it. Alternatively, you can use getmail to receive your webmail messages through an alternate (that is, non-web) protocol. First, you need to ensure that your webmail provider supports POP3 access. This isn’t always easy to find or determine, since the use of POP3 means you will no longer see the ads on CHAPTER 6 ■ DATA SOURCES 207 their web pages. But when it is available, it is usually found in the settings part of the service. All the major companies provide this service, although not all are free. • Hotmail provides POP3 access by default, making it unnecessary to switch on, and after many years of including this only on its subscription service, now Hotmail provides it for free. The server is currently at http://pop3.live.com. • Google Mail was the first to provide free POP3 access to e-mail, from http://pop.gmail.com. Although now most accounts are enabled by default, some older ones aren’t. You therefore need to select Settings and Forwarding and POP/IMAP. From here you can enable it for all mail or any newly received mail. • Yahoo! provides POP3 access and forwarding to their e-mail only through its Yahoo! Plus paid-for service. A cheat is available on some services (although not Yahoo!) where you forward all your mail to another service (such as Hotmail or Gmail) where free POP services are available! Previously, there was a project to process HTML mail directly, eliminating the need to pay for POP3 services. This included the now defunct http://httpmail.sourceforge.net. Such measures are (fortunately) no longer necessary. Once you know the server on which your e-mail lives, you can download it. This can be either for reading locally, for backup purposes, or for processing commands sent in e-mails. Although most e-mail software can process POP3 servers, I use getmail. apt-get install getmail4 I have this configured so that each e-mail account is downloaded to a separate file. I’ll demonstrate with an example, beginning with the directory structure: mkdir ~/.getmail mkdir ~/externalmail touch ~/externalmail/gmail.mbox touch ~/externalmail/hotmail.mbox touch ~/externalmail/yahoo.mbox and then a separate configuration file is created for each server called ~/.getmail/getmail.gmail, which reads as follows: [retriever] type = SimplePOP3SSLRetriever server = pop.gmail.com username = my_email_address@gmail.com password = my_password [destination] type = Mboxrd path = ~/externalmail/gmail.mbox [options] verbose = 2 message_log = ~/.getmail/error.log CHAPTER 6 ■ DATA SOURCES 208 If you’d prefer for them to go into your traditional Linux mail box, then you can change the path to the following: path = /var/mail/steev You can then retrieve them like this and watch the system download the e-mails: getmail -r getmail.gmail Some services, notably Google Mail, do not allow you to download all your e-mails at once if there are a lot of them. Therefore, you need to reinvoke the command. This helps support the bandwidth of both machines. ■ Tip If you have only one external mail account, then calling your configuration file getmailrc allows you to omit the filename arguments. You can then view these mails in the client of your choice. Here’s an example: mutt -f ~/externalmail/gmail.mbox Make sure you let getmail finish retrieving the e-mails; otherwise, you will get two copies of each mail in your inbox. If you are intending to process these e-mails with procmail, as you saw in Chapter 5, then you need to write the incoming e-mail not to the inbox but to procmail itself. This is done by configuring the destination thusly: [destination] type = MDA_external path = /usr/bin/procmail unixfrom = True Twitter The phenomenon that is Twitter has allowed the general public to morph into self-styled microcelebrities as they embrace a mechanism of simple broadcast communication from one individual to a set of many “followers.” Although communications generally remain public, it is possible to create a list of users so that members of the same family can follow each other in private. One thing that Twitter has succeeded in doing better than most social sites is that it has not deviated from its original microblogging ideals, meaning that the APIs to query and control the feeds have remained consistent. This makes it easy for you (or your house) to tweet information to your feeds or for the house to process them and take some sort of action based upon it. In all cases, however, you will have to manually sign up for an account on behalf of your house. CHAPTER 6 ■ DATA SOURCES 209 Posting Tweets with cURL The Twitter API uses an HTTP request to upload a new tweet, with the most efficient implementation being through cURL, the transfer library for most Internet-based protocols, including HTTP. $host = "http://twitter.com/statuses/update.xml?status="; $host .= urlencode(stripslashes(urldecode($message))); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $host); curl_setopt($ch, CURLOPT_VERBOSE, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERPWD, "$username:$password"); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt($ch, CURLOPT_HTTPHEADER, array('Expect:')); curl_setopt($ch, CURLOPT_POST, 1); $result = curl_exec($ch); curl_close($ch); This example uses PHP (with php5-curl), but any language with a binding for libcurl works in the same way. You need only to fill in your login credentials, and you can tweet from the command line. Reading Tweets with cURL In the same way that tweets can be written with a simple HTTP request, so can they be read. For example: $host = "http://twitter.com/statuses/friends_timeline.xml?count=5"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $host); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERPWD, "$username:$password"); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); $result = curl_exec($ch); curl_close($ch); This returns all the information available regarding the most recent tweets (including your own) with full information on the user (such as their name, image, and followers count), message, and the in- reply data (featuring status, user, and screen name). This is more than you’ll generally need, but it’s a good idea in API design to never lose information if possible—it’s easier to filter out than it is to add back in. You can use this code to follow tweets when offline by using the computer to intercept suitably formatted tweets and sending them on with SMS transmit code. CHAPTER 6 ■ DATA SOURCES 210 Reading Tweets with RSS The very nature of Twitter lends itself to existing RSS technology, making customized parsers unnecessary. The URL for the user 1234 would be as follows: http://twitter.com/statuses/user_timeline/1234.rss which could be retrieved and processed with XSLT or combined with the feeds from each family member into one for display on a house notice board. The results here are less verbose than their cURL counterparts, making it easier to process, at the expense of less contextual information. Facebook Although Twitter has adopted a broadcast mechanism, Facebook has continued to focus on the facilitation of a personal network with whom you share data. For HA, you are probably more interested in sharing information with friends than strangers, so this can be the better solution. However, writing an app that uses Facebook has a higher barrier to entry with comparatively little gain. It does, by way of compensation, provide a preexisting login mechanism and is a web site that many people check more often than their e-mail, so information can be disseminated faster. However, Facebook does change its API periodically, so what works one day might not work the next, and you have to keep on top of it. If you are using Facebook as a means of allowing several people to control or view the status of your home, it is probably easier to use your own home page, with a set of access rights, as you saw in Chapter 5. If you’re still sold on the idea of a Facebook, then you should install the Developer application and create your own app key with it. This will enable your application to authenticate the users who will use it, either from within Facebook or on sites other than Facebook through Facebook Connect. (A good basic tutorial is available at www.scribd.com/doc/22257416/Building-with-Facebook-Social-Dev-Camp- Chicago-2009.) To keep it private amongst your family, simply add their ID as developers. If you want to share information with your children, getting them to accept you as a Facebook friend can be more difficult, however! In this case, you might have to convince them to create a second account, used solely for your benefit. Facebook doesn’t allow you to send messages to users who haven’t installed the app (or are included in the list of developers), so this requires careful management. The technical component is much simpler, by comparison, because Facebook provides standard code that can be copied to a directory on your web server and used whenever your app is invoked from within Facebook. It is then up to you to check the ID of the user working with your app to determine what functionality they are entitled to and generate web pages accordingly. You can find a lot of useful beginning information on Facebook’s own page at http://developers.facebook.com/get_started.php. Automation With this information, you have to consider how it will be used by the house. This requires development of a most personal nature. After all, if you are working shifts, then my code to control the lights according to the times of sunrise and sunset will be of little use to you. Instead, I will present various possibilities and let you decide on how best to combine them. CHAPTER 6 ■ DATA SOURCES 211 Timed Events Life is controlled by time. So, having a mechanism to affect the house at certain times is very desirable. Since a computer’s life is also controlled by time, there are procedures already in place to make this task trivial for us. Periodic Control with Cron Jobs These take their name from the chronological job scheduler of Unix-like operating systems, which automatically executes a command at given times throughout the year. There is a file, known as the crontab, which has a fine level of granular control regarding these jobs, and separate files exist for each user. You can edit this file belonging to the current user (calling export EDITOR=vi first if necessary) with the following: crontab -e There is also a –u option that allows root to edit the crontab of other users. A typical file might begin with the following: # m h dom mon dow command 00 7 * * 1-5 /usr/local/minerva/etc/alarm 1 10,15 7 * * 1-5 /usr/local/minerva/etc/alarm 2 */5 * * * * /usr/local/bin/getmail quiet The # line is a comment and acts as a reminder of the columns; minutes, hours, day of month (from 1 to 31), month (1 to 12, or named by abbreviation), day of week (0 to 7, with Sunday being both 0 and 7), and the command to be executed. Each column supports the use of wildcards (* means any), inclusive ranges (1–5), comma-delimited sequences (occurring at 10 and 15 only), and periodic (*/5 indicates every five minutes in this example). The cron program will invoke the command if, and only if, all conditions can be met. Typical uses might be as follows: • An alarm clock, triggering messages, weather reports, or news when waking up • Retrieving e-mail for one or more accounts, at different rates • Initiating backups of local data, e-mail, or projects • Controlling lights while on holiday • Controlling lights to switch on, gradually, when waking up • Real-life reminders for birthdays, anniversaries, Mother’s Day, and so on Since these occur under the auspices of the user (that is, owner) of the crontab, suitably permissions must exist for the commands in question. CHAPTER 6 ■ DATA SOURCES 212 ■ Note Many users try to avoid running anything as root, if it is at all possible. Therefore, when adding timed tasks to your home, it is recommended you add them to the crontab for a special myhouse user and assign it only the specific rights it needs. The crontab, as provided, is accurate to within one minute. If you’re one of the very few people who need per-second accuracy, then there are two ways of doing it. Both involve triggering the event on the preceding minute and waiting for the required number of seconds. The first variation involves changing the crontab to read as follows: 00 7 * * 1-5 sleep 30; /usr/local/minerva/etc/alarm 1 The second involves adding the same sleep instruction to the command that’s run. This can be useful when controlling light switches in a humanistic way, since it is rare to take exactly 60 seconds to climb the stairs before turning the upstairs light on. For randomized timing, you can sleep for a random amount of time (sleep `echo $((RANDOM%60))s`) before continuing with the command, as you saw in Chapter 1. There will also be occasions where you want to ignore the cron jobs for a short while, such as disabling the alarm clock while we’re on holiday. You can always comment out the lines in the crontab to do this or change the command from this: /usr/local/minerva/etc/alarm 1 to the following: [ -f ~/i_am_on_holiday ] || /usr/local/minerva/etc/alarm 1 The first expression checks for the existence of the given file and skips the alarm call if it exists. Since this can be any file, located anywhere, it doesn’t need to belong to the crontab owner for it to affect the task. One possible scenario would be to use Bluetooth to watch for approaching mobile devices, creating a file in a specific directory for each user (and deleting it again, when they go out of range, that is, have left the house). Once everyone was home, a cron job set to check this directory every minute could send an e-mail reminding you to leave the computer and be socialable! For more complex timing scenarios, you can use cron to periodically run a separate script, say every minute. If you return to the “next train” script from earlier, you could gain every last possible minute at home by retrieving the first suitable train from here: NEXT_TRAIN=`whattrain.pl 30 35 | head -n 1` In this scenario, a suitable train is one that leaves in 30 to 35 minutes, which gives you time to get ready. If this command produces an output, then you can use the speech synthesizer to report it: if [ `echo $NEXT_TRAIN | wc -l` -ne 0 ]; then say default $NEXT_TRAIN fi The same script could be used to automatically vary the wake-up time of your alarm clock! . means of allowing several people to control or view the status of your home, it is probably easier to use your own home page, with a set of access rights, as you saw in Chapter 5. If you’re still. your data available from any machine in the world (with a network connection) provides a truly location-less digital lifestyle. But your home is not, generally, location-less. Therefore, you. using software such as AtMail. This allows your home machine to continue being in charge of all your mail, except that you don’t need to be at home to use it. Alternatively, you can use getmail