Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 103 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
103
Dung lượng
1,99 MB
Nội dung
Configuring Mail Services Using the Postfix Mail Server After you read this section, you might decide that Sendmail is far too complex and baffling, or at least more complicated than Postfix While Sendmail might surpass Postfix in terms of configurability and features, this is only true of corner cases, that is, of extremely obtuse or unusual mail system configurations Postfix is used every day at sites that handle thousands and tens of thousands of messages per day, so Postfix probably provides all of the functionality you’ll need with a fraction of the frustration and aggravation that accompanies learning arcane configuration hieroglyphics The best part is that Postfix is fully compatible with Sendmail at the command level For example, the command to regenerate the Postfix alias database is newaliases, the same name that Sendmail uses; the primary Postfix daemon is named sendmail, just like Sendmail The similarity is deliberate, for Postfix was designed to be a high-performance, easier-to-use replacement for Sendmail A single example might illustrate why it is easier to configure and use As you learned in the previous section, the (unintuitive) Sendmail syntax for defining a mail relay host is: DSrelay.example.com Postfix’s syntax is eminently clearer: relayhost = relay.example.com If this admittedly simple example doesn’t convince you that Postfix is easier to configure and use, consider this: it is Sendmail, not Postfix, that needs a meta-configuration-language (the m4 macros described earlier) to generate or modify configuration files Switching to Postfix By default, Fedora Core and RHEL use Sendmail Switching to Postfix is simple, but before doing so, stop Sendmail: # service sendmail stop You want to stop Sendmail before changing anything so that no incoming mail gets stuck in Sendmail’s queues before it is delivered Of course, you should make sure that Postfix is installed: $ rpmquery postfix postfix-2.2.2-2 479 480 Chapter 21 Install Postfix before proceeding Otherwise, to make Postfix your MTA, click Main Menu ➪ Preferences ➪ More Preferences ➪ Mail Transport Agent Switcher or execute the command system-switch-mail at a command prompt Either way, you should see the dialog box shown in Figure 21-1 Click the Postfix radio button and then click OK to save your change and close the dialog box After the change is applied, you will see the confirmation message shown in Figure 21-2 Click OK to proceed The Mail Transport Agent Switcher does most of the heavy lifting for you, so most of what you need to is tweak and fine-tune the Postfix configuration, as described in the next section Configuring Postfix Postfix’s primary configuration file is /etc/postfix/main.cf You will need to check or edit at least the following variables: ■■ The mydomain variable specifies your domain name: mydomain = example.com ■■ The myhostname variable identifies the local machine’s fully qualified domain name: myhostname = coondog.example.com ■■ The myorigin variable identifies the domain name appended to unqualified addresses (that is, usernames without the @example.com goober attached): myorigin = $mydomain This causes all mail going out to have your domain name appended Thus, if the value of mydomain is possum_holler.com and your username is bubba, then your outgoing mail will appear to come from bubba@possum_holler.com ■■ The mydestination variable tells Postfix what addresses it should deliver locally For a standalone workstation, which is a system that is connected directly to the Internet and that has some sort of domain name resolution running, you want mail to that machine and to localhost (and/or localhost.$mydomain and/or localhost locadomain) delivered locally, so the following entry should suffice: mydestination = $myhostname, localhost, localhost.$mydomain Postfix supports a larger number of configuration variables than the four just listed, but these are the mandatory changes you have to make Configuring Mail Services Figure 21-1 The Mail Transport Agent Switcher Figure 21-2 Successfully changing the MTA Create or modify /etc/aliases At the very least, you need aliases for Postfix, postmaster, and root in order for mail sent to those addresses to get to a real person Here are the contents of my initial /etc/aliases file: postfix: root postmaster: root root: bubba After creating or modifying the aliases file, regenerate the alias database using Postfix’s newaliases command: # /usr/sbin/newaliases You are finally ready to start Postfix: # service postfix start Starting postfix: [ OK ] Make sure that Postfix will start when you boot the system This should be taken care of by the MTA switching tool, but it never hurts to double-check You can use the chkconfig commands shown in the following example: # chkconfig levels 0123456 sendmail off # chkconfig levels 0123456 postfix off # chkconfig levels 2345 postfix on 481 482 Chapter 21 Finally, modify your syslog configuration to handle Postfix log messages appropriately Many system administrators, including the authors, prefer that mail log messages go to their own files to avoid cluttering up the primary system log So, we use the following entries in /etc/syslog.conf, which controls the system log: *.info;*.!warn;authpriv.none;cron.none;mail.none; *.warn;authpriv.none;cron.none;mail.none; mail.*;mail.!err mail.err -/var/log/messages -/var/log/syslog -/var/log/mail.log -/var/log/mail.err The first two lines keep any mail-related messages from being logged to /var/log/messages and /var/log/syslog The third line logs everything but errors to /var/log/mail.log The last line drops all error messages from Postfix into /var/log/mail.err The - character before each filename tells the system logging daemon, syslogd, to use asynchronous writes, which means that the logging daemon does not force log messages out to the specified file before returning control to the system This measure helps Postfix run somewhat faster, especially on a heavily loaded system but can lead to data loss if the machine crashes before buffered data is flushed to disk Naturally, you have to restart syslogd syslogd to cause these changes to take effect: # service syslog restart At this point, you have a basic, functional Postfix installation There is a great deal more customization that you can and might want to do, but what has been covered here should get you started and offer some insight into the simplicity of Postfix installation and configuration Running Postfix behind a Firewall or Gateway If the system on which you run Postfix is behind a firewall, uses a mail host, or otherwise lacks a direct or constant Internet connection, you probably want to define a relay host that handles your system’s outbound email In this case, Postfix will simply hand off locally generated email to the relay host, which must be configured to relay for you For a system that sits on an internal network and that doesn’t have a direct connection to the Internet, add the following entries to /etc/postfix/main.cf: relayhost = mailhost.$mydomain disable_dns_lookups = yes Configuring Mail Services mailhost.$mydomain (replace mailhost with the actual name of the relay host) handles actual mail delivery If you don’t run DNS on your internal network, the second line prevents Postfix’s SMTP client from performing DNS lookups and instead causes Postfix to retrieve the IP address for the relay host from /etc/hosts, so make sure that /etc/hosts contains the fully qualified domain name, IP address, and alias (if one exists) for the relay host you specify T I P You can also specify the relay host’s IP address in /etc/postfix/main.cf using the syntax: relayhost = [192.168.0.1] Notice that the IP address is enclosed in square brackets The square brackets implicitly disable DNS lookups That is, the square brackets imply disable_dnl_lookups = yes If you make these (or other) changes to the Postfix configuration file, you have to tell Postfix about them Use the following command to so: # service postfix reload Reloading postfix: [ OK ] The next section, “Running Postfix on a Mail Host,” shows you how to create a mail host that handles incoming mail for the systems on your network Running Postfix on a Mail Host At the end of the previous section, you configured Postfix to use a mail host, sometimes called a smart host, mail hub, or mail relay, for delivering outbound mail In this section, you configure the mail host to process outbound mail for such client systems This configuration assumes that the relay host, named mailbeast (just an example), is the sole point of entry and exit for all email traffic entering the network from the Internet and exiting the network from client systems As you did on the client systems, you need to set the following configuration variables on mailbeast: ■ ■ $myhostname ■ ■ $mydomain ■ ■ $myorigin ■ ■ $mydestination 483 484 Chapter 21 In addition, mailbeast needs to be told for which systems it can relay mail Doing so involves setting two additional configuration variables, $mynetworks and $relay_domains $mynetworks defines a list of trusted SMTP clients, that is, the list of clients that Postfix will allow to relay mail $relay_domains defines the destinations to which Postfix will relay mail Define $mynetworks using an explicit list of network/netmask patterns Consider the following $mynetworks setting: mynetworks = 192.168.0.0/24, 127.0.0.0/8 T I P If you have trouble deriving the appropriate netmask to use, remember the ipcalc tool introduced Chapter 12 This directive states that any system with an IP address in the range 192.168.0.1 to 192.168.0.254 or in the loopback network can relay mail through the Postfix server You need to use values that reflect your internal network, of course Where $mynetworks defines who is permitted to relay using the Postfix server, $relay_domains identifies to where email can be relayed By default, Postfix relays mail to any address that matches $mynetworks and $mydestination (the default value of $relay_domains is $mydestination) To add relay destinations, specify a comma- or spacedelimited list of hostnames or domains For example, the following directive allows relays to $mydestination, the domain example.com (and any subdomain of example.com), and the host mailbeast.otherexample.com: relay_domains = $mydestination, example.com, mailbeast.otherexample.com Notice how the long line is continued using white space at the beginning of the next line After making these changes, use the reload command shown earlier (service postfix reload) Serving Email with POP3 and IMAP The mail system configurations discussed so far assumed that all systems on your network run some sort of MTA Obviously, this is an unwarranted assumption For example, Windows systems used as desktop network clients ordinarily not have an MTA of their own Such systems require email access Configuring Mail Services using IMAP or POP3 (or Web-based mail, discussed in Chapter 24) This section shows you how to configure IMAP and POP3 servers Worth noting is that you can provide both IMAP and POP3 services, but that clients usually need to use one or the other or chaos will ensue Bear in mind also that IMAP, while more feature-rich than POP3, imposes a significantly higher disk space penalty on the server, especially if users decide to store all of their email on the server POP3 is slimmer than IMAP, but heavy POP3 usage can dramatically bog down a mail server due to the overhead involved in clients polling the server for new mail Setting up an IMAP Server The IMAP implementation configured in this section is the Dovecot IMAP server As an extra bonus, the Dovecot IMAP server also speaks POP3 We’ve selected Dovecot for several reasons First, it supports POP3 and IMAP, simplifying initial setup and ongoing maintenance So, if you configure the IMAP server using the procedures described in this section, you get a POP3 server for free unless you specifically disable POP3 services Second, Dovecot also supports POP3S and IMAPS (Secure POP3 and Secure IMAP, respectively), which wrap the authentication and data exchange processes in SSL-based encryption (using OpenSSL) Finally, Dovecot is also ready to run after the necessary packages have been installed, modulo the steps described in the following paragraphs First, make sure that the dovecot package is installed The following rpmquery command shows you whether this package is installed If not, install the dovecot package before proceeding: # rpmquery dovecot dovecot-0.99.14-4.fc4 The version number you see might be slightly different Configuring Dovecot If the necessary packages are installed, configure the dovecot service to start when the system boots If you don’t intend to provide an IMAP server, you can disable the IMAP services as described shortly Use the following commands to start dovecot at boot time: # chkconfig levels 0123456 dovecot off # chkconfig levels 345 dovecot on 485 486 Chapter 21 Testing Cyrus To test the server, connect to the POP3 server as a mortal user using telnet: $ telnet localhost pop3 Trying 127.0.0.1 Connected to localhost.localdomain (127.0.0.1) Escape character is ‘^]’ +OK dovecot ready quit +OK Logging out Connection closed by foreign host To close the connection, type quit (or QUIT) and press Enter This example used telnet to connect to the POP3 port (port 110) If you see anything other than +OK mumble displayed, check your configuration If you can connect to the POP3 server, you will be able to retrieve messages using the POP3 protocol Next up, connect to the IMAP server, again using telnet: $ telnet localhost imap Trying 127.0.0.1 Connected to localhost.localdomain (127.0.0.1) Escape character is ‘^]’ * OK dovecot ready logout * BYE Logging out OK Logout completed Connection closed by foreign host To close the connection, type logout (or LOGOUT) and press Enter This example used telnet to connect to the IMAP port (port 143) If you see anything other than * OK mumble displayed, check your configuration At this point, your IMAP and POP servers are up and running and ready to service IMAP and POP clients Maintaining Email Security Do you think you have nothing to hide? Maybe you don’t, but email security is always a privacy issue even if you aren’t mailing credit card numbers or corporate secrets Using S/MIME (secure MIME) for security is only one of many steps to take to protect the integrity of your own and your users’ email Configuring Mail Services N OT E This section briefly covers some of the most common vulnerabilities that affect email security For more information about email security, see the Sendmail Web site at http://www.sendmail.org/ and the Postfix Web site at http://www.postfix.org/ Protecting against Eavesdropping Your mail message goes through more computers than just yours and your recipient’s because of store-and-forward techniques All a cracker has to to snoop through your mail is use a packet sniffer program to intercept passing mail messages A packet sniffer is intended to be a tool that a network administrator uses to record and analyze network traffic, but the bad guys use them too Dozens of free packet sniffing programs are available on the Internet Using Encryption Cryptography isn’t just for secret agents Many email products enable your messages to be encrypted (coded in a secret pattern) so that only you and your recipient can read them Lotus Notes provides email encryption, for example One common method it to sign your messages using digital signatures, which makes it possible for people to confirm that a message purporting to come from you did in fact come from you Another typical approach, which can be used with digital signatures, is to encrypt email itself Combining digital signatures with encryption protects both the confidentiality of your email and its authenticity Fedora Core and RHEL ship with GNU Privacy Guard, or GPG, which provides a full suite of digital signature and encryption services Using a Firewall If you receive mail from people outside your network, you should set up a firewall to protect your network The firewall is a computer that prevents unauthorized data from reaching your network For example, if you don’t want anything from ispy.com to penetrate your net, put your net behind a firewall The firewall blocks out all ispy.com messages If you work on one computer dialed in to an ISP, you can still install a firewall Several vendors provide personal firewalls, and some of them are free if you don’t want a lot of bells and whistles 487 488 Chapter 21 Don’t Get Bombed, Spammed, or Spoofed Bombing happens when someone continually sends the same message to an email address either accidentally or maliciously If you reside in the United States and you receive 200 or more copies of the same message from the same person, you can report the bomber to the FBI The U.S Federal Bureau of Investigation has a National Computer Crimes Squad in Washington, DC, telephone +1-202-325-9164 Spamming is a variation of bombing A spammer sends unsolicited email to many users (hundreds, thousands, and even tens of thousands) You easily can be an accidental spammer If you choose your email’s Reply All function, and you send a reply to a worldwide distribution list, you might be perceived by some of the recipients as a spammer Spoofing happens when someone sends you email from a fake address If spoofing doesn’t seem like it could be a major problem for you, consider this: you get email from a system administrator telling you to use a specific password for security reasons Many people comply because the system administrator knows best Imagine the consequences if a spoofer sends an email faking the system administrator’s email address to all the users on a computer All of a sudden, the spoofer knows everyone’s passwords and has access to private and possibly sensitive or secret data Spoofing is possible because plain SMTP does not have authentication capabilities Without authentication features, SMTP can’t be sure that incoming mail is really from the address it says it is If your mail server enables connections to the SMTP port, anyone with a little knowledge of the internal workings of SMTP can connect to that port and send you email from a spoofed address Besides connecting to the SMTP port of a site, a user can send spoofed email by modifying his or her Web browser interfaces T I P You can protect your data and configure your mail system to make mail fraud more difficult If someone invades your mail system, you should report the intrusion to the Computer Emergency Response Team (CERT) You can find the reporting form on the Internet at ftp://info.cert.org/pub /incident_reporting_form Be Careful with SMTP Use dedicated mail servers First of all, keep the number of computers vulnerable to SMTP-based attacks to a minimum Have only one or a few centralized email servers, depending on the size of your organization Allow only SMTP connections that come from outside your firewall to go to those few central email servers This policy protects the other computers on Providing Web Services Figure 24-8 Testing SquirrelMail configuration changes SquirrelMail’s interface provides all the features you would expect in a browser-based email client and should keep your mobile users happy If you need more information about SquirrelMail, visit its project page on the Internet, www.squirrelmail.org Configuring an RSS Feed What’s an RSS feed? RSS is an acronym for Really Simply Syndication, Rich Site Summary, or RDF Site Summary, depending on which version of the RSS specification you follow Regardless of the version you use, RSS defines and implements an XML format for distributing news headlines over the Web, a process known as syndication To express it more simply and generally, RSS makes it possible to distribute a variety of summary information across the Web in a news-headline style format The headline information includes a URL that links to more information That URL, naturally, brings people to your Web site By way of example, Figure 24-9 shows the BBC’s front page RSS news 567 568 Chapter 24 Figure 24-9 Viewing one of the BBC’s RSS feeds The canonical use of RSS is to provide news headlines in a compact format Most major news sites provide this type of summary information Some opensource software projects use RSS to inform subscribers of significant events occurring in the project, such as releases, updates, and meetings Popular blog sites use RSS to notify people of the latest blog entries If you or your users have any sort of activity to publicize or frequently updated information to distribute, one way to so is to provide an RSS feed on your Web site This section shows you how to set up a simple, no-frills RSS feed that follows the 0.91 RSS specification (See the sidebar “Sorting out RSS Versions” for a discussion of the competing RSS versions.) N OT E For more information about RSS and RDF, see the home pages for the original RSS specification, http://purl.org/rss/ and the W3C RDF activity pages at http://www.w3.org/RDF/ If you’re providing an RSS feed, you might be curious how your Web site visitors might use it Many people track news and other RSS feeds using an RSS aggregator An aggregator is an application or browser extension that collects (or aggregates) RSS feeds from a list of sites that you specify and presents all of them in a single interface Most aggregators can understand both plain vanilla RSS feeds and the more feature-rich Atom feeds One of our favorite feed aggregators is the Firefox extension called Sage (see Figure 24-10) Providing Web Services SORTING OUT RSS VERSIONS There are different versions of the RSS specifications, at this point in time, three versions, 0.9x, 1.0, and 2.0 The original version, RSS 0.91, was designed by Netscape and UserLand Software’s Dave Winer The current iteration of the 0.9x specification is 0.94 The 0.9x spec is the simplest to understand and the easiest to use, so it is generally referred to as Really Simple Syndication Dave Winer maintains control of this version of the RSS specification RSS 1.0, referred to as RDF Site Summary, where RDF stands for Resource Description Framework, is a version of RSS promoted by the W3C It is not necessarily an improvement over RSS 0.9x Rather, it is a version of RSS that can be parsed by any reader that understands RDF Accordingly, any RDF-capable reader can handle an RSS 1.0 feed without having to understand anything about RSS itself Unfortunately, proponents of the simpler 0.9x specification and the more standardized 1.0 specification were unable to come to a compromise, which resulted in the 1.0 branch morphing into a version known as Atom Meanwhile, in reaction to the emergence of Atom, proponents of the 0.9x branch started working on RSS 2.0 RSS 2.0 is the successor to RSS 0.9x Like 0.9x, RSS 2.0 development is led by Dave Winer but, partially in response to criticism that he owned the copyright on RSS 0.9x, Winer donated copyright on 2.0 to Harvard University and removed himself as the final judge of RSS 2.0 extensions or usage As the matter stands, then, you can write Atom-compliant RSS feeds or 0.9x/2.0-compliant feeds Choosing which one is likely to come down to a matter of what your users want and whether you prefer the simplicity of the 0.9x/2.0 branch or the alleged “standards compliance” of the Atom branch Figure 24-10 Using the Sage RSS aggregator in Firefox 569 570 Chapter 24 On the right side of the browser screen, Sage shows article summaries You can click these summaries to view the entire article Notice that the left side of the screen contains the Sage sidebar The sidebar is always present (unless you close it), which makes it easy to jump to the news or other RSS feed item that interests you The upper portion of the sidebar lists each individual feed that you track The lower portion of the Sage sidebar lists each individual feed item available from the feed that is currently selected in the upper portion of the sidebar For example, in Figure 24-10, the selected feed is from The Register, which had 15 different feed headlines Clicking a feed headline in the list loads it into the browser window on the right Selecting Content for an RSS Feed What kind of content might be appropriate to include in an RSS feed? Structurally, any sort of list-oriented information, that is, information that can be organized into a list of hyperlinks and that contains information people will likely find useful are potential candidates for inclusion in an RSS feed In terms of content, you might include the following types of information: ■■ News and announcements about products, events, press releases, or whitepapers ■■ If your Web site (rather, the Web site you maintain) frequently updates documents, you might consider providing an RSS feed that lists new or updated documents (or individual pages) ■■ Calendars of events, such as company appearances at trade shows, user group meetings, or listings of training sessions ■■ Listings of available jobs As a final suggestion, RSS feeds can be even more useful on a company intranet than they are on an extranet For example, a human relations department might use an RSS feed to notify people of new or updated personnel forms A payroll team might use an RSS feed to let people know when paychecks can be picked up or to remind employees to fill out important paperwork Creating the Feed File Listing 24-1 shows a minimal RSS feed file You can type this as a model for your own file, type in the listing yourself, or use the included feed.rss file from this chapter’s code directory on the CD-ROM Providing Web Services RHLNSA3 Channel http://localhost/ Updates for RHLNSA3 en-us RHLNSA3 Channel http://localhost/favicon.png http://localhost/rhlnsa3/ RHLNSA3 News: April 5, 2005 http://localhost/rhlnsa3/20050405.html RSS feeds material nearly complete! Listing 24-1 A bare-bones RSS feed This file contains the key tags you need to create an RSS feed, which Table 24-2 describes The line is required by the XML specification, and it must be the first line in the file Table 24-2 Minimum Required Elements in an RSS Feed TAG DESCRIPTION channel Delimits the contents of a single channel description Describes the channel or lists the headline for the syndication item image Describes an icon or image that represents the channel item Delimits a single syndication item language Informs readers of the language in which the feed is written link Contains a link to the channel home page or an individual syndication item rss Defines the content as RSS data and specifies the version (0.91) title Identifies the channel or individual syndication item 571 572 Chapter 24 Required tags are shown in boldface As an XML file, all of the tags in an RSS file must be terminated with matching tags (such as and ), and the tags have to be lower case The version attribute of the tag is required because it enables RSS readers (usually called feed aggregators) to know which version of RSS to support and how to interpret the contents of the RSS file The meat of an RSS feed appears in tags The information in a feed item’s tag serves as the headline, so it should be catchier than the ho-hum shown in Listing 24-1 Each item’s contains the URL of the document containing the full scoop or other content you are trying to publicize using RSS The text in the tag might be shown as text under the headline, as pop-up text that appears if a mouse cursor hovers over the headline link, or it might be totally ignored, depending on the RSS reader in use Turning on an RSS Feed Naturally, Web browsers and feed aggregators need to know that your Web site has an RSS feed and where to find it To this, you need to add some metadata to the headers of your Web pages Use the HTML tag to so The following code snippet shows a template you can use for HTML pages: If your Web pages are XHTML, the tag must use the implicit end tag marker, as shown in the following snippet: Replace rssfile and descriptive text with the name of your RSS feed file and an appropriate title, respectively For the RSS feed file shown in Listing 24-1, and for HTML-based pages, you could use the following tag: After you have added this text, RSS-capable applications will be aware that you provide an RSS feed For example, if you load the page containing this text in an RSS-capable Web browser, such as Firefox, and you’ll see a small icon in the lower-right corner of the window that signals an RSS feed is available (See Figure 24-11.) Providing Web Services Figure 24-11 Viewing Firefox’s icon indicating a Web page has an RSS feed Interested readers can see a slightly modified example of this feed in action at http://www.kurtwerks.com/pubs/rhlnsa3/ Creating a simple RSS feed like the one in this section is a relatively low impact activity It would quickly grow to become a labor-intensive undertaking if you had to it manually Fortunately, there are a variety of tools that automate the creation of RSS feeds, and some content management systems even include tools to create RSS feeds automatically Other tools exist that you can use to validate your RSS feeds for correct format This section ends with a list of RSS creation and validation tools that you might find useful: ■ ■ Online RSS 0.9x Validator (http://aggregator.userland.com /validator/) checks 0.9x feeds ■ ■ Online RSS 1.0 Validator (ldodds.com/rss_validator/1.0 /validator.html) checks 1.0 RSS feeds ■ ■ Orchard RSS (http://orchard.sourceforge.net/) creates feeds using Python, Perl, or C ■ ■ RSS Editor (http://rsseditor.mozdev.org/) is a Firefox extensions for creating/updating RSS feeds ■ ■ RSS.py (mnot.net/python/RSS.py) uses the Python scripting language to generate and parse RSS ■ ■ XML::RSS (http://search.cpan.org/author/EISEN/XMLRSS/) is Perl module for creating and parsing RSS ■ ■ xpath2rss (mnot.net/xpath2rss/) uses XPath expressions to “scrape” Web sites and create RSS feeds If you would like additional tutorial information about RSS, see Reuven Lerner’s tutorial on RSS syndication, “At the Forge — Syndication with RSS,” which appeared in the print version of Linux Journal in September 2004 and is also available on the Web at Linux Journal’s Web site at www.linuxjournal com/article/7670 Another excellent tutorial is Mark Nottingham’s “RSS Tutorial for Content Publishers and Webmasters,” available on the Web at mnot.net/rss/tutorial/ An excellent book on the subject is Hacking RSS and Atom, written by Leslie Orchard (Wiley, ISBN 0-7645-9758-2) 573 574 Chapter 24 Adding Search Functionality If you have more than a few pages of content on your Web site, you will need some sort of search capability to help people find the information for which they’re looking While a site map might suffice for small sites, anything larger than a dozen pages or so needs to be searchable Fedora Core and RHEL ship with the ht://Dig search engine installed and ready to go This section describes how to get it going Getting Started with ht://Dig ht://Dig is a complete document searching and indexing system designed for a single domain or an intranet It is not meant to replace the big global search engines like Google, Yahoo!, or Excite Rather, it is intended for use on single sites and domains and is especially well suited for intranets, primarily because ht://Dig was initially developed for campus use at San Diego State University Although ht://Dig is intended for use on a small scale, the word “small” is relative; it is quite capable of searching sites or domains that comprise multiple servers and thousands of documents ht://Dig can handle sites or domains that consist of multiple servers because it has a built-in Web spider that can traverse a site and index all the documents it encounters ht://Dig handles thousands of documents because it uses a static search index that is very fast Other ht://Dig features include the following: ■■ Character set collation — SGML entities such é and ISOLatin-1 characters can be indexed and searched ■■ Content exclusion — Support for excluding content from indexing using a standard robots.txt file, which defines files and filename patterns to exclude from searches ■■ Depth limiting — Queries can be limited to match only those documents that are given number of links or clicks away from the initial search document ■■ Expiration notification — Maintainers of documents can be notified when a document expires by placing special meta-information inside an HTML document (using the HTML tag) that ht://Dig notices and uses to generate document expiration notices ■■ Fuzzy searching — ht://Dig can perform searches using a number of well-known search algorithms Algorithms can be combined The currently supported search methods include the following: ■■ Accent stripping — Removes diacritical marks from ISO-Latin-1 chare acters so that, for example, e, ¯, ˘, e, e , and ˇ are considered the e e · ¸ same letter (e) for search purposes Providing Web Services ■ ■ Exact match — Returns results containing exact matches for the query term entered ■ ■ Metaphones — Searches for terms that sound like the query term but based on an awareness of the rules of English pronunciation ■ ■ Prefixes — Searches for terms that have a matching prefix, so, for example, searching for the prefix dia matches diameter, diacritical, dialogue, diabolical, and diadem ■ ■ Soundex — Searches for terms that sound like the query term ■ ■ Stem searches — Searches for variants of a search term that use the same root word but different stems ■ ■ Substrings — Searches for terms that begin with a specified substring, so searching for phon* will match phone, phonetic, and phonics but not telephone ■ ■ Synonyms — Searches for words that mean the same thing as the query term, causing ht://Dig to perform return results that include synonyms ■ ■ Keyword optimization — You can add keywords to HTML documents to assist the search engine using the HTML tag ■ ■ Output customization — Search results can be tailored and customized using HTML templates ■ ■ Pattern matching — You can limit a search to specific parts of the search database by creating a query that returns only those documents whose URLs match a given pattern ■ ■ Privacy protection — A protected server can be indexed by instructing ht://Dig to use a given username and password when indexing protected servers or protected areas of public servers As you can see, ht://Dig is a full-featured search engine It is also easy to use and maintain, so try it and see if it will meet your needs To get started, you need to create the initial search database and then create some customized indexes to facilitate fast searching Fortunately, you not have to this manually ht://Dig uses a script named rundig to automate database creation and index maintenance So, as root, execute the following command: # /usr/bin/rundig rundig works by reading the ht://Dig configuration file, /etc/htdig /htdig.conf, and spidering the site specified by the start_url variable In the stock installation, start_url is http://localhost rundig finds and 575 576 Chapter 24 follows each hyperlink specified in HTML files (that is, files with the extensions html, htm, or shtml) that point to documents in the domain it is indexing, creating a database of words in each document and another database of the URLs Additional steps massage these databases into a searchable format and, optionally, create various indexes for so-called fuzzy searches, such as soundex and metaphone searches (searches for words that sound like other words) and prefix matches (searches for words that contain specified prefixes) If you have a lot of content in files that are not straight HTML, such as text files, files created with SSI, and so forth, start_url can also point to a file that contains a list of URLs to check For example, consider the following start_url statement in /etc/htdig/htdig.conf: start_url: http://www.example.com/digme.html This directive tells ht://Dig to start indexing at the URL http://www example.com/digme.html digme.html looks like this: Issue Tracker Source Files 5007 5041 5042 5043 5044 5045 ht://Dig will index the contents of each file linked in digme.html If you have a lot of content in your Web document trees, the database creation and indexing process can take a while Otherwise, when the command prompt returns, you can test out the search engine To so, point your Web browser at http://localhost/htdig/ (replace localhost with the name of your server if you are not accessing it locally) You should see a screen that resembles Figure 24-12 Type in a search term (try documentroot) and press Enter or click the Search button The search results should resemble Figure 24-13 Providing Web Services Figure 24-12 Viewing the ht://Dig search page Figure 24-13 ht://Dig search results for the word “documentroot.” 577 578 Chapter 24 The search results shown in a short format so you can see the number of matches Notice that the search results are case-insensitive After you satisfy yourself that the search engine is working, you will want or need to update the search databases as content is added to the server The easiest way to accomplish this is to execute rundig on a periodic basis using cron How often you update the indexes is up to you, but it should reflect the frequency with which content on the server changes Listing 24-2 shows a sample script you can use, rundig.cron #!/bin/sh # rundig.cron - Update ht://Dig search database /usr/bin/rundig –s Listing 24-2 Cron script to execute rundig The -s option causes rundig to display some runtime and database statistics when it is finished The output might resemble the following: htdig: Run complete htdig: server seen: htdig: localhost:80 documents HTTP statistics =============== Persistent connections HEAD call before GET Connections opened Connections closed Changes of server HTTP Requests HTTP KBytes requested : : : : : : : Yes Yes 5.68164 Make the script executable (chmod 755 rundig.cron) and place in /etc /cron.daily if you want to run it every day; /etc/cron.weekly if you want to run it once a week, or /etc/cron.monthly if you want to run it on a monthly basis After you have a cron job in place to update ht://Dig’s database and search indexes, you are done Check the output of the cron job periodically (it will mailed to the root user) to make sure that the index is being updated properly Beyond that, ht://Dig takes care of itself, which is just the arrangement a busy system administrator likes N OT E For more information about ht://Dig, visit its project Web page, http://www.htdig.org/ Providing Web Services Summary The days in which you could just slap together a simple Web server and cross the “Install Web server” task off your project list are long gone You will likely be asked to add additional features that build on the capabilities provided by your Web server, such as providing mailing list services or creating a browserbased interface for email GNU Mailman makes it child’s play to provide mailing list services, and SquirrelMail is a popular Fedora Core- and RHEL-ready browser-based email solution Creating an RSS feed for your Web site is simple to if you follow the instructions in this chapter Fedora Core and RHEL also come with ready-to-run Web site search engine features; you just need to know what they are, where to find them, and how to enable them 579 CHAPTER 25 Optimizing Internet Services IN THIS CHAPTER ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Optimizing LDAP Services Optimizing DNS Services Optimizing Mail Services Optimizing FTP Services Optimizing Web Services This chapter offers some optimization techniques you can apply to the servers and services described in the previous chapters Alas, we can’t offer sure-fire, foolproof methods for turning your server into, say, a mail-serving speed daemon We’re fresh out of pixie dust and magic potions Indeed, listen with a healthy dose of skepticism to anyone who claims to have the One True Optimization Method Server optimization requires analysis to narrow the problem domain, diagnosis to identify the performance problem, and experimentation to evaluate the effectiveness of your optimization Naturally, though, it helps to know what kinds of tweaks and changes are most appropriate for a given application or a particular situation In the case of LDAP, for example, the directory layout can have a dramatic impact on overall LDAP performance DNS can be optimized by having LAN clients run local caching servers, by configuring multiple slave servers, by directing local lookups to internal servers first, by zone file tweaks, and so on Mail servers are sensitive to I/O binding, and Web servers respond especially well to additional memory In general, anything that improves overall system performance will improve Internet services performance as a side effect Disabling unnecessary services is a standard technique and hopefully one that you have already implemented Centralizing Internet services on a machine that is not used by users is another general performance tweak for servers Also, getting a fatter pipe or simply 581 ... standalone workstation, which is a system that is connected directly to the Internet and that has some sort of domain name resolution running, you want mail to that machine and to localhost (and/ or... unencumbered by required directory structures and system binaries ■ ■ Support for hidden files and directories ■ ■ Self-contained and does not need to use system binaries or libraries, reducing... sftp rather than ftp, and the set of supported commands is more limited than it is standard FTP commands One important difference that between clear-text FTP and secure FTP is that sftp does not