8.1.1 SEO Strengths and Weaknesses Out of the box, WordPress provides great flexibility in terms of organizing and managing your blog’s content. Much of this flexibility comes by way of WordPress’ category and tag architecture. Each and every post created on your blog may be assigned to any number of both categories and tags. Categories are meant to classify content according to broad definitions, while tags are used to classify content more specifically. For example, a post about your favorite movies may be categorized in the “Favorite Movies” category, while being tagged for some of the movies featured in the article: “Star Wars,” “The Matrix,” and “Blade Runner.” Beyond this central organizing principle, WordPress brings with it many strengths and weaknesses in terms of how content is organized and made available to both users and the search engines. Let’s examine some of these SEO factors before digging into the fine art of optimizing your WordPress-powered site for the search engines. 8.1.2 Strong Focus on Content Content, as they say, is king. The Web exists because of it. Users are searching for it. Search engines are built on it. In order to succeed on the Web, your site should be focused on delivering useful content above all else. Awesomely, one of the main goals of WordPress is to make publishing content as easy as possible. 287 Search Engine Optimization 8 288 Once WordPress is set up, getting your content online happens as fast as you can create it. On the front end, there are hundreds of top-quality themes available, each focused on organizing and presenting your content with both users and search engines in mind. 8.1.3 Built-In “nofollow” Comment Links Perhaps not as useful as originally conceived, nofollow attributes placed on commentator links have long been thought of as an effective method of improving the SEO-quality of WordPress-powered sites. For those of you who may be unfamiliar with the whole “nofollow” thing, for now suffice it to say that nofollow attributes are placed on links to prevent search engines from following those links to their targets. Originally, this was intended to serve as a way to conserve valuable page rank, but after it was revealed that this method no longer works, nofollow commentator links may be a moot point. We’ll discuss this more in-depth later on in the chapter. 8.1.4 Duplicate Content Issues While the organizational strengths of WordPress are great for managing content, it also comes with a price: duplicate content. Duplicate content is essentially identical content appearing in more than one place on the Web. Search engines such as Google are reported to penalize pages or sites associated with too much duplicate content. Returning to our movie example for a moment, our WordPress-powered site may suffer in the search rankings because identical copies of our movie article are appearing at each of the following URLs: • original article -> http://example.com/blog/my-favorite-movies/ • category view -> http://example.com/blog/category/favorite-movies/ • star-wars tag view -> http://example.com/blog/tag/star-wars/ • the-matrix tag view -> http://example.com/blog/tag/the-matrix/ • blade-runner tag view -> http://example.com/blog/tag/blade-runner/ WordPress + nofollow Check out Chapter 7.7.3 for more information on nofollow, WordPress, and the search engines. 289 Yikes! And if that weren’t bad enough, we also see the exact same post content appearing at these URLs: • daily archive view -> http://example.com/blog/2009/02/02/ • monthly archive view -> http://example.com/blog/2009/02/ • yearly archive view -> http://example.com/blog/2009/ • author archive view -> http://example.com/blog/author/your-name/ Depending on your particular WordPress theme, this situation could be even worse. By default, all of your posts are available in identical form at each of the previous types of URLs. Definitely not good from a search-engine point of view. Especially if you are the type of blogger to make heavy use of tags, the number of duplicated posts could be staggering. 8.2.1 Controlling Duplicate Content Fortunately, WordPress’ poor handling of duplicate content is easily fixed. In fact, there are several ways of doing so. In a nutshell, we have plenty of tools and techniques at our disposal for winning the war on duplicate content: • meta nofollow tags • meta noindex tags • nofollow attributes • robots directives • canonical meta tags • use excerpts for posts So what do each of these sculpting tools accomplish and how do they help us eliminate duplicate content? Let’s take a look at each of them. 290 8.2.2 Meta noindex and nofollow Tags Meta nofollow tags are actually inline link elements located in the <head> section of your WordPress pages. For example, in your blog’s “header.php” file, you may find something like this: <meta name="googlebot" content="index,archive,follow" /> <meta name="msnbot" content="all,index,follow" /> <meta name="robots" content="all,index,follow" /> This code tells search engines – specifically, Google, MSN/Bing, and any other compliant search engine – that the entire page should be indexed, followed, and archived. This is great for single post pages (i.e., the actual “My Favorite Movies” article posted in our example), but we can use different parameters within these elements to tell the search engines not to index, follow, or archive our web pages. Ideally, most bloggers want their main article to appear in the search results. The duplicate content appearing on the other types of pages may be controlled with this set of meta tags: <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> Here, we are telling the search engines not to include the page in the search engine results, while at the same time, we are telling them that it is okay to crawl the page and follow the links included on the page. This prevents the page from appearing as duplicate content while allowing link equity to be distributed throughout the linked pages. Incidentally, we may also tell the search engines to neither index nor follow anything on the page by changing our code to this: <meta name="googlebot" content="noindex,noarchive,nofollow" /> Love Juice Anyone dipping into the murky waters of SEO will inevitably discover that there are many ways to refer to the SEO value of web pages. PR, Page rank, link equity, link juice, page juice, link love, love juice, rank juice, and just about any other combination of these terms is known to refer to the same thing: the success of a web page in the search engines. 291 <meta name="robots" content="noindex,nofollow" /> <meta name="msnbot" content="noindex,nofollow" /> So, given these meta tags, what is the best way to use this method to control duplicate content on your WordPress-powered site? We’re glad you asked. By using a strategic set of conditional tags in the “header.php” file for your theme, it is possible to address search-engine behavior for virtually all types of pages, thereby enabling you to fine-tune the indexing and crawling of your site’s content. To see how this is done, consider the following code: <?php if(is_home() && (!$paged || $paged == 1) || is_single()) { ?> <meta name="googlebot" content="index,archive,follow,noodp" /> <meta name="robots" content="all,index,follow" /> <meta name="msnbot" content="all,index,follow" /> <?php } else { ?> <meta name="googlebot" content="noindex,noarchive,follow,noodp" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Tell Search Engines not to Index a Specific Post In this section we see how to disable search-engine indexing for different categories, archives, pages, and other page views, but what if we want to prevent indexing of only one specific post? There are several SEO plugins that enable this functionality, but you don't really need one to do it. All you need to do is get the ID of the post for which you would like to disable indexing. Then, open your theme’s header.php file and place this snippet within the <head> section: <?php if ($post->ID == 77) { echo '<meta name="robots" content="noindex,noarchive">'; } Change the ID from “77” to the ID of your post and done! With this in place, compliant search engines such as Google and MSN/ Bing will neither index nor archive the specified post (ID #77 in this example). 292 The conditional PHP tags used in this example effectively are saying: “If the current page is the home page or the single post page, then allow the search engines to both index and follow the content of the page; otherwise, since the page is neither the home page nor the single post page, it is probably a tag page, category page, or other archive page and thus serves as duplicate content; therefore tell the search engines to follow all links but not index the content.” Of course, it isn’t always a bad thing to have some peripheral pages indexed in the search engines. In addition to having their home page and single pages indexed (as in our example above), many people prefer to have either tag pages or category pages (or both!) indexed in the search engines as well. In reality, the types of pages that you want indexed are completely up to you and your personal SEO strategy. Prevent Duplicate Content Caused by Paginated Comments Since WordPress 2.7, comments may be paginated, such that “x” number of comments appear on each page. While this is a step in the right direction, there may be a duplicate content issue resulting from the fact that your post content will appear on every page of your paginated comments. To resolve this issue, place the following code into your functions.php file: // prevent duplicate content for comments function noDuplicateContentforComments() { global $cpage, $post; if($cpage > 1) { echo "\n".'<link rel="canonical" href="'.get_permalink($post->ID).'" />'."\n"; } } add_action('wp_head', 'noDuplicateContentforComments'); This code will generate canonical <head> links for your all of your paginated comments. The search engines will then use this information to ensure that the original post permalink is attributed as the actual article. Further Information For more information and techniques on paged comments and duplicate content, check out Chapter 7.3.3. 293 So, if you would like to include both tag and category pages in the search results, you would simply modify the first line of our previous example like so: <?php if(is_home() && (!$paged || $paged == 1) || is_ category() || is_tag() || is_single()) { ?> 8.2.3 Nofollow Attributes Another useful tool in the fight against duplicate WordPress content is the controversial “nofollow” attribute. The nofollow attribute is placed into hyperlinks like this: <a href="http://domain.tld/path/target/" rel="nofollow">This is a "nofollow" hyperlink</a> Links containing the nofollow attribute will not be “followed” by Google, but may still be indexed in the search results if linked to from another source. Because such links are not followed, use of the nofollow attribute is an effective tool in the reduction and prevention of duplicate content. For an example of how nofollow can be used to help eliminate duplicate content, let’s look at a typical author archive. In the author-archive page view, you will find exact replicas of your original posts (unless you are using excerpts). This duplicate content is highly avoidable by simply “nofollowing” any links in your theme that point to the author-archive page view. Here is how the nofollow link would appear in your theme files: <a href="http://domain.tld/author/author-name/" rel="nofollow">This link will not be followed to the author archive</a> Exclude Admin Pages from Search Engines You may also replace or add other types of pages to your meta-tag strategy by using any of the following template tags: • is_home() • is_page() • is_admin() • is_author() • is_date() • is_search() • is_404() • is_paged() • is_category() • is_tag() • is_date() To target any date-based archive page (i.e. a monthly, yearly, daily or time-based archive) that is being displayed, use this: • is_archive() Remember, there are more than just date-based Archives. Other types of Archive pages include sequential displays of category, tag, and author pages. is_archive() will target all of these page types. And of course there are many more types of these conditional tags available to WordPress. See the WordPress Codex for more information: http://digwp.com/u/3 294 Ever wanted to keep a few specific pages out of the search engines? Here's how to do it using WordPress’ excellent conditional tag functionality. Just place your choice of these snippets into the <head> section of your header.php file and all compliant search engines (e.g., Google, MSN/Bing, Yahoo!, et al) will avoid the specified page(s) like the plague. This menu of snippets provides many specific-case scenarios that may be easily modified to suit your needs. Exclude a specific post <?php if(is_single('17')) { // your post ID number ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude a specific page <?php if(is_page('17')) { // your page ID number ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude a specific category <?php if(is_category('17')) { // your category ID number ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude a specific tag <?php if(is_tag('personal')) { // your tag name ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude multiple tags <?php if(is_tag(array('personal','family','photos'))) { ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude posts tagged with certain tag(s) <?php if(has_tag(array('personal','family','photos'))) { ?> <meta name="googlebot" content="noindex,noarchive,follow" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?> Exclude Specific Pages from Search Engines 295 Granted, using nofollow links to control duplicate content is not 100% foolproof. If the author-archive URL is linked to from any “followed” links, that page still may be indexed in the search engines. For pages such as the author archives that probably aren’t linked to from anywhere else, nofollow may help prevent a potential duplicate-content issue. 8.2.4 Robots.txt Directives Another useful and often overlooked method of controlling duplicate content involves the implementation of a robots.txt file for your site. Robots.txt files are plain text files generally placed within the root directory of your domain. http://domain.tld/robots.txt Robots.txt files contain individual lines of well-established “robots” directives that serve to control the crawling and indexing of various directories and pages. Search engines such as Google and MSN that “obey” robots directives periodically read the robots.txt file before crawling your site. During the subsequent crawl of your site, any URLs forbidden in the robots.txt file will not be crawled or indexed. Keep in mind, however, that pages prohibited via robots directives continue to consume page rank. So, duplicate content pages removed via robots.txt may still be devaluing your key pages by accepting any link equity that is passed via incoming links. Even so, with other measures in place, taking advantage of robots.txt directives is an excellent way to provide another layer of protection against duplicate content and unwanted indexing by the search engines. Let’s look at an example of how to make a useful robots.txt file. First, review the default directory structure of a WordPress installation in the screenshot (next page). For a typical WordPress installation located in the root directory, there is no reason for search engines to index URLs containing any of the core WordPress files. So we begin our robots.txt file by writing: Yahoo! Disobeys Sadly, when it comes to search engines that comply with robots.txt directives, Yahoo! falls far short: http://digwp.com/u/218 Not Foolproof Pages blocked by robots.txt directives may still appear within the index if linked to by “trusted, third-party sources.” http://digwp.com/u/219 296 Disallow: /wp-* Disallow: *.php These two lines tell compliant search engines to ignore any URL beginning with “http://domain.tld/wp-” or ending with “.php”. Thus, all of our core WordPress files are restricted and will not be crawled by compliant search engines. Now, consider some of the types of WordPress-generated URLs that we don’t want the search engines to follow or index: http://domain.tld/feed/ - your site's main feed http://domain.tld/comments/feed/ - your site's comments feed http://domain.tld/other/feeds/ - every other type of feed http://domain.tld/post/trackback/ - every trackback URL on your site http://domain.tld/2008/08/08/ - archive views for every day http://domain.tld/2008/08/ - archive views for every month http://domain.tld/2008/ - archive views for every year Of course, there are other types of pages which we may also wish to exclude from the search engines, such as category and tag archives, but you get the idea. To prohibit robots-compliant search engines from accessing and indexing the miscellaneous pages listed above, we add these directives to our robots.txt file: Disallow: */feed* Disallow: */trackback* Disallow: /20* . http://example.com/blog/my-favorite-movies/ • category view -& gt; http://example.com/blog/category/favorite-movies/ • star-wars tag view -& gt; http://example.com/blog/tag/star-wars/ • the-matrix tag view -& gt; http://example.com/blog/tag/the-matrix/ •. http://example.com/blog/tag/the-matrix/ • blade-runner tag view -& gt; http://example.com/blog/tag/blade-runner/ WordPress + nofollow Check out Chapter 7.7.3 for more information on nofollow, WordPress, and. view -& gt; http://example.com/blog/2009/02/02/ • monthly archive view -& gt; http://example.com/blog/2009/02/ • yearly archive view -& gt; http://example.com/blog/2009/ • author archive view -& gt;