1. Trang chủ
  2. » Công Nghệ Thông Tin

Beginning PHP and MySQL E-Commerce From Novice to Professional phần 4 pot

74 365 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 74
Dung lượng 2,58 MB

Nội dung

# Specify the folder in which the application resides. # Use / if the application is in the root. RewriteBase /tshirtshop # Rewrite to correct domain to avoid canonicalization problems # RewriteCond %{HTTP_HOST} !^www\.example\.com # RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L] # Rewrite URLs ending in /index.php or /index.html to / RewriteCond %{THE_REQUEST} ^GET\ .*/index\.(php|html?)\ HTTP RewriteRule ^(.*)index\.(php|html?)$ $1 [R=301,L] # Rewrite category pages RewriteRule ^.*-d([0-9]+)/.*-c([0-9]+)/page-([0-9]+)/?$ index.php?Depart mentId=$1&CategoryId=$2&Page=$3 [L] RewriteRule ^.*-d([0-9]+)/.*-c([0-9]+)/?$ index.php?DepartmentId=$1&Cate goryId=$2 [L] # Rewrite department pages RewriteRule ^.*-d([0-9]+)/page-([0-9]+)/?$ index.php?DepartmentId=$1&Pag e=$2 [L] RewriteRule ^.*-d([0-9]+)/?$ index.php?DepartmentId=$1 [L] # Rewrite subpages of the home page RewriteRule ^page-([0-9]+)/?$ index.php?Page=$1 [L] # Rewrite product details pages RewriteRule ^.*-p([0-9]+)/?$ index.php?ProductId=$1 [L] </IfModule> ■Tip If you don’t have a friendly code editor, creating a file that doesn’t have a name but just an extension, such as .htaccess, can prove to be problematic in Windows. The easiest way to create this file is to open Notepad, type the contents, go to Save As, and type ".htaccess" for the file name, including the quotes. The quotes prevent the editor from automatically appending the default file extension, such as .txt for Notepad. 3. At this moment, your web site should correctly support keyword-rich URLs, in the form described prior to starting this exercise. For example, try loading http://localhost/tshirtshop/nature-d2/. The result should resemble the page shown in Figure 7-2. CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 193 8644ch07.qxd 1/30/08 12:16 PM Page 193 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Figure 7-2. Testing keyword-rich URLs How It Works: Supporting Keyword-Rich URLs At this moment, you can test all kinds of keyword-rich URLs that are currently known by your web site: department pages and subpages, category pages and subpages, the front page and its subpages, and product details links. Note, however, that the links currently generated by your web site are still old, dynamic URLs. Updating the links in your site will be the subject of the next exercise. The core of the functionality you’ve just implemented lies in the .htaccess file. We’ve used this Apache folder- based configuration file to store the rewriting rules for mod_rewrite. The httpd.conf Apache configuration file can also be used, but we’ve chosen .htaccess because many web hosting scenarios will not allow you to modify the httpd.conf file. Also, modifying .htaccess doesn’t require you to restart the web server for the new set- tings to take effect, because the file is parsed on every request, which makes it ideal for development purposes. The first command in .htaccess is the one that enables the rewriting engine. If you didn’t configure mod_rewrite correctly, this line will cause an error: RewriteEngine On Next, we used the RewriteBase command to specify the name of the tshirtshop folder. Note that if you keep your application in the root folder, you should replace /tshirtshop with /. RewriteBase /tshirtshop Then, the real fun begins. A number of RewriteRule commands follow, which basically describe what URLs should be rewritten and to what they should be rewritten. Sometimes, the RewriteRule commands are accompanied by CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION194 8644ch07.qxd 1/30/08 12:16 PM Page 194 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com RewriteCond, which specifies a condition that must be met in order for the following RewriteRule command to be executed. A RewriteRule command contains at least two parameters. The first string that follows RewriteRule is a regular expression that describes the structure of the matching incoming URLs. The second describes what the URL should be rewritten to. mod_rewrite and Regular Expressions Regular expressions are one of those topics that programmers tend to either love or hate. A regular expression, commonly referred to as regex, is a text string that uses a special format to describe a text pattern. Regular expressions are used to define rules that match or transform groups of strings, and they represent one of the most powerful text manipulation tools avail- able today. Find a few details about them at the Wikipedia page at http://en.wikipedia.org/ wiki/Regular_expression. Regular expressions are particularly useful in circumstances when you need to manipu- late strings that don’t have a well-defined format (as XML documents have, for example) and cannot be parsed or modified using more specialized techniques. For example, regular expres- sions can be used to extract or validate e-mail addresses, find valid dates in strings, remove duplicate lines of text, find the number of times a word or a letter appears in a phrase, find or validate IP addresses, and so on. In the previous exercise, you used mod_rewrite rules, using regular expressions, to match incoming keyword-rich URLs and obtain their rewritten, dynamic versions. A bit later in this chapter, we’ll use a regular expression that prepares a string for inclusion in the URL, by replac- ing unsupported characters with dashes and eliminating duplicate separation characters. Regular expressions are supported by many languages and tools, including the PHP lan- guage and the mod_rewrite Apache module, and the implementations are similar. A regular expression that works in PHP will work in Java or C# without modifications most of the time. When you want to do an operation based on regular expressions, you usually must provide at least three key elements: • The source string that needs to be parsed or manipulated • The regular expression to be applied on the source string • The kind of operation to be performed, which can be either obtaining the matching substrings or replacing them with something else Regular expressions use a special syntax based on regular characters, which are interpreted literally, and metacharacters, which have special matching properties. A regular character in a regular expression matches the same character in the source string, and a sequence of such characters matches the same sequence in the source string. This is similar to searching for sub- strings in a string. For example, if you match “or” in “favorite color”, you’ll find two matches for it. A regular expression can contain metacharacters, which have special properties, and it’s their power and flexibility that makes regular expressions so useful. For example, the question mark (?) metacharacter specifies that the preceding character is optional. So if you want to match “color” and “colour”, your regular expression would be colou?r. CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 195 8644ch07.qxd 1/30/08 12:16 PM Page 195 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com As pointed out earlier, regular expressions can become extremely complex when you get into their more subtle details. In this section, you’ll find explanations for the regular expres- sions we’re using, and we suggest that you continue your regex training using a specialized book or tutorial. Table 7-2 contains the description of the most common regular expression metacharacters. You can use this table as a reference for understanding the rewrite rules. Table 7-2. Metacharacters Commonly Used in Regular Expressions Metacharacter Description ^ Matches the beginning of the line. In our case, it will always match the beginning of the URL. The domain name isn’t considered part of the URL, as far as RewriteRule is concerned. It is useful to think of ^ as anchoring the characters that follow to the beginning of the string, that is, asserting that they are the first part. . Matches any single character. * Specifies that the preceding character or expression can be repeated zero or more times, that is, not at all to infinity. + Specifies that the preceding character or expression can be repeated one or more times. In other words, the preceding character or expression must match at least once. ? Specifies that the preceding character or expression can be repeated zero or one time. In other words, the preceding character or expression is optional. {m,n} Specifies that the preceding character or expression can be repeated between m and n times; m and n are integers, and m needs to be lower than n. ( ) The parentheses are used to define a captured expression. The string matching the expression between parentheses can then be read as a variable. The paren- theses can also be used to group the contents therein, as in mathematics, and operators such as *, +, or ? can then be applied to the resulting expression. [ ] Used to define a character class. For example, [abc] will match any of the characters a, b, or c. The hyphen character (-) can be used to define a range of characters. For example, [a-z] matches any lowercase letter. If the hyphen is meant to be interpreted literally, it should be the last character before the clos- ing bracket, ]. Many metacharacters lose their special function when enclosed between brackets and are interpreted literally. [^ ] Similar to [ ], except it matches everything except the mentioned character class. For example, [^a-c] matches all characters except a, b, and c. $ Matches the end of the line. In our case, it will always match the end of the URL. It is useful to think of it as anchoring the previous characters to the end of the string, that is, asserting that they are the last part. \ The backslash is used to escape the character that follows. It is used to escape metacharacters when you need them to be taken for their literal value, rather than their special meaning. For example, \ . will match a dot, rather than any character (the typical meaning of the dot in a regular expression). The back- slash can also escape itself—so if you want to match C:\Windows, you’ll need to refer to it as C:\\Windows. To understand how these metacharacters work in practice, let’s analyze one of the rewrite rules in TShirtShop: the one that rewrites category page URLs. For rewriting category pages, we CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION196 8644ch07.qxd 1/30/08 12:16 PM Page 196 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com have two rules—one that handles paged categories and one that handles nonpaged categories. The following rule rewrites categories with pages, and the regular expression is highlighted: # Redirect category pages RewriteRule ^.*-d([0-9]+)/.*-c([0-9]+)/page-([0-9]+)/?$ index.php?DepartmentId=$1&CategoryId=$2&Page=$3 [L] This regular expression is intended to match URLs such as http://localhost/tshirtshop/ regional-d1/french-c1/page-2 and extract the ID of the department, the ID of the category, and the page number from these URLs. In plain English, the rule searches for strings that start with some characters followed by -d and a number (which is the department ID), followed by a forward slash, some other characters, -c and another number (which is the category ID), fol- lowed by /page- and a number, which is the page number. Using Table 7-2 as a reference, let’s analyze the regular expression technically. The expres- sion starts with the ^ character, matching the beginning of the requested URL (the URL doesn’t include the domain name). The characters .* match any string of zero or more characters, because the dot means any character, and the asterisk means that the preceding character or expression (which is the dot) can be repeated zero or more times. The next characters, -d([0-9]+), extract the ID of the department. The [0-9] bit matches any character between 0 and 9 (that is, any digit), and the + that follows indicates that the pat- tern can repeat one or more times, so you can have a multidigit number rather than just a single digit. The enclosing parentheses around [0-9]+ indicate that the regular expression engine should store the matching string (which will be the department ID) inside a variable called $1. You’ll need this variable to compose the rewritten URL. The same principle is used to save the category ID and the page number into the $2 and $3 variables. Finally, you have /?, which specifies that the URL can end with a slash, but the slash is optional. The regular expression ends with $, which matches the end of the string. ■Note When you need to use symbols that have metacharacter significance as their literal values, you need to escape them with a backslash. For example, if you want to match index.php, the regular expression should read index\.php. The \ is the escaping character, which indicates that the dot should be taken as a literal dot, not as any character (which is the significance of the dot metacharacter). The second argument of RewriteRule, index.php?DepartmentId=$1&CategoryId=$2&Page=$3, plugs in the variables that you extracted using the regular expression into the rewritten URL. The $1, $2, and $3 variables are replaced by the values supplied by the regular expression, and the URL is loaded by our application. A rewrite rule can also contain a third argument, which is formed of special flags that affect how the rewrite is handled. These arguments are specific to the RewriteRule command and aren’t related to regular expressions. Table 7-3 lists the possible RewriteRule arguments. These rewrite flags must always be placed in square brackets at the end of an individual rule. CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 197 8644ch07.qxd 1/30/08 12:16 PM Page 197 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Table 7-3. RewriteRule Options RewriteRule Option Significance Description R Redirect Sends an HTTP redirect. F Forbidden Forbids access to the URL. G Gone Marks the URL as gone. P Proxy Passes the URL to mod_proxy. L Last Stops processing further rules. N Next Starts processing again from the first rule, but using the current rewritten URL. C Chain Links the current rule with the following one. T Type Forces the mentioned MIME type. NS Nosubreq Applies only if no internal subrequest is performed. NC Nocase URL matching is case insensitive. QSA Qsappend Appends a query string part to the new URL instead of replacing it. PT Passthrough Passes the rewritten URL to another Apache module for further processing. S Skip Skips the next rule. E Env Sets an environment variable. RewriteRule commands are processed in sequential order as they are written in the con- figuration file. If you want to make sure that a rule is the last one processed in case a match is found for it, you need to use the [L] flag. This flag is particularly useful if you have a long list of RewriteRule commands, because using [L] improves performance and prevents mod_rewrite from processing all the RewriteRule commands that follow once a match is found. This is usually what you want regardless. Our final note on the .htaccess rules regards the following code: # Redirect to correct domain to avoid canonicalization problems #RewriteCond %{HTTP_HOST} !^www\.example\.com #RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L] As you can see, the RewriteCond and RewriteRule commands are commented out using the # character. We commented these lines, because you should change www.example.com to the location of your web site before uncommenting them (while working on localhost, leave these rules commented out). RewriteCond is a mod_rewrite command that places a condition for the rule that follows. In this case, you’re interested in verifying that the site has been accessed through www.example.com. If it hasn’t, you do a 301 redirect to www.example.com. This technique implements domain name canonicalization. If your site can be accessed through multiple domain names (such as www.example.com and example.com), establish one of them as the main domain and redirect all the others to it, avoiding duplicate content penalties from the search engines. You’ll learn more about 301 redirects a bit later in this chapter. CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION198 8644ch07.qxd 1/30/08 12:16 PM Page 198 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Building Keyword-Rich URLs In the previous exercise, you achieved a great thing: you’ve started supporting keyword-rich URLs in TShirtShop! However, note that • Your site supports dynamic URLs as well. • All links in your web site use the dynamic versions of the URLs. With these two drawbacks, the mere fact that we do support keyword-rich URLs doesn’t bring any significant benefits. This leads us to a second exercise related to our URLs. This time, we’ll change the dynamic links in our site to keyword-rich URLs. In the earlier chapters, we’ve been wise enough to use a centralized class named Link that generates all of the site’s links. This means that, now, updating all the links in our site is just a matter of updating that Link class. We’ll also need to build some data tier and business tier infrastructure to support the new functionality, which consists of methods that return the name of a department, category, or product if we supply the ID. Exercise: Generating Keyword-Rich URLs 1. Use phpMyAdmin to connect to your tshirtshop database, and execute the following code, which creates three stored procedures. These are simple procedures that return the name of a department, a category, or a product given its ID. Don’t forget to set $$ as the delimiter before executing the code. Create catalog_get_department_name stored procedure CREATE PROCEDURE catalog_get_department_name(IN inDepartmentId INT) BEGIN SELECT name FROM department WHERE department_id = inDepartmentId; END$$ Create catalog_get_category_name stored procedure CREATE PROCEDURE catalog_get_category_name(IN inCategoryId INT) BEGIN SELECT name FROM category WHERE category_id = inCategoryId; END$$ Create catalog_get_product_name stored procedure CREATE PROCEDURE catalog_get_product_name(IN inProductId INT) BEGIN SELECT name FROM product WHERE product_id = inProductId; END$$ 2. We’ll now add the business tier code that accesses the stored procedures created earlier. Add the following code to the Catalog class in business/catalog.php: // Retrieves department name public static function GetDepartmentName($departmentId) { // Build SQL query $sql = 'CALL catalog_get_department_name(:department_id)'; CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 199 8644ch07.qxd 1/30/08 12:16 PM Page 199 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com // Build the parameters array $params = array (':department_id' => $departmentId); // Execute the query and return the results return DatabaseHandler::GetOne($sql, $params); } // Retrieves category name public static function GetCategoryName($categoryId) { // Build SQL query $sql = 'CALL catalog_get_category_name(:category_id)'; // Build the parameters array $params = array (':category_id' => $categoryId); // Execute the query and return the results return DatabaseHandler::GetOne($sql, $params); } // Retrieves product name public static function GetProductName($productId) { // Build SQL query $sql = 'CALL catalog_get_product_name(:product_id)'; // Build the parameters array $params = array (':product_id' => $productId); // Execute the query and return the results return DatabaseHandler::GetOne($sql, $params); } 3. Open presentation/link.php, and modify its code like this: public static function ToDepartment($departmentId, $page = 1) { $link = self::CleanUrlText(Catalog::GetDepartmentName($departmentId)) . '-d' . $departmentId . '/'; if ($page > 1) $link .= 'page-' . $page . '/'; return self::Build($link); } public static function ToCategory($departmentId, $categoryId, $page = 1) { CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION200 8644ch07.qxd 1/30/08 12:16 PM Page 200 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com $link = self::CleanUrlText(Catalog::GetDepartmentName($departmentId)) . '-d' . $departmentId . '/' . self::CleanUrlText(Catalog::GetCategoryName($categoryId)) . '-c' . $categoryId . '/'; if ($page > 1) $link .= 'page-' . $page . '/'; return self::Build($link); } public static function ToProduct($productId) { $link = self::CleanUrlText(Catalog::GetProductName($productId)) . '-p' . $productId . '/'; return self::Build($link); } public static function ToIndex($page = 1) { $link = ''; if ($page > 1) $link .= 'page-' . $page . '/'; return self::Build($link); } 4. Continue working on the Link class by adding the following method, CleanUrlText(), which is called by the methods you’ve updated earlier to remove bad characters from the links: // Prepares a string to be included in an URL public static function CleanUrlText($string) { // Remove all characters that aren't a-z, 0-9, dash, underscore or space $not_acceptable_characters_regex = '#[^-a-zA-Z0-9_ ]#'; $string = preg_replace($not_acceptable_characters_regex, '', $string); // Remove all leading and trailing spaces $string = trim($string); // Change all dashes, underscores and spaces to dashes $string = preg_replace('#[-_ ]+#', '-', $string); // Return the modified string return strtolower($string); } CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 201 8644ch07.qxd 1/30/08 12:16 PM Page 201 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 5. Load TShirtShop, and notice the new links. In Figure 7-3, the link to the Visit the Zoo product, http://localhost/tshirtshop/visit-the-zoo-p36/, is visible in Internet Explorer’s status bar. Figure 7-3. Testing dynamically generated keyword-rich URLs How It Works: Generating Keyword-Rich URLs In this exercise, you modified the ToIndex(), ToDepartment(), ToCategory(), and ToProduct() methods of the Link class to build keyword-rich URLs instead of dynamic URLs. To support this functionality you created infrastructure code (business tier methods and database stored procedures) that retrieves the names of departments, products, and categories from the database. You also implemented a method named CleanUrlText(), which uses regular expressions to replace the charac- ters that we don’t want to include in URLs with dashes. This method transforms a string such as “Visit the Zoo” to a URL-friendly string such as “visit-the-zoo.” Make sure all the links in your site are now search engine-friendly, and let’s move on to the next task for this chapter. CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION202 8644ch07.qxd 1/30/08 12:16 PM Page 202 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... named 40 4 .php, and type in the following code: < ?php // Set the 40 4 status code header('HTTP/1.0 40 4 Not Found'); require_once 'include/config .php' ; require_once PRESENTATION_DIR 'link .php' ; ?> TShirtShop Page Not Found (40 4): Demo Product Catalog from Beginning. .. The TShirtShop team. 4 Modify htaccess by adding this highlighted code: # Set the default 500 page for Apache errors ErrorDocument 500 /tshirtshop/500 .php # Set the default 40 4 page ErrorDocument 40 4 /tshirtshop /40 4 .php ■ Caution Be sure to check these are the correct locations of your 40 4 .php and 500 .php files 5 Load http://localhost/tshirtshop/seasonal-d3/page-5/... TShirtShop should throw the 40 4 page as shown in Figure 7-8 Figure 7-8 Testing the 40 4 page in TShirtShop 219 8 644 ch07.qxd 1/30/08 12:16 PM Page 220 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 220 CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION How It Works: 40 4 and 500 In this exercise, and in the previous one, you’ve learned how to work with the 40 4 and 500 status codes using the... Link::ToIndex($i); } /* 40 4 redirect if the page number is larger than the total number of pages */ if ($this->mPage > $this->mrTotalPages) { // Clean output buffer ob_clean(); // Load the 40 4 page include '40 4 .php' ; 217 8 644 ch07.qxd 1/30/08 12:16 PM Page 218 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 218 CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION // Clear the output buffer and stop execution... Correctly Signaling 40 4 and 500 Errors It is important to use the correct HTTP status code when something special happens to the visitor’s request You’ve already seen that, when performing redirects, knowledge of HTTP status codes can make an important difference to your search engine optimization efforts This time we will talk about 40 4 and 500 The 40 4 status code is used to tell the visitor that he or... Browsers and web servers have templates that users get when you make such a request—you know, you’ve seen them Hosting services let you specify a custom page to be displayed when such a 40 4 error occurs This is obviously beneficial for your site, as you can provide some custom feedback to your visitor depending on what he or she was searching for Sometimes, however, the 40 4 status code isn’t automatically... ob_end_clean(); exit(); } } 205 8 644 ch07.qxd 1/30/08 12:16 PM Page 206 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 206 CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION 4 Open index .php, and call this method like this: // Load the database handler require_once BUSINESS_DIR 'database_handler .php' ; // Load Business Tier require_once BUSINESS_DIR 'catalog .php' ; // URL correction Link::CheckRequest();... TShirtShop Application Error (500): Demo Product Catalog from Beginning PHP and MySQL E-Commerce 8 644 ch07.qxd 1/30/08 12:16 PM Page 215 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com CHAPTER 7 ■... file: # Set the default 500 page for Apache errors ErrorDocument 500 /tshirtshop/500 .php ■ Caution Be sure to modify the URL to the location of your 500 .php file 4 Let’s test our new 500 .php file by creating an error in our web site Open include\config .php, and set the DEBUGGING const to false to disable the debug mode (otherwise, our site won’t throw 500 errors): // These should be true... on to the next exercise, be sure to set the DEBUGGING constant back to true, so that TShirtShop will show debugging data when an error happens, instead of throwing the 500 page Also, remove the reference to inexistent_file .php 8 644 ch07.qxd 1/30/08 12:16 PM Page 217 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com CHAPTER 7 ■ SEARCH ENGINE OPTIMIZATION Exercise: Using the 40 4 . way to create this file is to open Notepad, type the contents, go to Save As, and type ".htaccess" for the file name, including the quotes. The quotes prevent the editor from automatically. this exercise, you modified the ToIndex(), ToDepartment(), ToCategory(), and ToProduct() methods of the Link class to build keyword-rich URLs instead of dynamic URLs. To support this functionality. OPTIMIZATION196 8 644 ch07.qxd 1/30/08 12:16 PM Page 196 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com have two rules—one that handles paged categories and one that handles nonpaged

Ngày đăng: 12/08/2014, 10:21

TỪ KHÓA LIÊN QUAN