107 Matching and Replacing Substrings with String Functions Finding Strings in Strings: strstr(), strchr(), strrchr(), stristr() To find a string within another string you can use any of the functions strstr(), strchr(), strrchr(), or stristr(). The function strstr() is the most generic, and can be used to find a string or char- acter match within a longer string. Note that in PHP, the strchr() function is exactly the same as strstr(), although its name implies that it is used to find a character in a string, similar to the C version of this function. In PHP, either of these functions can be used to find a string inside a string, including finding a string containing only a single character. The prototype for strstr() is as follows: string strstr(string haystack, string needle); You pass the function a haystack to be searched and a needle to be found. If an exact match of the needle is found, the function returns the haystack from the needle onward, otherwise it returns false. If the needle occurs more than once, the returned string will start from the first occurrence of needle. For example, in the Smart Form application, we can decide where to send the email as follows: $toaddress = 'feedback@example.com'; // the default value // Change the $toaddress if the criteria are met if (strstr($feedback, 'shop')) $toaddress = 'retail@example.com'; else if (strstr($feedback, 'delivery')) $toaddress = 'fulfilment@example.com'; else if (strstr($feedback, 'bill')) $toaddress = 'accounts@example.com'; This code checks for certain keywords in the feedback and sends the mail to the appro- priate person. If, for example, the customer feedback reads “I still haven’t received deliv- ery of my last order,” the string “delivery” will be detected and the feedback will be sent to fulfilment@example.com. There are two variants on strstr().The first variant is stristr(), which is nearly identical but is not case sensitive.This will be useful for this application as the customer might type 'delivery', 'Delivery', or 'DELIVERY'. The second variant is strrchr(), which is again nearly identical, but will return the haystack from the last occurrence of the needle onward. Finding the Position of a Substring: strpos(), strrpos() The functions strpos() and strrpos() operate in a similar fashion to strstr(), except, instead of returning a substring, they return the numerical position of a needle within a haystack. 06 525x ch04 1/24/03 2:55 PM Page 107 108 Chapter 4 String Manipulation and Regular Expressions The strpos() function has the following prototype: int strpos(string haystack, string needle, int [offset] ); The integer returned represents the position of the first occurrence of the needle within the haystack.The first character is in position 0 as usual. For example, the following code will echo the value 4 to the browser: $test = 'Hello world'; echo strpos($test, 'o'); In this case, we have only passed in a single character as the needle, but it can be a string of any length. The optional offset parameter is used to specify a point within the haystack to start searching. For example, echo strpos($test, 'o', 5); This code will echo the value 7 to the browser because PHP has started looking for the character o at position 5, and therefore does not see the one at position 4. The strrpos() function is almost identical, but will return the position of the last occurrence of the needle in the haystack. Unlike strpos(), it only works with a single character needle.Therefore, if you pass it a string as a needle, it will only use the first character of the string to match. In any of these cases, if the needle is not in the string, strpos() or strrpos() will return false.This can be problematic because false in a weakly typed language such as PHP is equivalent to 0, that is, the first character in a string. You can avoid this problem by using the === operator to test return values: $result = strpos($test, 'H'); if ($result === false) echo 'Not found' else echo 'Found at position 0'; Note that this will only work in PHP 4—in earlier versions you can test for false by testing the return value to see if it is a string (that is, false). Replacing Substrings: str_replace(), substr_replace() Find-and-replace functionality can be extremely useful with strings.We have used find and replace in the past for personalizing documents generated by PHP—for example by replacing <<name>> with a person’s name and <<address>> with their address.You can also use it for censoring particular terms, such as in a discussion forum application, or even in the Smart Form application. Again, you can use string functions or regular expression functions for this purpose. The most commonly used string function for replacement is str_replace(). It has the following prototype: 06 525x ch04 1/24/03 2:55 PM Page 108 109 Introduction to Regular Expressions mixed str_replace(mixed needle, mixed new_needle, mixed haystack); This function will replace all the instances of needle in haystack with new_needle and return the new version of the haystack. Note As of PHP 4.0.5 you can pass all parameters as arrays and the function will work remarkably intelligently. You can pass an array of words to be replaced, an array of words to replace them with (respectively), and an array of strings to apply these rules to. The function will then return an array of revised strings. For example, because people can use the Smart Form to complain, they might use some colorful words. As programmers, we can prevent Bob’s various departments from being abused in that way: $feedback = str_replace($offcolor, '%!@*', $feedback); The function substr_replace() is used to find and replace a particular substring of a string based on its position. It has the following prototype: string substr_replace(string string, string replacement, int start, int [length] ); This function will replace part of the string string with the string replacement.Which part is replaced depends upon the values of the start and optional length parameters. The start value represents an offset into the string where replacement should begin. If it is 0 or positive, it is an offset from the beginning of the string; if it is negative, it is an offset from the end of the string. For example, this line of code will replace the last character in $test with "X": $test = substr_replace($test, 'X', -1); The length value is optional and represents the point at which PHP will stop replacing. If you don’t supply this value, the string will be replaced from start to the end of the string. If length is zero, the replacement string will actually be inserted into the string with- out overwriting the existing string. A positive length represents the number of characters that you want replaced with the new string. A negative length represents the point at which you’d like to stop replacing charac- ters, counted from the end of the string. Introduction to Regular Expressions PHP supports two styles of regular expression syntax: POSIX and Perl.The POSIX style of regular expression is compiled into PHP by default, but you can use the Perl style by 06 525x ch04 1/24/03 2:55 PM Page 109 110 Chapter 4 String Manipulation and Regular Expressions compiling in the PCRE (Perl-compatible regular expression) library.We’ll cover the sim- pler POSIX style, but if you’re already a Perl programmer, or want to learn more about PCRE, read the online manual at http://php.net. Note POSIX regular expressions are easier to learn and execute faster, but are not binary-safe. So far, all the pattern matching we’ve done has used the string functions.We have been limited to exact match, or to exact substring match. If you want to do more complex pattern matching, you should use regular expressions. Regular expressions are difficult to grasp at first but can be extremely useful. The Basics A regular expression is a way of describing a pattern in a piece of text.The exact (or lit- eral) matches we’ve done so far are a form of regular expression. For example, earlier we were searching for regular expression terms like "shop" and "delivery". Matching regular expressions in PHP is more like a strstr() match than an equal comparison because you are matching a string somewhere within another string. (It can be anywhere within that string unless you specify otherwise.) For example, the string "shop" matches the regular expression "shop". It also matches the regular expressions "h", "ho", and so on. We can use special characters to indicate a meta-meaning in addition to matching characters exactly. For example, with special characters you can indicate that a pattern must occur at the start or end of a string, that part of a pattern can be repeated, or that characters in a pat- tern must be of a particular type.You can also match on literal occurrences of special characters.We’ll look at each of these. Character Sets and Classes Using character sets immediately gives regular expressions more power than exact matching expressions. Character sets can be used to match any character of a particular type—they’re really a kind of wildcard. First of all, you can use the . character as a wildcard for any other single character except a new line (\n). For example, the regular expression .at matches the strings 'cat', 'sat',and 'mat',among others. This kind of wildcard matching is often used for filename matching in operating sys- tems. With regular expressions, however, you can be more specific about the type of char- acter you would like to match, and you can actually specify a set that a character must belong to. In the previous example, the regular expression matches 'cat' and 'mat',but 06 525x ch04 1/24/03 2:55 PM Page 110 111 Introduction to Regular Expressions also matches '#at'. If you want to limit this to a character between a and z, you can specify it as follows: [a-z] Anything enclosed in the special square brace characters [ and ] is a character class—a set of characters to which a matched character must belong. Note that the expression in the square brackets matches only a single character. You can list a set; for example [aeiou] means any vowel. You can also describe a range, as we just did using the special hyphen character, or a set of ranges: [a-zA-Z] This set of ranges stands for any alphabetic character in upper- or lowercase. You can also use sets to specify that a character cannot be a member of a set. For example, [^a-z] matches any character that is not between a and z.The caret symbol means not when it is placed inside the square brackets. It has another meaning when used outside square brackets, which we’ll look at in a minute. In addition to listing out sets and ranges, a number of predefined character classes can be used in a regular expression.These are shown in Table 4.3. Table 4.3 Character Classes for Use in POSIX Style Regular Expressions Class Matches [[:alnum:]] Alphanumeric characters [[:alpha:]] Alphabetic characters [[:lower:]] Lowercase letters [[:upper:]] Uppercase letters [[:digit:]] Decimal digits [[:xdigit:]] Hexadecimal digits [[:punct:]] Punctuation [[:blank:]] Tabs and spaces [[:space:]] Whitespace characters [[:cntrl:]] Control characters [[:print:]] All printable characters [[:graph:]] All printable characters except for space 06 525x ch04 1/24/03 2:55 PM Page 111 . ranges: [a-zA-Z] This set of ranges stands for any alphabetic character in upper- or lowercase. You can also use sets to specify that a character cannot be a member of a set. For example, [^a-z] matches. work in PHP 4—in earlier versions you can test for false by testing the return value to see if it is a string (that is, false). Replacing Substrings: str_replace(), substr_replace() Find -and- replace. feedback and sends the mail to the appro- priate person. If, for example, the customer feedback reads “I still haven’t received deliv- ery of my last order,” the string “delivery” will be detected and