The result is: FirstName LastName Cary Grant Matching by Sound Let’s turn from matching letters and characters to matching sounds. SQL pro- vides two functions that give you some interesting ways to compare the sounds of words or phrases. The two functions are SOUNDEX and DIFFERENCE. Let’s first look at an example that utilizes the SOUNDEX function: SELECT SOUNDEX ('Smith') AS 'Sound of Smith', SOUNDEX ('Smythe') AS 'Sound of Smythe' The result is: Sound of Smith Sound of Smythe S530 S530 The SOUNDEX function always returns a four-character response, which is a sort of code for the sound of the phrase. The first character is always the first letter of the phrase. In this case, the first character is S because both Smith and Smythe begin with an S. The remaining three characters are calculated from an analysis of the sound of the rest of the phrase. Internally, the function first removes all vowels and the letter Y. So, the function takes the MITH from SMITH and converts it to MTH. Likewise, it takes the MYTHE from SMYTHE and converts it to MTH. It then assigns a number to represent the sound of the phrase. In this example, that number turns out to be 530. Since SOUNDEX returns a value of S530 for both Smith and Smythe, you can conclude that they probably have very similar sounds. Microsoft SQL Server provides one additional function, called DIFFERENCE, which works in conjunction with the SOUNDEX function. Matching by Sound 91 DATABASE DIFFERENCES: MySQL and Oracle The DIFFERENCE function isn’t available in MySQL or Oracle. Here’s an example, using the same words: SELECT DIFFERENCE ('Smith', 'Smythe') AS 'The Difference' The result is: The Difference 4 The DIFFERENCE function always requires two arguments. Internally, the function first retrieves the SOUNDEX values for each of the arguments and then compares those values. If it returns a value of 4, as in the previous example, that means that all four characters in the SOUNDEX value are identical. A value of 0 means that none of the characters is identical. Therefore, a DIFFERENCE value of 4 indicates the highest pos sible match, and a value of 0 is the lowest possible match. With this in mind, here’s an example of how the DIFFERENCE function can be used to retrieve values that are very similar in sound to a specific phrase. Work- ing from the Actors table, you’re going to attempt to find rows with a first name that sounds like John. The SELECT statement is: SELECT FirstName, LastName FROM Actors WHERE DIFFERENCE (FirstName, 'John') ¼ 4 The results are: FirstName LastName Jon Voight John Wayne Chapter 9 ■ Inexact Matches92 The DIFFERENCE function concluded that both John and Jon had a difference value of 4 between the name and the specified value of John. If you want to analyze exactly why these two rows were selected, you can alter your SELECT to show both the SOUNDEX and DIFFERENCE values for all rows in the table: SELECT FirstName, LastName, DIFFERENCE (FirstName, 'John') AS 'Difference Value', SOUNDEX (FirstName) AS 'Soundex Value' FROM Actors This returns: FirstName LastName Difference Value Soundex Value Cary Grant 2 C600 Mary Steenburgen 2 M600 Jon Voight 4 J500 Dustin Hoffman 1 D235 John Wayne 4 J500 Gary Cooper 2 G600 Julie Andrews 3 J400 Notice that both Jon Voight and John Wayne have a SOUNDEX value of J500 and a DIFFERENCE value of 4 for their first names. This explains why they were initially selected. Also notice that Julie Andrews has a DIFFERENCE value of 3. If you had specified a WHERE clause where the DIFFERENCE value equaled 3 or 4, that actor would have been selected as well. Looking Ahead This concludes our study of matching phrases by pattern or sound. Matching by patterns is an important and widely used function of SQL. Any time you enter a word in a search box and attempt to retrieve all entities containing that word, you are utilizing pattern matching. Efforts to match by sound are much less common. The technology exists, but there is an inherent difficulty in translating words to sounds. The English language, or any language for that matter, contains too many quirks and exceptions for such a match to be reliable. Looking Ahead 93 In our next chapter, ‘‘Summarizing Data,’’ we’re going to turn our attention to ways to separate data into groups and summarize the values in those groups with various statistics. Back in Chapter 4, we talked about scalar functions. The next chapter will introduce another type of function, called aggregate functions. These aggregate functions will allow you to summarize your data in many useful ways. For example, you’ll be able to look at any group of orders and determine the number of orders, the total dollar amount of the orders, and the average order size. With these techniques, you’ll be able to move beyond the presentation of detailed data and begin to truly add value for your users as you deliver sum- marized information. Chapter 9 ■ Inexact Matches94 chapter 10 Summarizing Data Keywords Introduced: DISTINCT, SUM, AVG, MIN, MAX, COUNT, GROUP BY, HAVING Up until now, we’ve been presenting data basically as it exists in a database. Sure, we’ve used some functions to move things around and have created some addi- tional calculations, but the rows we’ve retrieved have corresponded to rows in the underlying database. We now want to turn to various methods to summarize our data. The computer term usually associated with this type of endeavor is aggregation, which means ‘‘to combine into groups.’’ The ability to aggregate and summarize your data is key to being able to move beyond a mere display of data to some- thing approaching real information. There’s a bit of magic involved when users view summarized data in a report. They understand and appreciate that you’ve been able to extract some real meaning from the mass of data in a database, in order to present a clearer picture of what it all means. Eliminating Duplicates Although it doesn’t provide a true aggregation, the most elementary way to summarize data is to eliminate duplicates. SQL has a keyword named DISTINCT, which provides an easy way to remove duplicate rows from your output. 95 . sort of code for the sound of the phrase. The first character is always the first letter of the phrase. In this case, the first character is S because both Smith and Smythe begin with an S. The. are calculated from an analysis of the sound of the rest of the phrase. Internally, the function first removes all vowels and the letter Y. So, the function takes the MITH from SMITH and converts. ('Smith') AS 'Sound of Smith', SOUNDEX ('Smythe') AS 'Sound of Smythe' The result is: Sound of Smith Sound of Smythe S530 S530 The SOUNDEX function always returns