The third format of the COUNT function allows you to use the DISTINCT keyword in addition to a column name. Here’s an example: SELECT COUNT (DISTINCT FeeType) AS 'Number of Fee Types' FROM Fees This statement is counting the number of distinct values for the FeeType col- umn. The result is: Number of Fee Types 3 This means that there are three different values found in the FeeType column. Grouping Data The previous examples of aggregation functions are interesting, but of somewhat limited value. The real power of the aggregation functions will become evident after we introduce the concept of grouping data. The GROUP BY keyword is used to separate data returned from a SELECT statement into any number of groups. For example, when looking at the previous Grades table, you may be interested in analyzing test scores based on the grade type. In other words, you want to separate the data into two separate groups, quizzes and homework. The value of the GradeType column can be used to determine which group each row belongs to. Once data has been separated into groups, then aggregation functions can be utilized so that summary statistics for each of the groups can be calculated and compared. Let’s proceed with an example that introduces the GROUP BY keyword: SELECT GradeType AS 'Grade Type', AVG (Grade) AS 'Average Grade' FROM Grades GROUP BY GradeType ORDER BY GradeType Grouping Data 101 The result is: Grade Type Average Grade Homework 86 Quiz 77 In this example, the GROUP BY keyword specifies that groups are to be created based on the value of the GradeType column. The two columns in the SELECT columnlist are GradeType and a calculated field that uses the AVG function. The GradeType column was included in the columnlist because when creating a group, it’s usually a good idea to include the column on which the groups are based. The ‘‘Average Grade’’ calculated field aggregates values based on all rows in each group. Notice that the average homework grade has been computed as 86. Even though there is one row with a NULL value for the Homework type, SQL is smart enough to ignore rows with NULL values when computing an average. If you want the NULL value to be counted as a 0, then the ISNULL function can used to convert the NULL to a 0, as follows: AVG (ISNULL (Grade, 0)) AS 'Average Grade' It’s important to note that when using a GROUP BY keyword, all columns in the columnlist must either be listed as columns in the GROUP BY clause or else be used in an aggregation function. Nothing else would make any sense. For example, the following SELECT would error: SELECT GradeType AS 'Grade Type', AVG (Grade) AS 'Average Grade', Student AS 'Student' FROM Grades GROUP BY GradeType ORDER BY GradeType The problem with this statement is that the Student column is not in the GROUP BY clause, nor is it aggregated in any way. Since everything is being presented in groups, SQL doesn’t know what to do with the Student column. Chapter 10 ■ Summarizing Data102 DATABASE DIFFERENCES: MySQL Unlike Microsoft SQL Server and Oracle, the previous statement will not error in MySQL, but will produce incorrect results. Multiple Columns and Sorting The concept of groups can be extended so the groups are based on more than one column. Let’s go back to the last SELECT and add the Student column to the GROUP BY clause and also to the columnlist. It now looks like: SELECT GradeType AS 'Grade Type', Student AS 'Student', AVG (Grade) AS 'Average Grade' FROM Grades GROUP BY GradeType, Student ORDER BY GradeType, Student The resulting data is: Grade Type Student Average Grade Homework Alec 88 Homework Kathy NULL Homework Susan 84 Quiz Alec 66 Quiz Kathy 71.5 Quiz Susan 93.5 You now see a breakdown not only of grade types, but also of students. The average grades are computed on each group. Note that the Homework row for Kathy shows a NULL value, since she only has one homework row, and that row has a value of NULL for the grade. The order in which columns are listed in the GROUP BY clause has no sig- nificance. The results would be the same if the clause were: GROUP BY Student, GradeType Multiple Columns and Sorting 103 However, as always, the order that columns are listed in the ORDER BY clause is meaningful. If you switch the ORDER BY clause to: ORDER BY Student, GradeType then the results are: Grade Type Student Average Homework Alec 88 Quiz Alec 66 Homework Kathy NULL Quiz Kathy 71.5 Homework Susan 84 Quiz Susan 93.5 This still looks a bit strange, since it’s difficult to tell at a glance that the data is really sorted by Student and then by Grade Type. As a general rule of thumb, it often helps if columns are listed in the same order in which columns are sorted. A more understandable SELECT statement would be: SELECT Student AS 'Student', GradeType AS 'Grade Type', AVG (Grade) AS 'Average Grade' FROM Grades GROUP BY GradeType, Student ORDER BY Student, GradeType The data now looks like: Student Grade Type Average Grade Alec Homework 88 Alec Quiz 66 Kathy Homework NULL Kathy Quiz 71.5 Susan Homework 84 Susan Quiz 93.5 Chapter 10 ■ Summarizing Data104 This is more comprehensible, since the column order corresponds to the sort order. There’s sometimes a certain confusion as to the difference between the GROUP BY and ORDER BY clauses. Just remember that the GROUP BY merely creates the groups. You still need to use the ORDER BY to present your data in the correct sequence. Selection Criteria on Aggregates One more topic needs to be added to our discussion of summarizing data. Once groups are created, selection criteria becomes a bit more complex. When apply- ing any kind of selection criteria to a SELECT with a GROUP BY, one has to ask whether the selection criteria applies to the individual rows or to the entire group. In essence, the WHERE clause handles selection criteria for individual rows. SQL provides a keyword named HAVING, which allows for selection criteria at the group level. Returning to the Grades table, let’s say you want to only look at grades on quizzes that are 70 or higher. The grades you’d like to look at are individual grades, so you can use the WHERE clause, as normal. Such a SELECT might look like: SELECT Student AS 'Student', GradeType AS 'Grade Type', Grade AS 'Grade' FROM Grades WHERE GradeType ¼ 'Quiz' AND Grade >= 70 ORDER BY Student, Grade The resulting data is: Student GradeType Grade Alec Quiz 74 Kathy Quiz 81 Susan Quiz 92 Susan Quiz 95 Selection Criteria on Aggregates 105 . counting the number of distinct values for the FeeType col- umn. The result is: Number of Fee Types 3 This means that there are three different values found in the FeeType column. Grouping Data The. examples of aggregation functions are interesting, but of somewhat limited value. The real power of the aggregation functions will become evident after we introduce the concept of grouping data. The GROUP. Sorting The concept of groups can be extended so the groups are based on more than one column. Let’s go back to the last SELECT and add the Student column to the GROUP BY clause and also to the columnlist.