Notice that quizzes with a score less than 70 aren’t shown. For example, you can see Alec’s quiz score of 74, but not his quiz score of 58. But what if you want to only display data for students who have an average quiz grade of 70 or more? Then you want to select on an average, not on individual rows. This is where the HAVING keyword comes in. You need to first group grades by student and then apply your selection criteria to an aggregate statistic based on the entire group. The following statement produces what we desire: SELECT Student AS 'Student', AVG (Grade) AS 'Average Quiz Grade' FROM Grades WHERE GradeType ¼ 'Quiz' GROUP BY Student HAVING AVG (Grade) >= 70 ORDER BY Student The output is: Student Average Quiz Grade Kathy 71.5 Susan 93.5 This SELECT has both a WHERE and a HAVING clause. The WHERE ensures that you only select rows with a GradeType of ‘‘Quiz.’’ The HAVING guarantees that you only select students with an average score of at least 70. What if you wanted to add a column with the GradeType value ? If you attempt to add GradeType to the SELECT columnlist, the statement will error. This is because all columns must be either listed in the GROUP BY or involved in an aggregation. If you want to show the GradeType column, it must be added to the GROUP BY clause, as follows: SELECT Student AS 'Student', GradeType AS 'Grade Type', AVG (Grade) AS 'Average Grade' FROM Grades WHERE GradeType ¼ 'Quiz' Chapter 10 ■ Summarizing Data106 GROUP BY Student, GradeType HAVING AVG (Grade) >= 70 ORDER BY Student The resulting data is: Student Grade Type Average Grade Kathy Quiz 71.5 Susan Quiz 93.5 Now that we’ve added the HAVING clause to the mix, let’s recap the general format of the SELECT statement: SELECT columnlist FROM tablelist WHERE condition GROUP BY columnlist HAVING condition ORDER BY columnlist It should be emphasized that, when employing any of the above keywords in a SELECT, they need to be entered in the order shown. For example, the HAVING keyword needs to always be after a GROUP BY but before an ORDER BY. Looking Ahead In this chapter, we covered several forms of aggregation, starting with the sim- plest—that of eliminating duplicates. We then introduced a number of aggregate functions, which are a different class of functions from the scalar functions seen in Chapter 4. The real power of aggregate functions becomes apparent when they are used in conjunction with the GROUP BY keyword, which allows for true aggregation of data into groups. Finally, we covered the HAVING keyword, which allows you to apply group-level selection criteria to values in aggregate functions. In our next chapter, ‘‘Combining Tables with an Inner Join,’’ we’re going to begin our exploration of a key topic in SQL, the ability to access data from multiple tables. Up until now, all SELECT queries have been against a single table. In the real world, this is an unrealistic scenario. The true value of relational databases lies in their ability to utilize multiple tables with related data. Seldom would one require data from only a single table. Looking Ahead 107 The topic of accessing data from multiple tables will be directly addressed in Chapters 11 and 12. Chapter 11 covers the inner join and Chapter 12 looks at the outer join. Subsequently, Chapters 13 through 15 will explore variations on the same theme. After you complete the next five chapters, you will have mastered the essential techniques of obtaining data from multiple tables. Chapter 10 ■ Summarizing Data108 chapter 11 Combining Tables with an Inner Join Keywords Introduced: INNER JOIN, ON Back in Chapter 1, we talked about the great advance of relational databases over their predecessors. The significant achievement of relational databases was their ability to allow data to be organized into any number of tables that are related but at the same time independent of each other. Unlike earlier databases, the relationships between tables in relational databases are not explicitly defined by a series of pointers. Instead, relationships are inferred by columns that tables have in common. Sometimes, these relationships are formalized by the definition of primary and foreign keys, but this isn’t always necessary. The great virtue of relational databases lies in the fact that someone can analyze business entities and then design an appropriate database design, which allows for maximum flexibility. Let’s look at a common example. Most organizations have a business entity known as the ‘‘customer.’’ As such, it is typical for a database to contain a Cus- tomers table that defines each customer. Such a table would normally contain a primary key to uniquely identify each customer and any number of columns with attributes describing the customer. Common attributes might include phone number, address, city, state, and so on. The main idea is that all information about the customer is stored in a single table and only in that table. This sim plifies the task of data updates. When a customer changes his phone number, there is only one table that needs to be updated. However, the downside to this setup is that whenever someone needs 109 any information about a customer, that person needs to access the Customers table to retrieve the information. This brings us to the conc ept of a join. Let’s say that someone is analyzing products that have been purchased. Along with information about the products, it is often necessary to provide information about the customers who purchased each product. For example, an analyst may desire to obtain customer ZIP codes for a geographic analysis. The ZIP code is only stored in the Customers table. Product information is stored in a Products table. To get information from both customers and products, the tables must be joined together in such a way that the information matches correctly. In essence, the promise of relational databases is fulfilled by the ability to join tables together in any desired manner. Joining Two Tables To begin our exploration of the join process, let’s revisit the Orders table that we first encountered in Chapter 3: OrderID FirstName LastName QuantityPurchased PricePerItem 1 William Smith 4 2.50 2 Natalie Lopez 10 1.25 3 Brenda Harper 5 4.00 The use of this table in earlier chapters was somewhat misleading. In reality, a competent database designer would never create a table such as this. The pro- blem is that it contains information about two separate entities: customers and orders. In the real world, the information would be split into at least two separate tables. A Customers table might look like this: CustomerID FirstName LastName 1 William Smith 2 Natalie Lopez 3 Brenda Harper 4 Adam Petrie Chapter 11 ■ Combining Tables with an Inner Join110 . several forms of aggregation, starting with the sim- plest—that of eliminating duplicates. We then introduced a number of aggregate functions, which are a different class of functions from the scalar. about the great advance of relational databases over their predecessors. The significant achievement of relational databases was their ability to allow data to be organized into any number of tables. BY Student The resulting data is: Student Grade Type Average Grade Kathy Quiz 71.5 Susan Quiz 93.5 Now that we’ve added the HAVING clause to the mix, let’s recap the general format of the SELECT