OCA/OCP Oracle Database 11g All-in-One Exam Guide 47 6 Include or Exclude Grouped Rows Using the HAVING Clause • Clustering rows using a common grouping attribute with the GROUP BY clause and applying an aggregate function to each of these groups returns group-level results. • The HAVING clause provides the language to limit the group-level results returned. • The HAVING clause may only be specified if there is a GROUP BY clause present. • All grouping is performed and group functions are executed prior to evaluating the HAVING clause. Self Test 1. What result is returned by the following statement? SELECT COUNT(*) FROM DUAL; (Choose the best answer.) A. NULL B. 0 C. 1 D. None of the above 2. Choose one correct statement regarding group functions. A. Group functions may only be used when a GROUP BY clause is present. B. Group functions can operate on multiple rows at a time. C. Group functions only operate on a single row at a time. D. Group functions can execute multiple times within a single group. 3. What value is returned after executing the following statement? SELECT SUM(SALARY) FROM EMPLOYEES; Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. (Choose the best answer.) A. 900 B. 1000 C. NULL D. None of the above 4. Which values are returned after executing the following statement? SELECT COUNT(*), COUNT(SALARY) FROM EMPLOYEES; Chapter 11: Group Functions 477 PART II Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in their SALARY field. (Choose all that apply.) A. 10 and 10 B. 10 and NULL C. 10 and 9 D. None of the above 5. What value is returned after executing the following statement? SELECT AVG(NVL(SALARY,100)) FROM EMPLOYEES; Assume there are ten employee records and each contains a SALARY value of 100, except for one employee, who has a null value in the SALARY field. (Choose the best answer.) A. NULL B. 90 C. 100 D. None of the above 6. What value is returned after executing the following statement? SELECT SUM((AVG(LENGTH(NVL(SALARY,0))))) FROM EMPLOYEES GROUP BY SALARY; Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. (Choose the best answer.) A. An error is returned B. 3 C. 4 D. None of the above 7. How many rows are returned by the following query? SELECT SUM(SALARY), DEPARTMENT_ID FROM EMPLOYEES GROUP BY DEPARTMENT_ID; Assume there are 11 non-null and 1 null unique DEPARTMENT_ID values. All records have a non-null SALARY value. (Choose the best answer.) A. 12 B. 11 C. NULL D. None of the above OCA/OCP Oracle Database 11g All-in-One Exam Guide 478 8. What values are returned after executing the following statement? SELECT JOB_ID, MAX_SALARY FROM JOBS GROUP BY MAX_SALARY; Assume that the JOBS table has ten records with the same JOB_ID value of DBA and the same MAX_SALARY value of 100. (Choose the best answer.) A. One row of output with the values DBA, 100 B. Ten rows of output with the values DBA, 100 C. An error is returned D. None of the above 9. How many rows of data are returned after executing the following statement? SELECT DEPT_ID, SUM(NVL(SALARY,100)) FROM EMP GROUP BY DEPT_ID HAVING SUM(SALARY) > 400; Assume the EMP table has ten rows and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. The first five rows have a DEPT_ID value of 10 while the second group of five rows, which includes the row with a null SALARY value, has a DEPT_ID value of 20. (Choose the best answer.) A. Two rows B. One row C. Zero rows D. None of the above 10. How many rows of data are returned after executing the following statement? SELECT DEPT_ID, SUM(SALARY) FROM EMP GROUP BY DEPT_ID HAVING SUM(NVL(SALARY,100)) > 400; Assume the EMP table has ten rows and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. The first five rows have a DEPT_ID value of 10, while the second five rows, which include the row with a null SALARY value, have a DEPT_ID value of 20. (Choose the best answer.) A. Two rows B. One row C. Zero rows D. None of the above Self Test Answers 1. þ C. The DUAL table has one row and one column. The COUNT(*) function returns the number of rows in a table or group. ý A, B, and D. 2. þ B. By definition, group functions can operate on multiple rows at a time, unlike single-row functions. Chapter 11: Group Functions 479 PART II ý A, C, and D. A group function may be used without a GROUP BY clause. In this case, the entire dataset is operated on as a group. The COUNT function is often executed against an entire table, which behaves as one group. D is incorrect. Once a dataset has been partitioned into different groups, any group functions execute once per group. 3. þ A. The SUM aggregate function ignores null values and adds non-null values. Since nine rows contain the SALARY value 100, 900 is returned. ý B, C, and D. B would be returned if SUM(NVL(SALARY,100)) were executed. C is a tempting choice, since regular arithmetic with NULL values returns a NULL result. However, the aggregate functions, except for COUNT(*), ignore NULL values. 4. þ C. COUNT(*) considers all rows, including those with NULL values, while COUNT(SALARY) only considers the non-null rows. ý A, B, and D. 5. þ C. The NVL function converts the one NULL value into 100. Thereafter, the average function adds the SALARY values and obtains 1000. Dividing this by the number of records returns 100. ý A, B, and D. B would be returned if AVG(NVL(SALARY,0)) were selected. It is interesting to note that if AVG(SALARY) were selected, 100 would have also been returned, since the AVG function would sum the non-null values and divide the total by the number of rows with non-null SALARY values. So AVG(SALARY) would be calculated as: 900/9=100. 6. þ C. The dataset is segmented by the SALARY column. This creates two groups: one with SALARY values of 100 and the other with a null SALARY value. The average length of SALARY value 100 is 3 for the rows in the first group. The NULL salary value is first converted into the number 0 by the NVL function, and the average length of SALARY is 1. The SUM function operates across the two groups adding the values 3 and 1, returning 4. ý A, B, and D. A seems plausible, since group functions may not be nested more than two levels deep. Although there are four functions, only two are group functions, while the others are single-row functions evaluated before the group functions. B would be returned if the expression SUM(AVG(LENGTH(SALARY))) were selected. 7. þ A. There are 12 distinct DEPARTMENT_ID values. Since this is the grouping attribute, 12 groups are created, including 1 with a null DEPARTMENT_ID value. Therefore 12 rows are returned. ý B, C, and D. OCA/OCP Oracle Database 11g All-in-One Exam Guide 480 8. þ C. For a GROUP BY clause to be used, a group function must appear in the SELECT list. ý A, B, and D. These are incorrect, since the statement is syntactically inaccurate and is disallowed by Oracle. Do not mistake the column named MAX_SALARY for the MAX(SALARY) function. 9. þ B. Two groups are created based on their common DEPT_ID values. The group with DEPT_ID values of 10 consists of five rows with SALARY values of 100 in each of them. Therefore, the SUM(SALARY) function returns 500 for this group, and it satisfies the HAVING SUM(SALARY) > 400 clause. The group with DEPT_ID values of 20 has four rows with SALARY values of 100 and one row with a NULL SALARY. SUM(SALARY) only returns 400 and this group does not satisfy the HAVING clause. ý A, C, and D. Beware of the SUM(NVL(SALARY,100)) expression in the SELECT clause. This expression selects the format of the output. It does not restrict or limit the dataset in any way. 10. þ A. Two groups are created based on their common DEPT_ID values. The group with DEPT_ID values of 10 consists of five rows with SALARY values of 100 in each of them. Therefore the SUM(NVL(SALARY,100)) function returns 500 for this group and satisfies the HAVING SUM(NVL(SALARY,100))>400 clause. The group with DEPT_ID values of 20 has four rows with SALARY values of 100 and one row with a null SALARY. SUM(NVL(SALARY,100)) returns 500, and this group satisfies the HAVING clause. Therefore, two rows are returned. ý B, C, and D. Although the SELECT clause contains SUM(SALARY), which returns 500 and 400 for the two groups, the HAVING clause contains the SUM(NVL(SALARY,100)) expression, which specifies the inclusion or exclusion criteria for a group-level row. CHAPTER 12 SQL Joins Exam Objectives In this chapter you will learn to • 051.6.1 Write SELECT Statements to Access Data from More Than One Table Using Equijoins and Nonequijoins • 051.6.2 Join a Table to Itself Using a Self-Join • 051.6.3 View Data That Does Not Meet a Join Condition Using Outer Joins • 051.6.4 Generate a Cartesian Product of All Rows from Two or More Tables 481 OCA/OCP Oracle Database 11g All-in-One Exam Guide 482 The three pillars of relational theory are selection, projection, and joining. This chapter focuses on the practical implementation of joining. Rows from different tables or views are associated with each other using joins. Support for joining has implications for the way data is stored, and many data models such as third normal form or star schemas have emerged to exploit this feature. Tables may be joined in several ways. The most common technique is called an equijoin, where a row is associated with one or more rows in another table based on the equality of column values or expressions. Tables may also be joined using a nonequijoin, where a row is associated with one or more rows in another table if its column values fall into a range determined by inequality operators. A less common technique is to associate rows with other rows in the same table. This association is based on columns with logical and usually hierarchical relationships with each other and is called a self-join. Rows with null or differing entries in common join columns are excluded when equijoins and nonequijoins are performed. An outer join is available to fetch these one-legged or orphaned rows, if necessary. A cross join or Cartesian product is formed when every row from one table is joined to all rows in another. This join is often the result of missing or inadequate join conditions but is occasionally intentional. Write SELECT Statements to Access Data from More Than One Table Using Equijoins and Nonequijoins This section introduces the different types of joins in their primitive forms, outlining the broad categories that are available before delving into an in-depth discussion of the various join clauses. The modern ANSI-compliant and traditional Oracle syntaxes are discussed, but emphasis is placed on the modern syntax. This section concludes with a discussion of nonequijoins and additional join conditions. Joining is described by focusing on the following eight areas: • Types of joins • Joining tables using SQL:1999 syntax • Qualifying ambiguous column names • The NATURAL JOIN clause • The natural JOIN USING clause • The natural JOIN ON clause • N-way joins and additional join conditions • Nonequijoins Types of Joins Two basic joins are the equijoin and the nonequijoin. Joins may be performed between multiple tables, but much of the following discussion will use two hypothetical tables to illustrate the concepts and language of joins. The first table is called the source, and Chapter 12: SQL Joins 483 PART II the second is called the target. Rows in the source and target tables comprise one or more columns. As an example, assume that the source and target are the COUNTRIES and REGIONS tables from the HR schema, respectively. The COUNTRIES table comprises three columns named COUNTRY_ID, COUNTRY_ NAME, and REGION_ID, while the REGIONS table comprises two columns named REGION_ID and REGION_NAME. The data in these two tables is related via the common REGION_ID column. Consider the following queries: Query 1: select * from countries where country_id='CA'; Query 2: select region_name from regions where region_id='2'; Query 1 retrieves the column values associated with the row from the COUNTRIES table with COUNTRY_ID=’CA’. The REGION_ID value of this row is 2. Query 2 fetches Americas as the region name from the REGIONS table for the row with REGION_ID=2, thus identifying the one region in which Canada lies. Joining facilitates the retrieval of column values from multiple tables using a single query. The source and target tables can be swapped, so the REGIONS table could be the source and the COUNTRIES table could be the target. Consider the following two queries: Query 1: select * from regions where region_name='Americas'; Query 2: select country_name from countries where region_id='2'; Query 1 fetches one row with a REGION_ID value of 2. Joining in this reversed manner allows the following question to be asked: What countries belong to the Americas region? The answers from the second query are five countries named: Argentina, Brazil, Canada, Mexico, and the United States of America. These results may be obtained from a single query that joins the tables together. Natural Joins The natural join is implemented using three possible join clauses that use the following keywords in different combinations: NATURAL JOIN, USING, and ON. When the source and target tables share identically named columns, it is possible to perform a natural join between them without specifying a join column. This is sometimes referred to as a pure natural join. In this scenario, columns with the same names in the source and target tables are automatically associated with each other. Rows with matching column values in both tables are retrieved. The REGIONS and COUNTRIES table both share a commonly named column: REGION_ID. They may be naturally joined without specifying join columns, as shown in the first two queries in Figure 12-1. The NATURAL JOIN keywords instruct Oracle to identify columns with identical names between the source and target tables. Thereafter, a join is implicitly performed between them. In the first query, the REGION_ID column is identified as the only commonly named column in both tables. REGIONS is the source table and appears after the FROM clause. The target table is therefore COUNTRIES. For each row in the REGIONS table, a match for the REGION_ID value is sought from all the rows in the COUNTRIES table. An interim result set is constructed containing rows matching the join condition. This set is then restricted by the WHERE clause. In this case, because the COUNTRY_NAME must be Canada, the REGION_NAME Americas is returned. OCA/OCP Oracle Database 11g All-in-One Exam Guide 484 The second query shows a natural join where COUNTRIES is the source table. The REGION_ID value for each row in the COUNTRIES table is identified and a search for a matching row in the REGIONS table is initiated. If matches are found, the interim results are limited by any WHERE conditions. The COUNTRY_NAME from rows with Americas as their REGION_NAME is returned. Sometimes more control must be exercised regarding which columns to use for joins. When there are identical column names in the source and target tables you want to exclude as join columns, the JOIN . . . USING format may be used. Remember that Oracle does not impose any rules stating that columns with the same name in two discrete tables must have a relationship with each other. The third query explicitly specifies that the REGIONS table be joined to the COUNTRIES table based on common values in their REGION_ID columns. This syntax allows natural joins to be formed on specific columns instead of on all commonly named columns. The fourth query demonstrates the JOIN . . . ON format of the natural join, which allows join columns to be explicitly stated. This format does not depend on the columns Figure 12-1 Natural joins Chapter 12: SQL Joins 485 PART II in the source and target tables having identical names. This form is more general and is the most widely used natural join format. TIP Be wary when using pure natural joins, since database designers may assign the same name to key or unique columns. These columns may have names like ID or SEQ_NO. If a pure natural join is attempted between such tables, ambiguous and unexpected results may be returned. Outer Joins Not all tables share a perfect relationship, where every record in the source table can be matched to at least one row in the target table. It is occasionally required that rows with nonmatching join column values also be retrieved by a query. Suppose the EMPLOYEES and DEPARTMENTS tables are joined with common DEPARTMENT_ID values. EMPLOYEES records with null DEPARTMENT_ID values are excluded along with values absent from the DEPARTMENTS table. An outer join fetches these rows. Cross Joins A cross join or Cartesian product derives its names from mathematics, where it is also referred to as a cross product between two sets or matrices. This join creates one row of output for every combination of source and target table rows. If the source and target tables have three and four rows, respectively, a cross join between them results in (3 × 4 = 12) rows being returned. Consider the row counts retrieved from the queries in Figure 12-2. Figure 12-2 Cross join . WHERE clause. In this case, because the COUNTRY_NAME must be Canada, the REGION_NAME Americas is returned. OCA/ OCP Oracle Database 11g All-in-One Exam Guide 484 The second query shows a natural join. records have a non-null SALARY value. (Choose the best answer.) A. 12 B. 11 C. NULL D. None of the above OCA/ OCP Oracle Database 11g All-in-One Exam Guide 478 8. What values are returned after. DEPARTMENT_ID value. Therefore 12 rows are returned. ý B, C, and D. OCA/ OCP Oracle Database 11g All-in-One Exam Guide 480 8. þ C. For a GROUP BY clause to be used, a group function must appear in the