Using Subqueries to Define Unknown Data
What You’ll Learn in This Hour:
. What a subquery is
. The justifications of using subqueries
. Examples of subqueries in regular database queries . Using subqueries with data manipulation commands . Embedded subqueries
In this hour, you are introduced to the concept of subqueries. Using sub- queries enables you to more easily preform complex queries.
What Is a Subquery?
Asubquery, also known as a nested query, is a query embedded within the WHEREclause of another query to further restrict data returned by the query.
A subquery returns data that is used in the main query as a condition to further restrict the data to be retrieved. Subqueries are employed with the SELECT,INSERT,UPDATE, and DELETEstatements.
You can use a subquery in some cases in place of a join operation by indi- rectly linking data between the tables based on one or more conditions.
When you have a subquery in a query, the subquery is resolved first, and then the main query is resolved according to the condition(s) resolved by the subquery. The results of the subquery process expressions in the WHERE clause of the main query. You can use the subquery either in the WHERE clause or the HAVINGclause of the main query. You can use logical and rela- tional operators, such as =,>,<,<>,!=,IN,NOT IN,AND,OR, and so on, within the subquery as well as to evaluate a subquery in the WHEREorHAVING
Did You Know?
The Rules of Using Subqueries
The same rules that apply to standard queries also apply to subqueries. You can use join operations, functions, conversions, and other options within a subquery.
Use Indentation for Neater Statement Syntax
Notice the use of indentation in our examples. The use of indentation is merely for readability. The neater your statements are, the easier it is to read and find syntax errors.
Subqueries must follow a few rules:
. Subqueries must be enclosed within parentheses.
. A subquery can have only one column in the SELECTclause, unless multiple columns are in the main query for the subquery to com- pare its selected columns.
. You cannot use an ORDER BYclause in a subquery, although the main query can use an ORDER BYclause. You can use the GROUP BY clause to perform the same function as the ORDER BYclause in a subquery.
. You can only use subqueries that return more than one row with multiple value operators, such as the INoperator.
. TheSELECTlist cannot include references to values that evaluate to aBLOB,ARRAY,CLOB, or NCLOB.
. You cannot immediately enclose a subquery in a SETfunction.
. You cannot use the BETWEENoperator with a subquery; however, you can use the BETWEENoperator within the subquery.
The basic syntax for a subquery is as follows:
SELECT COLUMN_NAME FROM TABLE
WHERE COLUMN_NAME = (SELECT COLUMN_NAME FROM TABLE
WHERE CONDITIONS);
The following examples show how you can and cannot use the BETWEEN operator with a subquery. Here is an example of a correct use of BETWEENin the subquery:
SELECT COLUMN_NAME FROM TABLE_A
By the Way
What Is a Subquery? 227
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE_B)
WHERE VALUE BETWEEN VALUE)
You cannot use BETWEENas an operator outside the subquery. The following is an example of an illegal use of BETWEENwith a subquery:
SELECT COLUMN_NAME FROM TABLE_A
WHERE COLUMN_NAME BETWEEN VALUE AND (SELECT COLUMN_NAME FROM TABLE_B)
Subqueries with the SELECT Statement
Subqueries are most frequently used with the SELECTstatement, although you can use them within a data manipulation statement as well. The sub- query, when employed with the SELECTstatement, retrieves data for the main query to use.
The basic syntax is as follows:
SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR
(SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ]
[ WHERE ])
The following is an example:
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE < (SELECT PAY_RATE FROM EMPLOYEE_PAY_TBL WHERE EMP_ID = ‘443679012’);
The preceding SQL statement returns the employee identification, last name, first name, and pay rate for all employees who have a pay rate greater than that of the employee with the identification 443679012. In this case, you do not necessarily know (or care) what the exact pay rate is for this particular employee; you only care about the pay rate for the purpose of getting a list of employees who bring home more than the employee specified in the subquery.
Using Subqueries for Unknown Values
Subqueries are frequently used to place conditions on a query when the exact
Did You Know?
The next query selects the pay rate for a particular employee. This query is used as the subquery in the following example.
SELECT PAY_RATE FROM EMPLOYEE_PAY_TBL WHERE EMP_ID = ‘220984332’;
PAY_RATE --- 11
1 row selected.
The previous query is used as a subquery in the WHEREclause of the follow- ing query:
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE > (SELECT PAY_RATE FROM EMPLOYEE_PAY_TBL WHERE EMP_ID = ‘220984332’);
EMP_ID LAST_NAME FIRST_NAME PAY_RATE --- --- --- --- 442346889 PLEW LINDA 14.75 443679012 SPURGEON TIFFANY 15 2 rows selected.
The result of the subquery is 11(shown in the last example), so the last con- dition of the WHEREclause is evaluated as
AND EP.PAY_RATE > 11
You did not know the value of the pay rate for the given individual when you executed the query. However, the main query was able to compare each individual’s pay rate to the subquery results.
Subqueries with the INSERT Statement
Always Remember to COMMIT Your DML
Remember to use the COMMITandROLLBACKcommands when using DML com- mands such as the INSERTstatement.
By the Way
You can also use subqueries in conjunction with Data Manipulation Language (DML)statements. The INSERTstatement is the first instance you examine. It uses the data returned from the subquery to insert into another table. You
What Is a Subquery? 229
can modify the selected data in the subquery with any of the character, date, or number functions.
The basic syntax is as follows:
INSERT INTO TABLE_NAME [ (COLUMN1 [, COLUMN2 ]) ] SELECT [ *|COLUMN1 [, COLUMN2 ]
FROM TABLE1 [, TABLE2 ] [ WHERE VALUE OPERATOR ]
The following is an example of the INSERTstatement with a subquery:
INSERT INTO RICH_EMPLOYEES
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE > (SELECT PAY_RATE FROM EMPLOYEE_PAY_TBL WHERE EMP_ID = ‘220984332’);
2 rows created.
ThisINSERTstatement inserts the EMP_ID,LAST_NAME,FIRST_NAME, and
PAY_RATEinto a table called RICH_EMPLOYEESfor all records of employees who have a pay rate greater than the pay rate of the employee with identifica- tion220984332.
Subqueries with the UPDATE Statement
You can use subqueries in conjunction with the UPDATEstatement to update single or multiple columns in a table. The basic syntax is as follows:
UPDATE TABLE
SET COLUMN_NAME [, COLUMN_NAME) ] = (SELECT ]COLUMN_NAME [, COLUMN_NAME) ] FROM TABLE
[ WHERE ]
Examples showing the use of the UPDATEstatement with a subquery follow.
The first query returns the employee identification of all employees who reside in Indianapolis. You can see that four individuals meet this criterion.
SELECT EMP_ID FROM EMPLOYEE_TBL
WHERE CITY = ‘INDIANAPOLIS’;
EMP_ID --- 442346889 313782439
443679012 4 rows selected.
The first query is used as the subquery in the following statement; it proves how many employee identifications are returned by the subquery. The fol- lowing is the UPDATEwith the subquery:
UPDATE EMPLOYEE_PAY_TBL SET PAY_RATE = PAY_RATE * 1.1 WHERE EMP_ID IN (SELECT EMP_ID
FROM EMPLOYEE_TBL
WHERE CITY = ‘INDIANAPOLIS’);
4 rows updated.
As expected, four rows are updated. One important thing to notice is that, unlike the example in the first section, this subquery returns multiple rows of data. Because you expect multiple rows to be returned, you use the IN operator instead of the equal sign. Remember that INcompares an expres- sion to values in a list. If you had used the equal sign, an error would have been returned.
Subqueries with the DELETE Statement
You can also use subqueries in conjunction with the DELETEstatement. The basic syntax is as follows:
DELETE FROM TABLE_NAME [ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME FROM TABLE_NAME) [ WHERE) ]
In the following example, you delete the BRANDON GLASSrecord from
EMPLOYEE_PAY_TBL. You do not know Brandon’s employee identification num- ber, but you can use a subquery to get his identification number from EMPLOYEE_TBL, which contains the FIRST_NAMEandLAST_NAMEcolumns.
DELETE FROM EMPLOYEE_PAY_TBL WHERE EMP_ID = (SELECT EMP_ID
FROM EMPLOYEE_TBL
WHERE LAST_NAME = ‘GLASS’
AND FIRST_NAME = ‘BRANDON’);
1 row deleted.
Embedded Subqueries 231
Embedded Subqueries
Check the Limits of Your System
You must check your particular implementation for limits on the number of sub- queries, if any, that you can use in a single statement. It might differ between vendors.
By the Way
You can embed a subquery within another subquery, just as you can embed the subquery within a regular query. When a subquery is used, that sub- query is resolved before the main query. Likewise, the lowest level subquery is resolved first in embedded or nested subqueries, working out to the main query.
The basic syntax for embedded subqueries is as follows:
SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE
[ WHERE COLUMN_NAME OPERATOR VALUE ]))
The following example uses two subqueries, one embedded within the other.
You want to find out what customers have placed orders in which the quan- tity multiplied by the cost of a single order is greater than the sum of the cost of all products.
SELECT CUST_ID, CUST_NAME FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST < (SELECT SUM(COST) FROM
PRODUCTS_TBL));
CUST_ID CUST_NAME
--- --- 090 WENDY WOLF
232 LESLIE GLEASON 287 GAVINS PLACE 43 SCHYLERS NOVELTIES 432 SCOTTYS MARKET 560 ANDYS CANDIES
Always Use a WHERE Clause
Do not forget the use of the WHEREclause with the UPDATEandDELETEstate- ments. All rows are updated or deleted from the target table if the WHEREclause is not used. You can utilize a SELECTstatement with the WHEREclause first to ensure that you are modifying the correct rows. See Hour 5, “Manipulating Data.”
Watch Out!
Six rows that meet the criteria of both subqueries were selected.
The following two examples show the results of each of the subqueries to aid your understanding of how the main query was resolved:
SELECT SUM(COST) FROM PRODUCTS_TBL;
SUM(COST) ---
138.08
1 row selected.
SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST > 138.08;
CUST_ID --- 43 287
2 rows selected.
In essence, the main query, after the substitution of the second subquery, is evaluated as shown in the following example:
SELECT CUST_ID, CUST_NAME FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST > 138.08);
The following shows how the main query is evaluated after the substitution of the first subquery:
SELECT CUST_ID, CUST_NAME FROM CUSTOMER_TBL
WHERE CUST_ID IN (287,43);
The following is the final result:
Correlated Subqueries 233
CUST_ID CUST_NAME
--- --- 43 SCHYLERS NOVELTIES 287 GAVINS PLACE 2 rows selected.
Multiple Subqueries Can Cause Problems
The use of multiple subqueries results in slower response time and might result in reduced accuracy of the results due to possible mistakes in the statement coding.
Watch Out!
Correlated Subqueries
Correlated subqueriesare common in many SQL implementations. The con- cept of correlated subqueries is discussed as an ANSI-standard SQL topic and is covered briefly in this hour. A correlated subquery is a subquery that is dependent upon information in the main query. This means that tables in a subquery can be related to tables in the main query.
In the following example, the table join betweenCUSTOMER_TBLand ORDERS_TBLin the subquery is dependent on the alias forCUSTOMER_TBL (C) in the main query. This query returns the name of all customers who have ordered more than 10 units of one or more items.
SELECT C.CUST_NAME FROM CUSTOMER_TBL C
WHERE 10 < (SELECT SUM(O.QTY) FROM ORDERS_TBL O
WHERE O.CUST_ID = C.CUST_ID);
CUST_NAME
--- SCOTTYS MARKET SCHYLERS NOVELTIES MARYS GIFT SHOP 3 rows selected.
You can extract and slightly modify the subquery from the previous state- ment in the next statement to show you the total quantity of units ordered for each customer, allowing the previous results to be verified:
SELECT C.CUST_NAME, SUM(O.QTY) FROM CUSTOMER_TBL C,
ORDERS_TBL O
CUST_NAME SUM(O.QTY) --- --- ANDYS CANDIES 1
GAVINS PLACE 10 LESLIE GLEASON 1 MARYS GIFT SHOP 100 SCHYLERS NOVELTIES 25 SCOTTYS MARKET 20 WENDY WOLF 2 7 rows selected.
TheGROUP BYclause in this example is required because another column is being selected with the aggregate function SUM. This gives you a sum for each customer. In the original subquery, a GROUP BYclause is not required becauseSUMachieves a total for the entire query, which is run against the record for each customer.
Subquery Performance
Subqueries do have performance implications when used within a query.
You must consider those implications prior to implementing them in a pro- duction environment. Consider that a subquery must be evaluated prior to the main part of the query, so the time that it takes to execute the subquery has a direct effect on the time it takes for the main query to execute. Let’s look at our previous example:
Proper Use of Correlated Subqueries
In the case of a correlated subquery, you must reference the table in the main query before you can resolve the subquery.
By the Way
SELECT CUST_ID, CUST_NAME FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST < (SELECT SUM(COST) FROM
PRODUCTS_TBL));
Imagine what would happen if PRODUCTS_TBLcontained a couple thousand product lines and ORDERS_TBLcontained a few million lines of customer orders. The resulting effect of having to do a SUMacrossPRODUCTS_TBLand then join it with ORDERS_TBLcould slow the query down quite considerably.
So always remember to evaluate the effect that using a subquery has on
Q&A 235
performance when deciding on a course of action to take for getting infor- mation out of the database.
Summary
By simple definition and general concept, a subquery is a query that is per- formed within another query to place further conditions on a query. You can use a subquery in an SQL statement’s WHEREclause or HAVINGclause.
Queries are typically used within other queries (Data Query Language), but you can also use them in the resolution of DML statements such as INSERT, UPDATE, and DELETE. All basic rules for DML apply when using subqueries with DML commands.
The subquery’s syntax is virtually the same as that of a standalone query, with a few minor restrictions. One of these restrictions is that you cannot use the ORDER BYclause within a subquery; you can use a GROUP BYclause, however, which renders virtually the same effect. Subqueries are used to place conditions that are not necessarily known for a query, providing more power and flexibility with SQL.
Q&A
Q. In the examples of subqueries, I noticed quite a bit of indentation. Is this nec- essary in the syntax of a subquery?
A. Absolutely not. The indentation is used merely to break the statement into separate parts, making the statement more readable and easier to follow.
Q. Is there a limit on the number of embedded subqueries that can be used in a single query?
A. Limitations such as the number of embedded subqueries allowed and the number of tables joined in a query are specific to each implemen- tation. Some implementations might not have limits, although the use of too many embedded subqueries could drastically hinder SQL statement performance. Most limitations are affected by the actual hardware, CPU speed, and system memory available, although there are many other considerations.
Q. It seems that debugging a query with subqueries can prove to be confusing, especially with embedded subqueries. What is the best way to debug a query
A. The best way to debug a query with subqueries is to evaluate the query in sections. First evaluate the lowest-level subquery, and then work your way to the main query (the same way the database evaluates the query). When you evaluate each subquery individually, you can sub- stitute the returned values for each subquery to check your main query’s logic. An error with a subquery often results from the use of the operator that evaluates the subquery, such as (=),IN,>,<, and so on.
Workshop
The following workshop is composed of a series of quiz questions and practi- cal exercises. The quiz questions are designed to test your overall under- standing of the current material. The practical exercises are intended to afford you the opportunity to apply the concepts discussed during the cur- rent hour, as well as build upon the knowledge acquired in previous hours of study. Please take time to complete the quiz questions and exercises before continuing. Refer to Appendix C, “Answers to Quizzes and Exercises,”
for answers.
Quiz
1. What is the function of a subquery when used with a SELECT statement?
2. Can you update more than one column when using the UPDATE statement in conjunction with a subquery?
3. Do the following have the correct syntax? If not, what is the correct syntax?
a.
SELECT CUST_ID, CUST_NAME FROM CUSTOMER_TBL WHERE CUST_ID =
(SELECT CUST_ID
FROM ORDERS_TBL
WHERE ORD_NUM = ‘16C17’);
b.
SELECT EMP_ID, SALARY FROM EMPLOYEE_PAY_TBL WHERE SALARY BETWEEN ‘20000’
AND (SELECT SALARY FROM EMPLOYEE_ID
WHERE SALARY = ‘40000’);
Workshop 237
c.
UPDATE PRODUCTS_TBL SET COST = 1.15 WHERE CUST_ID =
(SELECT CUST_ID FROM ORDERS_TBL
WHERE ORD_NUM = ‘32A132’);
4. What would happen if you ran the following statement?
DELETE FROM EMPLOYEE_TBL WHERE EMP_ID IN
(SELECT EMP_ID
FROM EMPLOYEE_PAY_TBL);
Exercises
1. Write the SQL code for the requested subqueries, and compare your results to ours. Use the following tables to complete the exercises:
EMPLOYEE_TBL
EMP_ID VARCHAR(9) NOT NULL primary key LAST_NAME VARCHAR(15) NOT NULL
FIRST_NAME VARCHAR(15) NOT NULL MIDDLE_NAME VARCHAR(15)
ADDRESS VARCHAR(30) NOT NULL CITY VARCHAR(15) NOT NULL STATE VARCHAR(2) NOT NULL ZIP INTEGER(5) NOT NULL PHONE VARCHAR(10)
PAGER VARCHAR(10) EMPLOYEE_PAY_TBL
EMP_ID VARCHAR(9) NOT NULL primary key POSITION VARCHAR(15) NOT NULL
DATE_HIRE DATETIME
PAY_RATE DECIMAL(4,2) NOT NULL DATE_LAST_RAISE DATETIME
CONSTRAINT EMP_FK FOREIGN KEY (EMP_ID_ REFERENCES EMPLOYEE_TBL (EMP_ID)
CUSTOMER_TBL
CUST_ID VARCHAR(10) NOT NULL primary key CUST_NAME VARCHAR(30) NOT NULL
CUST_ADDRESS VARCHAR(20) NOT NULL CUST_CITY VARCHAR(15) NOT NULL CUST_STATE VARCHAR(2) NOT NULL CUST_ZIP INTEGER(5) NOT NULL CUST_PHONE INTEGER(10)