SQL VISUAL QUICKSTART GUIDE- P48 doc

PostgreSQL The function date_part() extracts the spec- ified part of a datetime. current_timestamp returns the current (system) date and time. The standard addition and subtraction oper- ators add and subtract time intervals from a date. Subtracting one date from another yields the number of days between them (Listing 15.50). ✔ Tip ■ An alternative to date_part() is extract() . 450 Chapter 15 Working with Dates Listing 15.50 Working with dates in PostgreSQL. Extract parts of the current datetime. SELECT date_part('second',current_timestamp) AS sec_pt, date_part('minute',current_timestamp) AS min_pt, date_part('hour',current_timestamp) AS hr_pt, date_part('day',current_timestamp) AS day_pt, date_part('month',current_timestamp) AS mon_pt, date_part('year',current_timestamp) AS yr_pt; Add or subtract days, months, and years. SELECT pubdate + INTERVAL '2 DAY' AS p2d, pubdate - INTERVAL '2 DAY' AS m2d, pubdate + INTERVAL '2 MONTH' AS p2m, pubdate - INTERVAL '2 MONTH' AS m2m, pubdate + INTERVAL '2 YEAR' AS p2y, pubdate - INTERVAL '2 YEAR' AS m2y FROM titles WHERE title_id = 'T05'; Count the days between two dates. SELECT date2 - date1 AS days FROM (SELECT pubdate as date1 FROM titles WHERE title_id = 'T05') t1, (SELECT pubdate as date2 FROM titles WHERE title_id = 'T06') t2; Count the months between two dates. SELECT (date_part('year', date2)*12 + date_part('month',date2)) - (date_part('year', date1)*12 + date_part('month',date1)) AS months FROM (SELECT MIN(pubdate) AS date1, MAX(pubdate) AS date2 FROM titles) t1; Listing Calculating a Median The median describes the center of the data as the middle point of n (sorted) values. If n is odd, the median is the observation number (n+1)/2. If n is even, the median is the midpoint (average) of observations n/2 and n/2+1. The examples in this section calculate the median of the column sales in the table empsales (Figure 15.39). The median is 550—the average of the middle two numbers, 500 and 600, in the sorted list. Search online or in advanced SQL books, and you’ll find many standard and DBMS- specific ways to calculate the median. Listing 15.51 shows one way—it uses a self-join and GROUP BY to create a Cartesian product ( e1 and e2 ) without duplicates and then uses HAVING and SUM to find the row (containing the median) where the number of times e1.sales = e2.sales equals (or exceeds) the number of times e1.sales > e2.sales . Like all methods that use standard (or near-standard) SQL, it’s cumbersome, it’s hard to understand, and it runs slowly because it’s difficult to pick the middle value of an ordered set when SQL is about unordered sets. 451 SQL Tricks Calculating a Median emp_id sales E07 300 E08 400 E03 500 E04 500 E06 500 E01 600 E05 700 E10 700 E02 800 E09 900 Figure 15.39 The table empsales, sorted by ascending sales. Listing 15.51 Calculate the median of sales in standard SQL. SELECT AVG(sales) AS median FROM (SELECT e1.sales FROM empsales e1, empsales e2 GROUP BY e1.sales HAVING SUM(CASE WHEN e1.sales = e2.sales THEN 1 ELSE 0 END) >= ABS(SUM(SIGN(e1.sales - e2.sales)))) t1; Listing Median vs. Mean The median is a popular statistic because it’s robust, meaning it’s not affected seriously by extreme high or low values, either legitimate or caused by errors. The arithmetic mean (average), on the other hand, is so sensitive that it can swing wildly with the addition or removal of even a single extreme value. That’s why you see the median applied to skewed (lopsided) distributions such as wealth, house prices, military budgets, and gene expression. The median is also known as the 50th percentile or the second quartile. See also “Finding Extreme Values” later in this chapter. It’s faster and more efficient to calculate the median by using DBMS- specific functions, if available. Listing 15.52 calculates the median in Microsoft SQL Server. Listing 15.53 calculates it in Oracle. The second query in Listing 15.52 also works in DB2. The DB2 SQL Reference, Vol. 2, shows how to create a median procedure by using a cursor (a scrolling marker that steps through rows; not covered in this book). ✔ Tips ■ If you use an alternate method to com- pute the median, make sure it doesn’t eliminate duplicate values during calcu- lations and averages the two middle observations for an even n (rather than just lazily choosing one of them as the median). ■ See also the “Statistics in SQL” sidebar in “Calculating an Average with AVG() ” in Chapter 6. ■ To run Listing 15.51 in Microsoft Access, change the CASE expression to iif(e1.sales = e2.sales, 1, 0) and change SIGN to SGN . 452 Chapter 15 Calculating a Median Listing 15.53 Two ways to calculate the median in Oracle. Works in Oracle 9i and later. SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY sales) AS median FROM empsales; Works in Oracle 10g and later. SELECT median(sales) AS median FROM empsales; Listing Listing 15.52 Two ways to calculate the median in Microsoft SQL Server. The second way (which also works in DB2) is much faster than the first. Works in SQL Server 2000 and later. SELECT ( (SELECT MAX(sales) FROM (SELECT TOP 50 PERCENT sales FROM empsales ORDER BY sales ASC) AS t1) + (SELECT MIN(sales) FROM (SELECT TOP 50 PERCENT sales FROM empsales ORDER BY sales DESC) AS t2) )/2 AS median; Works in SQL Server 2005 and later. Works in DB2. SELECT AVG(sales) AS median FROM (SELECT sales, ROW_NUMBER() OVER (ORDER BY sales) AS rownum, COUNT(*) OVER () AS cnt FROM empsales) t1 WHERE rownum IN ((cnt+1)/2, (cnt+2)/2); Listing Finding Extreme Values Listing 15.54 finds the rows with the highest and lowest values (ties included) of the column advance in the table royalties . Figure 15.40 shows the result. ✔ Tips ■ You also can use the queries in “Limiting the Number of Rows Returned” earlier in this chapter to find extremes, though not both highs and lows in the same query. ■ In Microsoft SQL Server, Oracle, and DB2, you can replicate Listing 15.54 by using the window functions MIN OVER and MAX OVER (Listing 15.55). 453 SQL Tricks Finding Extreme Values Listing 15.54 List the books with the highest and lowest advances. See Figure 15.40 for the result. SELECT title_id, advance FROM royalties WHERE advance IN ( (SELECT MIN(advance) FROM royalties), (SELECT MAX(advance) FROM royalties)); Listing Listing 15.55 List the books with the highest and lowest advances, using window functions. SELECT title_id, advance FROM (SELECT title_id, advance, MIN(advance) OVER () min_adv, MAX(advance) OVER () max_adv FROM royalties) t1 WHERE advance IN (min_adv, max_adv); Listing title_id advance T07 1000000.00 T08 0.00 T09 0.00 Figure 15.40 Result of Listing 15.54. Changing Running Statistics Midstream You can modify values of an in-progress running statistic depending on values in another column. First, review Listing 15.1 in “Calculating Running Statistics” earlier in this chapter. Listing 15.56 calculates the running sum of book sales, ignoring biographies. The scalar subquery computes the running sum, and the inner CASE expression identifies biographies and changes their sales value to NULL , which is ignored by the aggregate function SUM() . (The outer CASE expression merely creates a label column in the result; it’s not part of the running-sum logic.) Figure 15.41 shows the result. 454 Chapter 15 Changing Running Statistics Midstream Listing 15.56 Calculate the running sum of book sales, ignoring biographies. See Figure 15.41 for the result. SELECT t1.title_id, CASE WHEN t1.type = 'biography' THEN '*IGNORED*' ELSE t1.type END AS title_type, t1.sales, (SELECT SUM(CASE WHEN t2.type = 'biography' THEN NULL ELSE t2.sales END) FROM titles t2 WHERE t1.title_id >= t2.title_id) AS RunSum FROM titles t1; Listing title_id title_type sales RunSum T01 history 566 566 T02 history 9566 10132 T03 computer 25667 35799 T04 psychology 13001 48800 T05 psychology 201440 250240 T06 *IGNORED* 11320 250240 T07 *IGNORED* 1500200 250240 T08 children 4095 254335 T09 children 5000 259335 T10 *IGNORED* NULL 259335 T11 psychology 94123 353458 T12 *IGNORED* 100001 353458 T13 history 10467 363925 Figure 15.41 Result of Listing 15.56. ✔ Tips ■ In the inner CASE expression, you can set the value being summed to any number, not only NULL . If you were sum- ming bank transactions, for example, you could make the deposits positive and withdrawals negative. ■ To run Listing 15.51 in Microsoft Access, change the two CASE expressions to iif(t1.type = 'biography', '*IGNORED*', t1.type) and iif(t2.type = 'biography', NULL, t2.sales) . In Oracle and DB2, you can replicate Listing 15.56 by using the window function SUM OVER (Listing 15.57). 455 SQL Tricks Changing Running Statistics Midstream Listing 15.57 Calculate the running sum of book sales, ignoring biographies and using window functions. SELECT title_id, CASE WHEN type = 'biography' THEN '*IGNORED*' ELSE type END AS title_type, sales, SUM(CASE WHEN type = 'biography' THEN NULL ELSE sales END) OVER (ORDER BY title_id, sales) AS RunSum FROM titles; Listing Pivoting Results Pivoting a table swaps its columns and rows, typically to display data in a compact format on a report. Listing 15.58 uses SUM functions and CASE expressions to list the number of books each author wrote (or cowrote). But instead of displaying the result in the usual way (see Listing 6.9 in Chapter 6, for example), like this: au_id num_books A01 3 A02 4 A03 2 A04 4 A05 1 A06 3 A07 0 Listing 15.58 produces a pivoted result: A01 A02 A03 A04 A05 A06 A07 3 4 2 4 1 3 0 456 Chapter 15 Pivoting Results Listing 15.58 List the number of books each author wrote (or cowrote), pivoting the result. SELECT SUM(CASE WHEN au_id='A01' THEN 1 ELSE 0 END) AS A01, SUM(CASE WHEN au_id='A02' THEN 1 ELSE 0 END) AS A02, SUM(CASE WHEN au_id='A03' THEN 1 ELSE 0 END) AS A03, SUM(CASE WHEN au_id='A04' THEN 1 ELSE 0 END) AS A04, SUM(CASE WHEN au_id='A05' THEN 1 ELSE 0 END) AS A05, SUM(CASE WHEN au_id='A06' THEN 1 ELSE 0 END) AS A06, SUM(CASE WHEN au_id='A07' THEN 1 ELSE 0 END) AS A07 FROM title_authors; Listing Listing 15.59 reverses the pivot. The first subquery in the FROM clause returns the unique authors’ IDs. The second subquery reproduces the result of Listing 15.58. ✔ Tip ■ To run Listings 15.58 and 15.59 in Microsoft Access, change the simple CASE expressions to iff functions (for example, change the first CASE expression in Listing 15.58 to iif(au_id = 'A01', 1, 0) ) and change the searched CASE expression to a switch() function (see the DBMS Tip in “Evaluating Conditional Values with CASE ” in Chapter 5). 457 SQL Tricks Pivoting Results Listing 15.59 List the number of books each author wrote (or cowrote), reverse-pivoting the result. SELECT au_ids.au_id, CASE au_ids.au_id WHEN 'A01' THEN num_books.A01 WHEN 'A02' THEN num_books.A02 WHEN 'A03' THEN num_books.A03 WHEN 'A04' THEN num_books.A04 WHEN 'A05' THEN num_books.A05 WHEN 'A06' THEN num_books.A06 WHEN 'A07' THEN num_books.A07 END AS num_books FROM (SELECT au_id FROM authors) au_ids, (SELECT SUM(CASE WHEN au_id='A01' THEN 1 ELSE 0 END) AS A01, SUM(CASE WHEN au_id='A02' THEN 1 ELSE 0 END) AS A02, SUM(CASE WHEN au_id='A03' THEN 1 ELSE 0 END) AS A03, SUM(CASE WHEN au_id='A04' THEN 1 ELSE 0 END) AS A04, SUM(CASE WHEN au_id='A05' THEN 1 ELSE 0 END) AS A05, SUM(CASE WHEN au_id='A06' THEN 1 ELSE 0 END) AS A06, SUM(CASE WHEN au_id='A07' THEN 1 ELSE 0 END) AS A07 FROM title_authors) num_books; Listing Working with Hierarchies A hierarchy ranks and organizes people or things within a system. Each element (except the top one) is a subordinate to a single other element. Figure 15.42 is a tree diagram of a corporate pecking order, with the chief executive officer (CEO) at top, above vice presidents (VP), directors (DIR), and wage slaves (WS). Hierarchical trees come with their own vocabulary. Each element in the tree is a node. Nodes are connected by branches. Two connected nodes form a parent–child relationship (three connected nodes form a grandparent–parent–child relationship, and so on). At the top of the pyramid is the root node (CEO, in this example). Nodes without children are end nodes or leaf nodes (DIR2 and all the WSs). Branch nodes con- nect to leaf nodes or other branch nodes (VP1, VP2, DIR1, and DIR3—think middle management). The table hier (Figure 15.43) represents the tree in Figure 15.42. The table hier has the same structure as the table employees in “Creating a Self-Join” in Chapter 7. Review that section for the basics of using self-joins with hierarchies. ✔ Tip ■ Hierarchies are common in life and data- bases. Most of the books in the “Advanced SQL Books” sidebar at the start of this chapter cover hierarchies in more detail than I do. For an advanced treatment, read Joe Celko’s Trees and Hierarchies in SQL for Smarties by Joe Celko (Morgan Kaufmann). 458 Chapter 15 Working with Hierarchies CEO WS4 WS5WS2 WS3WS1 DIR1 DIR2 DIR3 VP1 VP2 Figure 15.42 An organization chart showing a simple company hierarchy. emp_id emp_title boss_id E01 CEO NULL E02 VP1 E01 E03 VP2 E01 E04 DIR1 E02 E05 DIR2 E02 E06 DIR3 E03 E07 WS1 E04 E08 WS2 E04 E09 WS3 E04 E10 WS4 E06 E11 WS5 E06 Figure 15.43 The result of the query SELECT * FROM hier; . The table hier represents the organization chart in Figure 15.42. Listing 15.60 uses a self-join to list who works for whom. See Figure 15.44 for the result. ✔ Tip ■ To run Listing 15.60 in Microsoft Access and Microsoft SQL Server, change each || to + . In MySQL, use CONCAT() to con- catenate strings. See “Concatenating Strings with || ” in Chapter 5. 459 SQL Tricks Working with Hierarchies Listing 15.60 List the parent–child relationships. See Figure 15.44 for the result. SELECT h1.emp_title || ' obeys ' || h2.emp_title AS power_structure FROM hier h1, hier h2 WHERE h1.boss_id = h2.emp_id; Listing power_structure VP1 obeys CEO VP2 obeys CEO DIR1 obeys VP1 DIR2 obeys VP1 DIR3 obeys VP2 WS1 obeys DIR1 WS2 obeys DIR1 WS3 obeys DIR1 WS4 obeys DIR3 WS5 obeys DIR3 Figure 15.44 Result of Listing 15.60. . near-standard) SQL, it’s cumbersome, it’s hard to understand, and it runs slowly because it’s difficult to pick the middle value of an ordered set when SQL is about unordered sets. 451 SQL Tricks Calculating. in Microsoft Access and Microsoft SQL Server, change each || to + . In MySQL, use CONCAT() to con- catenate strings. See “Concatenating Strings with || ” in Chapter 5. 459 SQL Tricks Working with Hierarchies Listing. 15.52 Two ways to calculate the median in Microsoft SQL Server. The second way (which also works in DB2) is much faster than the first. Works in SQL Server 2000 and later. SELECT ( (SELECT MAX(sales)

Định dạng
Số trang	10
Dung lượng	188,23 KB