Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 42 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
42
Dung lượng
621,78 KB
Nội dung
Giving: PRODUCT LOCATION YEAR AMOUNT Blueberries Pensacola 2005 7650 Blueberries Pensacola 2006 9000 Cotton Pensacola 2005 13600 Cotton Pensacola 2006 16000 Lumber Pensacola 2005 2975 Lumber Pensacola 2006 3500 Then, we will use the MAX aggregate function and a BETWEEN condition on the RHS: SELECT product, location, year, s "Year Max" FROM sales1 WHERE location like 'Pen%' MODEL RETURN UPDATED ROWS PARTITION BY (product) DIMENSION BY (location, year) MEASURES (amount s) IGNORE NAV (s['Pensacola', ANY] = max(s)['Pensacola',year between 2005 and 2006]) ORDER BY product, location, year Giving: PRODUCT LOCATION YEAR Year Max Blueberries Pensacola 2005 9000 Blueberries Pensacola 2006 9000 Cotton Pensacola 2005 16000 Cotton Pensacola 2006 16000 Lumber Pensacola 2005 3500 Lumber Pensacola 2006 3500 We are not constrained to using wildcards on the RHS calculation of aggregates. In this case we controlled which rows would be included in the aggregate using the BETWEEN predicate. 192 The MODEL or SPREADSHEET Predicate in Oracle’s SQL Revisiting CV with Value Offsets —Revisiting CV with Value Offsets — Using Multiple MEASURES ValuesUsing Multiple MEASURES Values We have seen how to use the CV function inside an RHS expression. The CV function copies the value from the LHS and uses it in a calculation. We can also use logical offsets from the current value. For example, “cv()–1” would indicate the current value minus one. Suppose we wanted to calculate the increase in sales for each year, cv(). We will need the sales from the pre - vious year to make the calculation, cv()–1. We will restrict the data for the example; look first at sales in Pensacola: SELECT product, location, year, amount FROM sales1 WHERE location like 'Pen%' ORDER BY product, location, year Giving: PRODUCT LOCATION YEAR AMOUNT Blueberries Pensacola 2005 7650 Blueberries Pensacola 2006 9000 Cotton Pensacola 2005 13600 Cotton Pensacola 2006 16000 Lumber Pensacola 2005 2975 Lumber Pensacola 2006 3500 We will PARTITION BY product in this example and we will DIMENSION BY location and year. We will use two new MEASURES, growth and pct (percent growth). We will calculate with RULES and display the two new values. In the MEASURES clause, we will need the amount value, although it does not appear in the result set. As before, we will alias “amount” as s to simplify the RULES statements. Also, we need to add 193 Chapter | 6 the new result set columns growth and pct, but in the MEASURES clause, they are preceded by a zero so they can be aliased. We will use the RETURN UPDATED ROWS option to limit the output. Here is the query: SELECT product, location, year, growth, pct FROM sales1 WHERE location like 'Pen%' MODEL RETURN UPDATED ROWS PARTITION BY (product) DIMENSION BY (location, year) MEASURES (amount s, 0 growth, 0 pct) IGNORE NAV (growth['Pensacola', year > 2005] = (s[cv(),cv()] - s[cv(),cv()-1]), pct['Pensacola', year > 2005] = (s[cv(),cv()] - s[cv(),cv()-1])/s[cv(),cv()-1]) ORDER BY location, product Giving: PRODUCT LOCATION YEAR GROWTH PCT Blueberries Pensacola 2006 1350 .176470588 Cotton Pensacola 2006 2400 .176470588 Lumber Pensacola 2006 525 .176470588 Let us consider several things in this example. First, we are using “amount” in the calculation although we do not report amount directly. Note the syntax of this RULE: growth['Pensacola', year > 2005] = (s[cv(),cv()] - s[cv(),cv()-1]) The RULE says to compute a value for growth and hence growth appears on the LHS preceding the brackets. The RULE uses location and year to define the rows in the table for which growth will be 194 The MODEL or SPREADSHEET Predicate in Oracle’s SQL computed. Note that the calculation is based on amounts, aliased by s, which appears as the computing value on the RHS before the brackets. Remember that in the original explanation for this RULE: (new_amt['Pensacola', ANY]= new_amt['Pensacola', currentv(amount)]*2) We said: The new_amt on the LHS before the brackets ['Pen ] means that we will compute a value for new_amt. The new_amt on the RHS before the brackets means we will use new_amt values (amount values) to compute the new values for new_amt on the LHS. In this example, we have created a new variable on the LHS (growth) and used the old variable (s)onthe RHS. Syntactically and logically, we must mention both the new variable and the old one in the MEASURES clause. We are not bound to report in the result set the values we use in the MEASURES clause. On the other hand, to use the values in the RULES we have to have defined them in MEASURES. To make the new variable (growth, for example) numeric, we precede the “declaration” of growth with a zero in the MEASURES clause. Another quirk of this RULE: growth['Pensacola', year > 2005] = (s[cv(),cv()] - s[cv(),cv()-1]) is that we have used logical offsets in the calculation. Rather than ask for amounts (s) for calculation of a given growth for a given year, we offset the current value by –1 in the difference expression. What we are saying here is that for a particular year, we will use the 195 Chapter | 6 values for that year and the previous year. So, for 2006 we compute the growth for Pensacola as the “cv(),cv()” minus the “cv(),cv()–1”, which would be (using amount rather than its alias, s): amount('Pensacola',2006) – amount('Pensacola',2005) The other calculation, “pct,” is a bit more complex, but follows the same syntactical logic as the “growth” calculation. We used the alias for amount for a shorthand nota - tion, but the query works just as well and perhaps reads more clearly if we do not use the alias for amount: SELECT product, location, year, growth, pct FROM sales1 WHERE location like 'Pen%' MODEL RETURN UPDATED ROWS PARTITION BY (product) DIMENSION BY (location, year) MEASURES (amount, 0 growth, 0 pct) IGNORE NAV (growth['Pensacola', year > 2005] = (amount[cv(),cv()] - amount[cv(),cv()-1]), pct['Pensacola', year > 2005] = (amount[cv(),cv()] - amount[cv(),cv()-1])/ amount[cv(),cv()-1]) ORDER BY location, product Giving: PRODUCT LOCATION YEAR GROWTH PCT Blueberries Pensacola 2006 1350 .176470588 Cotton Pensacola 2006 2400 .176470588 Lumber Pensacola 2006 525 .176470588 The use of the alias here is a trade-off between under - standability and brevity. 196 The MODEL or SPREADSHEET Predicate in Oracle’s SQL As an aside, this result could have been had with a traditional (albeit arguably more complex) self-join: SELECT a.product, a.location, b.year, b.amount amt2006, a.amount amt2005, b.amount - a.amount growth, (b.amount - a.amount)/a.amount pct FROM sales1 a, sales1 b WHERE a.year = b.year -1 AND a.location LIKE 'Pen%' AND b.location LIKE 'Pen%' AND a.product = b.product ORDER BY product Giving: PRODUCT LOCATION YEAR AMT2006 AMT2005 GROWTH PCT Blueberries Pensacola 2006 9000 7650 1350 .176470588 Cotton Pensacola 2006 16000 13600 2400 .176470588 Lumber Pensacola 2006 3500 2975 525 .176470588 Having developed the example for one location, we can expand the MODEL statement to get the growth vol- ume and percents for all locations using the ANY wildcard and commenting out the WHERE clause of the core query: SELECT product, location, year, growth, pct FROM sales1 WHERE location like 'Pen%' MODEL RETURN UPDATED ROWS PARTITION BY (product) DIMENSION BY (location, year) MEASURES (amount s, 0 growth, 0 pct) IGNORE NAV (growth[ANY, year > 2005] = (s[cv(),cv()] - s[cv(),cv()-1]), pct[ANY, year > 2005] = (s[cv(),cv()] - s[cv(), cv()-1])/s[cv(),cv()-1]) ORDER BY location, product 197 Chapter | 6 Giving: PRODUCT LOCATION YEAR GROWTH PCT Cotton Mobile 2006 2400 .111111111 Lumber Mobile 2006 280 .111111111 Plastic Mobile 2006 3200 .111111111 Blueberries Pensacola 2006 1350 .176470588 Cotton Pensacola 2006 2400 .176470588 Lumber Pensacola 2006 525 .176470588 Perhaps there is a lesson in query development here in that it is easier to see results if the original data is fil - tered before we attempt to compute all values. Ordering of the RHSOrdering of the RHS When a range of cells is in the result set, ordering may be necessary when computing the values of the cells. Consider this derivative table created from previous data and enhanced: Ordered by year ascending: LOCATION PRODUCT AMOUNT YEAR Mobile Cotton 19872 2004 Mobile Cotton 21600 2005 Mobile Cotton 24000 2006 Ordered by year descending: LOCATION PRODUCT AMOUNT YEAR Mobile Cotton 24000 2006 Mobile Cotton 21600 2005 Mobile Cotton 19872 2004 198 The MODEL or SPREADSHEET Predicate in Oracle’s SQL The MODEL statement creates a virtual table from which it calculates results. If the MODEL statement updates the result that appears in the result set, the result calculation may depend on the order in which the data is retrieved. As we know, one can never depend on the order in which data is actually stored in a relational database. Consider the following examples where the RULES are made to give us the sum of the amounts for the previous two years, for either year first, based on different orderings: SELECT product, t, s FROM sales2 MODEL RETURN UPDATED ROWS PARTITION BY (location) DIMENSION BY (product, year t) MEASURES (amount s) (s['Cotton', t>=2005] ORDER BY t asc = sum(s)[cv(),t between cv(t)-2 and cv(t)-1]) ORDER BY product Giving: PRODUCT T S Cotton 2006 39744 Cotton 2005 19872 Note that the PARTITION BY statement is com - mented out, as the table contains only one location and hence partitioning is not necessary. Next, we compute a new value for s based on the sum of other values of s where on the RHS we sum over years cv()–1 and cv()–2. Second, we have added an ordering clause to the LHS to prescribe how we want to compute our new val - ues — ascending by year in this case. 199 Chapter | 6 For ('Cotton',2006), you expect the new value of s to be the sum of the values for 2005 and 2004 (19872 + 21600) = 41472. You expect that the sum for 2005 would be just 2004 because there is no 2003. But instead, we get an odd value for 2006. What is going on here? The problem here is that in the calculation, we need to order the “input” to the RULES. In the above case, we have ordered the year to be ascending on the LHS, so 2005 was calculated first. 2005 was correct as there was no 2003 and so the new value for 2005 was reported as the value for 2004: s['Cotton', t>=2005] = sum(s)[cv(),t between cv(t)-2 and cv(t)-1] Becomes: s['Cotton', 2005] = sum(s)[cv(),t between 2003 and 2004] s['Cotton', 2005] = s['Cotton', 2004] + s['Cotton', 2003] s['Cotton', 2005] = 19872+0=19872 When calculating 2006, the statement becomes: s['Cotton', 2006] = sum(s)[cv(),t between 2004 and 2005] s['Cotton', 2006] = s['Cotton', 2005] + s['Cotton', 2004] But 2005 has been recalculated due to our ordering. So, the calculation for 2006 becomes: s['Cotton', 2005] = 19872 + 19872 = 39744 Now look what happens if the LHS years are in descending order: SELECT product, t, s FROM sales2 MODEL RETURN UPDATED ROWS PARTITION BY (location) DIMENSION BY (product, year t) 200 The MODEL or SPREADSHEET Predicate in Oracle’s SQL MEASURES (amount s) (s['Cotton', t>=2005] ORDER BY t desc = sum(s)[cv(),t between cv(t)-2 and cv(t)-1]) ORDER BY product Gives: PRODUCT T S Cotton 2006 41472 Cotton 2005 19872 We get the correct answers because 2006 is recalcu - lated based on original values for 2005 and 2004. Then, 2005 is recalculated. Because of the ordering problem, in some state- ments where ordering is necessary, we may get an error if no ordering is specified. SELECT product, t, s FROM sales2 MODEL RETURN UPDATED ROWS PARTITION BY (location) DIMENSION BY (product, year t) MEASURES (amount s) (s['Cotton', t>=2005] = ORDER BY t desc = sum(s)[cv(),t between cv(t)-2 and cv(t)-1]) ORDER BY product SQL> / Gives: FROM sales2 * ERROR at line 2: ORA-32637: Self cyclic rule in sequential order MODEL When no ORDER BY clause is specified, you might think that the ordering specified by the DIMENSION should take precedence; however, it is far better to 201 Chapter | 6 [...]... target string); REs are incorporated into new functions in Oracle 10g that have these names: REGEXP_x, where x = INSTR, LIKE, REPLACE, SUBSTR (e.g., REGEXP_INSTR) The new functions may be used in both SQL and PL /SQL 223 Regular Expressions: String Searching and Oracle 10g The four new and improved functions operate on character strings and return the same types as the older counterparts: t REGEXP_INSTR... Rischert, “Inside Oracle Database 10g: Writing Better SQL Using Regular Expressions.” 225 Regular Expressions: String Searching and Oracle 10g REGEXP_INSTR We will begin our exploration of REs using the REGEXP_INSTR function As with INSTR, the function returns a number for the position of matched pattern Unlike INSTR, REGEXP_INSTR cannot work from the end of the string backward The arguments for REGEXP_INSTR... are used in computer languages, e.g., Java, XML, UNIX scripting, and particularly Perl For a programmer who uses REs in a programming language, their use within Oracle will be very similar The conjunction of string searching, REs, Oracle 10g, and POSIX is that in rewriting the “normal” string functions like INSTR, one may use standardized POSIX symbols in REGEXP_INSTR (and other REGEXP_x functions) ... Expressions: String Searching and Oracle 10g For many years, Oracle has supported string functions well (“strings” in Oracle are also known as character or text literals) This chapter presumes familiarity with the “ordinary” string functions, particularly INSTR, LIKE, REPLACE, and SUBSTR A “regular expression” (RE) is a character string (a pattern) that is used to match another string (a search string or target... target string (source string) “Mary has a cold.” Position is the place in S to begin the search for P The default is 1 Example: SELECT REGEXP_INSTR('Mary has a cold','a',3) position FROM dual 2 26 Chapter | 7 Gives: POSITION -7 Since we started in the third position of the search string, the first “a” after that was in the seventh position of the string As mentioned above, Position in REGEXP_INSTR cannot... 10 – 0.395 9 .60 5 New value = = = = 9 .60 5 + (21 – (9 .60 5*9 .60 5)) * 0.005 9 .60 5 + (-71.25) * 0.005 9.05 – 0.3 56 9.248 etc The method relies on the fact that the correction factor approaches the original value and as it gets closer, the correction gets smaller In this technique we have a choice of the correction constant The size of the 215 The MODEL or SPREADSHEET Predicate in Oracle s SQL correction... white paper is available at: http://otn .oracle. com/products/bi/pdf/ 10gr1_twp_bi_dw_sqlmodel.pdf.) Witkowski, A., Bellamkonda, S., Bozkaya, T., Folkert, N., Gupta, A., Sheng, L., Subramanian, S., “Business Modeling Using SQL Spreadsheets,” Oracle Corp., Redwood Shores, CA (paper given at the Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003) 221 This page intentionally left blank Chapter |... the original functions and, like the original functions, there are other arguments that may enhance the use of the function We will define each of the functions in turn, but we will primarily illustrate the function with minimal arguments The regular expressions (REs) are POSIX compliant POSIX stands for the Portable Operating System Interface standardization effort, which is overseen by various international... 0.0000000000001) (x['root'] = x['root'] + ((x['original'] (x['root']*x['root']))*0.1), x['Number of iterations'] = ITERATION_NUMBER + 1) SQL> / 220 Chapter | 6 Gives: (x['root'] = x['root'] + ((x['original'] (x['root']*x['root']))*0.1) * ERROR at line 9: ORA-014 26: numeric overflow References Haydu, John, “The SQL MODEL Clause of Oracle Database 10g, ” Oracle Corp., Redwood Shores, CA, 2003 (A PDF version... statement a bit: (x['root'] = x['root'] + ((x['original'] – (x['root']*x['root']))*0.005) 2 16 Chapter | 6 In this statement, we are saying that in each iteration, we will compute a new value for x['root']: x['root'] = by taking the old value and adding to it 0.005 times the difference between the old value squared and the original value: x['root'] + ((x['original'] – (x['root']*x['root']))*0.005) Unfortunately . product Giving: PRODUCT LOCATION YEAR AMT20 06 AMT2005 GROWTH PCT Blueberries Pensacola 20 06 9000 765 0 1350 .1 764 70588 Cotton Pensacola 20 06 160 00 1 360 0 2400 .1 764 70588 Lumber Pensacola 20 06 3500. product Giving: PRODUCT LOCATION YEAR GROWTH PCT Blueberries Pensacola 20 06 1350 .1 764 70588 Cotton Pensacola 20 06 2400 .1 764 70588 Lumber Pensacola 20 06 525 .1 764 70588 Let us consider several things in. included in the aggregate using the BETWEEN predicate. 192 The MODEL or SPREADSHEET Predicate in Oracle s SQL Revisiting CV with Value Offsets —Revisiting CV with Value Offsets — Using Multiple