Joe Celko s SQL for Smarties - Advanced SQL Programming P3 docx

xx CONTENTS 25 Arrays in SQL 575 25.1 Arrays via Named Columns 576 25.2 Arrays via Subscript Columns 580 25.3 Matrix Operations in SQL 581 25.3.1 Matrix Equality 582 25.3.2 Matrix Addition 582 25.3.3 Matrix Multiplication 583 25.3.4 Other Matrix Operations 585 25.4 Flattening a Table into an Array 585 25.5 Comparing Arrays in Table Format 587 26 Set Operations 591 26.1 UNION and UNION ALL 592 26.1.1 Order of Execution 594 26.1.2 Mixed UNION and UNION ALL Operators 595 26.1.3 UNION of Columns from the Same Table 595 26.2 INTERSECT and EXCEPT 596 26.2.1 INTERSECT and EXCEPT without NULLs and Duplicates 599 26.2.2 INTERSECT and EXCEPT with NULLs and Duplicates 600 26.3 A Note on ALL and SELECT DISTINCT 601 26.4 Equality and Proper Subsets 602 27 Subsets 605 27.1 Every nth Item in a Table 605 27.2 Picking Random Rows from a Table 607 27.3 The CONTAINS Operators 612 27.3.1 Proper Subset Operators 612 27.3.2 Table Equality 613 27.4 Picking a Representative Subset 618 28 Trees and Hierarchies in SQL 623 28.1 Adjacency List Model 624 CONTENTS xxi 28.1.1 Complex Constraints 625 28.1.2 Procedural Traversal for Queries 627 28.1.3 Altering the Table 628 28.2 The Path Enumeration Model 628 28.2.1 Finding Subtrees and Nodes 629 28.2.2 Finding Levels and Subordinates 630 28.2.3 Deleting Nodes and Subtrees 630 28.2.4 Integrity Constraints 631 28.3 Nested Set Model of Hierarchies 631 28.3.1 The Counting Property 633 28.3.2 The Containment Property 634 28.3.3 Subordinates 635 28.3.4 Hierarchical Aggregations 636 28.3.5 Deleting Nodes and Subtrees 636 28.3.6 Converting Adjacency List to Nested Set Model 637 28.4 Other Models for Trees and Hierarchies 639 29 Temporal Queries 641 29.1 Temporal Math 642 29.2 Personal Calendars 643 29.3 Time Series 645 29.3.1 Gaps in a Time Series 645 29.3.2 Continuous Time Periods 648 29.3.3 Missing Times in Contiguous Events 652 29.3.4 Locating Dates 656 29.3.5 Temporal Starting and Ending Points 658 29.3.6 Average Wait Times 660 29.4 Julian Dates 661 29.5 Date and Time Extraction Functions 665 29.6 Other Temporal Functions 666 29.7 Weeks 667 29.7.1 Sorting by Weekday Names 669 29.8 Modeling Time in Tables 670 29.8.1 Using Duration Pairs 672 29.9 Calendar Auxiliary Table 673 xxii CONTENTS 29.10 Problems with the Year 2000 675 29.10.1 The Zeros 675 29.10.2 Leap Year 676 29.10.3 The Millennium 677 29.10.4 Weird Dates in Legacy Data 679 29.10.5 The Aftermath 680 30 Graphs in SQL 681 30.1 Basic Graph Characteristics 682 30.1.1 All Nodes in the Graph 682 30.1.2 Path Endpoints 683 30.1.3 Reachable Nodes 683 30.1.4 Edges 684 30.1.5 Indegree and Outdegree 684 30.1.6 Source, Sink, Isolated, and Internal Nodes 685 30.2 Paths in a Graph 686 30.2.1 Length of Paths 687 30.2.2 Shortest Path 687 30.2.3 Paths by Iteration 688 30.2.4 Listing the Paths 691 30.3 Acyclic Graphs as Nested Sets 695 30.4 Paths with CTE 697 30.4.1 Nonacyclic Graphs 703 30.5 Adjacency Matrix Model 705 30.6 Points inside Polygons 706 31 OLAP in SQL 709 31.1 Star Schema 710 31.2 OLAP Functionality 711 31.2.1 RANK and DENSE_RANK 711 31.2.2 Row Numbering 711 31.2.3 GROUPING Operators 712 31.2.4 The Window Clause 714 31.2.5 OLAP Examples of SQL 716 31.2.6 Enterprise-Wide Dimensional Layer 717 CONTENTS xxiii 31.3 A Bit of History 718 32 Transactions and Concurrency Control 719 32.1 Sessions 719 32.2 Transactions and ACID 720 32.2.1 Atomicity 720 32.2.2 Consistency 721 32.2.3 Isolation 721 32.2.4 Durability 722 32.3 Concurrency Control 722 32.3.1 The Five Phenomena 722 32.3.2 The Isolation Levels 724 32.3.3 CURSOR STABILITY Isolation Level 726 32.4 Pessimistic Concurrency Control 726 32.5 SNAPSHOT Isolation: Optimistic Concurrency 727 32.6 Logical Concurrency Control 729 32.7 Deadlock and Livelocks 730 33 Optimizing SQL 731 33.1 Access Methods 732 33.1.1 Sequential Access 732 33.1.2 Indexed Access 732 33.1.3 Hashed Indexes 733 33.1.4 Bit Vector Indexes 733 33.2 Expressions and Unnested Queries 733 33.2.1 Use Simple Expressions 734 33.2.2 String Expressions 738 33.3 Give Extra Join Information in Queries 738 33.4 Index Tables Carefully 740 33.5 Watch the IN Predicate 742 33.6 Avoid UNIONs 744 33.7 Prefer Joins over Nested Queries 745 33.8 Avoid Expressions on Indexed Columns 746 33.9 Avoid Sorting 746 33.10 Avoid CROSS JOINs 750 xxiv CONTENTS 33.11 Learn to Use Indexes Carefully 751 33.12 Order Indexes Carefully 752 33.13 Know Your Optimizer 754 33.14 Recompile Static SQL after Schema Changes 756 33.15 Temporary Tables Are Sometimes Handy 757 33.16 Update Statistics 760 References 761 General References 761 Logic 761 Mathematical Techniques 761 Random Numbers 762 Scales and Measurements 763 Missing Values 763 Regular Expressions 764 Graph Theory 765 Introductory SQL Books 765 Optimizing Queries 766 Temporal Data and the Year 2000 Problem 766 SQL Programming Techniques 768 Classics 768 Forum 769 Updatable Views 769 Theory, Normalization, and Advanced Database Topics 770 Books on SQL-92 and SQL-99 771 Standards and Related Groups 771 Web Sites Related to SQL 772 Statistics 772 Temporal Databases 773 New Citations 774 Index 777 About the Author 810 Introduction to the Third Edition T HIS BOOK, LIKE THE first and second editions before it, is for the working SQL programmer who wants to pick up some advanced programming tips and techniques. It assumes that the reader is an SQL programmer with a year or more of experience. It is not an introductory book, so let’s not have any gripes in the Amazon.com reviews about that, as we did with the prior editions. The first edition was published ten years ago and became a minor classic among working SQL programmers. I have seen copies of this book on the desks of real programmers in real programming shops almost everywhere I have been. The true compliment is the Post-it ® notes sticking out of the top. People really use it often enough to put stickies in it! Wow! 1.1 What Changed in Ten Years Hierarchical and network databases still run vital legacy systems in major corporations. SQL people do not like to admit that Fortune 500 companies have more data in IMS files than in SQL tables. But SQL people can live with that, because we have all the new applications and all the important smaller databases. xxvi INTRODUCTION TO THE THIRD EDITION Object and object-relational databases found niche markets, but never caught on with the mainstream. But OO programming is firmly in place, so object-oriented people can live with that. XML has become the popular data tool du jour as of this writing in 2005. Technically, XML is syntax for describing and moving data from one platform to another, but its support tools allow searching and reformatting. It seems to be lasting longer and finding more users than DIF, EDI, and other earlier attempts at a “Data Esperanto” did in the past. An SQL/XML subcommittee in INCITS H2 (the current name of the original ANSI X3H2 Database Standards Committee) is making sure they can work together. Data warehousing is no longer an exotic luxury reserved for major corporations. Thanks to the declining prices of hardware and software, medium-sized companies now use the technology. Writing OLAP queries is different from writing OLTP queries, and OLAP probably needs its own Smarties book now. Small “pseudo-SQL” products have appeared in the open source arena. Languages such as MySQL are very different in syntax and semantics from Standard SQL, often being little more than a file system interface with borrowed reserved words. However, their small footprint and low cost have made them popular with Web developers. At the same time, full scale, serious SQL databases have become open source. Firebird (http://firebird.sourceforge.net/) has most ANSI SQL-92 features, and it runs on Linux, Microsoft Windows, and a variety of UNIX platforms. Firebird offers optimistic concurrency and language support for stored procedures and triggers. It has been used in production systems (under a variety of names) since 1981, and became open source in 2000. Firebird is the open source version of Borland Software Corporation’s (nèe Inprise Corporation) InterBase product. CA-Ingres became open source in 2004, and Computer Associates offered one million dollars in prize money to anyone who would develop software that would convert existing database code to Ingres. Ingres is one of the best database products ever written, but was a commercial failure due to poor marketing. Postgres is the open-source descendent of the original Ingres project at UC-Berkeley. It has commercial support from Pervasive Software, which also has a proprietary SQL product that evolved from their Btrieve products. The SQL standard has changed over time, but not always for the best. Parts of the standard have become more relational and set-oriented, 1.2 What Is New in This Edition xxvii while other parts have added things that clearly are procedural, deal with nonrelational data and are based on file system models. To quote David McGoveran, “A committee never met a feature it did not like.” In this case, he seems to be quite right. But strangely enough, even with all the turmoil, the ANSI/ISO Standard SQL-92 is still the common subset that will port across various SQL products to do useful work. In fact, the U.S. Government described the SQL-99 standard as “a standard in progress” and required SQL-92 conformance for federal contracts. The reason for the loyalty to SQL-92 is simple. The FIPS-127 conformance test suite was in place during the development of SQL-92, so all the vendors could move in the same direction. Unfortunately, the Clinton administration canceled the program, and conformity began to drift. Michael M. Gorman, the President of Whitemarsh Information Systems Corporation and secretary of INCITS H2 for more than 20 years, has a great essay on this and other political aspects of SQL’s history at www.wiscorp.com; it is worth reading. 1.2 What Is New in This Edition Ten years ago, in the first edition, I tried to stick to the SQL-89 standard and to use only the SQL-92 features that are already used in most implementations. Five years ago, in the second edition, I wrote that it would be years before any vendor had a full implementation of SQL-92, but all products were moving toward that goal. This is still true today, as I write the third edition, but now we are much closer to universal implementations of intermediate and full SQL-92. I now feel brave enough to use some of the SQL-99 features found in current products, while doing most of the work in SQL-92. In the second edition, I dropped some of the theory from the book and moved it to Joe Celko’s Data and Databases: Concepts in Practice (ISBN 1-55860-432-4). I find no reason to add it back into this edition. I have moved and greatly expanded techniques for trees and hierarchies into a separate book ( Joe Celko’s Trees and Hierarchies in SQL for Smarties , ISBN 1-55860-920-2) because there was enough material to justify it. I have included a short mention of some techniques here, but not at the detailed level offered in the other book. I put programming tips for newbies into a separate book ( Joe Celko’s SQL Programming Style , ISBN 1-12088-797-5). This book is an advanced programmer’s book, and I assume that the reader is writing real SQL, not some dialect or his native programming xxviii INTRODUCTION TO THE THIRD EDITION language in a thin disguise. I also assume that he or she can translate Standard SQL into their local dialect without much effort. I have tried to provide comments with the solutions to explain why they work. I hope this will help the reader see underlying principles that can be used in other situations. A lot of people have contributed material, either directly or via newsgroups, and I cannot thank all of them. But I made a real effort to put names in the text next to the code. In case I missed anyone, I got material or ideas from Aaron Bertrand, Alejandro Mesa, Anith Sen, Craig Mullins, Daniel A. Morgan, David Portas, David Cressey, Dawn M. Wolthuis, Don Burleson, Erland Sommarskog, Itzik Ben-Gan, John Gilson, Knut Stolze, Louis Davidson, Michael L. Gonzales of HandsOn- BI LLC, Dan Guzman, Hugo Kornelis, Richard Romley, Serge Rielau, Steve Kass, Tom Moreau, Troels Arvin, and probably a dozen others I am forgetting. 1.3 Corrections and Additions Please send any corrections, additions, suggestions, improvements, or alternative solutions to me or to the publisher. Morgan-Kaufmann Publishers 500 Sansome Street, Suite 400 San Francisco, CA 94111-3211 CHAPTER 1 Database Design T HIS CHAPTER DISCUSSES THE DDL (Data Definition Language), which is used to create a database schema. It is related to the next chapter on the theory of database normalization. Most bad queries start with a bad schema. To get data out of the bad schema, you have to write convoluted code, and you are never sure if it did what it was meant to do. One of the major advantages of databases, relational and otherwise, was that the data could be shared among programs so that an enterprise could use one trusted source for information. Once the data was separated from the programs, we could build tools to maintain, back up, and validate the data in one place, without worrying about hundreds or even thousands of application programs possibly working against each other. SQL has spawned a whole branch of data modeling tools devoted to designing its schemas and tables. Most of these tools use a graphic or text description of the rules and the constraints on the data to produce a schema declaration statement that can be used directly in a particular SQL product. It is often assumed that a CASE tool will automatically prevent you from creating a bad design. This is simply not true. Bad schema design leads to weird queries that are trying to work around the flaws. These flaws can include picking the wrong data . the ANSI/ISO Standard SQL- 92 is still the common subset that will port across various SQL products to do useful work. In fact, the U .S. Government described the SQL- 99 standard as “a standard. progress” and required SQL- 92 conformance for federal contracts. The reason for the loyalty to SQL- 92 is simple. The FIPS-127 conformance test suite was in place during the development of SQL- 92,. programming tips for newbies into a separate book ( Joe Celko s SQL Programming Style , ISBN 1-1 208 8-7 9 7-5 ). This book is an advanced programmer s book, and I assume that the reader is

Định dạng
Số trang	10
Dung lượng	212,35 KB