www.it-ebooks.info Refactoring SQL Applications www.it-ebooks.info Other resources from O’Reilly Related titles The Art of SQL Learning SQL Making Things Happen SQL in a Nutshell SQL Pocket Guide oreilly.com oreilly.com is more than a complete catalog of O’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers interested in open and emerging technologies, including new plat- forms, programming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in documenting the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online reference library for programmers and IT professionals. Conduct searches across more than 1,000 books. Sub- scribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or simply flip to the page you need. Try it today for free. www.it-ebooks.info Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Refactoring SQL Applications Stéphane Faroult with Pascal L’Hermite www.it-ebooks.info Refactoring SQL Applications by Stéphane Faroult with Pascal L’Hermite Copyright © 2008 Stéphane Faroult and Pascal L’Hermite. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Mary Treseler Production Editor: Rachel Monaghan Copyeditor: Audrey Doyle Indexer: Lucie Haskins Cover Designer: Mark Paglietti Interior Designer: Marcia Friedman Illustrator: Robert Romano Printing History: August 2008: First Edition. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Refactoring SQL Applications and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. Java ™ is a trademark of Sun Microsystems, Inc. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. This book uses RepKover™, a durable and flexible lay-flat binding. ISBN: 978-0-596-51497-6 [M] www.it-ebooks.info v CONTENTS PREFACE vii 1 ASSESSMENT 1 A Simple Example 2 Assessing Possible Gains 19 2 SANITY CHECKS 37 Statistics and Data Skewness 38 Indexing Review 44 Parsing and Bind Variables 55 Bulk Operations 70 Transaction Management 73 3 USER FUNCTIONS AND VIEWS 75 User-Defined Functions 76 Views 103 4 TESTING FRAMEWORK 115 Generating Test Data 116 Comparing Alternative Versions 132 5 STATEMENT REFACTORING 147 Execution Plans and Optimizer Directives 148 Analyzing a Slow Query 152 Refactoring the Query Core 158 Rebuilding the Initial Query 176 6 TASK REFACTORING 179 The SQL Mindset 180 Restructuring the Code 185 7 REFACTORING FLOWS AND DATABASES 211 Reorganizing Processing 212 Shaking Foundations 233 www.it-ebooks.info vi CONTENTS 8 HOW IT WORKS: REFACTORING IN PRACTICE 243 Can You Look at the Database? 243 Queries of Death 245 All These Fast Queries 247 No Obvious Very Wrong Query 248 Time to Conclude 249 A SCRIPTS AND SAMPLE PROGRAMS 251 B TOOLS 261 INDEX 269 www.it-ebooks.info vii Chapter Preface Ma, sendo l’intento mio scrivere cosa utile a chi la intende, mi è parso più conveniente andare drieto alla verità effettuale della cosa, che alla immaginazione di essa. But, it being my intention to write a thing which shall be useful to him who apprehends it, it appears to me more appropriate to follow up the real truth of a matter than the imagination of it. —Niccolò Machiavelli Il Principe, XV T HERE IS A STORY BEHIND THIS BOOK.IHAD HARDLY FINISHED THE ART OF SQL, WHICH WASN’TON sale yet, when my then editor, Jonathan Gennick, raised the idea of writing a book about SQL refactoring. SQL, I knew. But I had never heard about refactoring. I Googled the word. In a famous play by Molière, a wealthy but little-educated man who takes lessons in his mature years marvels when he discovers that he has been speaking “prose” for all his life. Like Monsieur Jourdain, I discovered that I had been refactoring SQL code for years without even knowing it—performance analysis for my customers led quite naturally to improving code through small, incremental changes that didn’t alter program behavior. It is one thing to try to design a database as best as you can, and to lay out an architecture and programs that access this database efficiently. It is another matter to try to get the best performance from systems that were not necessarily well designed from the start, or which have grown out of control over the years but that you have to live with. And there was something appealing in the idea of presenting SQL from a point of view that is so often mine in my professional life. The last thing you want to do when you are done with a book is to start writing another one. But the idea had caught my fancy. I discussed it with a number of friends, one of whom is one of the most redoubtable SQL specialists I know. This friend burst into righteous www.it-ebooks.info viii PREFACE indignation against buzzwords. For once, I begged to differ with him. It is true that the idea first popularized by Martin Fowler * of improving code by small, almost insignificant, localized changes may look like a fad—the stuff that fills reports by corporate consultants who have just graduated from university. But for me, the true significance of refactoring lies in the fact that code that has made it to production is no longer considered sacred, and in the recognition that a lot of mediocre systems could, with a little effort, do much better. Refactoring is also the acknowledgment that the fault for unsatisfactory performance is in ourselves, not in our stars—and this is quite a revelation in the corporate world. I have seen too many sites where IT managers had an almost tragic attitude toward perfor- mance, people who felt crushed by fate and were putting their last hope into “tuning.” If the efforts of database and system administrators failed, the only remaining option in their view was to sign and send the purchase order for more powerful machines. I have read too many audit reports by self-styled database experts who, after reformatting the output of system utilities, concluded that a few parameters should be bumped up and that more memory should be added. To be fair, some of these reports mentioned that a couple of ter- rible queries “should be tuned,” without being much more explicit than pasting execution plans as appendixes. I haven’t touched database parameters for years (the technical teams of my customers are usually competent). But I have improved many programs, fearlessly digging into them, and I have tried as much as I could to work with developers, rather than stay in my ivory tower and prescribe from far above. I have mostly met people who were eager to learn and understand, who needed little encouragement when put on the right tracks, who enjoyed developing their SQL skills, and who soon began to set performance targets for themselves. When the passing of time wiped from my memory the pains of book writing, I took the plunge and began to write again, with the intent to expand the ideas I usually try to trans- mit when I work with developers. Database accesses are probably one of the areas where there is the most to gain by improving the code. My purpose in writing this book has been to give not recipes, but a framework to try to improve the less-than-ideal SQL applications that surround us without rewriting them from scratch (in spite of a very strong temptation sometimes). Why Refactor? Most applications bump, sooner or later, into performance issues. In the best of cases, the success of some old and venerable application has led it to handle, over time, volumes of data for which it had never been designed, and the old programs need to be given a new lease on life until a replacement application is rolled out in production. In the worst of cases, performance tests conducted before switching to production may reveal a dismal failure to meet service-level requirements. Somewhere in between, data volume * Fowler, M. et al. Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional. www.it-ebooks.info PREFACE ix increases, new functionalities, software upgrades, or configuration changes sometimes reveal flaws that had so far remained hidden, and backtracking isn’t always an option. All of those cases share extremely tight deadlines to improve performance, and high pressure levels. The first rescue expedition is usually mounted by system engineers and database adminis- trators who are asked to perform the magical parameter dance. Unless some very big mistake has been overlooked (it happens), database and system tuning often improves performance only marginally. At this point, the traditional next step has long been to throw more hardware at the appli- cation. This is a very costly option, because the price of hardware will probably be com- pounded by the higher cost of software licenses. It will interrupt business operations. It requires planning. Worryingly, there is no real guarantee of return on investment. More than one massive hardware upgrade has failed to live up to expectations. It may seem counterintuitive, but there are horror stories of massive hardware upgrades that actually led to performance degradation. There are cases when adding more processors to a machine simply increased contention among competing processes. The concept of refactoring introduces a much-needed intermediate stage between tuning and massive hardware injection. Martin Fowler’s seminal book on the topic focuses on object technologies. But the context of databases is significantly different from the context of application programs written in an object or procedural language, and the differences bring some particular twists to refactoring efforts. For instance: Small changes are not always what they appear to be Due to the declarative nature of SQL, a small change to the code often brings a massive upheaval in what the SQL engine executes, which leads to massive performance changes—for better or for worse. Testing the validity of a change may be difficult If it is reasonably easy to check that a value returned by a function is the same in all cases before and after a code change, it is a different matter to check that the contents of a large table are still the same after a major update statement rewrite. The context is often critical Database applications may work satisfactorily for years before problems emerge; it’s often when volumes or loads cross some thresholds, or when a software upgrade changes the behavior of the optimizer, that performance suddenly becomes unaccept- able. Performance improvement work on database applications usually takes place in a crisis. Database applications are therefore a difficult ground for refactoring, but at the same time the endeavor can also be, and often is, highly rewarding. www.it-ebooks.info [...]... G U R E 1 - 2 Refactoring gains with Oracle 16 CHAPTER ONE www.it-ebooks.info SQL Server Performance Increase Ratio 160 150 140 120 SQL- Oriented Refactoring Code-Oriented Refactoring 100 80 60 40 20 0 le p am t Six x hE le mp le mp a Ex th Fif le mp xa dE on ec S xa dE hir T le Two Column Index Single Column Index No Index mp xa thE r u Fo F I G U R E 1 - 3 Refactoring gains with SQL Server I plotted... and txdate T A B L E 1 - 6 Speed improvement factor with SQL rewriting and function rewriting DBMS Speed improvement MySQL x34 Oracle x16 SQL Server x44 Refactoring, Second Standpoint The preceding change is already a change of perspective: instead of only modifying the code so as to execute fewer SQL statements, I have begun to replace two SQL statements with one I already pointed out that loops are... SQL- Oriented Refactoring t Six xa hE ple le m a Ex fth mp le mp xa dE Fi on ec S Code-Oriented Refactoring xa dE hir T le Two Column Index Single Column Index No Index mp xa thE r u Fo F I G U R E 1 - 1 Refactoring gains with MySQL Oracle Performance Increase Ratio SQL- Oriented Refactoring Code-Oriented Refactoring 75 70 65 60 55 50 45 40 35 30 25 20 15 5 0 Two Column Index le Single Column Index... 1 - 1 Baseline for SimpleExample.java DBMS Baseline result MySQL 11 minutes Oracle 3 minutes SQL Server 5.5 minutes SQL Tuning, the Traditional Way The usual approach at this stage is to forward the program to the in-house tuning specialist (usually a database administrator [DBA]) Very conscientiously, the MySQL DBA will * MySQL 5.1 † SQL Server 2005 and Oracle 11 ASSESSMENT 5 www.it-ebooks.info probably... (accountid, txdate), against MySQL, Oracle, and SQL Server, and measured the performance ratio compared to the initial version The results for FirstExample.java don’t appear explicitly in the figures that follow (Figures 1-1, 1-2, and 1-3), but the “floor” represents the initial run of FirstExample MySQL Performance Increase Ratio 60 55 50 45 40 35 30 25 20 15 5 0 le mp SQL- Oriented Refactoring t Six xa hE... seconds with MySQL, 10 seconds with Oracle, and a little under 9 seconds with SQL Server, improvements by respective factors of 24, 16, and 38 compared to the initial situation (see Table 1-5) T A B L E 1 - 5 Speed improvement factor with a two-column index and function rewriting DBMS Speed improvement MySQL x24 Oracle x16 SQL Server x38 Another possible improvement is hinted at in the MySQL log (as... result in a very small improvement: their cumulative effect makes the MySQL version about 10% faster However, we receive hardly any measurable gain with Oracle and SQL Server (see Table 1-3) T A B L E 1 - 3 Speed improvement factor after index, code cleanup, and no auto-commit DBMS Speed improvement MySQL x3.2 Oracle x1.2 SQL Server x1.3 SQL Tuning, Revisited When one index fails to achieve the result we... Speed MySQL x3.4 Oracle x1.2 SQL Server x1.3 So far, I have taken what I’d call the “traditional approach” of tuning, a combination of some minimal improvement to SQL statements, common-sense use of features such as transaction management, and a sound indexing strategy I will now be more radical, and take two different standpoints in succession Let’s first consider how the program is organized Refactoring, ... not require permission Incorporating a significant amount of example code from this book into your product’s documentation does require permission We appreciate, but do not require, attribution An attribution usually includes the title, author, publisher, and ISBN For example: Refactoring SQL Applications by Stéphane Faroult with Pascal L’Hermite Copyright 2008 Stéphane Faroult and Pascal L’Hermite,... ); 43 st1.close( ); 44 if (rs2 != null) { 45 rs2.close( ); 46 } 47 st2.close( ); 48 st3.close( ); 49 } catch(SQLException ex){ 50 System.err.println("==> SQLException: "); 51 while (ex != null) { 52 System.out.println("Message: " + ex.getMessage ( )); 53 System.out.println("SQLState: " + ex.getSQLState ( )); 54 System.out.println("ErrorCode: " + ex.getErrorCode ( )); 55 ex = ex.getNextException( ); 56 . www.it-ebooks.info Refactoring SQL Applications www.it-ebooks.info Other resources from O’Reilly Related titles The Art of SQL Learning SQL Making Things Happen SQL in. Sebastopol • Taipei • Tokyo Refactoring SQL Applications Stéphane Faroult with Pascal L’Hermite www.it-ebooks.info Refactoring SQL Applications by Stéphane