10.7 Manage test data in a shared database
◆ Problem
You want to test against a database, but you do not have access to a dedicated data- base against which to test.
◆ Background
There are a number of reasons why you are stuck sharing a test database.
Consider the cost issues around licenses for the database platform your project uses. It may be quite expensive to procure additional licenses, and management is
15See the end of recipe 10.7, “Manage test data in a shared database” for a discussion of “owning the plug.”
350 CHAPTER 10 Testing and JDBC
usually unclear on the cost/benefit trade-off, preferring to minimize the costs they can easily quantify (the exact price of a license) rather than attempting to mini- mize the costs they have difficulty quantifying (how much less productive the team is because programmers have to share a database), even though the latter cost may well be much higher than the former. We can complain all we want, but until we can provide a definite cost/benefit analysis and convince management of the accuracy of our analysis, things are not likely to change.
There may be political issues of “ownership” surrounding the database. There are those managers who feel that if the database group does not maintain total control over the database then there will be chaos. They are often afraid of pro- grammers telling database administrators how to structure the database, which has the tendency to create tension between the two groups. Past experience has told these managers that, in order to keep the peace, the database group must both own and tightly control the databases. It is unclear whether there is any point to try to overcome this roadblock.
Whatever the reason, you may feel yourself stuck with a shared test database and you would like to know how to deal with it.
◆ Recipe
Before diving into the painful world of bobbing and weaving around data you absolutely cannot destroy, consider some alternatives.
■ Download a free database product and have your own database. You can select from Mimer, MySQL, HSQLDB, and others. There are a number of companies providing free database products (at least for development) that give you the freedom to write the tests you need.
■ Create a separate tablespace or schema in the shared database for your tests. Although you are sharing a database, you are really only sharing disk space; other than that you can do what you need to do. This option has the nice property of forcing your SQL code to be tablespace or schema indepen- dent, which eliminates an implicit dependency on the identity of the logged-in user, making it additionally worth the extra effort.
■ Execute your database tests during off hours, when there is less likelihood of colliding with other testers. Here, we assume that you are sharing a test data- base16 with other programming groups. By staggering your testing times, you
16If your organization lets you test on the production database, frankly it deserves what it gets. Nothing is worth that.
TE AM FL Y
Team-Fly®
351 Manage test data in a shared database
can share a license without incurring the cost of trampling on each others’
data. Restricting yourself to off hours reduces how often you can execute your tests and makes it difficult to write them in the first place.
If you are forced to share a single database, tablespace, and schema with other groups and need to run your tests while they are running theirs, then you do not have much of an option left: every group must ensure that its test data does not collide with any other group’s test data. This means things like “For customer names, we’ll take A through D; and for coffee product IDs we’ll use 000-099.” You will want to make sure to capture each of these decisions in a big chart and make it visible, preferably on a web site. When some rogue programmer causes a con- flict, calmly point them to the web site and politely ask him to be more careful. If he does it a second time....17
◆ Discussion
Although carving up the database into slices is workable, there are a few con- straints to consider:
■ Excluding large classes of data may lead to burying subtle defects—If you are writ- ing tests for retrieving and processing contracts, you may have a special rule for contracts that expire in the first quarter of the year. If your test data con- sists only of contracts from October to December, then you will never test that special rule.
■ You might run out of data—What happens when you want to write a stress test with customers buying from among 1,000 coffee bean products? If you only have 100 product IDs to choose from, then you cannot write that test—or at a minimum, you have to coordinate with the other teams about executing that test, probably only during off-peak hours.
■ You might run out of row IDs—If your database provides IDENTITY columns (also known as auto-increment columns), then you may overflow the next available row ID after executing your tests 10 to 20 times per day every day. Admittedly, sharing a test database may force you to execute your tests less often, but even if you manage to overcome that hurdle, this issue presents another one.
■ You cannot test certain edge cases—How can you verify your code in the presence of an empty table if you cannot empty the table? You can take a mock objects approach (see recipe 10.9, “Test legacy JDBC code without the database”) but not everyone finds that solution satisfying. We do not mind so much.
17http://c2.com/cgi/wiki?RolledUpNewspaper
352 CHAPTER 10 Testing and JDBC
■ Data collisions are notoriously difficult to diagnose—When two tests collide there can only be chaos. The result is nearly impossible to diagnose beyond,
“Someone else is running tests right now.” Whom should you call? How bad is the damage? Those questions are not readily answered, which wastes time and effort that could be spent executing the tests and preventing defects.
We recommend having separate test databases by any means necessary. We feel that it is impossible to understate its importance.
NOTE You need to own the plug—Ward Cunningham wrote about the importance of “owning the plug” in his afterword to Kent Beck’s Sorted Collection, which you can find at http://c2.com/doc/forewords/beck2.html. Ward starts by writing, “While a program expresses intent, it is the computer, the hardware, that brings that intent to life. In order to have full control over your program’s expression you must control the computer that runs it. Therefore: Write your program for a computer with a plug. Should you be dissatisfied with the behavior of the computer, unplug it.” The same is true with databases. In order to be best able to realize code that talks to a database, you need to be able to “unplug”—or destroy—the database. The resulting tests take longer to execute, but you save an unbounded amount of time not having to deal with the problems you incur by sharing a test database with others.
◆ Related
■ 10.9—Test legacy JDBC code without the database