Reusable Standard Database Components
One little spark, of inspiration, is at the heart, of all creation. Right at the start, of everything that’s new. One little spark, lights up for you.
— The Sherman Brothers for Disney As we near the end of the database design process, the database is pretty much completed from a design
standpoint. We have spent time looking at performance, concurrency and security patterns that you can follow to help implement the database in a manner that will work well under most any typical OLTP style load. In this chapter (and again somewhat in the next two), we are going to look at applying “finishing touches” to the database that can be used to enhance the user experience and assist with the querying and maintaining of the database. We won’t flesh out all of the details for every concept in this chapter, because a lot of the examples include parts of the system that are decidedly more aligned to the DBA than the architect/programmer (or are just far too large to fit in this book). However, the point I will be making make here is that our goal will be end up with a self contained database container with as much of the database coding and maintenance functionality as possible. In Chapter 9, we introduced the concept of a contained database to help maintain a secure and portable database, and here I will present some additional add-in capabilities and expansion point ideas to add to make your database that much more usable.
The reality of database design is that most databases are rarely cookie-cutter affairs. Most companies, even when they buy a third-party package to implement some part of their business, are going to end up making (in many cases substantial) customizations to the database to fit their needs. If you are starting a very large project, you may even want to look at previous models or perhaps pre-built “universal data models,” such as those in Len Silverston’s series of books, the first of which is The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (Wiley, 2001) (perhaps the only book on database design with a larger title that the book you hold in your hands right now.) Karen Lopez (@datachick on Twitter) frequently speaks on the subject of universal models in the SQL PASS universe of presenters that I am generally involved with. Even these universal models may only be useful as starting points to help you map from your “reality” to a common view of a given sort of business.
In this chapter, however, I want to explore the parts of the database that I find to be useful and almost always the same for every database I create. Not every database will contain all of what I will present, but when I need a common feature I will use the exact same code in every database, with the obvious caveat that I am constantly looking for new ways to improve almost everything I use over time (not to mention it gives me something to write
about when I run out of Lego sets to build). Hence, sometimes a database may use an older version of a feature until it can be upgraded. I will cover the following topics:
• Numbers table: A table of numbers, usually integers that can be used for a number of interesting uses (not many of the mathematics-related).
• Calendar table: A table where every row represents a day, assisting in grouping data for queries, both for reports and operational usage.
• Utility objects: Every programmer has code that they use to make their job easier. Utilities to monitor usage of the system; extended DDL to support operations that aren’t part of the base DDL in T-SQL.
• Logging objects: Utilities to log the actions of users in the database, generally for system management reasons. A common use is an error log to capture when and where errors are occurring.
• Other possibilities: In this section, I will present a list of additional ideas for ways to extend your databases in ways that will give you independent databases that have common implementation.
Not every database can look alike, even two that do almost the exact same thing will rarely be all that alike unless the designer is the same, but following the patterns of implementation we have discussed all throughout the book thus far and the practices we will discuss in this chapter, we can produce databases that are similar enough such that the people supporting your work will have it easy figuring out what you had in mind.
If you are dealing with a third party system where it is forbidden to add any of your own objects, even in a schema that is separated from the shipped schemas, don’t think that everything I am saying here doesn’t apply to you. All of the example code presented supposes a single database approach. In such cases a common approach is to create a companion database where you locate code you need to access their code from the database tier.
Some examples would need to be slightly reworked for that model, but that rework would be minimal.
Note
■ For the examples in this chapter, I am going to use a copy of the Adventureworks2012 database to stick to the supposition of the chapter that we should place the tables in the database with the data you are working with. If you are working with a community version of AdventureWorks2012 that you cannot modify, you can build your own companion database for the examples. I will include a comment in the query to note where the data is specifically from that database. In cases where cross database access will not be trivial, I will note that in the code with a comment.
Numbers Table
A numbers table is a precalculated table of numbers, primarily non-negative integers, which you can use for some purpose. The name “numbers” is pretty open ended, but getting so specific as nonNegativeIntegers is going to get you ridiculed by the other programmers on the playground. In previous editions of the book I have used the name sequence, but with the addition of the sequence object, the name “numbers” was the next best thing. We will use the numbers table when we need to work with data in an ordered manner, particularly a given sequence of numbers. For example, if you needed a list of the top ten products sold and you only sold six, you would have to somehow manufacture four more rows for display. Having a table where you can easily output a sequence of numbers is going to be a very valuable asset at times indeed.
While you can make the numbers table contain any type of numbers you may need, usually it is just a simple table of non-negative integers from 0 to some reasonable limit, where reasonable is more or less how many
565 you find you need. I generally load mine by default up to 99999 (99999 gives you full five digits (and is a very
convenient number for the query I will use to load the table.) With the algorithm I will present, you can easily expand to create a sequence of numbers that is larger than you can store in SQL Server.
There are two really beautiful things behind this concept. First, the table of non-negative integers has some great uses dealing with text data, as well as doing all sorts of math with. Second, you can create additional attributes or even other sequence tables that you can use to represent other sets of numbers that you find useful or interesting. For example:
Even or odd, prime, squares, cubes, and so on
•
Other ranges or even other grains of values, for example, (-1, -.5, 0, .5, 1)
•
Letters of the alphabet
•
In the examples in this section, we will look at several techniques you may find useful, and possibly quite often. The code to generate a simple numbers table of integers is pretty simple; though it looks a bit daunting the first time you see it. It is quite fast to execute in this form, but no matter how fast it may seem, it is not going to be faster than querying from a table that has the sequence of numbers precalculated and stored ahead of time.
;WITH digits (I) AS
(--set up a set of numbers from 0-9 SELECT I
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)) ,integers (I) AS (
SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I) -- + (10000*D5.I) + (100000*D6.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4 --CROSS JOIN digits AS D5 CROSS JOIN digits AS D6
) SELECT I FROM integers ORDER BY I;
This code will return a set of 10,000 rows, as follows:
I --- 0 1 2
… 9998 9999
Uncommenting the code for the D5 and D6 tables will give you an order of magnitude increase for each, up to 999,999 rows. The code itself is pretty interesting (this isn’t a coding book, but it is a really useful technique).
Breaking the code down, you get the following:
;WITH digits (I) AS
(--set up a set of numbers from 0-9 SELECT I
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))
This is just simply a set of ten rows from 0 to 9. The next bit is where the true brilliance begins. (No, I am not claiming I came up with this. I first saw it on Erland Sommarskog’s web site a long time ago, using a technique I will show you in a few pages to split a comma-delimited string.) You cross-join the first set over and over, multiplying each level by a greater power of 10. The result is that you get one permutation for each number. For example, since 0 is in each set, you get one permutation that results in 0. You can see this better in the following smallish set:
;WITH digits (I) AS (--set up a set of numbers from 0-9 SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)) SELECT D1.I AS D1I, 10*D2.I AS D2I, D1.I + (10*D2.I) AS [Sum]
FROM digits AS D1 CROSS JOIN digits AS D2 ORDER BY [Sum];
This returns the following, and you can see that by multiplying the D2.I value by 10, you get the ten’s place repeated, giving you a very powerful mechanism for building a large set. In the full query, each of the additional digit table references have another power of ten in the SELECT clause multiplier, allowing you to create a very large set (rows removed and replaced with . . . for clarity and to save a tree):
D1I D2I Sum
---- ---- ---
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
0 10 10
1 10 11
2 10 12
3 10 13
4 10 14
…
6 80 86
7 80 87
8 80 88
9 80 89
0 90 90
1 90 91
2 90 92
3 90 93
4 90 94
5 90 95
6 90 96
7 90 97
8 90 98
9 90 99
567 This kind of combination of sets is a very useful technique in relational coding. As I said earlier, this isn’t a query book, but I feel it necessary to show you the basics of why this code works, because it is a very good mental exercise. Using the full query, you can create a sequence of numbers that you can use in a query.
So, initially create a simple table named Number with a single column I (because it is a typical value used in math to denote an index in a sequence, such as x,, where the I denotes a sequence of values of x. The primary purpose of the Numbers is to introduce an ordering to a set to assist in an operation.). I will create this table in a schema named Tools to contain the types of tool objects, functions, and procedures we will build in this chapter.
In all likelihood, this is a schema you would grant EXECUTE and SELECT to public and make the tools available to any user you have given ad hoc query access to.
USE AdventureWorks2012;
GO
CREATE SCHEMA Tools;
GO
CREATE TABLE Tools.Number (
I int NOT NULL CONSTRAINT PKTools_Number PRIMARY KEY );
Then I will load it with integers from 0 to 99999.
;WITH digits (I) AS (--set up a set of numbers from 0-9 SELECT I
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)) --builds a table from 0 to 99999
,Integers (I) AS (
SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I) + (10000*D5.I) --+ (100000*D6.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */) INSERT INTO Tools.Number(I)
SELECT I
FROM Integers;
So if you wanted to count the integers between 1 and 1000 (inclusive), it is as simple as:
SELECT COUNT(*) FROM Tools.Number
WHERE I BETWEEN 1 AND 1000;
Of course, that would be a bit simplistic, and there had better be 1000 values between (inclusive) 1 and 1000, but what if you wanted the number of integers between 1 and 1000 that are divisible by 9 or 7?
SELECT COUNT(*) FROM Tools.Number
WHERE I BETWEEN 1 AND 1000 AND (I % 9 = 0 OR I % 7 = 0);
This returns the obvious answer: 238. Sure, a math nerd could sit down and write a formula to do this, but why? And if you find you need these values quite often, you could create a table called Tools.
DivisibleByNineAndSevenNumber, or add columns to the number table called DivisibleByNineFlag and DivisibleBySevenFlag if it were needed in context of integers that were not divisible by 9 or 7. The simple numbers table is the most typical need, but you can make a table of any sorts of numbers that you need (prime numbers? squares? cubes?). The last example of this chapter is an esoteric example of what you can do with a
table of numbers to do some pretty (nerdy) fun stuff with a table of numbers, but for OLTP use, the goal will be (as we discussed in Chapter 5 on normalization) to pre-calculate values only when they are used often and can never change. Numbers-type tables are an excellent candidate for storing pre-calculated values because the set of integer numbers and prime numbers are the same now as back in 300 BC when Euclid was working with them.
In this section, I will present the following uses of the numbers table to get you going:
• Determining the contents of a string: Looking through a string without looping by using the ordered nature of the numbers table to manage the iteration using relational code.
• Determining gaps in a sequence: Having a set of data that contains all of the values allows you to use relational subtraction to find missing values.
• Separating comma-delimited items: Sometimes data is not broken down into scalar values like you desire.
• Stupid mathematic tricks: I take the numbers table to abstract levels, solving a fairly complex math problem that, while not terribly applicable in a practical manner, serves as an experiment to build upon if you have similar, complex problem-solving needs.
Determining the Contents of a String
As a fairly common example usage, it sometimes occurs that a value in a string you are dealing with is giving your code fits, but it isn’t easy to find what the issue is. If you want to look at the Unicode (or ASCII) value for every character in a string, you can do something like the following:
DECLARE @string varchar(20) = 'Hello nurse!';
SELECT Number.I AS Position,
SUBSTRING(split.value,Number.I,1) AS [Character], UNICODE(SUBSTRING(split.value,Number.I,1)) AS [Unicode]
FROM Tools.Number
CROSS JOIN (SELECT @string AS value) AS split WHERE Number.I > 0 --No zeroth position
AND Number.I <= LEN(@string) ORDER BY Position;
This returns the following:
Position Character Unicode
--- --- ---
1 H 72
2 e 101
3 l 108
4 l 108
5 o 111
6 32
7 n 110
8 u 117
9 r 114
10 s 115
11 e 101
12 ! 33
569 This in and of itself is interesting, and sometimes when you execute this, you might see a little square
character that can’t be displayed and a really large/odd Unicode value (like 20012, picking one randomly) that you didn’t expect in your database of English-only words. What really makes the technique awesome is that not only didn’t we have to write a routine to go column by column, we won’t have to do go row by row either.
Using a simple join, you can easily do this for a large number of rows at once, this time joining to a table in the AdventureWorks2012 database that can provide us with an easy example set.
SELECT LastName, Number.I AS position,
SUBSTRING(Person.LastName,Number.I,1) AS [char],
UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]
FROM /*Adventureworks2012.*/ Person.Person JOIN Tools.Number
ON Number.I <= LEN(Person.LastName )
AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL ORDER BY LastName;
This returns 111,969 rows (one for each character in a last name) in only around 3 seconds on a virtual machine hosted on my writing laptop (which is a quite decent Alienware MX11 Core 2 Duo 1.3 GHz; 8 GB; 250 GB, 7200RPM SATA drive; 11.6-inch netbook-sized laptop).
LastName position char Unicode
--- --- --- ---
Abbas 1 A 65
Abbas 2 b 98
Abbas 3 b 98
Abbas 4 a 97
Abbas 5 s 115
Abel 1 A 65
Abel 2 b 98
Abel 3 e 101
Abel 4 l 108
…… … . …
With that set, you could easily start eliminating known safe Unicode values with a simple where clause and find your evil outlier that is causing some issue with some process. For example, you could find all names that include a character not in the normal A–Z, space, comma, or dash characters.
SELECT LastName, Number.I AS Position,
SUBSTRING(Person.LastName,Number.I,1) AS [Char],
UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]
FROM /*Adventureworks2012.*/ Person.Person JOIN Tools.Number
ON Number.I <= LEN(Person.LastName )
AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL --Note I used both a-z and A-Z in LIKE in case of case sensitive AW database WHERE SUBSTRING(Person.LastName, Number.I,1) NOT LIKE '[a-zA-Z ~''~-]' ESCAPE '~' ORDER BY LastName, Position;
This returns the following:
LastName Position Char Unicode
--- --- --- ---
Mart¡nez 5 ¡ 161
This can be a remarkably powerful tool when trying to figure out what data is hurting your application with some unsupported text particularly when dealing with a stream of data from an outside source.
Finding Gaps in a Sequence of Numbers
Another common issue that we have when using a surrogate is that there can be gaps in their values. Ideally, this should not be an issue, but when troubleshooting errors it is often useful to be able to determine the missing numbers in a range. For example, say you have a table with a domain of values between 1 and 10. How might you determine if a value isn’t used? This is fairly simple; you can just do a distinct query on the used values and then check to see what values aren’t used, right? Well how about if you had to find missing values in 20,000+ distinct values? This is not quite going to work if a lot of values aren’t used. For example, consider the Person table in the AdventureWorks2012 database. Running the following query, you can see that not every BusinessEntityID is used.
SELECT MIN(BusinessEntityID) AS MinValue, MAX(BusinessEntityID) AS MaxValue, MAX(BusinessEntityID) - MIN(BusinessEntityID) + 1 AS ExpectedNumberOfRows, COUNT(*) AS NumberOfRows,
MAX(BusinessEntityID) - COUNT(*) AS MissingRows FROM /*Adventureworks2012.*/ Person.Person;
This returns the following:
MinValue MaxValue ExpectedNumberOfRows NumberOfRows MissingRows --- --- --- --- ---
1 20777 20777 19972 805
So we know that there are 805 rows missing between BusinessEntityID values 1 and 20777. To discover these rows, we take a set of values from 1 to 20777 with no gaps, and subtract the rows using the EXCEPT relational operator:
SELECT Number.I FROM Tools.Number
WHERE I BETWEEN 1 AND 20777 EXCEPT
SELECT BusinessEntityID
FROM /*Adventureworks2012.*/ Person.Person;
Execute this query and you will find that there are 805 rows returned. Using the subtraction method with the Numbers table is a very powerful method that you can use in lots of situations where you need to find what isn’t there rather than what is.
571
Separating Comma Delimited Items
My last example that you can translate to a direct business need comes from Erland Sommarskog’s web site (www.sommarskog.se/) on arrays in SQL Server, as well as Aaron Bertrand’s old ASPFAQ web site. Using this code, you can take a comma-delimited list to return it as a table of values (which is the most desirable form for data in SQL Server in case you have just started reading this book on this very page and haven’t learned about normalization yet).
DECLARE @delimitedList varchar(100) = '1,2,3' SELECT SUBSTRING(',' + @delimitedList + ',',I + 1,
CHARINDEX(',',',' + @delimitedList + ',',I + 1) - I - 1) AS value FROM Tools.Number
WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ',' ORDER BY I;
This returns the following:
Value --- 1 2 3
The way this code works is pretty interesting in and of itself. It works by doing a substring on each row. The key is in the WHERE clause.
WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
The first line is there because SUBSTRING starts with position 1. The second limits the rows in Tools.Number to more than the length of the @delimitedList variable. The third includes rows only where the SUBSTRING of the value at the position returns the delimiter, in this case, a comma. So, take the following query:
DECLARE @delimitedList varchar(100) = '1,2,3';
SELECT I
FROM Tools.Number WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ',' ORDER BY I;
Executing this, you will see the following results, showing you the position of each value in the list:
Value --- 1 3 5