beginning microsofl sql server 2008 programming phần 5 pdf

Let me reiterate the importance of being sure that your customers are really considering their future needs. The PartNo column using a simple 6-character field is an example of where you might want to be very suspicious. Part numbers are one of those things that people develop new philosophies on almost as often as my teenage daughter develops new taste in clothes. Today’s inventory manager will swear that’s all they ever intend to use and will be sincere in it, but tomorrow there’s a new inventory manager or perhaps your organization merges with another organization that uses a 10-digit numeric part number. Expanding the field isn’t that bad of a conversion, but any kind of conversion carries risks, so you want to get it right the first time. Description is one of those guessing games. Sometimes a field like this is going to be driven by your user interface requirements (don’t make it wider than can be displayed on the screen), other times you’re just going to be truly guessing at what is “enough” space. Here you use a variable-length char over a regular char for two reasons: ❑ To save a little space ❑ So we don’t have to deal with trailing spaces (look at the char vs. varchar data types back in Chapter 1 if you have questions on this) We haven’t used an nchar or nvarchar because this is a simple invoicing system for a U.S. business, and we’re not concerned about localization issues. If you’re dealing with a multilingual scenario, you’ll want to pay much more attention to the Unicode data types. You’ll also want to consider them if you’re storing inherently international information such as URLs, which can easily have kanji and similar characters in them. Weight is similar to Description in that it is going to be somewhat of a guess. We’ve chosen a tinyint here because our products will not be over 255 pounds. Note that we are also preventing ourselves from keeping decimal places in our weight (integers only). As we discussed back under PartNo, make sure you consider your needs carefully — conservative can be great, but being over-conservative can cause a great deal of work later. We described the CustomerNo field back when we were doing the Orders table. CustomerName and CustomerAddress are pretty much the same situation as Description — the ques- tion is, how much is enough? But we need to be sure that we don’t give too much. As before, all fields are required (there will be no nulls in either table) and no defaults are called for. Identity columns also do not seem to fit the bill here as both the customer number and part number have special formats that do not lend themselves to the automatic numbering system that an identity provides. Adding the Relationships OK, to make the diagram less complicated, I’ve gone through all four of my tables and changed the view on them down to just Column Names. You can do this, too, by simply right-clicking on the table and selecting the Column Names menu choice. You should get a diagram that looks similar to Figure 8-28. 255 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 255 Figure 8-28 You may not have the exact same positions for your table, but the contents should be the same. We’re now ready to start adding relationships, but we probably ought to stop and think about what kind of relationships we need. All the relationships that we’ll draw with the relationship lines in our SQL Server diagram tool are going to be one-to-zero, one, or many relationships. SQL Server doesn’t really know how to do any other kind of relationship implicitly. As we discussed earlier in the chapter, you can add things such as unique constraints and triggers to augment what SQL Server will do naturally with relations, but, assuming you don’t do any of that, you’re going to wind up with a one-to-zero, one, or many relationship. The bright side is that this is by far the most common kind of relationship out there. In short, don’t sweat it that SQL Server doesn’t cover every base here. The standard foreign key constraint (which is essentially what your reference line represents) fits the bill for most things that you need to do, and the rest can usually be simulated via some other means. We’re going to start with the central table in our system — the Orders table. First, we’ll look at any relationships that it may need. In this case, we have one — it needs to reference the Customers table. This is going to be a one-to-many relationship with Customers as the parent (the one) and Orders as the child (the many) table. To build the relationship (and a foreign key constraint to serve as the foundation for that relationship), we’re going to simply click and hold in the leftmost column of the Customers table (in the gray area) right where the CustomerNo column is. We’ll then drag to the same position (the gray area) next to the CustomerNo column in the Orders table and let go of the mouse button. SQL Server promptly pops up with the first of two dialogs to confirm the configuration of this relationship. The first, shown in Figure 8-29, confirms which columns actually relate. As I pointed out earlier in the chapter, don’t sweat it if the names that come up don’t match with what you intended — just use the combo boxes to change them back so both sides have CustomerNo in them. Note also that the names don’t have to be the same — keeping them the same just helps ease confusion in situations where they really are the same. 256 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 256 Figure 8-29 Click OK for this dialog, and then also click OK to accept the defaults of the Foreign Key Relationship dialog. As soon as we click OK on the second dialog, we have our first relationship in our new database, as in Figure 8-30. Figure 8-30 Now we’ll just do the same thing for our other two relationships. We need to establish a one-to-many relationship from Orders to OrderDetails (there will be one order header for one or more order details) based on OrderID. Also, we need a similar relationship going from Products to OrderDetails (there will be one Products record for many OrderDetails records) based on ProductID as shown in Figure 8-31. 257 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 257 Figure 8-31 Adding Some Constraints As we were going through the building of our tables and relationships, I mentioned a requirement that we still haven’t addressed. This requirement needs a constraint to enforce it: the part number is formatted as 9A9999 where “9” indicates a numeric digit 0–9 and “A” indicates an alpha (non-numeric) character. Let’s add that requirement now by right-clicking the Products table and selecting Check Constraints to bring up the dialog shown in Figure 8-32. Figure 8-32 It is at this point that we are ready to click Add and define our constraint. To restrict part numbers entered to the format we’ve established, we’re going to need to make use of the LIKE operator: (PartNo LIKE ‘[0-9][A-Z][0-9][0-9][0-9][0-9]’) 258 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 258 This will essentially evaluate each character that the user is trying to enter in the PartNo column of our table. The first character will have to be 0 through 9 , the second A through Z (an alpha), and the next four will again have to be numeric digits (the 0 through 9 thing again). We just enter this into the text box labeled Expression. In addition, we’re going to change the default name for our constraint from CK_Products to CKPartNo, as shown in Figure 8-33. That didn’t take us too long — and we now have our first database that we designed!!! This was, of course, a relatively simple model — but we’ve now done the things that make up perhaps 90 percent or more of the actual data architecture. Figure 8-33 Summary Database design is a huge concept, and one that has many excellent books dedicated to it as their sole subject. It is essentially impossible to get across every database design notion in just a chapter or two. In this chapter, I have, however, gotten you off to a solid start. We’ve seen that data is considered nor- malized when we take it out to the third normal form. At that level, repetitive information has been eliminated and our data is entirely dependent on our key — in short, the data is dependent on: “The key, the whole key, and nothing but the key.” We’ve seen that normalization is, however, not always the right answer — strategic de-normalization of our data can simplify the database for users and speed reporting performance. Finally, we’ve looked at some non-normalization related concepts in our database design, plus how to make use of the built-in diagramming tools to design our database. In our next chapter, we will be taking a very close look at how SQL Server stores information and how to make the best use of indexes. 259 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 259 Exercises 1. Normalize the following data into the third normal form: Patient SSN Physician Hospital Treatment AdmitDate ReleaseDate Sam Spade 555-55- 5555 Albert Schweitzer Mayo Clinic Lobotomy 10/01/2005 11/07/2005 Sally Nally 333-33- 3333 Albert Schweitzer NULL Cortizone Injection 10/10/2005 10/10/2005 Peter Piper 222-22- 2222 Mo Betta Mustard Clinic Pickle Extraction 11/07/2005 11/07/2005 Nicki Doohickey 123-45- 6789 Sheila Sheeze Mustard Clinic Cortizone Injection 11/07/2005 11/07/2005 260 Chapter 8: Being Normal: Normalization and Other Basic Design Issues 57012c08.qxd:WroxBeg 11/25/08 5:36 AM Page 260 9 SQL Ser ver Storage and Index Str uctures Indexes are a critical part of your database planning and system maintenance. They provide SQL Server (and any other database system for that matter) with additional ways to look up data and take shortcuts to that data’s physical location. Adding the right index can cut huge percentages of time off your query executions. Unfortunately, too many poorly planned indexes can actually increase the time it takes for your query to run. Indeed, indexes tend to be one of the most misunderstood objects that SQL Server offers, and therefore, they also tend to be one of the most mismanaged. We will be studying indexes rather closely in this chapter from both a developer’s and an adminis- trator’s point of view, but in order to understand indexes, you also need to understand how data is stored in SQL Server. For that reason, we will also take a look at SQL Server’s data-storage mechanism. SQL Ser ver Storage Data in SQL Server can be thought of as existing in something of a hierarchy of structures. The hierarchy is pretty simple. Some of the objects within the hierarchy are things that you will deal with directly, and will therefore understand easily. A few others exist under the covers, and while they can be directly addressed in some cases, they usually are not. Let’s take a look at them one by one. The Database OK — this one is easy. I can just hear people out there saying, “Duh! I knew that.” Yes, you probably did, but I point it out as a unique entity here because it is the highest level of storage (for a given server). This is also the highest level at which a lock can be established, although you cannot explicitly create a database-level lock. A lock is something of a hold and a place marker that is used by the system. As you do develop- ment using SQL Server — or any other database for that matter — you will find that understanding and managing locks is absolutely critical to your system. 57012c09.qxd:WroxBeg 11/25/08 5:45 AM Page 261 We will be looking into locking extensively in Chapter 14, but we will see the lockability of objects within SQL Server discussed in passing as we look at storage. The Extent An extent is the basic unit of storage used to allocate space for tables and indexes. It is made up of eight contiguous 64KB data pages. The concept of allocating space based on extents, rather than actual space used, can be somewhat diffi- cult to understand for people used to operating system storage principles. The important points about an extent include: ❑ Once an extent is full, the next record will take up not just the size of the record, but the size of a whole new extent. Many people who are new to SQL Server get tripped up in their space esti- mations in part due to the allocation of an extent at a time rather than a record at a time. ❑ By pre-allocating this space, SQL Server saves the time of allocating new space with each record. It may seem like a waste that a whole extent is taken up just because one too many rows were added to fit on the currently allocated extent(s), but the amount of space wasted this way is typically not that much. Still, it can add up — particularly in a highly fragmented environment — so it’s definitely something you should keep in mind. The good news in taking up all this space is that SQL Server skips some of the allocation-time overhead. Instead of worrying about allocation issues every time it writes a row, SQL Server deals with additional space allocation only when a new extent is needed. Don’t confuse the space that an extent is taking up with the space that a database takes up. Whatever space is allocated to the database is what you’ll see disappear from your disk drive’s available-space number. An extent is merely how things are, in turn, allocated within the total space reserved by the database. The Page Much like an extent is a unit of allocation within the database, a page is the unit of allocation within a specific extent. There are eight pages to every extent. A page is the last level you reach before you are at the actual data row. Whereas the number of pages per extent is fixed, the number of rows per page is not — that depends entirely on the size of the row, which can vary. You can think of a page as being something of a container for both table- and index-row data. A row is, in general, not allowed to be split between pages. There are a number of different page types. For purposes of this book, the types we care about are: ❑ Data — Data pages are pretty self-explanatory. They are the actual data in your table, with the exception of any BLOB data that is not defined with the text-in-row option, varchar(max) or varbinary(max). ❑ Index — Index pages are also pretty straightforward: They hold both the non-leaf and leaf level pages (we’ll examine what these are later in the chapter) of a non-clustered index, as well as the non-leaf level pages of a clustered index. These index types will become much clearer as we continue through this chapter. 262 Chapter 9: SQL Server Storage and Index Structures 57012c09.qxd:WroxBeg 11/25/08 5:45 AM Page 262 Page Splits When a page becomes full, it splits. This means more than just a new page being allocated — it also means that approximately half the data from the existing page is moved to the new page. The exception to this process is when a clustered index is in use. If there is a clustered index and the next inserted row would be physically located as the last record in the table, then a new page is created, and the new row is added to the new page without relocating any of the existing data. We will see much more on page splits as we investigate indexes. Rows You have heard much about “row level locking,” so it shouldn’t be a surprise to hear this term. Rows can be up to 8KB. In addition to the limit of 8,060 characters, there is also a maximum of 1,024 standard (non-sparse) columns. In practice, you’ll find it very unusual to run into a situation where you run out of columns before you run into the 8,060 character limit. 1,024 gives you an average column width of just under 8 bytes. For most uses, you’ll easily exceed that average (and therefore exceed the 8,060 characters before the 1,024 columns). The exception to this tends to be in measurement and statistical information — where you have a large number of different things that you are storing numeric samples of. Still, even those applications will find it a rare day when they bump into the 1,024-column-count limit. When you do, you can explore the notion of sparse columns, so let’s look at that. Sparse Columns Sparse columns, in terms of a special data structure, is new with SQL Server 2008. These are meant to deal with the recurring scenario where you have columns that you essentially just need “sometimes.” That is, they are going to be null a high percentage of the time. There are many scenarios where, if you bump into a few of these kinds of columns, you tend to bump into a ton of them. Using sparse columns, you can increase the total number of allowed columns in a single table to 30,000. Internally, the data from columns marked as being sparsely populated is embedded within a single column — allowing a way to break the former limitation of 1,024 columns without major architectural changes. Image, text, ntext, geography, geometry, timestamp, and all user-defined data types are prohibited from being marked as a sparse column. While sparse columns are handled natively by newer versions of the SQL Native Client, other forms of data access will have varying behavior when accessing sparse columns. The sparse property of a column will be transparent when selecting a column by name, but when selecting it using a “*” in the select list, different client access methods will vary between supplying the sparse columns as a unified XML column vs. not showing those columns at all. You’ll want to upgrade your client libraries as soon as reasonably possible. 263 Chapter 9: SQL Server Storage and Index Structures 57012c09.qxd:WroxBeg 11/25/08 5:45 AM Page 263 Sparse columns largely fall under the heading of “advanced topic,” but I do want you to know they are there and can be a viable solution for particular scenarios. Understanding Indexes Webster’s dictionary defines an index as: A list (as of bibliographical information or citations to a body of literature) arranged usually in alphabetical order of some specified datum (as author, subject, or keyword). I’ll take a simpler approach in the context of databases and say it’s a way of potentially getting to data a heck of a lot quicker. Still, the Webster’s definition isn’t too bad — even for our specific purposes. Perhaps the key thing to point out in the Webster’s definition is the word “usually” that’s in there. The definition of “alphabetical order” changes depending on a number of rules. For example, in SQL Server, we have a number of different collation options available to us. Among these options are: ❑ Binary — Sorts by the numeric representation of the character (for example, in ASCII, a space is represented by the number 32, the letter “D” is 68, and the letter “d” is 100). Because everything is numeric, this is the fastest option. Unfortunately, it’s not at all the way in which people think, and can also really wreak havoc with comparisons in your WHERE clause. ❑ Dictionary order — This sorts things just as you would expect to see in a dictionary, with a twist. You can set a number of different additional options to determine sensitivity to case, accent, and character set. It’s fairly easy to understand that if we tell SQL Server to pay attention to case, then “A” is not going to be equal to “a.” Likewise, if we tell it to be case insensitive, then “A” will be equal to “a.” Things get a bit more confusing when you add accent sensitivity. SQL Server pays attention to diacritical marks, and therefore “a” is different from “á,” which is different from “à.” Where many people get even more con- fused is in how collation order affects not only the equality of data, but also the sort order (and, therefore, the way it is stored in indexes). By way of example, let’s look at the equality of a couple of collation options in the following table, and what they do to our sort order and equality information: Collation Order Comparison Values Index Storage Order Dictionary order, case insensitive, accent insensitive (the default) A = a = à = á = â = Ä = ä = Å = å a, A, à, â, á, Ä, ä, Å, åt Dictionary order, case insensitive, accent insensitive, uppercase preference A = a = à = á = â = Ä = ä = Å = å A, a, à, â, á, Ä, ä, Å, å Dictionary order, case sensitive A ≠ a, Ä ≠ ä, Å ≠ å, a ≠ à ≠ á ≠ â ≠ ä ≠ å, A ≠ Ä ≠ Å A, a, à, á, â, Ä, ä, Å, å 264 Chapter 9: SQL Server Storage and Index Structures 57012c09.qxd:WroxBeg 11/25/08 5:45 AM Page 264 [...]... 2 65 57012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 266 Chapter 9: SQL Server Storage and Index Structures 1 6 Root 11 16 1 11 16 2 7 12 17 3 8 13 18 4 9 14 5 Actual Data 6 10 15 Figure 9-1 1 157 53 4 Root 1 53 104 Non-Leaf Level Leaf Level Figure 9-2 266 1 10 20 30 41 53 65 78 90 98 104 110 121 130 140 157 270 410 157 190 210 230 250 261 270 310 3 35 360 380 53 4 600 755 410 430 450 4 75 510 52 1 53 4 54 5 55 7... (Figure 9 -5) 1 157 ••• ••• Root Looking for Records 158 through 400 157 270 410 1 53 104 Non-Leaf Level 1 2 Leaf Level is Data Page 52 Fred Sally ••• ••• Steve 53 54 103 Bob Sam ••• ••• George 104 1 05 156 Bruce Sue ••• ••• ••• 157 158 269 Tom Ashley ••• ••• Ralph 270 271 400 401 Bill ••• ••• Margot Tom 411 412 Mike Nancy Figure 9 -5 271 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 272 Chapter 9: SQL Server. .. 102 157 209 To Clustered Index Point 1 157 ••• ••• 1 53 104 1 2 52 Fred Sally ••• ••• Steve 157 270 410 53 54 102 103 Bob Sam ••• ••• Tim George 157 158 209 269 Tom Ashley ••• Tony ••• Ralph 270 276 401 Bill Russ ••• ••• Tom Figure 9-7 2 75 57012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 276 Chapter 9: SQL Server Storage and Index Structures Clustered Seek: 102 Tim Fred Mike Ralph Steve 102 1 56 74 52 Tim... 1 157 ••• ••• Root 157 270 410 1 53 104 Non-Leaf Level 1 2 Leaf Level 52 Data Pages Looking for Records 158 through 400 4764 05 2362 05 ••• ••• 111903 53 54 100403 236201 ••• ••• 103 2419 05 104 3342 05 1 05 141604 ••• ••• 156 020001 220701 220702 220703 220704 220701 Ralph Ashley Bill ••• ••• 157 141602 158 220702 ••• ••• 269 220701 241901 241902 241903 241904 2419 05 270 220703 271 236204 ••• 400 12 750 4... 274 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 2 75 Chapter 9: SQL Server Storage and Index Structures Select EmployeeID who is FName like “T%” Allison Fred Mike Ralph Steve Root Allison Bill Charlie Diane Ernest Allison Amy ••• ••• Barbara 1 15 27 367 1 15 270 23 361 211 Steve Tom Zach 52 157 99 Bill Bruce ••• ••• Frank Non-Leaf Level Leaf Level 1 15 1 56 74 52 270 104 Steve Sue ••• ••• Tim 52 1 05 171... ••• ••• Barbara 102 27 102 270 23 361 211 209 157 99 1 05 102 Tim Tom Tony ••• ••• Yolanda 102 Tom 157 Tony 209 Clustered Seek: 157 Tom Fred Mike Ralph Steve 157 209 157 1 56 74 52 Tom Bill Charlie Diane Ernest Tom Amy ••• ••• Barbara 157 27 367 157 270 23 361 211 Steve Tom Zach 52 157 99 Bill Bruce ••• ••• Frank 270 104 Steve Sue ••• ••• Tim 52 1 05 171 102 Figure 9-8 Creating , Altering , and Dropping... 30 41 53 65 78 90 98 104 110 121 130 140 157 270 410 157 190 210 230 250 261 270 310 3 35 360 380 53 4 600 755 410 430 450 4 75 510 52 1 53 4 54 5 55 7 57 0 58 8 600 621 641 680 720 755 780 7 95 8 25 847 860 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 267 Chapter 9: SQL Server Storage and Index Structures Page Splits — A First Look All of this works quite nicely on the read side of the equation — it’s the insert... Figure 9-4 270 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 271 Chapter 9: SQL Server Storage and Index Structures Ordered insert as last record in a Cluster Key 1 2 3 4 × 5 5 New record to be inserted but the page is full Since it is last, it is added to an entirely new page without disturbing the existing data Figure 9-4 Navigating the Tree As I’ve indicated previously, even the indexes in SQL Server are... scan is a pretty straightforward process When a table scan is performed, SQL Server starts at the physical beginning of the table, looking through every row in the table As it finds rows that match the criteria of your query, it includes them in the result set 268 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 269 Chapter 9: SQL Server Storage and Index Structures You may hear lots of bad things about... non-sequential inserts When AP0000 25 gets inserted and there 286 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 287 Chapter 9: SQL Server Storage and Index Structures isn’t room on the page, SQL Server is going to see AR000001 in the table and know that it’s not a sequential insert Half the records from the old page will be copied to a new page before AP0000 25 is inserted The overhead of this can be staggering Remember . 9-2 1 53 104 Root Non-Leaf Level 53 4 600 755 1 157 53 4 157 270 410 157 190 210 230 250 261 Leaf Level 410 430 450 4 75 510 52 1 270 310 3 35 360 380 1 10 20 30 41 104 110 121 130 140 53 65 78 90 98 53 4 54 5 55 7 57 0 58 8 755 780 7 95 8 25 847 860 600 621 641 680 720 1 2 3 4 5 Root Actual Data 6 7 8 9 10 11 12 13 14 15 1 6 11 16 16 17 18 266 Chapter. it. 2 65 Chapter 9: SQL Server Storage and Index Structures 57 012c09.qxd:WroxBeg 11/ 25/ 08 5: 45 AM Page 2 65 Figure 9-1 Figure 9-2 1 53 104 Root Non-Leaf Level 53 4 600 755 1 157 53 4 157 270 410 157 190 210 230 250 261 Leaf. ReleaseDate Sam Spade 55 5 -55 - 55 55 Albert Schweitzer Mayo Clinic Lobotomy 10/01/20 05 11/07/20 05 Sally Nally 333-33- 3333 Albert Schweitzer NULL Cortizone Injection 10/10/20 05 10/10/20 05 Peter Piper 222-22- 2222 Mo

Định dạng
Số trang	73
Dung lượng	1,61 MB