Physical Model Implementation Case Study■- 123docz.net

Physical Model Implementation Case Study

Even in literature and art, no man who bothers about originality will ever be original: whereas if you simply try to tell the truth (without caring twopence how often it has been told before) you will, nine times out of ten, become original without ever having noticed it.

—C.S. Lewis When the normalization task is complete, you have the basic structures ready for implementation, but tasks still need to be performed in the process for completing the transformation from the logical to the physical, relational model. Throughout the normalization process, you have produced legal, normalized tables that can be implemented and, using the information that should be produced during the logical modeling phase, are now ready for the ﬁnishing touches that will to turn your theoretical model into something that users (or at least developers!) can start using. At a minimum, between normalization and actual implementation, take plenty of time to review the model to make sure you are completely happy with it.

In this chapter, I’ll take the normalized model and convert it into the final blueprint for the database implementation. Even starting from the same logical model, different people tasked with implementing the relational database will take a subtly (or even dramatically) different approach to the process. The final physical design will always be, to some extent, a reflection of the person/organization who designed it, although usually each of the reasonable solutions “should” resemble one another at its core.

The normalized model you have created is pretty much database agnostic and unaffected by whether the ﬁnal implementation would be on Microsoft SQL Server, Microsoft Access, Oracle, Sybase, or any relational database management system. (You should expect a lot of changes if you end up implementing with a

nonrelational engine, naturally.) However, during this stage, in terms of the naming conventions that are deﬁned, the datatypes chosen, and so on, the design is geared speciﬁcally for implementation on SQL Server 2012. Each of the relational engines has its own intricacies and quirks, so it is helpful to understand how to implement on the system you are tasked with. In this book, we will stick with SQL Server.

We will go through the following steps:

• Choosing names: We’ll look at naming concerns for tables and columns. The biggest thing here is making sure to have a standard and to follow it.

• Choosing key implementation: Throughout the earlier bits of the book, we’ve made several types of key choices. In this section, we will go ahead and ﬁnalize the implementation keys for the model.

• Determining domain implementation: We’ll cover the basics of choosing datatypes, nullability, and simple computed columns. Another decision will be choosing between using a domain table or a column with a constraint for types of values where you want to limit column values to a given set.

• Setting up schemas: This section provides some basic guidance in creating and naming your schemas. Beginning in SQL Server 2005, you could set up groups of tables as schemas that provide groupings of tables for usage, as well as security.

• Adding implementation columns: We’ll consider columns that are common to almost every database that people implement that are not part of the logical design

• Using Data Deﬁnition Language (DDL) to create the database: In this section, we will go through the common DDL that is needed to build most every database you will encounter

• Baseline testing your creation: Because it’s is a great practice to load some data and test your complex constraints, this section offers guidance on how you should approach and implementing testing.

Note

■ For this and subsequent chapters, i’ll assume that you have sQl server 2012 installed on your machine.

For the purposes of this book, i recommend you use the developer Edition, which is available for a small cost from www.microsoft.com/sql/howtobuy/default.aspx. The developer Edition gives you all of the functionality of the Enterprise Edition of sQl server for developing software. it also includes the fully functional Management studio for developing queries and managing your databases. (The Enterprise Evaluation Edition will also work just ﬁne if you don't have any money to spend. Bear in mind that licensing changes are not uncommon, so your mileage may vary.

in any case, there should be a version of sQl server available to you to work through the examples.)

Another possibility is sQl server Express Edition, which is free but doesn’t come with the full complement of features of the developer Edition. For the most part, the feature list is complete enough to use with this book.

i won’t make required use of any of the extended features, but if you’re learning sQl server, you’ll probably want to have the full feature set to play around with. you can acquire the Express Edition in the download section at www.microsoft.com/sql/.

Finally, I’ll work on a complete (if really small) database example in this chapter, rather than continue with any of the examples from previous chapters. The example database is tailored to keeping the chapter simple and to avoiding difﬁcult design decisions, which we will cover in the next few chapters.

The main example in this chapter is based on a simple messaging database that a hypothetical company is building for its upcoming conference. Any similarities to other systems are purely coincidental, and the model is speciﬁcally created not to be overly functional but to be very, very small. The following are the simple requirements for the database:

Messages can be 200 characters of Unicode text. Messages can be sent privately to one

•

user, to everyone, or both. The user cannot send a message with the exact same text more than once per hour (to cut down on mistakes where users click send too often).

171 Users will be identiﬁed by a handle that must be 5–20 characters and that uses their

•

conference attendee numbers and the key value on their badges to access the system.

To keep up with your own group of people, apart from other users, users can connect themselves to other users. Connections are one-way, allowing users to see all of the speakers’ information without the reverse being true.

Figure 6-1 shows the logical database design for this application, on which I’ll base the physical design.

The following is a brief documentation of the tables and columns in the model. I won’t be too speciﬁc with things like datatypes in this list. To keep things simple, I will expound on the needs as we get to each need individually.

Figure 6-1. Simple logical model of conferencing message database

• User: Represents a user of the messaging system, preloaded from another system with attendee information.

• UserHandle: The name the user wants to be known as. Initially pre-loaded with a value based on the persons ﬁrst and last name, plus a integer value, changeable by the user.

• AccessKey: A password-like value given to the users on their badges to gain access.

• AttendeeNumber: The number that the attendees are given to identify themselves, printed on front of their badges.

• TypeOfAttendee: Used to give the user special privileges, such as access to speaker materials, vendor areas, and so on.

• FirstName, LastName: Name of the user printed on badge for people to see.

• UserConnection: Represents the connection of one user to another in order to ﬁlter results to a given set of users.

• UserHandle: Handle of the user who is going to connect to another user.

• ConnectedToUser: Handle of the user who is being connected to.

• Message: Represents a single message in the system.

• UserHandle: Handle of the user sending the message.

• Text: The text of the message being sent.

• RoundedMessageTime: The time of the message, rounded to the hour.

• SentToUserHandle: The handle of the user that is being sent a message.

• MessageTime: The time the message is sent, at a grain of one second.

• MessageTopic: Relates a message to a topic.

• UserHandle: User handle from the user who sent the message.

• RoundedMessgeTime: The time of the message, rounded to the hour.

• TopicName: The name of the topic being sent.

• UserDeﬁnedTopicName: Allows the users to choose the UserDeﬁned topic styles and set their own topics.

• Topic: Predeﬁned topics for messages.

• TopicName: The name of the topic.

• Description: Description of the purpose and utilization of the topics.

Choosing Names

The target database for our model is SQL Server, so our table and column naming conventions must adhere to the rules imposed by this database and generally be consistent and logical. In this section, I’ll brieﬂy cover some of the different concerns when naming tables and columns. All of the system constraints on names have been the same for the past few versions of SQL Server, including 2000, 2005, and 2008.

Names of columns, tables, procedures, and so on are referred to technically as identifiers. Identifiers in SQL Server are stored in a system datatype of sysname. The system defined type named sysname is defined as a 128-character (or less, of course) string using double-byte Unicode characters. SQL Server’s rules for identifier consist of two distinct naming methods:

• Regular identiﬁers: This is the preferred method, with the following rules:

The ﬁrst character must be a letter as deﬁned by Unicode Standard 3.2 (generally

•

speaking, Roman letters A to Z, uppercase and lowercase, although this also includes other letters from other languages) or the underscore character (_). You can ﬁnd the Unicode Standard at www.unicode.org.

Subsequent characters can be Unicode letters, numbers, the “at” sign (

• @), or the

dollar sign ($).

173 The name must not be a SQL Server reserved word. You can ﬁnd a large list of

•

reserved words in SQL Server 2012 Books Online, in the “Reserved Keywords”

section. Some of the keywords won’t cause an error, but it’s better to avoid all keywords if possible. Some of these are tough, like user, transaction, and table, as they do often come up in the real world. (Note that our original model includes the name User, which we will have to correct.)

The name cannot contain spaces.

•

• Delimited identiﬁers: These should have either square brackets ([ ]) or double quotes ("), which are allowed only when the SET QUOTED_IDENTIFIER option is set to on, around the name. By placing delimiters around an object’s name, you can use any string as the name.

For example, [Table Name], [3232 fjfa*&(&^(], or [Drop Database Master] would be legal (but really annoying, dangerous) names. Names requiring delimiters are generally a bad idea when creating new tables and should be avoided if possible, because they make coding more difﬁcult. However, they can be necessary for interacting with data tables in other environments. Delimiters are generally to be used when scripting objects because a name like [Drop Database Master] can cause “problems” if you don’t.

If you need to put a closing brace (]) or even a double quote character in the name, you have to include two closing braces (]]), just like when you need to include a single quote within a string. So, the name fred]olicious would have to be delimited as [fred]]olicious]. However, if you ﬁnd yourself needing to include special characters of any sort in your names, take a good long moment to consider whether you really do need this. If you determine after some thinking that you do, please ask someone else for help naming your objects, or e-mail me at louis@drsql.org. This is a pretty horrible thing to do and will make working with your objects very cumbersome.

Even just including space characters is a bad enough practice that you and your users will regret for years. Note too that [name] and [name ] are treated as different names (see the embedded space). I once had a DBA name a database with a trailing space by accident . . . very annoying.

Note

■ using policy-based management, you can create naming standard checks for whenever a new object is created. Policy-based management is a management tool rather than a design one, though it could pay to create naming standard checks to make sure you don’t accidentally create objects with names you won’t accept. in general, i ﬁnd doing things that way too restrictive, because there are always exceptions to the rules and automated policy enforcement only works with a dictator’s hand. (Think darth Vader, development manager!)

Table Naming

While the rules for creating an object name are pretty straightforward, the more important question is, “What kind of names should be chosen?” The answer is predictable: “Whatever you feel is best, as long as others can read it.” This might sound like a cop-out, but there are more naming standards than there are data architects.

(On the day this paragraph was written, I actually had two independent discussions about how to name several objects and neither person wanted to follow the same standard.) The standard I generally go with is the standard that was used in the logical model, that being Pascal-cased names, little if any abbreviation, and as descriptive as possible. With space for 128 characters, there’s little reason to do much abbreviating (other than extending the life of your keyboard, I would suppose).

Caution

■ Because most companies have existing systems, it’s a must to know the shop standard for naming tables so that it matches existing systems and so that new developers on your project will be more likely to understand your database and get up to speed more quickly. The key thing to make sure of is that you keep your full logical names intact for documentation purposes.

As an example, let’s consider the name of the UserConnection table we will be building later in this chapter.

The following list shows several different ways to build the name of this object:

• user_connection (or sometimes, by some awful mandate, an all-caps version USER_

CONNECTION): Use underscores to separate values. Most programmers aren’t big friends of underscores, because they’re cumbersome to type until you get used to them. Plus, they have a COBOLesque quality that doesn’t please anyone.

• [user connection] or "user connection": This name is delimited by brackets or quotes. As I have already mentioned, this isn’t really favored by anyone who has done any programming, because it’s impossible to use this name when building variables in code, and it’s very easy to make mistakes with them. Being forced to use delimiters is annoying, and many other languages use double quotes to denote strings. (In SQL, you should always uses single quotes!) On the other hand, the brackets [ and ] don’t denote strings, although they are a Microsoft-only convention that will not port well if you need to do any kind of cross-platform programming. Bottom line: delimited names are a bad idea anywhere except perhaps in a SELECT clause for a quickie report.

• UserConnection or userConnection: Pascal or camel case (respectively), using mixed case to delimit between words. I’ll use Pascal style in the examples, because it’s the style I like. (Hey, it’s my book. You can choose whatever style you want!)

• usrCnnct or usCnct: The abbreviated forms are problematic, because you must be careful always to abbreviate the same word in the same way in all your databases. You must maintain a dictionary of abbreviations, or you’ll get multiple abbreviations for the same word—for example, getting “description” as “desc,” “descr,” “descrip,” and/or

“description.”

Choosing names for objects is ultimately a personal choice but should never be made arbitrarily and should be based ﬁrst on existing corporate standards, then existing software, and ﬁnally legibility and readability. The most important thing to try to achieve is internal consistency. Naming, ownership, and datatypes are all things that will drive you nuts when not done consistently, because they keep everyone guessing what will be used next time. Your goal as an architect is to ensure that your users can use your objects easily and with as little thinking about structure as possible. Even most pretty bad naming conventions will be better than having ten different good ones being implemented by warring architect/developer factions. And lest you think I am kidding, in many ways the Cold War was civil compared to the internal politics of database/application design.

Note

■ There is something to be said about the quality of corporate standards as well. if you have an archaic standard, like one that was based on the mainframe team’s standard back in the 19th century, you really need to consider trying to change the standards when creating new databases so you don’t end up with names like HWWG01_TAB_USR_CONCT_T just because the shop standards say so (and yes, i do know when the 19th century was).

175

Naming Columns

The naming rules for columns are the same as for tables as far as SQL Server is concerned. As for how to choose a name for a column—again, it’s one of those tasks for the individual architect, based on the same sorts of criteria as before (shop standards, best usage, and so on). This book follows this set of guidelines:

Other than the primary key, my feeling is that the table name should rarely be included

•

in the column name. For example, in an entity named Person, it isn’t necessary to have columns called PersonName or PersonSocialSecurityNumber. Most columns should not be preﬁxed with the table name other than with the following two exceptions:

A surrogate key such as

• PersonId: This reduces the need for role naming (modifying names of attributes to adjust meaning, especially used in cases where multiple migrated foreign keys exist).

Columns that are naturally named with the entity name in them, such as

•

PersonNumber, PurchaseOrderNumber, or something that’s common in the language of the client and used as a domain-speciﬁc term.

The name should be as descriptive as possible. Use few abbreviations in names, with a

•

couple of notable exceptions:

• Highly recognized abbreviations: As an example, if you were writing a purchasing system and you needed a column for a purchase-order table, you could name the object PO, because this is widely understood. Often, users will desire this, even if some abbreviations don’t seem that obvious.

• Pronounced abbreviations: If a value is read naturally as the abbreviation, then it can be better to use the abbreviation. For example, I always use id instead of identifier, first because it’s a common abbreviation that’s known to most people and second because the surrogate key of the Widget table is naturally pronounced Widget-Eye- Dee, not Widget-Identifier.

Usually, the name should end in a “class” word that distinguishes the main function of the

•

column. This class word gives a general idea of the purpose of the attribute and general expectation of datatype. It should not be the same thing as the datatype—for example:

• StoreId is the identiﬁer for the store.

• UserName is a textual string, but whether or not it is a varchar(30) or nvarchar(128) is immaterial.

• EndDate is the date when something ends and does not include a time part.

• SaveTime is the point in time when the row was saved.

• PledgeAmount is an amount of money (using a numeric(12,2), or money, or any sort of types).

• DistributionDescription is a textual string that is used to describe how funds are distributed.

• TickerCode is a short textual string used to identify a ticker row.

• OptInFlag is a two-value column (possibly three including NULL) that indicates a status, such as in this case if the person has opted in for some particular reason.

Physical Model Implementation Case Study■

The Language of Data Modeling■

Data Protection with Check Constraints and Triggers■