22 DeliveringBusinessIntelligence with MicrosoftSQLServer 2008
Manufacturing Automation
The manufacturing automation system tracks the materials used to make each product.
It also stores which products are manufactured on which production lines. Finally, this
system tracks the number of items manufactured during each shift.
The manufacturing automation system uses a proprietary data-storage format. Data
can be exported from the manufacturing automation system to a comma-delimited
text file. This text file serves as the source for loading the manufacturing data into the
business intelligence systems.
Order Processing
The order processing system manages the inventory amounts for all products. It tracks
wholesale orders placed by non–Max Min retailers. The system also records product
amounts sold through the Max Min retail stores and the Max Min online store to
maintain inventory amounts.
The order processing system tracks order fulfillment, including product shipping.
It also generates invoices and handles the payment of those invoices. In addition, this
system records any products returned from the retailer.
The order processing system uses a MicrosoftSQLServer database as its backend.
Point of Sale
The point of sale (POS) system manages the cash registers at each of the five Max
Min–owned retail stores. This system also tracks the inventory at each retail store
using Universal Product Code (UPC) barcode stickers placed on each item. The POS
system handles both cash and credit card transactions. It also tracks information on any
products returned by the customer.
Information from each of the five POS systems is exported to an XML file. This
XML file is transferred nightly, using File Transfer Protocol (FTP), to a central
location. These XML files serve as the source for loading the POS data into the
business intelligence systems.
MaxMin.com
The MaxMin.com online store is an ASP.NET application. It uses SQLServer as its
backend database. All sales through the online store are paid with a credit card. All
customers of the online store must provide name, address, phone number, and e-mail
address with each purchase.
The online store tracks the shipping of orders. It also handles any products returned
by customers. Finally, the online store saves information on product promotions and
discounts that are run on the store site.
Chapter 2: Making the Most of What You’ve Got 23
Accounting
The accounting system tracks all the financial transactions for Max Min, Inc. This includes
the purchase of raw materials for manufacturing. The accounting system uses a SQL
Server database for its backend.
Building the Foundation
In Chapter 3, you will learn more about the foundations of our businessintelligence
systems. We explore possible sources for our businessintelligence data. We also look at
what the structure of those data sources might look like.
This page intentionally left blank
Chapter 3
Seeking the Source—
The Source of Business
Intelligence
In This Chapter
c
Seeking the Source
c
The Data Mart
c
Snowflakes, Stars, and
Analysis Services
26 DeliveringBusinessIntelligence with MicrosoftSQLServer 2008
Planning ahead is a good idea. It wasn’t raining when Noah built the ark.
—Anonymous
I
n the previous chapter, we discussed the various ways businessintelligence can aid
in making effective business decisions. We also looked at the characteristics of the
business intelligence used at different levels within our organizations. Finally, we
were introduced to Maximum Miniatures, Incorporated, the source for all sample data
in this book.
In this chapter, we begin planning the database structures to serve as the source
of our business intelligence. In some cases, we can extract our businessintelligence
information directly from the same database used to store the data from our daily
business operations. In many cases, however, we need to move that data into another
location before we can use it as business intelligence. This “other location” is known as
a data mart.
Seeking the Source
We have seen that businessintelligence is important for effective decision making in
our organizations. This, however, leads to a big question. Just where is this business
intelligence going to come from? Is businessintelligence a form of corporate espionage?
Do we need to send up spy satellites to watch our competitors and tap the phone lines
of our clients? Should we be hiring secret agents to infiltrate our rivals’ facilities? Of
course not!
Does businessintelligence require us to take the pulse of the people? Do we need to
commission large studies of our potential customers? Do we need to conduct a survey
to determine what people are thinking about our products or services? While some
business intelligence may come from customer satisfaction surveys or market research,
the customer’s buying behavior is a better gauge of their tendencies and satisfaction. At
any rate, this is not what we are going to focus on in this book.
The bulk of businessintelligence for most organizations comes from something they
already have: their transactional data.
Transactional Data
Most organizations need to keep track of the things they do to conduct their business.
Orders taken, products produced, services rendered, payments received from clients, and
payments made to vendors are all interactions that usually result in one or more entries
in some type of data store. Each of these interactions is a business transaction, so we
refer to this as transactional data.
Chapter 3: Seeking the Source—The Source of BusinessIntelligence 27
Definition
Transactional data is the information stored to track the interactions, or business transactions, carried out by
an organization.
The business transactions of an organization need to be tracked for that organization
to operate. Payments must be collected for products and services. Payments must be
made for goods and services received. Orders and service requests need to be fulfilled.
In general, the organization needs to keep track of what it has done and what it needs
to do. When these transactions are stored on and managed by computers, we refer to
this as online transaction processing, or OLTP.
Definition
Online transaction processing (OLTP) systems record business interactions as they happen. They support the
day-to-day operation of an organization.
The sum of these transactions stored in OLTP systems is the history of an
organization. This transactional data contains the raw numbers necessary to calculate
the measures we discussed in the previous chapter. Here, then, is the data we need to
create our business intelligence.
Difficulties Using Transactional Data for Business Intelligence
OLTP systems are the treasure chests holding the raw data we need to calculate
measures and create business intelligence. Problems arise, however, when we try to
extract these nuggets of raw data from our OLTP systems. Let’s take a look at some of
the difficulties.
The Nature of the Beast Well-designed OLTP systems are optimized for efficiently
processing and storing transactions. This means breaking data up into small chunks
using the rules of database normalization. This allows OLTP systems to process a
number of transactions at the same time without one transaction getting in another’s
way. Information of this type is best stored in a relational database.
The measures we are using for business intelligence, on the other hand, are not
designed to reflect the events of one transaction, but to reflect the net result of a
number of transactions over a selected period of time. Businessintelligence measures
are often aggregates of hundreds, thousands, or even millions of individual transactions.
Designing a system to provide these aggregates efficiently requires an entirely different
set of optimizations.
28 DeliveringBusinessIntelligence with MicrosoftSQLServer 2008
Definition
An aggregate is a number that is calculated from amounts in many detail records. An aggregate is often the sum
of many numbers, although it can also be derived using other arithmetic operations or even from a count of the
number of items in a group. For example, the total amount invoiced to a client in a given year is the aggregate
sum of all the invoice amounts for that client in the given year.
OLTP systems, because of the way they are designed, are usually not good at delivering
large aggregates. This is not what they were intended to do. We need to look to a different
type of data storage optimization to make these aggregates work efficiently.
Interfering withBusiness Operations OLTP systems are used by our organizations
to support their daily operations. In many cases, the organizations’ operation depends
on the performance of these systems. If the order processing system or the client
management system becomes too bogged down, our organizations can grind to a halt.
We’ve already discussed the fact that OLTP systems are not good at delivering the
aggregates needed for business intelligence. When OLTP systems are called on to
produce such aggregates, they typically use a large amount of processing power and take
a long time to produce a result. It is also possible that a large number of records will
be locked while the aggregate is being produced, rendering those records unavailable
to participate in transactional processing. Either of these two events can have a serious
impact on transactional processing efficiency.
In other words, requiring an OLTP system to create businessintelligence aggregates can
tax the system. This can have a detrimental effect on our organizations’ daily operations.
Archiving Because OLTP systems are concerned with the day-to-day operations,
they aren’t too worried about data from the distant past. These systems may only save
data for a relatively short period of time (and/or the data may only represent the current
state, such as current quantity in stock). The data may be saved for a year, and then
a year-end process may remove it from the database. It may be archived in another
format, a text file or a database backup file, or it might simply be deleted. Whether
deleted or archived, the data is no longer easily accessible.
OLTP systems use this archive process to ensure that the system continues to
operate efficiently. If a transaction table contains too many records, the OLTP system
can become bogged down and begin to operate slowly. Archiving allows an OLTP
system to stay lean and mean.
This archiving causes problems for business intelligence. When we are looking for
trends in our measures, we want to compare last year’s numbers to this year’s numbers.
We may even want to compare numbers over several years of operation. This is hard to
do when the data from past years has been archived or deleted.
Chapter 3: Seeking the Source—The Source of BusinessIntelligence 29
Divided They Stand Our organizations probably use a number of different OLTP
systems to manage different aspects of their operations. One system is used for order
processing, a different system for accounting, another for manufacturing, and still
another for personnel. As we saw in the previous chapter, Maximum Miniatures,
Incorporated has five different systems that can provide data for business intelligence.
Even with the move toward integrated Enterprise Resource Planning (ERP) systems, it
is unlikely that all of an organization’s transactional data will be in one location.
The measures used to provide business intelligence, on the other hand, do not respect
these lines of separation. Instead, they treat the organization as a whole. For example, a
reasonable measure to require is the profit margin for a particular product. To calculate
this measure, we need the list of raw materials from the manufacturing system, the cost
of those materials from the accounting system, the cost of labor required to produce
the product from the time entry system, and the amount paid for the product from the
order processing system. To calculate this type of a measure, then, we need to combine
data across systems to get what we need.
Aside from the necessity for communication between systems, this need to cross
systems leads to another problem. Each of these systems maintains its own set of
product numbering schemes, codes, and calendars. The same product may be known as
“12593” in the manufacturing system and “SD125RDS” in the order processing system.
The payroll system may work on two-week pay periods, while the accounting system
works on fiscal months. When data from these disparate systems is brought together,
we need to find some common ground.
The Data Mart
A number of problems can result when we try to use our organizations’ OLTP systems
as the source for our business intelligence. What we need to do is take the information
stored in these OLTP systems and move it into a different data store. This intermediate
data store can then serve as the source for our measure calculations. We need to store
the data so it is available for our businessintelligence needs somewhere outside of our
OLTP systems. When data is stored in this manner, it is referred to as a data mart.
Definition
A data mart is a body of historical data in an electronic repository that does not participate in the daily
operations of the organization. Instead, this data is used to create business intelligence. The data in the data
mart usually applies to a specific area of the organization.
30 DeliveringBusinessIntelligence with MicrosoftSQLServer 2008
note
In this book, we discuss the creation of data marts, rather than the perhaps more familiar term, data
warehouse. Data warehouses tend to be large, one-stop-shopping repositories where all the historical data
for the organization would be stored. Nothing is wrong with this as a concept; however, attempting to create
a data warehouse often led to huge, multiyear technology projects that were never quite finished or were
outdated when they finally did get done. In this book, we concern ourselves with creating data marts—smaller
undertakings that focus on a particular aspect of an organization.
Features of a Data Mart
Because the data mart is meant to serve as a source for businessintelligence rather than
managing the organization’s day-to-day transactions, it is not designed the same as an
OLTP database. Instead of being built around the rules of normalization, data marts are
built for speed of access. A data mart is still a relational database, but it is designed to
require fewer table joins when data is output for analysis and reporting. In a data mart,
it is acceptable to have data repeated (denormalized) for the sake of speed.
When designing a data mart, the rules of normalization are replaced by a different
method of design organized around “facts.” These new design approaches are called
stars and snowflakes. We discuss stars and snowflakes in the sections “The Star
Schema” and “The Snowflake Schema.” Stars and snowflakes may seem like the stuff
of children’s fantasies, but, in reality, they provide quite grownup and down-to-earth
approaches to creating information that is quick and easy to access.
Not Real-Time Data
OLTP systems store data from business transactions as they occur throughout the day.
Data marts, on the other hand, are updated at set intervals. Data is copied from the
OLTP systems periodically and written to the data mart. This is known as a data load.
Because the data mart exists separately from the OLTP systems, accessing the data
mart for businessintelligence information does not put any stress on the transactional
systems vital to the business’ operation. The only exception to this is during the data
load. During the data load, the OLTP systems may have to work hard to prepare the
data for copying to the data mart. The good news here is the data load is an automated
process that can be scheduled to run during off-peak hours.
As we discussed in Chapter 2, information in a data mart has some latency. In
most cases, some time elapses between the moment a transaction is completed and
the moment when the transaction is copied to the data mart. If a data mart load is
Chapter 3: Seeking the Source—The Source of BusinessIntelligence 31
scheduled to run each month right after the month-end processing, then the data
mart has a latency of one month. If the data load runs nightly, the data mart can have
a latency of up to one day.
The latency of the data mart must be set properly to fulfill the businessintelligence
requirements of that data mart. The information provided by the data mart must be up-
to-date enough to facilitate effective decision making. However, the data loads should
not occur so often that they cause unneeded stress on the OLTP systems.
Consolidation and Cleansing
Data from a number of different OLTP systems may be combined into a single data
mart. This enables us to calculate some complex measures for our business intelligence.
As we discussed earlier, this may also cause problems. Multiple OLTP systems can
have different ways of representing data. Inconsistent data types used for the same data,
dissimilar unique identifiers used for the same entity, and different time periods and
calendar systems can all cause a great deal of difficulty when trying to combine data
from heterogeneous systems.
In fact, problems can even arise when using data from a single system. The business
rules necessary for a meaningful measure calculation may be stricter than those
enforced within the OLTP system itself. If this is the case, some of the data coming
from the OLTP system may not meet the stricter rules. Inconsistencies with data types
and unique identifiers could also exist within the same system if the database has been
poorly designed or poorly maintained.
These problems must be resolved before the data can be stored in the data mart. We
must scrub out all the problem data. To do this, the data is put through a data cleansing
process.
Definition
Data cleansing removes inconsistencies and errors from transactional data so it has the consistency necessary for
use in a data mart.
Data cleansing transforms data into a format that does not cause problems in the
data mart environment. It converts inconsistent data types into a single type. Data
cleansing translates dissimilar identifiers to a standard set of codes for the data mart. In
addition, it repairs or removes any data that does not meet the business rules required
by the measures calculated from this data mart.
Data cleansing is usually done as part of a larger process. This process extracts the
data from the OLTP systems and loads it into the data mart. Thus, the entire procedure
is known as extract, transform, and load—or ETL.
. 22 Delivering Business Intelligence with Microsoft SQL Server 2008
Manufacturing Automation
The manufacturing. entirely different
set of optimizations.
28 Delivering Business Intelligence with Microsoft SQL Server 2008
Definition
An aggregate is a number that