Preface [ 5 ] Another widely recognized name in the data warehousing arena is Ralph Kimball. Ralph Kimball is an author on the subject of data warehousing and business intelligence and received a Ph.D. in 1972 from Stanford University in Electrical Engineering specializing in man-machine systems. He is widely regarded as the Guru of Data Warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. Ralph's methodology is also known as dimensional modeling or the Kimball methodology. The similarities between Mr. Inmon and Mr. Kimball are many and so are the differences. The following paradigm statements illustrate just how Mr. Inmon and Mr. Kimball are perceived in the world of Data Warehousing. Bill Inmon's paradigm: The enterprise data warehouse is one part of the overall business intelligence system. An enterprise should have just one data warehouse and one to many data marts. The data marts then source their information from the data warehouse. In the data warehouse, information is stored in third normal form. Ralph Kimball's paradigm: The enterprise data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. There is no right way or wrong way between either of these two ideas. They each represent different data warehousing philosophies. In reality, the data warehouse philosophy used in most enterprises is closer to Ralph Kimball's idea. This is because most data warehouses started out as department level efforts, and as such they originated as an activity specic data mart. Only when more data marts are built later do they evolve into a data warehouse. What is a data warehouse Just what is a data warehouse really? According to Bill Inmon, you know, the famous author of several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process." A data warehouse is typically a relational database that is designed using dimensional modeling and is used for querying and data analysis rather than business transaction processing. It usually contains relevant historical data that is derived from transactional data. The data warehouse separates data analysis overhead from transactional overhead and enables an enterprise to consolidate its data from several sources or activities. This material is copyright and is licensed for the sole use by Paul Corcorran on 5th July 2009 8601 ave. p #1, , lubbock, , 79423 Download at Boykma.Com Preface [ 6 ] In simpler terms an enterprise-wide data warehouse is a centralized data store where integral and mission critical data that is relevant and necessary to the decision making processes of the different business units can be stored and accessed real-time by the various business activities. One of the primary benets of the enterprise data warehouse is the use of—One Number—across the enterprise. This means that what is called a part in one activity is the same part in another activity. Everyone is speaking the same language and is on the same page. Different types of data warehouses In addition to the relational database, an enterprise data warehouse environment often consists of an Extract Transform and Load (ETL) solution, an OLAP engine (hooray Essbase), client analysis tools, and other web or desktop applications that manage the gathering of data and delivering it to business users. There are three types of data warehouses: 1. Enterprise Data Warehouse: An enterprise data warehouse provides a central database for decision support throughout the enterprise. It is recommended that there is only one data warehouse across the enterprise. 2. Operational Data Store: This has a broad enterprise wide scope, but unlike the real enterprise data warehouse, data is refreshed in near real time and used for routine business activity. One of the typical applications of the Operational Data Store (ODS) is to hold the recent data before migration to the data warehouse. Typically, the ODS are not conceptually equivalent to the data warehouse albeit do store the data that have a deeper level of the history than that of the OLTP data. 3. Data Mart: The data mart is a subset of the data warehouse and it supports a particular region, business unit, or business function. The data mart receives its source data from the data warehouse. There can be many data marts sourcing data from the one data warehouse. In case you're wondering, here are a few words about an OLAP solution and an OLTP solution. An OLAP solution stands for On-Line Analytical Processing, which in a nutshell means that the data you are using for your analysis is mainly considered reporting or presentation data and any updates or write-backs are solely for analytical purposes. The source data is rarely updated in this method. This material is copyright and is licensed for the sole use by Paul Corcorran on 5th July 2009 8601 ave. p #1, , lubbock, , 79423 Download at Boykma.Com Preface [ 7 ] The OLTP solution stands for On-Line Transactional Processing which means that the base or source data is directly updated with factual and historical data as an output of the analysis or data entry processes. Conventional straight line reporting can be performed and there is very little, if any, slice-and-dice analysis or what-if scenarios. Data warehouses and data marts are usually built on dimensional data modeling where fact tables are connected with dimension tables. This is most useful for users to access data since a database can be visualized as a cube containing many dimensions. A data warehouse and its smaller, more specic data mart provide an opportunity for slicing and dicing that visualize cube along any one of its dimensions. Data warehouse data modeling As mentioned above, even the so-called masters of the data warehouse have differing ideas as to the data modeling methodology that should be used in a data warehouse. There is general agreement that seem to have the choices narrowed down to just two popular architectures. There is the Third Normal Form and the Dimensional Data Model. Of the two main types of data modeling most popularly used in data warehousing the more common of the two is the Dimensional Data Model. Read on as we briey explain the differences between the two. The Third Normal Form (3NF) The Third Normal Form or 3NF method of database modeling in a nutshell is all about the primary key. What this means is there is no data element in the database that cannot be referenced by the primary key. To achieve 3NF a database must also pass the rst levels on normalization. In the First Normal Form or 1NF the theory is that all of the data in all of the columns must be atomic. This means there can be no sets of data in one column. For instance, a name column that contains both rst and last names has sets of data. It is better to have one column for the rst name and a separate column for the last name. To pass the Second Normal Form or 2NF the data must be 1NF compliant and now must also be more key dependent. Where the 1NF model focuses on the atomic nature of the data the 2NF model is more key dependent. What this means is that data in non-key columns cannot depend on the composite or primary key. This material is copyright and is licensed for the sole use by Paul Corcorran on 5th July 2009 8601 ave. p #1, , lubbock, , 79423 Download at Boykma.Com Preface [ 8 ] Finally there is the Third Normal Form or 3NF which now, on top of organizing the data at the atomic level as well as identifying the data in conjunction with other supporting data, must now be completely primary key dependent. To be 3NF all data in non-key columns must be dependent on the primary key. No more can the data in one column or table be dependent on data in another column or table that is dependent on the primary key. As we said earlier, there is no right or wrong reason to use either data modeling methodology. Both have their merits and their demerits. Being the least popular of the data warehousing data models, the 3NF model is actually the most popular data modeling methodology used in active online transactional processing systems. Ironically, when data is exported from an Essbase cube to a at le for load to a relational database, it more closely resembles a 3NF data model than a Dimensional Data Model. The Dimensional Data Model The Dimensional Data Model is the data modeling methodology most commonly used in data warehousing systems. The Dimensional Data Model differs substantially from the Third Normal Form, more commonly used for transactional systems. As you can imagine, the same data would then be stored much differently in a dimensional model than in a 3NF model. The Dimensional Data Model consists of Fact and Dimension tables. The Fact tables store the numerical values of the business unit and contain numerical or additive measures of the business like Gross Sales, Gross Units. The Fact table also contains columns which link to the Dimension table. The Dimension table stores the descriptive information about the dimension and some times these are joined to other dimension tables to dene the hierarchy of a dimension like Market (Geographical information) or Time information. To understand Dimensional Data Modeling, we'll dene some of the terms commonly used. Pay attention here as you may notice a denite similarity here with the terms used to describe data in an Essbase database • Dimension: A category of information, for example, the Time dimension. The Time dimension would contain data relative to time periods such as days or months or years. • Attribute: A distinct level within a dimension. For example, Year is an attribute in the Time dimension. This material is copyright and is licensed for the sole use by Paul Corcorran on 5th July 2009 8601 ave. p #1, , lubbock, , 79423 Download at Boykma.Com Preface [ 9 ] • Hierarchy: The specication of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year | Quarter | Month | Day. When the data in the data warehouse is modelled using the Dimension Data Model method instead of being organized like the 3NF method, which is in neat rows and columns with primary keys to identify everything, it usually follows the line of the dimensions that are included as necessary components of your data. The resultant structure of the dimensional data method resembles more of a multidimensional cube than two dimensional rows and columns. Where does Essbase t in this Okay, now for the big question. Where does Essbase t in with all this data warehouse mumbo jumbo? Well if you were paying attention a few paragraphs back you would notice that we mentioned that a necessary tool in your enterprise data warehouse toolbox included an OLAP solution. Well, Essbase is it! Essbase is the perfect multidimensional OLAP database tool to use as your function specic reporting and analysis data mart tool. Consider this, if your data is stored in your relational database data warehouse under the Dimensional Data Model methodology what better tool is there that has the power and capability to perform in the multidimensional arena. Essbase is a natural. Consider this, with the proper hardware, Essbase is designed to support even the largest cubes with vast numbers of users so scalability is not an issue. Essbase is also the superior real time analysis and reporting tool that performs complex calculations. It can also be updated from the source database, in this case the data warehouse, quickly and effortlessly and depending on the technology you use for your data warehouse, Essbase can also connect directly to the data warehouse database to draw its data. Knowing all this what other choice is there besides Essbase? Conventions In this book, you will nd a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning. This material is copyright and is licensed for the sole use by Paul Corcorran on 5th July 2009 8601 ave. p #1, , lubbock, , 79423 Download at Boykma.Com . is licensed for the sole use by Paul Corcorran on 5th July 20 09 8601 ave. p #1, , lubbock, , 794 23 Download at Boykma.Com Preface [ 9 ] • Hierarchy: The specication of levels that represents. arena. Essbase is a natural. Consider this, with the proper hardware, Essbase is designed to support even the largest cubes with vast numbers of users so scalability is not an issue. Essbase. is copyright and is licensed for the sole use by Paul Corcorran on 5th July 20 09 8601 ave. p #1, , lubbock, , 794 23 Download at Boykma.Com Preface [ 8 ] Finally there is the Third Normal Form