9.11 Modeling a Data Warehouse 207 models (UML), business process models, and semi-structured data using eXtensible Markup Language (XML). Notice in Figure 9.14 how the fourth page, which contains graphics, can be oriented in landscape mode, while the remaining pages are kept in portrait mode. AllFusion ERwin and Rational Data Architect also provide rich reporting features, though Sybase PowerDesigner has the richest capabilities. 9.11 Modeling a Data Warehouse In Chapter 8 we discussed the unique design considerations required for data warehousing and decision support. Typically, warehouses are designed to support complex queries that provide analytic analysis of your data. As such, they exploit different schema topology models, such as star schema and horizontal partitioning. They typically exploit data views and materialized data views, data aggregation, and multidimen- sional modeling far more extensively than other operational and trans- actional databases. Traditionally, warehouses have been populated with data that is extracted and transformed from other operational databases. However, more and more companies are moving to consolidate system resources and provide real-time analytics by either feeding warehouses data in near-real-time (i.e., with a few minutes latency) or entirely merging their transactional data stores with their analytic warehouses into a single server or cluster. These trends are known as “active data warehousing,” and pose even more complex design challenges. There is a vast need for CASE tooling in this space. Sybase offers a CASE tool known as Sybase Industry Warehouse Stu- dio (IWS). Sybase IWS is really a set of industry-specific, prepackaged warehouses that require some limited customization. Sybase IWS tooling provides a set of wizards for designing star schemas, dimensional tables, denormalization, summarization, and partitioning; as usual, the Sybase tools are strong on reporting facilities. The industry domains covered by ISW are fairly reasonable—they include IWS for Media, IWS for Healthcare, IWS for Banking, IWS for Capital Markets, IWS for Life Insurance, IWS for Telco, IWS for Credit Cards, IWS fro P&C Insurance, and IWS for CRA. IBM’s DB2 Cube Views (shown in Figure 9.15) provides OLAP and multidimensional modeling. DB2 Cube Views allows you to create meta- data objects to dimensionally model OLAP structures and relational data. The graphical interface allows you to create, manipulate, import, or export cube models, cubes, and other metadata objects. Teorey.book Page 207 Saturday, July 16, 2005 12:57 PM 208 CHAPTER 9 CASE Tools for Logical Database Design Sybase IWS uses standard database design constructs that port to many database systems, such as DB2 UDB, Oracle, Microsoft SQL Server, Sybase Adaptive Server Enterprise, and Sybase IQ. In contrast, IBM’s DB2 Cube Views is designed specifically to exploit DB2 UDB. The advantage of DB2 Cube View is that it can exploit product-specific capabilities in the DB2 database that may not be generally available in other databases. Some examples of this include materialized query tables (precomputed aggregates and cubes), multidimensional clustering, triggers, functional dependencies, shared-nothing partitioning, and replicated MQTs. Sybase IWS dependence on the lowest common denominator database feature provides flexibility when selecting the database server but may prove extremely limiting for even moderately sized marts and warehouses (i.e., larger than 100 GB), where advanced access and design features become critical. To summarize and contrast, Sybase offers portable warehouse designs that require minimal customization and are useful for smaller systems, and DB2 Cube View provides significantly richer and more powerful capabilities, which fit larger systems, require more customization, and necessitate DB2 UDB as the database server. Figure 9.15 DB2 Cube Views interface (courtesy IBM Rational Division) Teorey.book Page 208 Saturday, July 16, 2005 12:57 PM 9.12 Semi-Structured Data, XML 209 AllFusion ERwin Data Modeler has basic support to model OLAP and multidimensional databases, but does not have the same richness of tooling and wizards that the other companies offer to actually substan- tially simplify the design process of these complex systems. 9.12 Semi-Structured Data, XML XML (eXtensible Markup Language) is a data model consisting of nodes of several types linked together with ordered parent/child relationships to form a hierarchy. One representation of that data model is textual— there are others that are not text! XML has increasingly become a data format of choice for data sharing between systems. As a result, increas- ing volumes of XML data are being generated. While XML data has some structure it is not a fully-structured for- mat, such as the table definitions that come from a fully-structured modeling using ER with IE or UML. XML is known in the industry as a semi-structured format: It lacks the strict adherence of schema that structured data schemas have, yet it has some degree of structure which distinguishes it from completely unstructured data, such as image and video data. Standards are forming around XML to allow it to be used for data- base style design and query access. The dominant standards are XML Schema and XML Query (also known as XQuery). Also worth noting is OMG XMI standard, which defines a standard protocol for defining a structured format for XML interchange, based on an object model. Pri- marily for interfacing reasons, UML tools such as MagicDraw have taken XMI seriously and have therefore become the preferred alternatives in the open source space. XML data is text-based, and self-describing (meaning that XML described the type of each data point, and defines its own schema). XML Figure 9.16 An XML schema for a recipe Teorey.book Page 209 Saturday, July 16, 2005 12:57 PM 210 CHAPTER 9 CASE Tools for Logical Database Design has become popular for Internet-based data exchange based on these qualities as well as being “well-formed.” Well-formed is a computer sci- ence term, implying XML’s grammar is unambiguous through the use of mandated structure that guarantees terms are explicitly prefixed and closed. Figure 9.16 shows the conceptual design of a semi-structured document type named “recipe.” Figure 9.17 shows an XML document for a hot dog recipe. Notice that the file is completely textual. IBM Rational Data Architect and Sybase PowerDesigner have taken the lead in being early adopters of XML data modeling CASE tools. Both products support the modeling of semi-structured data through XML and provide graphical tooling for modeling XML hierarchies. Figure 9.17 An XML document for a hot dog Teorey.book Page 210 Saturday, July 16, 2005 12:57 PM 9.13 Summary 211 9.13 Summary There are several good CASE tools available for computer-assisted data- base design. This chapter has touched on some of the features for three of the leading products: IBM Rational Data Architect, Computer Associ- ates AllFusion ERwin Data Modeler, and Sybase PowerDesigner. Each provides powerful capabilities to assist in developing ER models and transforming those models to logical database designs and physical implementations. All of these products support a wide range of database vendors, including DB2 UDB, DB2 zOS, Informix Data Server (IDS), Ora- cle, SQL Server, and many others through ODBC support. Each product has different advantages and strengths. The drawbacks a product may have now are certain to be improved over time, so discussing the relative merits of each product in a book can be somewhat of an injustice to a product that will deliver improved capabilities in the near future. At the time of authoring this text, Computer Associate’s AllFusion ERwin Data Modeler had advantages as a mature product with vast data- base support. The AllFusion products don’t have the advanced complex feature support for XML and warehousing/analytics, but what they do support they do well. Sybase PowerDesigner sets itself apart for superior reporting capabilities. IBM’s Rational Data Architect has the best integra- tion with a broad software application development suite of tooling, and the most mature use of UML. Both the Sybase and IBM tools are blazing new ground in their early support for XML semi structured data and for CASE tools for warehousing and OLAP. The best products provide the highest level of integration into a larger software development environ- ment for large-scale collaborative, and possible geographically diverse, development. These CASE tools can dramatically reduce the time, cost, and complexity of developing, deploying, and maintaining a database design. 9.14 Literature Summary Current logical database design tools can be found in manufacturer Web sites [Database Answers, IBM Rational Software, Computer Associates, Sybase PowerDesigner, Directroy of Data Modeling Resources, Objects by Design, Understanding relational databases: referential integrity, and Widom]. Teorey.book Page 211 Saturday, July 16, 2005 12:57 PM . 16, 2005 12:57 PM 208 CHAPTER 9 CASE Tools for Logical Database Design Sybase IWS uses standard database design constructs that port to many database systems, such as DB2 UDB, Oracle, Microsoft. developing, deploying, and maintaining a database design. 9.14 Literature Summary Current logical database design tools can be found in manufacturer Web sites [Database Answers, IBM Rational Software,. Rational Software, Computer Associates, Sybase PowerDesigner, Directroy of Data Modeling Resources, Objects by Design, Understanding relational databases: referential integrity, and Widom]. Teorey.book