DATA MODELING FUNDAMENTALS (P16) pot

11 300 0
DATA MODELING FUNDAMENTALS (P16) pot

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

data modeling tools, analysis and design tools, and tools for documenting and testing applications. Circular Structure. A data structure consisting of three or more entity types forming cyclical relationships where the first is related to the second, the second to the third, and so on, and finally the last related back to the first. In a good data model, circular structures are resolved. Composite Key. Primary key made up of more than one attribute. Concatenated Key. Same as Composite Key. Conceptual Completeness. Conceptual completeness of a data model implies that it is a complete representation of the information requirements of the organization. Conceptual Correctness. Conceptual correctness of a data model implies that it is a true replica of the information requirements of the organization. Conceptual Data Model. A generic data model capturing the true meaning of the information requirements of an organization. Does not conform to the conventions of any class of database systems such as hierarchical, network, relational, and so on. Conceptual Entity Type. Set representing the type of the objects, not the physical objects themselves. Data Dictionary. Repository holding the definitions of the data structures in a database. In a relational database, the data dictionary contains the definitions of all the tables, columns, and so on. Data Integrity. Accuracy and consistency of the data stored in the organization’s data- base system. Data Manipulation. Operations for altering data in the database. Data manipulation includes retrieval, addition, update, and deletion of data. Data Mining. Knowledge discovery process. Data mining algorithms uncover hidden relationships and patterns from a given set of data on which they operate. Knowledge discovery is automatic, not through deliberate search and analysis by analysts. Data Model. Representation of the real-world information requirements that gets implemented in a computer system. A data model provides a method and means for describing real-world information by using specific notations and conventions. Data Repository. Storage of the organization’s data in databases. Stores all data values that are part of the databases. Data View. See User View. Data Warehouse. A specialized database having a collection of transformed and inte- grated data, stored for the purpose of providing strategic information to the organization. Database. Repository where an ordered, integrated, and related collection of the organization’s data is stored for the purpose of computer applications and information sharing. Database Administration. Responsibility for the technical aspects of the organization’s database. Includes the physical design and handling of the technical details such as database security, performance, day-to-day maintenance, backup, and recovery. Database administration is more technical than managerial. Database Administrator (DBA). Specially trained technical person performing the database administration functions in an organization. 426 GLOSSARY Database Practitioners. Includes the set of IT professionals such as analysts, data mode- lers, designers, programmers, and database administrators who design, build, deploy, and maintain database systems. DBMS. Database Management System. Software system to store, access, maintain, manage, and safeguard the data in databases. DDLC. Database Dev elopment Life Cycle. A complete process from beginning to end, with distinct phases for defining information requirements, creating the data model, designing the database, implementing the database, and maintaining it thereafter. Decomposition of Relations. Splitting of relations or tables into smaller relations for the purpose of normalizing them. Degree. The number of entity types or object sets that participate in a relationship. For a binary relationship the degree is 2. Dimension Entity Type. In a STAR schema, a dimension entity type represents a business dimension such as customer or product along which metrics like sales are analyzed. DKNF. Domain Key Normal Form. This is the ultimate goal in transforming a relation into the highest normal form. A relation is in DKNF if it represents one topic and all of its business rules, being able to be expressed through domain constraints and key relationships. Domain. The set of all permissible data values and data types for an attribute of an entity type. DSS. Decision Support System. Application that enables users to make strategic decisions. Decision support systems are driven by specialized databases. End-Users. See Users. Entity. A real-world “thing” of interest to an organization. Entity Instance. A single occurrence of an entity type. For example, a single invoice is an instance of the entity type called INVOICE. Entity Integrity. A rule or constraint to ensure the correctness of an entity type or rela- tional table. ERD. Entity-Relationship Diagram. A graphical representation of entities and their relationships in the Entity-Relationship data modeling technique. Entity Set. The collection of all entity instances of a particular type of entity. Entity Type. Refers to the type of entity occurrences in an entity set. For example, all customers of an organization form the CUSTOMER entity type. E-R Data Modeling. Design technique for creating an entity-relationship diagram from the information requirements. Evolutionary Modeling. Data modeling as promoted by the Agile Software Develop- ment movement. This is a type of iterative modeling methodology where the model evolves in “creation—feedback—revision” cycles. External Data Model. Definition of the data structures in a database that are of interest to various user groups in an organization. It is the way users view the database from outside. Fact Entity Type. In a STAR schema, a fact entity type represents the metrics such as sales that are analyzed along business dimensions such as customer or product. GLOSSARY 427 Feasibility Study. One of the earlier phases in DDLC conducting a study of the readiness of an organization and the technological, economic, and operational feasibility of a database system for the organization. Fifth Normal Form (5NF). A relation that is already in the fourth normal form and without any join dependencies. First Normal Form (1NF). A relation that has no repeating groups of values for a set of attributes in a single row. Foreign Key. An attribute in a relational table used for establishing a direct relationship with another table, known as the parent table. The values of the foreign key attribute are drawn from the primary key values of the parent table. Fourth Normal Form (4NF). A relation that is already in the third normal and without any multivalued dependencies. Functional Dependency. The value of an attribute B in a relation depending on the value of another attribute A. For every instance of attribute A, its value uniquely determines the value of attribute B in the relation. Generalization. The concept that some entity types are general cases of other entity types. The entity types in the general cases are known as super-types. Generalizing Specialists. A trend in software developers, as promoted by the agile soft- ware development movement, where specialists acquire more and more diverse skills and expand their horizons. Accordingly, data modelers are no longer specialists with just data modeling skills. Gerund. Representation of a relationship between two entity types as an entity type itself. Homonyms. Two or more data elements having the same name but containing different data. Identifier. One or more attributes whose values can uniquely identify the instances of an entity type. Identifying Relationship. A relationship between two entity types where one entity type depends on another entity type for its existence. For example, the entity type ORDER- DETAIL cannot exist without the entity type ORDER. Inheritance. The property that sub-sets inherit the attributes and relationships of their super-set. Intrinsic Characteristics. Basic or inherent properties of an object or entity. IT. Information Technology. Covers all computing and data communications in an organ- ization. Typically, the CIO is responsible for IT operations in an organization. Iterative Modeling. This implies that the modeling process is not strictly carried out in a sequential manner such as modeling all entity types, modeling all relationships, model- ing all attributes, and so on. Iterative modeling allows the data modeler to constantly go back, verify, readjust, and ensure cohesion and completeness. Key. One or more attributes whose values can uniquely identify the rows of a relational table. Logical Data Model. Also sometimes referred to as a conventional data model, consists of the logical data structure representing the information requirements of an organiz- ation. This data model conforms to the conventions of a class of database systems such as hierarchical, network, relational, and so on. The logical data model for a relational database system consists of tables or relations. 428 GLOSSARY Logical Design. Process of designing and creating a logical data model. Matrix. Consists of members or elements arranged in rows and columns. In the relational data model, a table or relation may be compared to a matrix thereby making it possible to apply matrix algebra functions to the data represented in the table. MDDMBS. Multi-dimensional database management system. Used to create and manage multi-dimensional databases for OLAP. Meta-data. Data about the data of an organization. Model Transformation. Process of mapping and transforming the components of a conceptual data model to those of a logical or conventional data model. MOLAP. Multidimensional Online Analytical Processing. An analytical processing technique in which multidimensional data cubes are created and stored in separate proprietary databases. Normal Form. A state of a relation or table, free from incorrect dependencies among the attributes. See also Boyce-Codd Normal Form, First Normal Form, Second Normal Form, and Third Normal Form. Normalization. The step-by-step method of transforming a random table into a set of normalized relations free from incorrect dependencies and conforming to the rules of the relational data model. Null Value. A value of an attribute, different from zero or blank to indicate a missing, non-applicable or unknown value. OLAP. Online Analytical Processing. Powerful software systems providing extensive multidimensional analysis, complex calculations, and fast response times. Usually present in data warehousing systems. Physical Data Model. Data model representing the information requirements of an organization at a physical level of hardware and system software, consisting of the actual components such as data files, blocks, records, storage allocations, indexes, and so on. Physical Design. Process of designing the physical data model. Practitioners. See Database Practitioners. Primary Key. A single attribute or a set of attributes that uniquely identifies an instance of an object set or entity type and chosen as the primary key. RDBMS. Relational Database Management System. Referential Integrity. Refers to two relational tables that are directly related. Referential integrity between related tables is established if non-null values in the foreign key attribute of the child table are primary key values in the parent table. Relation. In relational database systems, a relation is a two dimensional table with columns and rows, conforming to relational rules. Relational Data Model. A conventional or logical data model where data is perceived as two-dimensional tables with rows and columns. Each table represents a business object; each column represents an attribute of the object; each row represents an instance of the object. Relational Database. A database system built based on the relational data model. Relationship. A relationship between two object sets or entity types represents the associations of the instances of one object set with the instances of the other object GLOSSARY 429 set. Unary, binary, or ternary relationships are the common ones depending on the number of object sets participating in the relationship. A unary relationship is recur- sive—instances of an object set associated with instances of the same object set. Relationships may be mandatory or optional based on whether some instances may or may not participate in the relationship. Repeating Group. A group of attributes in a relation that has multiple sets of values for the attributes. ROLAP. Relational Online Analytical Processing. An online analytical processing technique in which multidimensional data cubes are created on the fly by the relational database engine. Second Normal Form (2NF). A relation that is already in the first normal form and without partial key dependencies. Set Theory. Mathematical concept where individual members form a set. Set operations can be used to combine or select members from sets in several ways. In a relational data model, the rows or tuples of a table or relation may be considered as forming a set. As such, set operations may be applied to manipulation of data represented as tables. Specialization. The concept that some entity types are special cases of other entity types. The entity types in the special cases are known as sub-types. SQL. Structured Query Language. Has become the standard language interface for relational databases. Stakeholders. All people in the organization who have a stake in the success of the data system. STAR Schema. The arrangement of the collection of fact and dimension entity types in the dimensional data model, resembling a star formation, with the fact entity type placed in the middle and surrounded by the dimension entity types. Each dimension entity type is in a one-to-many relationship with the fact entity type. Strategic Information. May refer to information in an organization used for makin g strategic decisions. Strong Entity. An entity on which a weak entity depends for its existence. See also Weak Entity. Sub-types. See Specialization. Subset. An entity type that is a special case of another entity type known as the superset. Super-types. See Generalization. Superset. An entity type that is a general case of another entity type known as the subset. Surrogate Key. A unique value generated by the computer system used as a key for a relation. A surrogate key has no business meaning apart from the computer system. Synonyms. Two or more data elements containing the same data but having different names. Syntactic Completeness. Syntactic completeness of a data model implies that the model- ing process has been carried out completely to produce a good data model for the organization. Syntactic Correctness. Syntactic correctness of a data model implies that the represen- tation using the appropriate symbols does not violate any rules of the modeling technique. 430 GLOSSARY Third Normal Formn (3NF). A relation that is already in the second normal form and without any transitive dependencies—that is, the dependencies of non-key attributes on the primary key through other non-key attributes, not directly. Transitive Dependency. In a relation, the dependency of a non-key attribute on the primary key through another non-key attribute, not directly. Triad. A set of three related entity types where one of the relationships is redundant. Triads must be resolved in a refined data model. Tuple. A row in a relational table. UML. Unified Modeling Language. Its forerunners constitute the wave of object-oriented analysis and design methods of the 1980s and 1990s. UML is a unified language because it directly unifies the leading methods of Booch, Rumbaugh, and Jacobson. OMG (Object Management Group) has adopted UML as a standard. User View. View of the database by a single user group. Therefore, a data view of a particular user group includes only those parts of the database that group is concerned with. The collection of all data views of all the user groups constitutes the total data model. Users. In connection with data modeling, the term users includes all people who use the data system that is built based on the particular data model. Weak Entity. An entity that depends for its existence on another entity known as a strong entity. For example, the entity type ORDER DETAIL cannot exist without the entity type ORDER. See also Strong Entity. XML. eXtensible Markup Language. Introduced to overcome the limitations of HTML. XML is extensible, portable, structured, and descriptive. In a very limited way, it may be used in data modeling. GLOSSARY 431 INDEX Aggregation. See Relationships, special cases of, aggregation Agile movement, the, 376–379 generalizing specialists, 379 philosophies, 378 principles, 378 See Data modeling, agile modeling principles See also Modeling, agile; Modeling, evolutionary Assembly structures, 147–148 Attribute, checklist for validation of, 178–180 Attributes, 100, 158–178 constraints for, 169–170 null values, 170 range, 170 type, 170 value set, 169 data, as, 161 domain, definition of, 164 domains, 164– 169 attribute values, for, 166 information content, 165 misrepresented, 167 split, 167 names, 163 properties or characteristics, 158 relationships of, 160 types of, 171–175 optional, 173 simple and composite, 171 single-valued and multi-valued, 171 stored and derived values, with, 172 values, 162 Business intelligence, 300 Business rules, incorporation of, 25 Case study E-R model, 84 UML model, 87 Categorization. See Specialization/ Generalization, categorization Circular structures, See Relationships, design issues of, circular structures Class diagram, 62 See also UML Conceptual and physical entity types, 145–147 Conceptual model symbols and meanings, 77 Data lifecycle, 7–9 Data mining, 334–342 OLAP versus data mining, 336 techniques, 338 data modeling for, 341 Data model communication tool, 5 components of, 18–20 database blueprint, 5 external, 13, 75 conceptual, 14 –15, 75 identifying components, 77–80 review procedure, 76–77 logical, 15 –17, 75, 104–107 transformation steps, 107–110 433 Data Modeling Fundamentals. By Paulraj Ponniah Copyright # 2007 John Wiley & Sons, Inc. Data model (Continued ) physical, 17, 76, 111–112 quality, 26– 29, 348 approach to good modeling, 351 assurance process, 365–373 aspects of, 365 assessment of, 370 stages of, 366 definitions, of, 351–360 checklists, 358 dimensions, 361 good and bad models, 349 meaning of, 360 relational, 109 symbols, 19– 20 Data model diagram, review of, 103–104 Data modeling agile modeling principles, application of, 34– 35 approaches, 36– 38, 44– 47 data mining, for, 341 data warehouse, for the, 38–39 methods and techniques IDEF1X, 51 Information Engineering, 50 Object Role Modeling (ORM), 55 Peter Chen (E-R) modeling, 48 Richard Barker’s, 53 XML, 57 steps of, 20–26 tips, practical, 392–421 bill-of-materials, 409 iterative modeling, 399–401 cycles, establishing, 399 increments, 400 partial models, integration of, 401 layout, conceptual model, 409–417 adding texts, 416 component arrangement, 410 visual highlights, 417 legal entities, 402 locations and places, 403 logical data model, 417–421 persons, 407 requirements definition, 393–396 stakeholder participation, 396–399 time periods, 405 Data system development life cycle. See DDLC Data warehouse, 301–325 data staging, 304 data storage, 304 dimensional to relational, 322 families of STARS, 321 information delivery, 305 modeling business data, dimensional nature of, 306 dimensional modeling, 308–312 dimension entity type, 309,313 fact entity type, 309, 314 information package, 307 snowflake schema, 318 source data, 304 STAR schema, 312–318 data granularity, 315, 317 degenerate dimensions, 316 factless fact entity type, 316 fully additive measures, 315 semi-additive measures, 315 technologies, 302 Database design conceptual to relational, 243 informal, 272 model transformation method attributes to columns, 250 entity types to relations, 250 identifiers to keys, 252 transformation of relationships, 252–267 mandatory and optional conditions, 261–265 transformation summary, 267 when to use, 248 traditional method, 244 Databases, post-relational, 39–40 DDLC, 29– 33 design, 31 implementation, 31 phases and tasks, 32 process, starting the, 30 requirements definition, 30 roles and responsibilities, 33 Decision-support systems, 296–301 data modeling for, 301 history of, 297 Dimensional analysis. See OLAP systems, dimensional analysis Domains. See Attributes, domains E-R modeling. See Data modeling, methods and techniques; Peter Chen (E-R) modeling Entity, checklist for validation of, 153–155 434 INDEX Entity integrity. See Relational model, entity integrity Entity types aggregation, 129 association, 129 category of, 127 definition, comprehensive, 116 existence dependency, 132 homonyms, 125 ID dependency, 132 identifying, 120 intersection, 129 regular, 128 strong, 128 subtype, 128 supertype, 128 synonyms, 125 IDEF1X. See Data modeling, methods and techniques, IDEF1X Identifiers or keys, 101, 175–178 generalization hierarchy, in, 177–178 guidelines for, 176 keys, definitions of, 175 need for, 175 Informal design, 272–276 potential problems, 273–276 addition anomaly, 276 deletion anomaly, 275 update anomaly, 275 Information engineering. See Data modeling, methods and techniques; Information engineering Information levels, 11–13 Integration definition for information modeling. See Data modeling, methods and techniques, IDEF1X Key. See also Identifiers or keys composite, 176 natural, 176 primary, 176 surrogate, 176 Meta-modeling, 40 Modeling, agile, 379–385 documentation, 383 feasibility, 384 practices additional, 383 primary, 381 principles auxiliary, 381 basic, 380 Modeling, evolutionary, 385–387 benefits of, 387 flexibility, need for, 386 nature of, 386 Modeling time dimension, 149 Normalization methodology, 276–291 fundamental normal forms, 278–285 Boyce–Codd normal form, 284 first normal form, 278 second normal form, 279 third normal form, 281 higher normal forms, 285–288 domain-key normal form, 288 fifth normal form, 287 fourth normal form, 286 normalization as verification, 291 steps, 277, 290 OLAP systems, 325–333 data modeling for, 332 dimensional analysis, 326 features, 325 hypercubes, 328 MOLAP, 330 ROLAP, 330 Online analytical processing. See OLAP systems ORM. See Data modeling, methods and techniques; Object Role Modeling Peter Chen. See Data modeling, methods and techniques; Peter Chen (E-R) modeling Process modeling, 40 Quality. See Data model, quality Recursive structures, 145 Referential integrity. See Relational model, referential integrity Relational model, 231–242 columns as attributes, 234 entity integrity, 240 functional dependencies, 242 mathematical foundation, 232 modeling concept, single, 232 notation for, 237 referential integrity, 240 relation or table, 233 INDEX 435 [...]... summarized, 144 when to be used, 137 STAR schema See Data warehouse, STAR schema See also Data warehouse, families of STARS Symbols See Data model, symbols UML activity diagram, 68 class diagram, 62 collaboration diagram, 65 sequence diagram, 65 state diagram, 65 data modeling using, 61– 63 development process, in, 64 –65 use case diagram, 65 Unified modeling language See UML User views, 33, 90 View integration,... participation, 198–200 partial, 198 total, 199 two-sided, 186 types of, 201–204 identifying, 202 nonidentifying, 204 Requirements definition See Data modeling, tips, practical, requirements definition See also DDLC, requirements definition Richard Barker’s See Data modeling, methods and techniques, Richard Barker’s Specialization/generalization, 98, 134– 144 attributes, inheritance of, 140 categorization, . conventions. Data Repository. Storage of the organization’s data in databases. Stores all data values that are part of the databases. Data View. See User View. Data Warehouse. A specialized database. tables, columns, and so on. Data Integrity. Accuracy and consistency of the data stored in the organization’s data- base system. Data Manipulation. Operations for altering data in the database. Data manipulation includes. and meanings, 77 Data lifecycle, 7–9 Data mining, 334–342 OLAP versus data mining, 336 techniques, 338 data modeling for, 341 Data model communication tool, 5 components of, 18–20 database blueprint,

Ngày đăng: 07/07/2014, 09:20

Tài liệu cùng người dùng

Tài liệu liên quan