Database Modeling & Design Fourth Edition- P18 pps

72 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling At this point we have sufficient commonality between schemas to attempt a merge. In schemas 1 and 2.2 we have two sets of common entities, Department and Topic-area. Other entities do not overlap and must appear intact in the superimposed, or merged, schema. The merged schema, schema 3, is shown in Figure 4.7a. Because the common entities are truly equivalent, there are no bad side effects of the merge due to existing relationships involving those entities in one schema and not in the other. (Such a relationship that remains intact exists in schema 1 between Topic-area and Report, for example.) If true equivalence cannot be established, the merge may not be possible in the existing form. In Figure 4.7, there is some redundancy between Publication and Report in terms of the relationships with Department and Topic-area. Such a redundancy can be eliminated if there is a supertype/subtype relationship between Publication and Report, which does in fact occur in this case because Publication is a generalization of Report. In schema 4.1 (Figure 4.7b) we see the introduction of this generalization from Report to Publication. Then in schema 4.2 (Figure 4.7c) we see that the Figure 4.7 View integration: the merged schema Publication includeshas NN NN N N N1 1N contains research- area written-for title title address (a) Schema 3, the result of merging schema 1 and schema 2.2 code code namename name Report Contractor publishesDepartment Topic-area address Teorey.book Page 72 Saturday, July 16, 2005 12:57 PM 4.4 View Integration 73 (b) Schema 3.1, new generalization (c) Schema 3.2, elimination of redundant relationships Figure 4.7 (continued) Publication includeshas d NN NN N N N1 1N contains research- area written-for title title address code code namename name Report Contractor publishesDepartment Topic-area address Publication includeshas d NN N N 1 N research- area written-for title address code code namename name Report Contractor Department Topic-area address Teorey.book Page 73 Saturday, July 16, 2005 12:57 PM 74 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling redundant relationships between Report and Department and Topic-area have been dropped. The attribute “title” has been eliminated as an attribute of Report in Figure 4.7c because “title” already appears as an attribute of Publication at a higher level of abstraction; “title” is inher- ited by the subtype Report. The final schema, in Figure 4.7c, expresses completeness because all the original concepts (report, publication, topic area, department, and contractor) are kept intact. It expresses minimality because of the transformation of “dept-name” from attribute in schema 1 to entity and attribute in schema 2.2, and the merger between schema 1 and schema 2.2 to form schema 3, and because of the elimination of “title” as an attribute of Report and of Report relationships with Topic-area and Department. Finally, it expresses understandability in that the final schema actually has more meaning than the individual original schemas. The view integration process is one of continual refinement and reevaluation. It should also be noted that minimality may not always be the most efficient way to proceed. If, for example, the elimination of the redundant relationships “publishes” and/or “contains” from schema 3.1 to 3.2 causes the time required to perform certain queries to be exces- sively long, it may be better from a performance viewpoint to leave them in. This decision could be made during the analysis of the transac- tions on the database or during the testing phase of the fully imple- mented database. 4.5 Entity Clustering for ER Models This section presents the concept of entity clustering, which abstracts the ER schema to such a degree that the entire schema can appear on a single sheet of paper or a single computer screen. This has happy conse- quences for the end user and database designer in terms of developing a mutual understanding of the database contents and formally document- ing the conceptual model. An entity cluster is the result of a grouping operation on a collection of entities and relationships. Entity clustering is potentially useful for designing large databases. When the scale of a database or information structure is large and includes a large number of interconnections among its different components, it may be very difficult to understand the semantics of such a structure and to manage it, especially for the end users or managers. In an ER diagram with 1,000 entities, the overall Teorey.book Page 74 Saturday, July 16, 2005 12:57 PM 4.5 Entity Clustering for ER Models 75 structure will probably not be very clear, even to a well-trained database analyst. Clustering is therefore important because it provides a method to organize a conceptual database schema into layers of abstraction, and it supports the different views of a variety of end users. 4.5.1 Clustering Concepts One should think of grouping as an operation that combines entities and their relationships to form a higher-level construct. The result of a grouping operation on simple entities is called an entity cluster. A grouping operation on entity clusters, or on combinations of elementary entities and entity clusters, results in a higher-level entity cluster. The high- est-level entity cluster, representing the entire database conceptual schema, is called the root entity cluster. Figure 4.8a illustrates the concept of entity clustering in a simple case where (elementary) entities R-sec (report section), R-abbr (report abbreviation), and Author are naturally bound to (dominated by) the entity Report; and entities Department, Contractor, and Project are not dominated. (Note that to avoid unnecessary detail, we do not include the attributes of entities in the diagrams.) In Figure 4.8b, the dark-bordered box around the entity Report and the entities it dominates defines the entity cluster Report. The dark-bordered box is called the EC box to represent the idea of an entity cluster. In general, the name of the entity cluster need not be the same as the name of any internal entity; how- ever, when there is a single dominant entity, the names are often the same. The EC box number in the lower-right corner is a clustering-level number used to keep track of the sequence in which clustering is done. The number 2.1 signifies that the entity cluster Report is the first entity cluster at level 2. Note that all the original entities are considered to be at level 1. The higher-level abstraction, the entity cluster, must maintain the same relationships between entities inside and outside the entity cluster as occur between the same entities in the lower-level diagram. Thus, the entity names inside the entity cluster should appear just outside the EC box along the path of their direct relationship to the appropriately related entities outside the box, maintaining consistent interfaces (relationships) as shown in Figure 4.8b. For simplicity, we modify this rule slightly: If the relationship is between an external entity and the dominant internal entity (for which the entity cluster is named), the entity cluster name need not be repeated outside the EC box. Thus, in Figure 4.8b, we could drop the name Report both places it occurs outside the Teorey.book Page 75 Saturday, July 16, 2005 12:57 PM 76 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling Report box, but we must retain the name Author, which is not the name of the entity cluster. 4.5.2 Grouping Operations Grouping operations are the fundamental components of the entity clustering technique. They define what collections of entities and relationships comprise higher-level objects, the entity clusters. The operations are heuristic in nature and include (see Figure 4.9): Figure 4.8 Entity clustering concepts N N N1 NN 1 N 1N has (a) ER model before clustering Report Author Project Department Contractor has does does hasin 11 R-abbr R-sec (b) ER model after clustering NNReportReport NN Author Project Department Contractor has does does 11 Report (entity cluster) 2.1 Teorey.book Page 76 Saturday, July 16, 2005 12:57 PM . computer screen. This has happy conse- quences for the end user and database designer in terms of developing a mutual understanding of the database contents and formally document- ing the conceptual. entities and relationships. Entity clustering is potentially useful for designing large databases. When the scale of a database or information structure is large and includes a large number of. could be made during the analysis of the transac- tions on the database or during the testing phase of the fully imple- mented database. 4.5 Entity Clustering for ER Models This section presents

Định dạng
Số trang	5
Dung lượng	166,41 KB