PATTERNS OF DATA MODELING- P34 potx

150 Chapter 11 / Identity IDEF1X uses the AKn.m annotation to indicate unique combinations of fields (alternate keys). The “n” is the number of the alternate key. The “m” is the sequence of fields within the alternate key. Thus in Figure 11.1 the airlineName and airlineCode each individually de- note an Airline. The combination of airlineID and accountNumber is also unique for a Fre- quentFlyerAccount. 11.1.2 UML Qualifiers UML qualifiers are also helpful for specifying intrinsic identity. A qualifier is an attribute that distinguishes among the entities at a “many” relationship end. A qualifier selects among the entities, reducing the effective multiplicity, often from “many” to “one”. For example, in Figure 11.1 the UML model and IDEF1X models have the same meaning. They specify that an Airline plus an accountNumber yield at most one FrequentFlyerAccount. In contrast, if the accountNumber is omitted, an Airline yields many FrequentFlyerAccounts. A qualifier specifies an important path for traversing a model and finding data. 11.1.3 Logical Horizon Traversal is a third aspect of intrinsic identity. Applications can navigate a model and its corresponding database by traversing relationships and generalizations. Traversals of a model become SQL joins in an implementation. The logical horizon [Feldman-1986] of an entity type is the set of entity types reachable by one or more paths terminating in a cumulative multiplicity of one. A path is a sequence of traversals of relationships and generalization levels. The purpose of the logical horizon is to compute the specific entities that can be inferred from a starting entity. First I will explain Figure 11.2 and then give some examples for logical horizon. A Pe r- son may have multiple FrequentFlyerAccounts, each of which is offered by an Airline. An accountNumber is unique within the context of an Airline. Each FrequentFlyerAccount can have Activities posted of various ActivityTypes. For example, a frequent flyer account can have credits posted for actual flight miles, bonus miles, dining activity, and car rentals. Figure 11.1 Intrinsic identity in a model. There are unique combinations of fields that provide starting points for finding data in a database. Airline name {unique} FrequentFlyerAccount 1 accountNumber 0 1 lastActivityDate /currentBalance IDEF1X model UML model airlineID Airline airlineName (AK1.1) frequentFlyerAccountID FrequentFlyerAccount lastActivityDate currentBalance airlineID (FK) (AK1.1) accountNumber (AK1.2) code {unique} airlineCode (AK2.1) 11.2 Names 151 The logical horizon of Person is null; the only relationship is to FrequentFlyerAccount and the multiplicity is “many.” The logical horizon of FrequentFlyerAccount is Pers on and Airline. The logical horizon of Activity is FrequentFlyerAccount, Person, Airline, and Activ- ityType. 11.2 Names Names are prominent in models and often helpful for finding entities. Webster’s dictionary defines a name as “a word or phrase that constitutes the distinctive designation of a person or thing.” There are four scenarios for how names can be used to find entities. • Unique names. Some names are unique and resolve to a single entity. For example, the name of a country corresponds to a specific country. In Figure 11.2 Airline name, Air- line code, and ActivityType name are globally unique. Figure 11.2 Names. Names are often helpful for finding entities. Airline name {unique} FrequentFlyerAccount 1 Person name 1 * accountNumber 0 1 lastActivityDate /currentBalance Activity amount date * 1 ActivityType name {unique} * 1 IDEF1X model UML model airlineID Airline airlineName (AK1.1) personID Person frequentFlyerAccountID FrequentFlyerAccount lastActivityDate currentBalance airlineID (FK) (AK1.1) accountNumber (AK1.2) personID (FK) personName activityID Activity amount date frequentFlyerAccountID (FK) activityTypeID (FK) activityTypeID ActivityType activityTypeName (AK1.1) code {unique} airlineCode (AK2.1) 152 Chapter 11 / Identity • Unique names within a context. Other names are not unique on their own but are unique when combined with a parent entity. For example, the names of provinces are unique within the context of a country. UML qualifiers (and their equivalent in IDEF1X alternate keys) provide a notation for specifying fields that are unique within a context. In Figure 11.2 accountNumber provides the unique name for a FrequentFlyerAccount within the context of an Airline. • Non-unique names. Still other names provide important description but alone cannot find an entity. For example, person names are important, but insufficient for finding an individual person. Sometimes non-unique names can be augmented with additional de- tails to find a specific entity. • Multiple unique names. Some entities have multiple names. Figure 11.3 promotes the substanceName attribute to an entity type because each chemical substance may have multiple aliases. For example, propylene is known as propylene and C 3 H 6 . 11.3 Surrogate Identity Sometimes entities can be identified via other entities with which they are closely related. For example, you cannot reliably identify a person with his or her name. However, in some applications you can identify a person via a passport, driver’s license, or identity card. 11.4 Structured Fields Some entities, especially mechanical parts and items for commerce, have structured fields that provide identity. A structured field is a field that is composed from constituent pieces with a specified grammar. Structured fields are synthetic but when parsed the pieces have meaning. Many structured fields are backed by standard protocols. As an example, consider the UPC, EAN, and GTIN codes for consumer packaged goods. The UPC (Universal Product Code) originated in the United States in the early 1970s SubstanceName substanceName alias Substance 1 * Figure 11.3 Multiple unique names. Chemical substances can have multiple names that identify the substance. 1 IDEF1X model UML model substanceID Substance substanceNameID SubstanceName substanceName substanceID (FK) 11.5 Master Applications 153 and has twelve digits. The first digit is the system digit and indicates the kind of item—gen- eral merchandise, random-weight item, health item, in-house item, and coupon. The next five digits denotes the vendor. Digits seven through eleven are called the item code—a unique code for an item within the context of a vendor. The last digit is a check digit. The UPC was followed by the EAN (European Article Number) which has thirteen digits. The EAN has the following sequence: two system digits, five vendor digits, five item digits, and one check digit. Most recently the UPC and EAN have both been replaced by the GTIN (Global Trade Item Number). The GTIN has fourteen digits with the following sequence: item digit, two system digits, five vendor digits, five item digits, and one check digit. Note that there are a total of six item digits, one at the start of the sequence and the other five later in the sequence. The extra item digit is often used to distinguish individual items from multiple items that are packaged into cases and pallets. Many manufacturers have product codes that embody information about the major com- ponents, location of manufacture, and date of manufacture. 11.5 Master Applications Some organizations have dedicated applications (called master applications) that enforce the identity of occurrences of crucial concepts and unify their data across an organization’s applications. All requests for creating, modifying, and deleting such occurrences must be co- ordinated via the corresponding master application. For example, customers, parts, loca- tions, securities, and contracts are often good candidates for master applications. An organization should have no more than a handful of master applications and the subject concepts should pervade the organization. Consider a Customer master application. If each application names customers, there could be many variations, such as AT&T, A.T.&T., and American Telephone and Telegraph. One application might have the current mailing address and another might have a stale address from the past. With this kind of chaos, it is difficult to have a global perspective for data. In contrast, a Customer master application would dictate the precise name for a com- pany for use throughout an organization and serve as a centralized source for its data. 11.6 Merging Data Duplicate copies of data can arise in various ways, such as through flaws in business pro- cesses. For example, an airline may create two frequent flyer accounts for a person and not realize that they are dealing with the same customer. Duplicate copies can also arise through business acquisitions that cause databases to be merged. For example, the merger of Delta Airlines and Northwest Airlines created a need to merge frequent flyer databases. The tricky part of merging data is determining if one entity is the same thing as another. The entities being compared must have matching real-world data that can identify them or there must be an algorithm to deduce a match (such as by equating the various AT&T names 154 Chapter 11 / Identity in Section 11.5). Sometimes the information required to make the determination is missing from a database; then a person has to investigate to reach a decision. Once a match is estab- lished the mechanics of combination are straightforward but can still be tedious to accom- plish without disrupting a business. There is no merge technique that is clearly superior. The appropriate solution depends on the situation as Table 11.1 explains. 11.7 Chapter Summary Identity is the property that distinguishes an entity from all others. In concept, an entity has intrinsic identity apart from how the entity may happen to be implemented. Candidate keys Merge approach Definition Advantages Drawbacks Combine entities Copy data from one entity to the other. Then discard the rejected entity. • Fully consoli- dates the data. • There can be much data to migrate, especially if there are many foreign keys. • May have to remap foreign keys and revise foreign key definitions. • It is easy to overlook a foreign key, causing a stale ref- erence. Favor an entity Mark one entity as deprecated and the other as active. Both entities remain in the database. Gradually migrate data to the active entity. • Eventually con- solidates the data. • There is less risk of stale references as obsolete data can persist for awhile. • In the meantime there is database clutter, possible confusion, and indirection in accessing data. • There can be much data to migrate, especially if there are many foreign keys. Logical merge Use a binding table to logically combine entities. (See Section 10.1.) One entity could be favored for new data. • Can readily merge data as well as unmerge. • There is no risk of stale references. • Causes indirection in accessing data, complicat- ing queries and slowing per- formance. • The database schema remains fragmented by multiple aspects of an entity. Table 11.1 Summary of Approaches to Merging Data Note: The appropriate solution depends on the situation. . acquisitions that cause databases to be merged. For example, the merger of Delta Airlines and Northwest Airlines created a need to merge frequent flyer databases. The tricky part of merging data is determining. entities remain in the database. Gradually migrate data to the active entity. • Eventually con- solidates the data. • There is less risk of stale references as obsolete data can persist for. location of manufacture, and date of manufacture. 11.5 Master Applications Some organizations have dedicated applications (called master applications) that enforce the identity of occurrences of crucial

Định dạng
Số trang	5
Dung lượng	146,02 KB