Tài liệu Managing time in relational databases- P7 pdf

20 419 1
Tài liệu Managing time in relational databases- P7 pdf

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

result is that asserted version bi-temporal tables have two pairs of dates, representing two different time periods. They are the effective time period and the assertion time period of the rows in those tables. To assert a version is to claim that it is true. With any tables other than bi-temporal tables, if something happens that makes a row no longer true, that row is either updated or deleted. But with asserted version tables, that isn’t how we handle a row that we discover is not true. We handle it by assigning it an assertion end date (or a transaction end date, in the case of the standard temporal model) representing the date on which we acknowl- edge that it is not true. Then we add another row to the table. This new row is a new assertion about the same version, a new assertion about what the object is like during the same period of time. The multiple rows representing the same version are multiple assertions of that version. In a series of assertions about the same object during the same effective time period, the later one (in assertion time) of every consecutive pair is a correction to the earlier one, and the latest one of all is our current assertion about what is true. We may also note here that corrections and the versions they correct do not necessarily line up one to one. An error may span only part of a version, or may span multiple versions, or may span both. The ability to include in a table both corrections to versions, as well as the original versions themselves, is precisely what is lost when we remain with uni-temporal versions. This ability is precisely what we gain when we manage data bi- temporally. For example, suppose that a row in a table states that pol- icy P861 had a copay amount of $25 during the first six months of 2010. But on July 1 st of that year, we realized that the copay amount was actually $20forthatpolicyoverthat time period. If we were to overwrite that copay amount, we would lose the information that prior to July 1 st , all reports and queries would have shown a copay of $25. To avoid losing that information, we instead end the assertion time period for the row showing a $25 copay, and insert another row. This new row is also for policy P861, during the same six months of 2010, but it shows the correct copay of $20. Its assertion begin date is the same as the assertion end date of the row showing the $25 copay—July 1 st .WenolongerassertthatP861hada $25 copay during that period of time. Instead, we now assert that it had a $20 copay during that period of time. We have corrected a mistake without losing track of what the mistake Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING 103 was, when it occurred, and for how long it was mistakenly treated as the truth. Temporal Integrity Constraints The three integrity constraints in relational theory are entity integrity, referential integrity and domain integrity. Entity integ- rity insures that the object represented by a row in a non- temporal table is represented by that row only, and no other. But as we have seen, with temporal tables, an object may be represented by any number of rows. For these tables, then, the entity integrity constraint must be modified. That modification results in what we call temporal entity integrity. As for domain integrity, it is obviously possible for a domain to change over time. But such changes are not changes to data. They are changes to types of data, indeed to datatypes. Domain changes are one form of what computer scientists call schema evolution. Adding or removing tables, altering primary or foreign keys, and adding or removing non-key columns, are other ways in which database schemas evolve. Both the standard temporal model and Asserted Versioning are formalizations of temporal data management within a snapshot of the evolution of a data- base schema. Both temporal models, along with the IT industry’s best versioning practices, assume unchanging, stable database schemas. What, then, of referential integrity? Well, suppose we have two conventional tables, X and Y, with Y having a referential integrity dependency on X. Typically we would say that X is the parent table, and Y the child table. Now suppose that both tables are bi-temporal tables. In this case, any number of rows in X may represent the same object. So for any row in Y, which of the mul- tiple rows in X, all representing the same object, is that row in Y dependent on? Which row in X does the foreign key in Y point to? The answer is that the foreign key in Y does not point to any specific row in X. For these tables, then, the referential integrity constraint must be modified. That modification results in what we call temporal referential integrity (TRI). The foreign key in Y becomes what we call a temporal foreign key (TFK). Temporal Entity Integrity Breaking the one-to-one correspondence between objects and rows is no small thing. Its implications are significant. One of them is that the relational rule of entity integrity must be understood in a new way. 104 Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING With a non-temporal table, entity integrity is usually enforced by a primary key uniqueness constraint on the table. It may also be enforced by defining a unique index on an alternate key. When a surrogate key is used as the primary key of a table, pri- mary key uniqueness will guarantee only that rows are physically distinguishable from one another. To guarantee entity integrity, which is a semantic constraint, a unique index must also be defined on an alternate key, none of whose component columns are surrogate-valued. For temporal tables, the corresponding constraint is temporal entity integrity. Basically, it works like this. In a non-temporal table, DBMS-enforced entity integrity blocks any insertion that would result in a pair of rows both of which represent the same object. But when the target table is an asserted version table, temporal entity integrity (TEI) blocks any insertion that would result in a pair of rows that represent the same object and that have one or more effective time clock ticks in common within one or more assertion time clock ticks that they also have in common. In enforcing the rule that no two versions of the same object may conflict, TEI is obviously analogous to conventional entity integrity. For having one or more effective time clock ticks in common means that there are two rows which both purport to describe the same object during the same period of time. So both entity integrity and temporal entity integrity play the same semantic role; they both prevent conflicting truth claims. However, TEI has additional work to do, work that is not required of entity integrity. First of all, TEI must insure that adja- cent versions within the same episode [meet], i.e. that there are no temporal gaps between them. In addition, TEI must also insure that there are temporal gaps between adjacent episodes, i.e. that one episode is always [before] the other. And it’s impor- tant to understand why there must be at least one clock tick between any pair of adjacent episodes. It has to do with under- standing and enforcing the user’s intentions when she submits a temporal transaction. First, let’s consider insert transactions. With an insert to a conventional table, the user tacitly agrees that if a row with the same unique identifier already exists in the target table, the DBMS will reject the transaction. The user is telling the DBMS “I believe that this table does not contain a row for this object. So if I’m wrong, I don’t want the transaction to proceed. I want you to kick out the transaction, and notify me.” By the same token, then, with an insert to an asserted version table, the user is telling the DBMS to create a new episode of the Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING 105 indicated object, or to extend an existing episode into effective time clock ticks that it does not yet occupy. And analogously to entity integrity in non-temporal tables, the user is tacitly agree- ing that if an episode of the same object already exists in the tar- get table and if its effective and assertion time periods have any clock ticks in common with the time periods targeted by the transaction, the transaction should be rejected. The user is telling the Asserted Versioning Framework “I believe that this table does not contain any row or rows for this object that already represent the object in even a single clock tick that is specified on the transaction. So if I’m wrong, I don’t want the transaction to proceed. I want you to kick out the transaction, and notify me.” Next, let’s consider update and delete transactions. By a simi- lar process of reasoning, we can see that in submitting either type of transaction, the user is telling the Asserted Versioning Framework “I believe that the table does contain one or more rows for this object with one or more clock ticks that [ intersect] the clock ticks specified on this transaction. So if I’m wrong, I don’t want the transaction to proceed. I want you to kick out the transaction, and notify me.” One Clock Tick: Convention or Constraint? Given two rows representing the same object, there are three temporal relationships between them that are distinguishable by means of a single clock tick. First, if there is even a single clock tick between the end of one and the start of the next, then they are non-contiguous. In Allen relationship terms, one is [before] the other. Next, if there is even a single clock tick that is contained in both their time periods, then they [ intersect]. Finally, if neither is the case, then they are contiguous. In Allen relationship terms, they [meet]. Two versions of the same object that are non-contiguous may exist in the same target table at the same time. But if they do, they necessarily belong to different episodes. And if one of those versions is the only version of that object already in the target table and the other is a transaction, that transaction cannot be an update. If it were an update transaction, it would be equiva- lent to attempting a conventional update when there was no row for that object already in the target table. By the same token, if one of those versions is in the target table and the other is a transaction, that transaction cannot be a deletion. Two versions of the same object that [ intersect], in both effec- tive and assertion time, cannot exist in the same target table at 106 Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING the same time. If they did, they would violate temporal entity integrity. They would be two concurrently asserted statements about what the same object is like at the same time. By the same token, if one of those versions is in the target table and the other is a transaction, that transaction cannot be an insert transaction. If it were an insert transaction, it would be equivalent to attempting a conventional insert when a row for that object is already in the target table. But what about the third case, when the two versions are con- tiguous? As we have already seen, two contiguous versions of the same object can exist in the same table at the same time, and that, in doing so, they belong to the same episode. But when one is a transaction and the other a row already in the target table, which is the correct transaction to use—an insert or an update? As far as the results of the transaction are concerned, it doesn’t matter. Whether an insert or an update is used, the effect will be to expand an existing episode either forwards or backwards in time. Of course, if the episode is being expanded forwards in time, it must be an episode which, prior to the transaction, ended in a non-12/31/9999 date. Otherwise, the transaction’s begin date could not be contiguous with the end date of the episode, but instead would be included within the time period of the episode. But the principle is still the same. We have chosen to use an insert transaction in cases of contiguous time periods, but we could have chosen to use an update transaction instead. It is entirely a matter of convention, of deciding on a convention that will make a user’s background assumptions clear. We chose to use temporal insert transactions, in fact, to pre- serve a pleasing symmetry. For we now have a set of trans- actions in which all increases in clock tick representation are done with inserts, all reductions in clock tick representation are done with deletes, and updates do neither. Of course, pleasing symmetries often turn out to have practical as well as aesthetic benefits. In this case, for example, the AVF will never have to be concerned with temporal entity integrity or temporal referential integrity on update transactions, except for an update which changes the parent object referenced by a version. Making that determination on nothing more than transaction type is certainly more efficient than making it on the basis of some more complex set of criteria. With our convention in place, we have a clear and intuitive model of how time and transaction types pair up with one another. For a given object, if the effective and assertion time periods specified on a temporal transaction do not [ intersect] the time periods on any rows already in the target table, then Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING 107 the transaction is valid if it is an insert, and invalid otherwise. Conversely, for a given object, if the effective and assertion time periods specified on a temporal transaction do [ intersect] the time periods on one or more rows already in the target table, then the transaction is valid if it is an update or delete, and invalid otherwise. What we could not have chosen to do is to permit either inserts or updates to be used in the case of contiguous time per- iods. The reason is that we must preserve a core element in the semantics of conventional insert and update transactions, which is that the user knows whether or not the target table already contains a row which matches the transaction. By using a con- ventional insert, the user is telling us to reject the transaction if a matching representation of that object already exists in the table. The same must be true of temporal inserts against tempo- ral tables. By using a conventional update, the user is telling us to reject the transaction unless a matching representation of that object already exists in the table. Again, the same must be true of temporal updates. The difference is that a “matching representa- tion”, in a conventional table, is simply a row representing the same object. A “matching representation”, in an asserted version table, is a row representing the same object at the same time, i.e. in a set of one or more identical clock ticks. Temporal Referential Integrity Another consequence of breaking the one-to-one correspon- dence between objects and rows is that the relational rule of ref- erential integrity breaks down when the parent table in a referential integrity relationship is an asserted version table. In that case, the parent in any instance of that relationship is not a row; rather, it is an episode, and it may consist of any number of rows. Referential integrity (RI) reflects an existence dependency.Ifa child row is RI-dependent on a parent row, this is based on the fact that the object represented by the child row is existence- dependent on the object represented by the parent row. There- fore, a child row cannot be inserted into the database unless its referenced parent row is already present, and that parent row cannot be deleted from the database as long as any child row referencing it is present. The same logic is at work in the case of temporal referential integrity. If there is an existence dependency between a parent object and a child object—between a client and a policy, for example—then we cannot assert that the policy is ever in effect 108 Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING when the client is not. It follows that no change which reduces the total number of clock ticks in which a child policy is in effect, and no change which increases the total number of clock ticks in which a parent client is in effect, can create a TRI violation. And so the AVF never has to enforce TRI on a child-side temporal delete or a parent side-temporal insert. As for temporal updates, they never alter the number of clock ticks in which an object is represented. And so, unless they change the parent object for a version, there is no TRI enforcement needed. By the same token, any change which increases the total number of clock ticks in which a child policy is in effect, and any change which decreases the total number of clock ticks in which a parent client is in effect, can create a TRI violation. And so the AVF must enforce temporal referential integrity on parent-side temporal delete transactions, and also on child-side temporal inserts. Child-Side Temporal Referential Integrity The foreign key in a row in a child asserted version table is a temporal foreign key (TFK). It contains the object identifier (the oid) of the object that its object is existence-dependent on. But this object identifier isn’t sufficient to identify a specific row in the parent table. There may be many rows in the parent table with that object identifier. And that one row in the child table may be TRI-dependent on any number of those rows in the par- ent table. That one row in the child table is TRI-dependent on an episode in the parent table, an episode of the object designated by the object identifier in its TFK. The episode it is dependent on is the one episode of the object designated by that oid that, within shared assertion time, includes the effective time period of that version. Although the parent managed object in a TRI relationship is an episode, the child managed object is a version. Just as the for- eign key value in a row in a conventional table may change over time, so too the temporal foreign key value in a version in a tem- poral table may change over time from one version of an object to the next version of that same object. It does not have to be the same as the TFK value in any other version of the same object. Even within the same episode, a TFK value may change from one version in that episode to the next version in that same epi- sode. TRI child objects are versions, not episodes, because among versions of the same object, what is referenced by a temporal foreign key may change over time and, consequently, over versions. Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING 109 Parent-Side Temporal Referential Integrity Of course, TRI, like RI, can be violated from the parent side as well. In the case of TRI, a violation cannot occur unless the tem- poral extent of the parent episode is reduced. This can happen in one of three ways, First of all, the effective-time start of the epi- sode can be moved forwards. Second, the effective-time end of the episode can be moved back. This will happen when either the effective-time end of an episode is changed from 12/31/9999 to any other date, or is changed from a non-12/31/9999 date to an earlier date. Finally, the episode can be split into two episodes, leaving a gap where previously there had been none. But shortening the effective time extent of a parent episode will not always result in a TRI violation. It will do so only if the reduction removes the representation of the parent object from one or more clock ticks that are occupied by a child version whose TFK matches the oid of the versions in that parent epi- sode. For example, suppose a parent episode’s effective time is all of 2009, and a delete transaction splits that episode and creates in its place one January to April episode and another October to December episode. If none of the child versions has an effective time period that includes any of the six months from April to October, then TRI has not been violated. Conceptually, reducing the time period of an episode with dependent versions (versions which may be in the same table but, more commonly, are in other tables) so that the parent epi- sode no longer fully includes the time period of one or more of those child versions, is like a deletion in a conventional parent table in that there are three options for handling it. First, we may want to simply restrict the transaction and prevent the reduction from taking place. Second, we may want to permit the transaction to proceed, but find all the dependent child rows and set their temporal foreign keys to the parent object to NULL. Or, finally, we may want to reduce the temporal extent of all affected child rows so that the TRI constraint is re-established. For example, if a client is deleted effective September 2010, then any policy owned by that client must be deleted as of that same date. In asserted version tables, this means that if the most recent episode of that client is given an effective end date of Sep- tember 2010, then the most recent episode of the policy she owns must have an effective end date no later than September 2010. 4 4 To be completely accurate, this description would have to be modified a little to include situations in which there are future versions or episodes of that client or that policy. But we will leave those details for later. 110 Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING The mechanisms by which we check for and enforce tempo- ral referential integrity constraints are proprietary to Asserted Versioning, LLC. But one thing is obvious. At some point in the design process, the data modeler is going to have to declare TRI relationships; and from a record of those declarations, a mechanism will have the metadata needed to enforce both sides of the relationship. Bi-Temporal Tables and Non-Bi-Temporal Views The only physical tables managed by Asserted Versioning are bi-temporal tables. But on the basis of those bi-temporal tables, Asserted Versioning also manages two uni-temporal views, and one non-temporal view. Although we will frequently refer to them as views, sometimes it will be convenient to refer to them as tables instead. This is because we are accustomed to manag- ing the data they contain by means of physical tables. For example, in the preceding chapter, we reviewed several types of version tables. Asserted Versioning supports all the functionality of all those types of version tables, and important additional functionality as well. But version tables are not a distinct kind of physical table for which Asserted Versioning must define both schemas and the code to manage them. Instead, Asserted Versioning defines one bi-temporal schema, and manages only that one kind of physical table. Those tables support uni-temporal versions and uni-temporal assertions, but it supports them as views. And by means of views, those tables also support what appear to be conventional, non- temporal tables. The rows in uni-temporal version tables (views) have a single time period associated with them. This is the period from when the object, as it exists in the world, is as the row describes it to be, extending to 12/31/9999, or to when the object changes so that the description no longer applies to it, or to when the exis- tence of the object is no longer of concern to us. The rows in uni-temporal assertion tables (views) also have a single time period associated with them. This is the period from when they begin to be asserted as true statements, extending to either 12/31/9999 or to when they cease being asserted as true statements about what the objects they represent are currently like. Rows in uni-temporal assertion tables can only describe what the objects they represent are currently like because, by definition, such rows have no effective time. Having no effective Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING 111 time, like rows in conventional tables, the time in the life history of the objects they represent is the eternal Now(). In physical versioned tables, there is only one version for an object during any one period of time. It describes what the object was/is/will be like during that period of time. If we later find that the description is wrong, all we can do is overwrite it. We cannot create a second version about the same object during that same period of time. But when we overwrite it, we destroy the information that we once described that period of time in the life history of that object differently. In physical assertion tables, there is only one version for an object. It describes what the object is currently like, just as rows in conventional tables do. If something about the object changes, we update that version, just as we update conventional data. We overwrite the old data with the new data, destroying information about what the object used to be like. On the other hand, if we discover, not that the object has changed, but rather that our data is wrong, we do not overwrite the erroneous data. Instead, we create a new assertion about that object and, on the same clock tick that we enter that new, corrected assertion, we stop asserting the row with the erroneous data. 5 Of course, the rows in the physical tables that Asserted Versioning manages have both time periods. Each row is both a version and an assertion about that version. Assertion tables, as we said, are views on these physical tables. They are, as we will see, views of objects which are never represented by more than one version at a time. And version tables, by the same token, are also views on these physical tables. They are views of objects for which there is only one version in effect during any given period of time, that being the currently asserted description of what that object is like at that time. A conventional table view can also be defined on an asserted version bi-temporal table. This is a view of all and only those rows which are currently asserted and currently in effect. These rows in conventional table views have neither assertion nor effective time periods associated with them. Updates may either correct mistakes in the data, or reflect changes in the world. As with physical conventional tables, there is no way to look at the data and tell the difference. However, with a conventional table view, versions with future effective begin dates automati- cally roll into the view when that date arrives. Similarly, versions 5 If this description sounds unfamiliar, it is because IT professionals seldom use assertion tables. It would be like keeping a logfile of corrections inside a conventional table, and it just isn’t done. 112 Chapter 5 THE CORE CONCEPTS OF ASSERTED VERSIONING [...]... above it, graphics which position that data in both effective time and assertion time As well as additional diagrams which focus more on effective time than assertion time, we also present an in- line notation for single asserted Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00006-6 Copyright # 2010 Elsevier Inc All rights of reproduction in any form reserved 119 120 Chapter 6 DIAGRAMS... example, in saying that the effective begin date is the temporal delimiter of the start of an effective time period Eff-end Effective end date The date on which the business data in a row is no longer in effect In other words, the date which is one clock tick past the last date on which the business data in a row is in effect Asr-beg Assertion begin date The date on which the business data in a row... table, not just for those querying one As long as they do not specify either assertion time or effective time on their insert, update or delete transactions, it will seem to them that they are maintaining data in a conventional table This maintenance is not done by writing SQL transactions; it is done by making an API Call to our Asserted Versioning Framework.6 These are insert, update and delete transactions... be during a period of time in its life history There will be no cases in which any representation of an existence-dependent child object will exist in effective time unless a representation of its parent object also exists in that effective time Surrogate Keys and Bi-Temporal Match Logic In Chapter 7, we will see each of the three temporal transactions used in what we called the basic scenario In those... effective time period of the business data on that row; and (iii) a unique identifier of the assertion time period of that version In the Asserted Versioning implementation of bi-temporality, the effective and assertion begin dates each represent their full time periods in the table’s primary key But how can each of these dates represent an entire time period? For example, in Figure 6.1, the effective time. .. VERSIONING Glossary References Glossary entries whose definitions form strong interdependencies are grouped together in the following list The same glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter There will usually be several other, and often many other, glossary entries that are not included in. .. table, the primary key columns for the corresponding entity in the logical data model must be included as non-primary key columns in the asserted version table These columns make up what we call the business key of an asserted version table In order to be able to show sample tables across the width of a page, we have not included business keys in our examples Instead, we have used values for our surrogate... of writing queries directly against physical tables is essential With Asserted Versioning’s bi-temporal tables, the flexibility of writing queries directly against those tables is even more valuable As we will see in Chapters 12–14, the full range of all possible bi-temporal data is covered by nine categories of that data, those categories being the three-by-three combination of data which is in the... Mar10 Business Data Assertion Period Figure 6.1 The Asserted Versioning Table Row Creation Date Chapter 6 DIAGRAMS AND OTHER NOTATIONS description of what that object is like, beginning on that date As we have pointed out before, these columns which implement Asserted Versioning are all shown as date columns in this book, but they could as well be timestamps Also note that although this column heading is... eff_beg_dt We use hyphens in these column headings because underscores do not stand out as well, particularly when a column heading is underlined, as it is for all primary key columns When these columns names are used in the text, they will be italicized, although usually we will replace them with their full spelling equivalents For example, when referring, in the text, to the column heading eff-beg, we will . integrity constraints in relational theory are entity integrity, referential integrity and domain integrity. Entity integ- rity insures that the object. row in Y dependent on? Which row in X does the foreign key in Y point to? The answer is that the foreign key in Y does not point to any specific row in

Ngày đăng: 24/12/2013, 02:16

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan