Tài liệu Managing time in relational databases- P15 pptx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	20
Dung lượng	315,88 KB

Nội dung

are true statements, and beliefs that those st atements are true statements. Using the terminology of beliefs, we may say that the rows in tables in relational databases may relate data to time in any of nine ways. So where “thing” means, more precisely, “persistent object”, we can organize these nine relationships of rows to time as shown in Figure 12.1. In Asserted Versioning, beliefs are what we assert by means of ro ws in our tables, and facts are what those rows describe about the objects they represent. Columns, in Figure 12.1, from left to right, represent past, present and future beliefs. Rows, in that same illustration, from top to bottom, represent past, present and future facts. Temporalized beliefs are represented by rows with assertion time periods. Temporalized facts are represented by rows with effective time periods, i.e. by versions. 2 But temporal transactions cannot insert, update or delete all nine types of rows. Specifically, temporal transactions cannot insert, update or delete rows making statements about what we used to believe, statements of type (i), (ii) or (iii). It’s important to under stand why this is so. Temporal transactions create new rows in temporal tables. But these rows represent beliefs, and we can’t now make a statement about what we used to believe. On the other hand we can, of course, now make a statement about what used to be true. To understand what the two temporal dimensions of bi-temporal data really mean, we ne ed to understand why distinctions like these ones are valid—wh y, in this case, we can make statements about how things used to be, but cannot make statements about what we used to think about them. what things used to be like what we used to believe (i) what we used to believe things used to be like (ii) what we used to believe things are like now (iv) what we currently believe things used to be like (v) what we currently believe things are like now (vi) what we currently believe things will be like (vii) what we will believe things used to be like (viii) what we will believe things are like now (ix) what we will believe things will be like (iii) what we used to believe things will be like what we currently believe what we will believe what things are like what things will be like Figure 12.1 Facts, Beliefs and Time. 2 Of course, since we cannot know the future, we cannot state with certainty either what the facts will be, or what we will believe. Instead, “what things will be like” should be taken as shorthand for “what things may turn out to be like”, and “what we will believe” should be taken as shorthand for “what we may come to believe”. Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS 265 So why can’t we? Surely we make statements about what we used to believe all the time. For example, we can now state that we used to beli eve that Bernie Madoff was an honest man. If we can make such statements in ordinary conversation, why can’t we make them as transactions that will update a database? The reason is that in a database, as we said, a belief is expressed by the presence of a row in a table. No row, no belief. So if we write a transaction today that creates a row stating that we believed something yesterday, we are creating a row that states that we believed something at a time when there was no row to represent that belief. Given that the beliefs we are talking about are beliefs that certain statements about persistent objects are true, and given that those statements are the statements made by rows in tables, it would be a logical contradiction to state that we had such a belief at a point or period in time during which there was no row to represent that belief. 3 This leaves us six combinations of beliefs and what they are about that we can, without logical contradiction, modify by means of a temporal transaction. Asserted Versioning recognizes all six combinations. But the standard temporal model does not permit data to be located in future belief time, and so it does not recognize combinations (vii), (viii) or (ix) as meaningful. It does not attempt to develop a data management framework within which we can make statements about what we may in the future believe. Future beliefs, and their representation in temporal tables as not yet asserted rows, are precisely what make the difference between the assertion time dimension of Asserted Versioning and the transaction time dimension of the standard temporal model. Without it, the two temporal dimensions of Asserted Versioning are semantically equivalent to the two temporal dimensions of the standard temporal model. Without it, asse r- tion time is equivalent to transaction time. But is it valid to locate data in future belief time? After all, as we noted in a footnote a short while ago, we can be certain about what we once believed and about what we currently beli eve, but we cannot be certain about what we will believe. On the other hand, a lack of certainty is not the same thing as a logical contradiction. There is nothing logically invalid about making statements about what we think was, is or may come to be true. By the same token, there is nothing logically invalid about making 3 In fact, we offer this as a statement of what we will call the temporalized extension of the Closed World Assumption (CWA). All too briefly: the CWA is about the relationship of a collection of statements to the world. Its temporalized extension is about the relationship of beliefs (assertions, claims, etc.) to each of those statements. 266 Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS statements about what we currently believe or may come to believe was, is or may turn out to be true. The only logical con- tradition is the one already noted, that because of the temporalized extension of the CWA, it is a logical contradiction to create a row representing a statement about what, prior to the time the row was created, we then believed/asserted to be true. We should now have a clear idea of what deferred transactions and deferred assertions are. They are the data in categories (vii), (vii i) and (ix) of Figure 12.1. We understan d that neither the standard temporal model nor, for that matter, any more recent computer science work on bi-temporality that we are aware of, recognizes data which represents what we are not yet willing to assert is true about what things were like, are like or may turn out to be like. Before discussing deferred transactions and deferred assertions, we want to ex plain how they are one subtype of a more generalized concept, of something we call pipeline datasets.Oncewehave done that, the remainder of this chapter will focus on deferred transactions and deferred assertions, and the business value of internalizing them. Then, in the next chapter, we will look at several other kinds of pipeline datasets, and the business value of internalizing them as well. The Internalization of Pipeline Datasets We begin by introducing some new terminology. Dataset is an older technical term, and up to this point in the book, we have used it to refer to any physical collection of data. Going forward, we would like to narrow that definition a bit. From now on, when we talk about datasets, we will mean physical files, tables, views or other managed objects in which the managed object itself represents a type and contains multiple managed objects each of which represent an instance of that type. Thus, comma-delimited files are datasets, as are flat files, indexed files and relational tables themselves. A graphic image is not a dataset, in this narrower sense of the term, nor is a CLOB (a character large object). Production datasets are datasets that contain production data. Production data is data that describes the objects and events of interest to the business. It is a semantic concept. Pro- duction databases are the col lections of production datasets which the business recognizes as the official repositories of that data. Production databases consist of production tables, which are production datasets whose data is designated as always reliable and always available for use. Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS 267 When prod uction data is being worked on, it may reside in any number of production datasets, for example in those datasets we call batch transaction files,ortransaction tables,or data staging areas. Once we’ve got the data just right, we use it to transform the production tables that are its targets. The transformation may be carried out by applying insert, update and delete transactions to the production tables. At other times, the transformation may be a merge of data we’ve been working on into those tables, or a replacement of some of the data in those tables with the data we’ve been working on. When data is extracted from production tables, it has an intended destination. That destination may be another database or a business user, either of which may be internal to the business or external to it. Sometimes that data is delivered directly to its destination. At other times, it must go through one or more inter- mediate stages in which various additional transformations are applied to it. When first extracted from production tables, this data is usually said to be contained in query result sets. As that data moves farther away from its point of origin, and through additional transformations, the resulting production datasets tend to be called things like extracts. At its ultimate destinations, it is manifested as the content displayed on screens or in reports,orasdatathathas just been acquired by downstream organizations, perhaps to sup- ply their own databases as datasets which tend to be call feeds. Let’s make the metaphor underlying this description a little more explicit by using the concept of pipelines. Pipeli ne production datasets (pipeline datasets, for short) are points at which data comes to rest along the inflow pipelines whose termination points are production tables, or along the outflow pipelines whose points of origin are those same tables. The points of origin of inflow pipelines may be external to the organization or internal to it; and the data that flows along these pipelines are the acquired or generated transactions that are going to update production tables. The termination points of outflow pipelines may also be either internal to the organization, or external to it; and we may think of the data that flows along these pipelines as the result sets of queries applied to those production tables. There may be many points at which incoming production data comes to rest, for some period of time, prior to resuming its jour- ney towards its target tables. Similarly, there may be many points at which outgoing data comes to rest, for some period of time, prior to continuing on to its ultimate destinations. These points at which production data comes to rest are these pipeline datasets. But these points of rest, and the movement of data from one to another, exist in an environment in which that data is also at 268 Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS risk. The robust mechanisms with which DBMSs maintain the security and integrity of their production tables are not available to those pipeline datasets which exist outside the production database itself. All in all, pipeline data flowing towards production tables would cost much less to manage, and would be managed to a higher standard of security and integrity, if that data could be moved immediately from its points of origin directly into the production tables which are its points of destination. Let’s see now if this is as far-fetched a notion as it may appear to be to many IT professionals. We will look at deferred transactions and deferred assertions in this chapter, and consider other pipeline datasets in the next chapter. Deferred Assertions We will discuss deferred transactions and deferred assertions, and how they work, by means of a series of scenarios in which deferred transactions are applied to sample data. A Deferred Update to a Current Episode We begin with an open episode of policy P861. As shown in Figure 12.2 , the current version in this episode—P861(r4)—has an [Aug 2012 – 12/31/9999] effective time period. 4 It also has an [Aug 2012 – 12/31/9999] assertion time period. From this, we know that there is no representation of this object anywhere else in the production table, in either temporal dimension, from August 2012 until further notice. By now we should know how to read an asserted version table like this. The episode extends from an effective begin date of Row # 1 P861 Policy Table Nov11 Nov11 Nov11 Nov11C882 C882 C882 C882 HMO HMO PPO POS $20 $50 $30 $40 Nov11 Nov11 Nov11 Mar12 Mar12 Mar12 Mar12 Apr12 Apr12 Apr12 Apr12 Aug12 Aug12 Aug12 Aug129999 9999 9999 9999 9999 oid eff-beg eff-end asr-end type copay row-crt client epis- beg asr-beg P861 P861 P861 2 3 4 Figure 12.2 A Current Episode: Before the Deferred Assertion. 4 The notation “P861(r4)” indicates row #4 in the referenced figure, in this case Figure 12.2. The policy identifier is not strictly necessary, and is included just to remind us which object we are talking about. Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS 269 November 2011 to an effective end date of 12/31/9999. Every version in this episode is currently asserted. We will now submit a deferred temporal update. Again, we assume that it is now January 2013. That transaction looks like this: UPDATE Policy [P861,,, $55] May 2012, Jul 2012, Jan 2090 The three temporal parameters following the bracketed data are the effective begin date, effective end date and assertion begin date. All temporal updates discussed so far have accepted the default value for the assertion begin date, that value being Now(). Here, with our first deferred transaction, we override that default with a future date. There are several things to note about this transaction. First of all, the object specified in this transaction is policy P861, and the transaction’s effective timespan is May 2012 to July 2012, i.e. the two months of May and June 2012. The assertion begin date is January 2090, a date which is several decades in the future. The first thing the AVF does is to split one or more rows in the Policy table into multiple rows such that one or a contiguous set of those rows has the oid and the effective timespan specified on the transaction. When a set of one or more contiguous asserted version rows, and a temporal transaction, have the same oid and also the same effective time period, we will say that they match. Since the transaction specifies an effective timespan of [May 2012 – July 2012], the AVF modifies the current assertions for P861 so that one version matches the transaction. That is P861 (r6), as shown in Figure 12.3. This results in a set of rows that are semantically equivalent to the original ro w, those rows being P861(r5, r6 & r7). They cover the same effective time period as the original row; and they contain the same business data as the original row. Note Row # 1 oid eff-beg eff-end asr-end type copay row-crt epis- beg clinet asr-beg Nov11 Policy Table Nov11 Nov11 C882 HMO $20 $50 $30 $40 $30 $30 $30 HMO POS PPO HMO HMO HMO C882 C882 C882 C882 C882 C882 Nov11 Nov11 Nov11 Nov11 Nov11 Nov11 Mar12 Mar12 Mar12 May12 May12 Jul12 Jul12 Jan13 Jan13 Jan13 Jan13 Apr12 Apr12 Apr12 Apr12 Aug12 Aug12 Aug12 Nov11 Mar12 Jan13 Jan13 Jan13 Apr12 Aug12 Aug12 9999 9999 9999 9999 9999 9999 9999 P861 P861 P861 P861 P861 P861 P861 2 <3> 4 <5> <6> <7> Figure 12.3 A Current Episode: Effective Time Alignment. 270 Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS that, in Figure 12.3, we have not yet created the deferred assertion. We have just realigned version boundaries, within current assertion time, as a preliminary step to carrying out the update. Prior to this realignment, the effective timespan of the transaction was located [during] the effective time period of P861 (r3). Now the effective timespan of the transaction [equals] the effective time period of P861(r6), and so the transaction matches that asserted version. The result of this alignment is shown in Figure 12.3. P861(r3) has been withd rawn into past assertion time, into an assertion time period that ends on January 2013. P861(r5, r6 & r7) have replaced it in current assertion time, in assertion time periods that begin on January 2013 (and not, let it be noted, on January 2090). Again, we use angle brackets on row numbers to indicate rows that are part of an atomic and isolated unit of work, a series of physical modifications to the database that must together all succeed or all fail, and a set of rows that are not visible in the database until the unit of work completes. Note that P861(r5, r6 & r7) have the same episode begin date and the same business data as row 3. In addition, their three effective time periods cover exactly the same clock ticks as the withdrawn P861(r3). These three rows, together, are semantically equivalent to P861(r3). They represent the same object in exactly the same effective time clock ticks; and in every such clock tick, they attribute the same business data to that object. Nor has the assertion time in the table been altered, either. Prior to this transaction, the statement made by P861(r3) was asserted from April 2012 to 12/31/9999. Midway into the transaction, at the point shown in Figure 12.3, the table still asserts that from April 2012 to 12/31/9999, P861 was owned by client C882, was an HMO policy, and had a copay of $30. It asserts this because the statement made by the logical conjunction of P861 (r6, r7 & r8) is truth-functionally equivalent to the statement made by P861(r6), and the assertion times of [Apr 2012 – Jan 2013] and [January 2013 – 12/31/9999] both [meet] and, together, [equal] the original assertion time of P861(r3), before it was withdrawn. At this point in the transaction, we have per- formed syntactic surgery on the target table, but have in no way altered its semantic content. There is now one and only one row in the target table that mat ches the transaction. It is P861(r6). The AVF next withdraws P861(r6), moving it into closed assertion time, i.e. giving it an assertion time period with a non-12/31/9999 assertion end date. It does so by giving P861(r6) an assertion end date that matches the assertion begin date on the transaction, thus Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS 271 preserving the assertion time continuity of this effective time history of P861. The next thing the AVF does is to make a copy of P861(r6), apply the copay update to that copy, and give it an assertion time period of [Jan 2090 – 12/31/9999]. This becomes P861(r8), the row that supercedes row 6. This row is the deferred assertion. The result is shown in Figure 12.4. Note that this closed assertion is still current. It is currently J anuary 2013, and so Now() still falls between the assertion begin and end dates of P861(r6), and will continue to do so until Janu- ary 2090. So a closed assertion time period is one with a non- 12/31/9999 end date. Some closed assertion time periods are past; they are no longer asserted. But others are current, like this one. And yet others may be assertion time pe riods that lie entirely in the future. Note that this process is almost identical to the familiar process of withdrawing a version into past assertion time and superceding it with a row in current assertion time. The only difference is that the withdrawn assertion is moved into closed but still current assertion time, and the superceding assertion is placed into future assertion time. At this point, both P861(r3 & r6) are locked. The AVF will never modify P861(r3) because it is already located in past assertion time. But P861(r6) is also locked, even though it is still currently asserted. The AVF treats any row with a non-12/31/9999 assertion end date as locked. The reason all such rows are locked, including those whose assertion time periods are not yet past, is that the database contain s a later assertion which otherwise matches the locked assertion. In this case, P861(r6) is locked because the Policy table now contains a later assertion that was created from it. That later assertion was supposedly written and submitted based on Row # 1 oid eff-beg eff-end asr-end type copay row-crt epis- beg clinet asr-beg Nov11 Nov11 Nov11 C882 HMO $20 $50 $30 $40 $30 $30 $30 $55 HMO POS PPO HMO HMO HMO HMO C882 C882 C882 C882 C882 C882 C882 Nov11 Nov11 Nov11 Nov11 Nov11 Nov11 Nov11 Mar12 Mar12 Mar12 May12 May12 May12 Jul12 Jul12 Jul12 Jan13 Jan13 Jan13 Jan90 Jan13 Jan90 Apr12 Apr12 Apr12 Apr12 Aug12 Aug12 Aug12 Nov11 Mar12 Jan13 Jan13 Jan13 Jan13 Apr12 Aug12 Aug12 9999 9999 9999 9999 9999 9999 9999 P861 P861 P861 P861 P861 P861 P861 P861 2 <3> 4 <5> <6> <7> <8> Policy Table Figure 12.4 Withdrawing a Current Assertion into Closed Assertion Time, and Superceding It. 272 Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS then-current knowledge of the contents of the database, specifically of what the database then asserted about what P861 was like in May and June of 2012. If that description is allowed to change before the later assertion became current, then all bets are off. Another way to think about the locking associated with deferred transactions and deferred assertions is that it serializes those transactions. If a process about to update a row in a database does not first lock that row from other updates, then another update process could read the row before the first process is complete. Then, whichever process physically updates that row on the database first, its changes will be lost, overwritten by the changes made by the process which updates the database last. This could happen with deferred assertions if they were not serialized. The mechanics of deferred assertion locking are simple. Every temporal transaction has an assertion begin date, either the default date of Now() or an explicitly supplied future date. Tem- poral updates and temporal deletes begin their work by withdrawing the one or more versions which represent an object in any clock ticks included in the transaction’s effective timespan. The versions they withdraw are those versions located in the most recent period of assertion time. That may be current assertion time, and usually is. But when a deferred transaction has been applied to versions in current assertion time, it closes their assertion periods with the same date that begins the assertion period of the deferred assertion it creates, just as the deferred update we are discussing closed P861(r6) and sup- erceded it with P861(r8). And it creates a version that exists in future assertion time. Deferred transactions may then be applied to that deferred assertion, and we will explain how to do that in the next section. Note what is not locked. The episode itself is not locked. Out of the entire currently asserted effective time period from November 2011 to 12/31/9999, for P861, only two months have been locked. Inserts, updates and deletes can continue to take place against any of the other clock ticks in the episode occupied by P861—or, for that matter, against any clock ticks not occupied by P861. We have now completed the deferred transaction. As directed by the transaction, the AVF has created a version of P861, for the effective time months of May and June 2012, that will not be asserted until January 2090. If nothing happens between now and January 20 90, th en at that t ime, the database will stop asserting that P861 had a copay amount Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS 273 of $30 in May and June of 2012, an d begin asserting, instead, that it had a copay amount of $55 during those two long-ago months. A Deferred Update to a Deferred Assertion Now we have a deferred assertion. Next, let’s consider an update whi ch will apply to that deferred assertion. This transaction takes place on February 2013. UPDATE Policy [P861,,, $50] May 2012, Jun 2012, Jan 2090 Apparently, sometime in the month after the first deferred update, we decided that the copay update should have been increased to $50, not to $55, for the month of May 2012. To process this second deferred update, the AVF begins its work by looking for versions already in the target table, with the same oid, whose effective time periods [ intersect] the effective timespan specified on the transaction. It ignores past assertions, because database modifications neither affect past assertions nor are affected by them. The effective timespan for P861 that the AVF is looking for is [May 2012 – Jun 2012]. The AVF finds two rows—P861(r6 & r8) (as shown in Figure 12.4)— whose effective time include s that of the timespan on the transaction. Both rows have the same oid as the transaction, and both include the effective-time clock tick of May 2012. P861(r6), however, is locked because there is a later assertion about the same object that includes all its effective time clock ticks. It is P861(r8) that is the latest asse rtion which has an effective time period that [ intersects] that of the transaction. 5 That row’s time period, to be more precise, [starts -1 ] the effective time period on the transaction. So the target of the deferred update must be P861(r8). It is the latest, i.e. future-most, assertion abou t the month of May 2012, in the life of P861. Next, because P861(r8) includes June as well as May, the first thing the AVF does is to split that row to create a semantically 5 As we said in Chapter 3, we will refer to Allen relationships by using the relationship name enclosed in brackets. And as we said in Chapter 9, we will refer to temporal extent state transformations by using the transformation name enclosed in braces. In both cases, when we refer to non-leaf nodes in either taxonomy, we will underline the name. Thus we can say that one time period [meets] another, or that one time period [ intersects] another. We italicize the Allen relationship name equals, as we explained in Chapter 3, to mark the fact that, unlike all other Allen relationships, it has no distinct inverse. 274 Chapter 12 DEFERRED ASSERTIONS AND OTHER PIPELINE DATASETS [...]... assertion time is a physical update, for which there is no corresponding date For another thing, since a row in empty assertion time never was asserted, and never will be asserted, what information does it contain that would justify retaining it in the database? Well, in fact, a row in empty assertion time is informative The information it contains is information about an intention At one point in time, ... point in time, i.e in either May or June 2012 In other words, P861(r8) can’t be allowed to remain in future assertion time because it would then be a TEI conflict waiting to happen This is why the AVF moved it into empty assertion time This is the semantically correct thing to do With P861(r9 & r10) now in the database, which together match P861(r8), and with both being in yet-to-come assertion time, ... PIPELINE DATASETS Creating P861(r9 & r10) is a preparatory move made by the AVF, to isolate a single deferred assertion that will match the update transaction So P861(r8) was the correct one to go Having nowhere in past assertion time to go, and obviously not belonging in current assertion time, it went to the only place it could go—into non-asserted time, i.e into empty assertion time A row in empty... P861(r6) exists in a closed period of assertion time, it can, and indeed in this case must, be overridden So rather than thinking of the approval transaction as changing the assertion begin date on one or more deferred assertions, we should think of it as changing the hand-over clock tick between locked assertions and the deferred assertions that are being moved backwards in assertion time The approval... cousins, lock matching assertions that were already in the database at the time those transactions were carried out It locks them by giving them a non-12/31/9999 assertion end date In the case of a non-deferred update or delete, these locked assertions exist in past assertion time But in the case of a deferred transaction, the locked assertions remain in current assertion time, and their assertion time. .. AVF has created the two rows P861(r9 & r10) P861(r8) has been withdrawn into closed assertion time, but that assertion time is neither past nor present assertion time It is empty assertion time, because the time period [Jan 2090 – Jan 2090] includes no clock ticks, not a single one Reflections on Empty Assertion Time In all our dealings with temporal transactions, the assertion date specified on the transaction... information about an intention At one point in time, we apparently intended that the business data on that row would one day be asserted Perhaps we intended to deceive someone with that business data In that case, that row is a record of an intent to deceive By retaining the row, we retain a record of that intent Non-deferred transactions are always against currently asserted versions which have a 12/31/9999... backwards in assertion time, those approved assertions override any locked matching assertions In overriding them, it “sets them to naught” almost literally, by setting their assertion end dates to match their assertion begin dates, thus moving them into empty assertion time But there is one last issue to deal with We have emphasized that semantic constraints do not exist across assertion time periods But... managed object is moved backwards into an earlier period of assertion time, one which begins before the assertion time period containing its parent managed object, then the TRI relationship between them will be broken The assertion time movement will make the child managed object a referential “orphan” until the passage of time reaches the beginning of the assertion time period of the parent managed... that, in all cases, the business is willing to wait for these assertions to fall into currency, i.e to become current not because of some explicit action, but rather when the passage of time reaches their begin dates Deferred assertions may be created in near future assertion time, or moved to it from far future assertion time when the business approves of those assertions becoming production data Deferred . justify retaining it in the database? Well, in fact, a row in empty assertion time is informative. The information it contains is information about an intention belonging in current assertion time, it went to the only place it could go—into non-asserted time, i.e. into empty assertion time. A row in empty assertion time,

Ngày đăng: 21/01/2014, 08:20

Xem thêm