Tài liệu Managing time in relational databases- P17 doc

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	20
Dung lượng	390,31 KB

Nội dung

Pending Updates exist in what we called, in the previous chapter, either the assertion-time near future or the assertion- time far future. Those in the near future have an assertion begin date close enough to Now() that the business is willing to let the passage of time make them current. Near future deferred assertions would typically have a begin date that will become current in the next few seconds, hours, days or weeks. In a conventional database, pending updates are transactions accumulated in an external batch transaction file, or perhaps in a batch transaction table within the database. Far future deferred assertions are the internalization of data located in what are often called staging areas. They are collections of data that are usually more complicated than usual to update. By placing them in far future assertion time, we guarantee that they will not inadvertently become current assertions simply because of the passage of time. They can become current assertions only when, presumably after a review-and-approve process, the business releases them into near-future assertion time. Pending Projections: Future Claims About the Future The Pending Projections dataset consists of all those rows in an asserted version table which lie in both the assertion time future and in the effective time future. Its subject matter is things as they may turn out to be. Its rows are claims about what currently lies in the future, but claims which we are not yet willing to make. Pending Projections are a record of what we may eventually be willing to say things are going to be like. Here is the view which re-presents Pending Projections. With the suffix “Pend_Proj” standing for “pending projections”, it looks like this: CREATE VIEW Policy_Pend_Proj AS SELECT oid, asr_beg_dt, asr_end_dt, eff_beg_dt, eff_end_dt, client, type, copay what things used to be like what we used to claim what we currently claim what we will claim things will be like what we will claim what things are like what things will be like Figure 13.12 Pending Projections. 306 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS FROM Policy_AV WHERE asr_beg_dt > Now() AND eff_beg_dt > Now() As we have seen with our other re-presented pipeline datasets, Pending Projections include both the assertion and effective time period as part of the unique identifier because both temporal dimensions are specified as ranges, and neither as points in time. Mirror Images of the Nine-Fold Way As we said in Chapter 9, effective time exi sts within assertion time. First, logically speaking, we make a state ment about how things are. Next, logically speaking, we make a truth claim about that statement. Most of our queries against bi-temporal tables will specify a point in assertion time—most commonly Now()—and then ask for rows asserted at that point in time that were in effect at some point or period of effective time. For example, we might ask for all policies that were in effect on August 23, 2008, as we currently believe them to hav e been. Or we might ask for all policies which we currently claim were in effect any time in the first half of 2008. Pinning down a point in assertion time, and then asking for versions of objects claimed at that point in time to be correct, is the general form that queries will take when posed by business users. But we can look at bi-temporal data from the opposite point of view as well. We can pin down a point or period in effective time, and ask for everything we ever asserted about things at that point in time. It would not be too misleading to call this the auditor’s point of view. From this point of view, we are interested in the history of our claims about what is true, not in the history of what actually happened out there in the world. Of course, we could also ask for all future assertions about a given point in effective time. But auditors, by the nature of their work, have little interest in future assertions. By the same token, they are very interested in past assertions, along with current ones. So an auditor’s mirror- image of the nine categories reduces to a set of six categories, those shown in Figure 13.13. These views that auditors are interested in are physically the same ones we have already described. The “mirror-image” is in perspective, not in content. Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 307 The Value of Internalizing Pipeline Datasets The cost of managing physical pipeline datasets is high. This cost is seldom discussed because it is universally thought to be just an inevitable cost of doing business. Bringing down this cost is a matter of doing all those various things that IT management has done for decades, and continues to do. Quality control pro- cedures are put in place so errors don’t creep into our databases and later have to be backed out. The platform costs of storing, transforming, and moving data into and out of pipeline datasets are controlled by minimizing redundancy, and by moving datasets up and down the storage hierarchy. Software that sets up and runs production schedules minimizes the human costs of scheduling work involving these pipeline datasets. But the work of managing pipeline datasets is tedious. And whenever the management of these datasets is a one-off kind of thing, i.e. whenever the development group has to manage these datasets rather than the IT Operations group that handles scheduled maintenance, errors in managing them are not uncommon. Asserted Versioning does not offer a way to more efficiently manage pipel ine datasets. It offers a way to eliminate them and, consequently, eliminate the totality of their management costs! There will always be some circumstances in which data must be manipulated in external pipeline datasets. But these can become the exception rather than the rule. In place of these pipeline datasets, Asserted Versioning stores the information contained in those pipeline datasets internally, within the production tables that are their sources and destinations. Pending transactions can be stored within the production tables themselves. Posted transactions can be, too. Data staging areas can also exist as semantically distinct sets of rows, physically contained within production tables. Pipeline datasets, then, cease to exist as distinct physical objects. They become virtualized, as semantically distinct collections of rows what things used to be like what we used to claim what we currently claim what we used to claim things used to be like what we currently claim things used to be like what we currently claim things are like now what we currently claim things will be like what we used to claim things are like now what we used to claim things will be like what things are like what things will be like Figure 13.13 The Auditor’s Mirror Image of the Nine-Fold Way. 308 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS all physically existing within the same tables, re-presented in different views. We may think that the principal cost elimination benefit of internalizing pipeline datasets is that it reduces the number of distinct datasets that programs, SQL and production scheduling software have to identify and manage. This is a reduction in the cost of the mechanics of pipeline datasets. Instead of assembling data from multiple tables, it already exists all in one place. But the more significant cost reduction has to do with the semantics of pipeline datasets. With all data about the same things in the same place, we will, all of us, find all of it when we go looking for it. The most junior member of the business community will find the same set of data for his queries that the most senior member does. There won’t be differences in completeness of the source data, or quality of that data, as there so often are in today’s business world and today’s collections of business data. When we need any of this data, we won’t have to go looking for it. All of the data about what we once thought was true, or what we currently think is true, or what we are not yet willing to assert is true, will be available by simply changing the assertion point- in-time selection criterion on views and queries. By changing that predicate in a WHERE clause to a past point in assertion time, we will be able to access the internalized re-presentation of posted transactions. By changing the predicate to a future point in time, we will be able to access the internalized re-presentation of pending transactions. By the same token, we will be able to access historical data about what things used to be like from the same table that contains data about what they are like right now, and that may also contain data about what those things are going to be like sometime in the future. Again, it will be as easy as changing a predicate in a WHERE clause. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same glossar y entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfami liar term is encountered. Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 309 We note, in particular, that none of the nine types of pipeline dataset are included in this list. In general, we leave category sets out of these lists, but recommend that the reader look them up in the Glossary. as-is as-was Asserted Versioning Framework (AVF) assertion time statement conventional table non-temporal table deferred assertion far future assertion time near future assertion time instance type managed object object oid queryable object pipeline dataset inflow pipeline outflow pipeline internalization of pipeline datasets re-presentation of pipeline datasets production database production table temporal dimension temporal transaction version 310 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 14 ALLEN RELATIONSHIP AND OTHER QUERIES Allen Relationship Queries 313 Time Period to Time Period Queries 316 Point in Time to Period of Time Queries 333 Point in Time to Point in Time Queries 341 A Claims Processing Example 343 In Other Words 346 Glossary References 347 In this chapter, we examine each of the thirteen Allen relation- shi ps, as well as each non-leaf node in the taxonomy of Allen relationships which we introduced in Chapter 3. We describe the Allen relationships as they hold between two time periods, between a time period and a point in time, and also between two points in time. We show how these relationships are expressed in terms of time periods represented with the closedopen convention, and we provide a sample query for each one. After a section in which we illustrate how much simpler these queries would be to express if we had a PERIOD datatype, we conclude this chapter by discussing queries which involve temporal joins. Figure 14.1 sho ws our tax onomy of the Allen relationships. Those relationships are the leaf nodes in this taxonomy. Every leaf node has an inverse relationship, except the [equals] relationship. We italicize that relationship name to emphasize that it has no inverse. So counting the [ equals] relationship, and the six leaf nodes and their inverses, we have the full set of thirteen Allen relationships. We also underline the non-leaf node relationships in the taxonomy, to emphasize that they are relationships we have defined, and are not one of the Allen relationships. Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00014-5 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 311 Many of the Allen relationships are used by the AVF to enforce TEI and TRI. For example, as we pointed out in Chapter 3, the [ intersects] relationship is important because it defines TEI. If two asserted versions of the same object share even a single effective time clock tick, within shared assertion time, then they [ intersect], and violate TEI. Otherwise, they don’t. The [fills] relationship is important because it defines TRI. If a TRI relationship fails, it is because there is no episode of the referenced parent object which temporally includes, i.e. [ fills À1 ], that of the child version. The [before] relationship is important because it distinguishes episodes from one another. Every episode of an object is non-contiguous with every other episode of the same object, and so one of them must be [before] the other. As for queries issued by business users, we have found that many ad hoc queries, and perhaps the majority of them, are queries about episodes, not about versions. That is, they are queries that want (i) the begin and end date of the episode and, for business data, (ii) the last version of past episodes, the current version of current episodes, or the latest version of future episodes. Because of the importance of episodes to queries, the SQL examples in this chapter will select episodes. The last, current or latest version contains the business data. The episode Intersects Excludes | | | | Before | | | | Meets | | | Equals | | | | Occupies Starts Finishes During | | | | Aligns | | | | | | | | Time Period Relationships Along a Common Timeline Fills Overlaps Figure 14.1 The Asserted Versioning Allen Relationship Taxonomy. 312 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES begin date that is on every version, and the version’s own effective end date, provide the effective time period of the episode itself. We also note that the SQL in many of the following examples does not represent typical queries that a business would write. Each of these queries focuses on one specific Allen relationship, and show how to express it in SQL. In particular, these sample queries do not include typical join criteria. Instead, the only join criteria used in these examples are two time periods and the Allen relationship between them. Another reason these sample queries don’t look very real world is that they select from two of the tables in our sample database that don’t have much to do with one another. In particular, there is no TRI relationship between them. They are the Policy and Wellness Program tables. If we had used, for example, the Client and Policy tables instead, many of the queries would have been more realistic. But TRI-related tables cannot illustrate all of the Allen relationships. In fact, every instance of a TRI relationship involves a parent and a child time period that is an instance of one of seven of the Allen relationships. This leaves six other Allen relationships that TRI-related tables cannot illustrate. Nevertheless, as overly simple and unrealistic as most of these samp le queries may be, they are the foundation for all queries that express temporal relationships. No query will ever need to express a temporal relationship that is not one of these relationships. So if we know how to write the temporal predicates in these queries, we will know how to write any temporal predicate for any query. Allen Relationship Queries The value of reviewing all the Allen relationships in terms of queries against asserted version tables is that, as we already know, the Allen relationships are exhaustive. There are no posi- tional relationships along a common timeline, among time periods and/or points in time, other than those ones. Thus, by showing how to write a query for each one of them, as well as for the groups of them identified in our taxonomy, we will have provided the basic material out of which any query against any assertion version table may be expressed. In addition to the thirteen Allen relationships themselves, our taxonomy provides five additional relationships, each of which is a logical combination of two or more Allen relationships. And these combinations are not formed simply by stringing together Allen relationships with OR predicates. Although they Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 313 are, necessarily, logically equivalent to the OR’d set of those relationships, they are often much simpler expressions, easier to understand and faster when executed. In these sample queries, we will not include predicates for assertion time, and will pretend that our sample tables are uni-temporal versioned tables. This eliminates unnecessary detail from these examples. We will do this by using two version table views, shown below: V_Wellness_Program_Curr_Asr and V_Policy_Curr_Asr. The former is a view of all currently asserted Wellness Program versions. The latter is a view of all currently asserted Policy versions. CREATE VIEW V_Wellness_Program_Curr_Asr AS SELECT * FROM Wellness_Program_AV WHERE asr_beg_dt <¼ Now() AND asr_end_dt > Now() CREATE VIEW V_ Policy _Curr_Asr AS SELECT * FROM Policy_AV WHERE asr_beg_dt <¼ Now() AND asr_end_dt > Now() In these example queries, as we said before, we will be selecting episodes, not versions. For the two tables used in this chapter, these are the views which provide episodes as queryable managed objects: CREATE VIEW V_Wellness_Program_Epis AS SELECT wp.wellpgm_oid, wp.epis_beg_dt, wp.eff_end_dt AS epis_end_dt, wp.welllpgm_nm, wp.wellpgm_nbr, wp.wellpgm_cat_cd FROM V_Wellness_Program_Curr_Asr AS wp WHERE wp.eff_end_dt ¼ (SELECT MAX(wpx.eff_end_dt) FROM V_Wellness_Program_Curr_Asr AS wpx WHERE wpx.wellpgm_oid ¼ wp.wellpgm_oid AND wpx.epis_beg_dt ¼ wp.epis_beg_dt) CREATE VIEW V_Policy_Epis AS SELECT pol.policy_oid, pol.epis_beg_dt, pol.eff_end_dt AS epis_end_dt, pol.policy_type, pol.copay_amt, pol.client_oid, pol.policy_nbr FROM V_Policy_Curr_Asr AS pol WHERE pol.eff_end_dt ¼ (SELECT MAX(px.eff_end_dt) FROM V_Policy_Curr_Asr px WHERE px.policy_oid ¼ pol.policy_oid AND px.epis_beg_dt ¼ pol.epis_beg_dt) 314 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES These episode views are the query-side work of defining an episode datatype. The AVF presents episode s as maintainable managed objects. These views present episodes as queryable managed objects. This is a ver y important point. Both computer science research and IT practice have shown the importance of the concept of a string of one or more contiguous clock ticks with a known location in time. SQL does not directly support this concept; and so instead, we, and others, have to write code to exclude gaps and overlaps occurring in the timespan between a pair of date s or timestamps. A PERIOD datatype is the direct support needed for this concept. This datatype implements this concept at the correct level of abstraction. By the same token, our own research and practice has shown the importance of the concept of an episode, a string of one or more contiguous and non-overlapping versions of the same object. Without that concept, and the concepts of objects and versions on which it depends, there is also no concept of temporal entity integrity and temporal referential integrity. Without that concept, collections of rows are defined, as needed, within each SQL statement. As we can see with both the standard and alternative temporal models, their SQL insert, update and delete statements do result in bi-temporal data that satisfies what we call TEI and TRI. Their SQL queries do find episodes, when they need them, past assertions when they need them, and so on. But the level of abstraction is wrong, for the same reason that getting the same results with a pair of dates that one would get with a PERIOD datatype is wrong. So we now have two views which externalize, as queryable managed objects, the best data we currently have (i.e. our currently asserted data) about policy episodes and wellness program episodes. Now, using these two views, we will define another view that we will use to illustrate each of the Allen relationships. This is the view V_Allen_Example. This view will keep the examples as small and easy to understand as possi- ble, eliminating all extraneous and repetitive detail while focus- ing on the Allen relationships themselves. Here is the V_Allen_Example view: CREATE VIEW V_Allen_Example AS SELECT wp.wellpgm_oid, pol.policy_oid, wp.epis_beg_dt AS wp_epis_beg_dt, wp.epis_end_dt AS wp_epis_end_dt, pol.epis_beg_dt AS pol_epis_beg_dt, pol.epis_end_dt AS pol_epis_end_dt Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 315 [...]... relationships which includes the [starts], [startsÀ1], [finishes], [finishesÀ1], [during] and [duringÀ1] relationships In other words, it combines the [during] relationships with the [aligns] relationships These are all the relationships in which one time period includes all the clock ticks that are in the other time period, but also includes at least one additional clock tick In the non-superscripted... limiting case of a time period that includes only a single clock tick P1 [starts] P2 This is a pair of relationships, one the inverse of the other.1 In the non-superscripted relationship, the first time period is the shorter one Figure 14.2 shows this relationship, and its place in our taxonomy The two dashed lines in the illustration graphically represent P1 and P2, with P1 being the upper dashed line... that is in P1 is also in P2, but that there is at least one clock tick in P2 that is not also in P1 321 322 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies During Aligns Starts Finishes | -| | | | -| | | | -| | -| Figure 14.6 P1 [occupies] P2 The inverse of this relationship is: P1 [occupiesÀ1] P2 In the superscripted... wp_epis_end_dt) P1 [during] P2 This is a pair of relationships, one the inverse of the other In the non-superscripted relationship, the first time period is the shorter one Time Period Relationships Along a Common Timeline Intersects Fills Occupies During | -| | | Figure 14.5 P1 [during] P2 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES The predicate for this relationship, as it holds between two time periods... a join predicate This is so we can see how the Allen relationships correlate the two tables using dates Time Period to Time Period Queries Allen relationships may relate two time periods, or a time period to a point in time, or two points in time We will represent time periods, using our closed-open notation, as follows: Period 1: [eff_beg_dt1 – eff_end_dt1] Referred to as P1 Period 2: [eff_beg_dt2... [finishes] P2 This is a pair of relationships, one the inverse of the other In the non-superscripted relationship, the first time period is the shorter one Figure 14.3 shows this relationship, and its place 317 318 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Finishes | -| | -| Figure 14.3 P1 [finishes] P2 in. .. wp_epis_end_dt Time Period Relationships Along a Common Timeline Intersects Fills Equals | -| | -| Figure 14.7 P1 [equals] P2 323 324 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES P1 [fills] P2 This not an Allen relationship It is the node in our taxonomy of Allen relationships which includes the [starts], [startsÀ1], [finishes], [finishesÀ1], [during], [duringÀ1] and [equals] relationships In other... eff_end_dt2)) Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Starts | -| | | Figure 14.4 P1 [aligns] P2 Finishes | -| | -| 319 320 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES It says that P1 and P2 either start or end at the same time, but do not both start and end at the same time This relationship has no inverse Consider the following request for information:... eff_end_dt2) Time Period Relationships Along a Common Timeline Intersects Fills Equals Occupies | -| | -| During Aligns Starts | -| | | Figure 14.8 P1 [fills] P2 Finishes | -| | -| | -| | | Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES It says that P1 doesn’t start after P2 and doesn’t end before P2 The idea behind it is that it includes every relationship in which every clock tick in P2... are standard They were provided by Allen in his original article, and continue to be used throughout the computer science literature Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Starts | -| | | Figure 14.2 P1 [starts] P2 two time periods expressed as pairs of dates using the closedopen convention, is: (eff_beg_dt1 . effect any time in the first half of 2008. Pinning down a point in assertion time, and then asking for versions of objects claimed at that point in time to. Period Queries 316 Point in Time to Period of Time Queries 333 Point in Time to Point in Time Queries 341 A Claims Processing Example 343 In Other Words 346 Glossary

Ngày đăng: 21/01/2014, 08:20

Xem thêm