Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
390,31 KB
Nội dung
Pending Updates exist in what we called, in the previous
chapter, either the assertion-time near future or the assertion-
time far future. Those in the near future have an assertion
begin date close enough to Now() that the business is willing
to let the passage of time make them current. Near future
deferred assertions would typically have a begin date that will
become current in the next few seconds, hours, days or weeks.
In a conventional database, pending updates are transactions
accumulated in an external batch transaction file, or perhaps
in a batch transaction table within the database.
Far future deferred assertions are the internalization of data
located in what are often called staging areas. They are collections
of data that are usually more complicated than usual to update. By
placing them in far future assertion time, we guarantee that they
will not inadvertently become current assertions simply because
of the passage of time. They can become current assertions only
when, presumably after a review-and-approve process, the busi-
ness releases them into near-future assertion time.
Pending Projections: Future Claims About the Future
The Pending Projections dataset consists of all those rows in
an asserted version table which lie in both the assertion time
future and in the effective time future. Its subject matter is
things as they may turn out to be. Its rows are claims about what
currently lies in the future, but claims which we are not yet
willing to make. Pending Projections are a record of what we
may eventually be willing to say things are going to be like.
Here is the view which re-presents Pending Projections. With
the suffix “Pend_Proj” standing for “pending projections”, it
looks like this:
CREATE VIEW Policy_Pend_Proj
AS SELECT oid, asr_beg_dt, asr_end_dt, eff_beg_dt, eff_end_dt,
client, type, copay
what things
used to be like
what we used to claim what we currently claim
what we will claim things
will be like
what we will claim
what things
are like
what things
will be like
Figure 13.12 Pending Projections.
306 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS
FROM Policy_AV
WHERE asr_beg_dt > Now()
AND eff_beg_dt > Now()
As we have seen with our other re-presented pipeline
datasets, Pending Projections include both the assertion and
effective time period as part of the unique identifier because
both temporal dimensions are specified as ranges, and neither
as points in time.
Mirror Images of the Nine-Fold Way
As we said in Chapter 9, effective time exi sts within assertion
time. First, logically speaking, we make a state ment about how
things are. Next, logically speaking, we make a truth claim about
that statement.
Most of our queries against bi-temporal tables will specify a
point in assertion time—most commonly Now()—and then ask
for rows asserted at that point intime that were in effect at
some point or period of effective time. For example, we might
ask for all policies that were in effect on August 23, 2008, as we
currently believe them to hav e been. Or we might ask for all
policies which we currently claim were in effect any time in
the first half of 2008.
Pinning down a point in assertion time, and then asking for
versions of objects claimed at that point intime to be correct,
is the general form that queries will take when posed by business
users. But we can look at bi-temporal data from the opposite
point of view as well. We can pin down a point or period in effec-
tive time, and ask for everything we ever asserted about things at
that point in time.
It would not be too misleading to call this the auditor’s point
of view. From this point of view, we are interested in the history
of our claims about what is true, not in the history of what
actually happened out there in the world. Of course, we could
also ask for all future assertions about a given point in effective
time. But auditors, by the nature of their work, have little interest
in future assertions. By the same token, they are very interested
in past assertions, along with current ones. So an auditor’s mirror-
image of the nine categories reduces to a set of six categories,
those shown in Figure 13.13.
These views that auditors are interested in are physically the
same
ones we have
already described. The “mirror-image” is in
perspective, not in content.
Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 307
The Value of Internalizing Pipeline Datasets
The cost of managing physical pipeline datasets is high. This
cost is seldom discussed because it is universally thought to be
just an inevitable cost of doing business. Bringing down this cost
is a matter of doing all those various things that IT management
has done for decades, and continues to do. Quality control pro-
cedures are put in place so errors don’t creep into our databases
and later have to be backed out. The platform costs of storing,
transforming, and moving data into and out of pipeline data-
sets are controlled by minimizing redundancy, and by moving
datasets up and down the storage hierarchy. Software that sets
up and runs production schedules minimizes the human costs
of scheduling work involving these pipeline datasets.
But the work of managing pipeline datasets is tedious. And
whenever the management of these datasets is a one-off kind
of thing, i.e. whenever the development group has to manage
these datasets rather than the IT Operations group that handles
scheduled maintenance, errors inmanaging them are not
uncommon.
Asserted Versioning does not offer a way to more efficiently
manage pipel ine datasets. It offers a way to eliminate them
and, consequently, eliminate the totality of their management
costs! There will always be some circumstances in which data
must be manipulated in external pipeline datasets. But these
can become the exception rather than the rule.
In place of these pipeline datasets, Asserted Versioning
stores the information contained in those pipeline datasets
internally, within the production tables that are their sources
and destinations. Pending transactions can be stored within
the production tables themselves. Posted transactions can be,
too. Data staging areas can also exist as semantically distinct sets
of rows, physically contained within production tables. Pipeline
datasets, then, cease to exist as distinct physical objects. They
become virtualized, as semantically distinct collections of rows
what things
used to be like
what we used to claim what we currently claim
what we used to claim
things used to be like
what we currently claim
things used to be like
what we currently claim
things are like now
what we currently claim
things will be like
what we used to claim
things are like now
what we used to claim
things will be like
what things
are like
what things
will be like
Figure 13.13 The Auditor’s Mirror Image of the Nine-Fold Way.
308 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS
all physically existing within the same tables, re-presented in
different views.
We may think that the principal cost elimination benefit of
internalizing pipeline datasets is that it reduces the number of
distinct datasets that programs, SQL and production scheduling
software have to identify and manage. This is a reduction in the
cost of the mechanics of pipeline datasets. Instead of assembling
data from multiple tables, it already exists all in one place.
But the more significant cost reduction has to do with the
semantics of pipeline datasets. With all data about the same
things in the same place, we will, all of us, find all of it when
we go looking for it. The most junior member of the business
community will find the same set of data for his queries that
the most senior member does. There won’t be differences in
completeness of the source data, or quality of that data, as there
so often are in today’s business world and today’s collections of
business data.
When we need any of this data, we won’t have to go looking for
it. All of the data about what we once thought was true, or what
we currently think is true, or what we are not yet willing to assert
is true, will be available by simply changing the assertion point-
in-time selection criterion on views and queries. By changing that
predicate in a WHERE clause to a past point in assertion time, we
will be able to access the internalized re-presentation of posted
transactions. By changing the predicate to a future point in time,
we will be able to access the internalized re-presentation of pend-
ing transactions.
By the same token, we will be able to access historical data
about what things used to be like from the same table that
contains data about what they are like right now, and that may
also contain data about what those things are going to be like
sometime in the future. Again, it will be as easy as changing a
predicate in a WHERE clause.
Glossary References
Glossary entries whose definitions form strong inter-
dependencies are grouped together in the following list. The
same glossar y entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter. There will usually be sev-
eral other, and often many other, glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfami liar term is encountered.
Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 309
We note, in particular, that none of the nine types of pipeline
dataset are included in this list. In general, we leave category sets
out of these lists, but recommend that the reader look them up
in the Glossary.
as-is
as-was
Asserted Versioning Framework (AVF)
assertion time
statement
conventional table
non-temporal table
deferred assertion
far future assertion time
near future assertion time
instance
type
managed object
object
oid
queryable object
pipeline dataset
inflow pipeline
outflow pipeline
internalization of pipeline datasets
re-presentation of pipeline datasets
production database
production table
temporal dimension
temporal transaction
version
310 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS
14
ALLEN RELATIONSHIP AND
OTHER QUERIES
Allen Relationship Queries 313
Time Period to Time Period Queries 316
Point inTime to Period of Time Queries 333
Point inTime to Point inTime Queries 341
A Claims Processing Example 343
In Other Words 346
Glossary References 347
In this chapter, we examine each of the thirteen Allen relation-
shi
ps, as well
as each non-leaf node in the taxonomy of Allen
relationships which we introduced in Chapter 3. We describe
the Allen relationships as they hold between two time periods,
between a time period and a point in time, and also between
two points in time. We show how these relationships are
expressed in terms of time periods represented with the closed-
open convention, and we provide a sample query for each one.
After a section in which we illustrate how much simpler these
queries would be to express if we had a PERIOD datatype, we
conclude this chapter by discussing queries which involve tem-
poral joins.
Figure 14.1 sho
ws our tax
onomy of the Allen relationships.
Those relationships are the leaf nodes in this taxonomy. Every
leaf node has an inverse relationship, except the [equals] rela-
tionship. We italicize that relationship name to emphasize that
it has no inverse. So counting the [ equals] relationship, and
the six leaf nodes and their inverses, we have the full set
of thirteen Allen relationships. We also underline the non-leaf
node relationships in the taxonomy, to emphasize that they
are relationships we have defined, and are not one of the Allen
relationships.
Managing TimeinRelational Databases. Doi: 10.1016/B978-0-12-375041-9.00014-5
Copyright
#
2010 Elsevier Inc. All rights of reproduction in any form reserved. 311
Many of the Allen relationships are used by the AVF to enforce
TEI and TRI. For example, as we pointed out in Chapter 3, the
[
intersects] relationship is important because it defines TEI. If
two asserted versions of the same object share even a single
effective time clock tick, within shared assertion time, then they
[
intersect], and violate TEI. Otherwise, they don’t. The [fills] rela-
tionship is important because it defines TRI. If a TRI relationship
fails, it is because there is no episode of the referenced parent
object which temporally includes, i.e. [
fills
À1
], that of the child
version. The [before] relationship is important because it
distinguishes episodes from one another. Every episode of an
object is non-contiguous with every other episode of the same
object, and so one of them must be [before] the other.
As for queries issued by business users, we have found that
many ad hoc queries, and perhaps the majority of them, are
queries about episodes, not about versions. That is, they are
queries that want (i) the begin and end date of the episode and,
for business data, (ii) the last version of past episodes, the current
version of current episodes, or the latest version of future
episodes. Because of the importance of episodes to queries, the
SQL examples in this chapter will select episodes. The last, cur-
rent or latest version contains the business data. The episode
Intersects
Excludes
| |
| |
Before
| | | |
Meets
| | |
Equals
| |
| |
Occupies
Starts Finishes
During
| |
| |
Aligns
| |
| |
| |
| |
Time Period Relationships
Along a Common Timeline
Fills Overlaps
Figure 14.1 The Asserted Versioning Allen Relationship Taxonomy.
312 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES
begin date that is on every version, and the version’s own effective
end date, provide the effective time period of the episode itself.
We also note that the SQL in many of the following examples
does not represent typical queries that a business would write.
Each of these queries focuses on one specific Allen relationship,
and show how to express it in SQL. In particular, these sample
queries do not include typical join criteria. Instead, the only join
criteria used in these examples are two time periods and the
Allen relationship between them.
Another reason these sample queries don’t look very real
world is that they select from two of the tables in our sample
database that don’t have much to do with one another. In partic-
ular, there is no TRI relationship between them. They are the
Policy and Wellness Program tables. If we had used, for example,
the Client and Policy tables instead, many of the queries would
have been more realistic.
But TRI-related tables cannot illustrate all of the Allen
relationships. In fact, every instance of a TRI relationship
involves a parent and a child time period that is an instance of
one of seven of the Allen relationships. This leaves six other Allen
relationships that TRI-related tables cannot illustrate.
Nevertheless, as overly simple and unrealistic as most of
these samp le queries may be, they are the foundation for all
queries that express temporal relationships. No query will ever
need to express a temporal relationship that is not one of these
relationships. So if we know how to write the temporal pre-
dicates in these queries, we will know how to write any temporal
predicate for any query.
Allen Relationship Queries
The value of reviewing all the Allen relationships in terms of
queries against asserted version tables is that, as we already
know, the Allen relationships are exhaustive. There are no posi-
tional relationships along a common timeline, among time
periods and/or points in time, other than those ones. Thus,
by showing how to write a query for each one of them, as well
as for the groups of them identified in our taxonomy, we will
have provided the basic material out of which any query against
any assertion version table may be expressed.
In addition to the thirteen Allen relationships themselves,
our taxonomy provides five additional relationships, each of which
is a logical combination of two or more Allen relationships.
And these combinations are not formed simply by stringing
together Allen relationships with OR predicates. Although they
Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 313
are, necessarily, logically equivalent to the OR’d set of those
relationships, they are often much simpler expressions, easier to
understand and faster when executed.
In these sample queries, we will not include predicates for asser-
tion time, and will pretend that our sample tables are uni-temporal
versioned tables. This eliminates unnecessary detail from these
examples. We will do this by using two version table views, shown
below: V_Wellness_Program_Curr_Asr and V_Policy_Curr_Asr.
The former is a view of all currently asserted Wellness Program
versions. The latter is a view of all currently asserted Policy versions.
CREATE VIEW V_Wellness_Program_Curr_Asr AS
SELECT * FROM Wellness_Program_AV
WHERE asr_beg_dt <¼ Now()
AND asr_end_dt > Now()
CREATE VIEW V_ Policy _Curr_Asr AS
SELECT * FROM Policy_AV
WHERE asr_beg_dt <¼ Now()
AND asr_end_dt > Now()
In these example queries, as we said before, we will be
selecting episodes, not versions. For the two tables used in this
chapter, these are the views which provide episodes as queryable
managed objects:
CREATE VIEW V_Wellness_Program_Epis AS
SELECT wp.wellpgm_oid, wp.epis_beg_dt, wp.eff_end_dt
AS epis_end_dt, wp.welllpgm_nm, wp.wellpgm_nbr,
wp.wellpgm_cat_cd
FROM V_Wellness_Program_Curr_Asr AS wp
WHERE wp.eff_end_dt ¼
(SELECT MAX(wpx.eff_end_dt)
FROM V_Wellness_Program_Curr_Asr AS wpx
WHERE wpx.wellpgm_oid ¼ wp.wellpgm_oid
AND wpx.epis_beg_dt ¼ wp.epis_beg_dt)
CREATE VIEW V_Policy_Epis AS
SELECT pol.policy_oid, pol.epis_beg_dt, pol.eff_end_dt AS
epis_end_dt, pol.policy_type, pol.copay_amt,
pol.client_oid, pol.policy_nbr
FROM V_Policy_Curr_Asr AS pol
WHERE pol.eff_end_dt ¼
(SELECT MAX(px.eff_end_dt)
FROM V_Policy_Curr_Asr px
WHERE px.policy_oid ¼ pol.policy_oid
AND px.epis_beg_dt ¼ pol.epis_beg_dt)
314 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES
These episode views are the query-side work of defining an
episode datatype. The AVF presents episode s as maintainable
managed objects. These views present episodes as queryable
managed objects.
This is a ver y important point. Both computer science
research and IT practice have shown the importance of the con-
cept of a string of one or more contiguous clock ticks with a
known location in time. SQL does not directly support this con-
cept; and so instead, we, and others, have to write code to
exclude gaps and overlaps occurring in the timespan between a
pair of date s or timestamps. A PERIOD datatype is the direct
support needed for this concept. This datatype implements this
concept at the correct level of abstraction.
By the same token, our own research and practice has shown
the importance of the concept of an episode, a string of one or
more contiguous and non-overlapping versions of the same
object. Without that concept, and the concepts of objects and
versions on which it depends, there is also no concept of tempo-
ral entity integrity and temporal referential integrity. Without
that concept, collections of rows are defined, as needed, within
each SQL statement. As we can see with both the standard and
alternative temporal models, their SQL insert, update and delete
statements do result in bi-temporal data that satisfies what we
call TEI and TRI. Their SQL queries do find episodes, when they
need them, past assertions when they need them, and so on. But
the level of abstraction is wrong, for the same reason that getting
the same results with a pair of dates that one would get with a
PERIOD datatype is wrong.
So we now have two views which externalize, as queryable
managed objects, the best data we currently have (i.e. our
currently asserted data) about policy episodes and wellness
program episodes. Now, using these two views, we will define
another view that we will use to illustrate each of the Allen
relationships. This is the view V_Allen_Example. This view
will keep the examples as small and easy to understand as possi-
ble, eliminating all extraneous and repetitive detail while focus-
ing on the Allen relationships themselves. Here is the
V_Allen_Example view:
CREATE VIEW V_Allen_Example AS
SELECT wp.wellpgm_oid, pol.policy_oid,
wp.epis_beg_dt AS wp_epis_beg_dt,
wp.epis_end_dt AS wp_epis_end_dt,
pol.epis_beg_dt AS pol_epis_beg_dt,
pol.epis_end_dt AS pol_epis_end_dt
Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 315
[...]... relationships which includes the [starts], [startsÀ1], [finishes], [finishesÀ1], [during] and [duringÀ1] relationships In other words, it combines the [during] relationships with the [aligns] relationships These are all the relationships in which one time period includes all the clock ticks that are in the other time period, but also includes at least one additional clock tick In the non-superscripted... limiting case of a time period that includes only a single clock tick P1 [starts] P2 This is a pair of relationships, one the inverse of the other.1 In the non-superscripted relationship, the first time period is the shorter one Figure 14.2 shows this relationship, and its place in our taxonomy The two dashed lines in the illustration graphically represent P1 and P2, with P1 being the upper dashed line... that is in P1 is also in P2, but that there is at least one clock tick in P2 that is not also in P1 321 322 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies During Aligns Starts Finishes | -| | | | -| | | | -| | -| Figure 14.6 P1 [occupies] P2 The inverse of this relationship is: P1 [occupiesÀ1] P2 In the superscripted... wp_epis_end_dt) P1 [during] P2 This is a pair of relationships, one the inverse of the other In the non-superscripted relationship, the first time period is the shorter one Time Period Relationships Along a Common Timeline Intersects Fills Occupies During | -| | | Figure 14.5 P1 [during] P2 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES The predicate for this relationship, as it holds between two time periods... a join predicate This is so we can see how the Allen relationships correlate the two tables using dates Time Period to Time Period Queries Allen relationships may relate two time periods, or a time period to a point in time, or two points in time We will represent time periods, using our closed-open notation, as follows: Period 1: [eff_beg_dt1 – eff_end_dt1] Referred to as P1 Period 2: [eff_beg_dt2... [finishes] P2 This is a pair of relationships, one the inverse of the other In the non-superscripted relationship, the first time period is the shorter one Figure 14.3 shows this relationship, and its place 317 318 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Finishes | -| | -| Figure 14.3 P1 [finishes] P2 in. .. wp_epis_end_dt Time Period Relationships Along a Common Timeline Intersects Fills Equals | -| | -| Figure 14.7 P1 [equals] P2 323 324 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES P1 [fills] P2 This not an Allen relationship It is the node in our taxonomy of Allen relationships which includes the [starts], [startsÀ1], [finishes], [finishesÀ1], [during], [duringÀ1] and [equals] relationships In other... eff_end_dt2)) Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Starts | -| | | Figure 14.4 P1 [aligns] P2 Finishes | -| | -| 319 320 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES It says that P1 and P2 either start or end at the same time, but do not both start and end at the same time This relationship has no inverse Consider the following request for information:... eff_end_dt2) Time Period Relationships Along a Common Timeline Intersects Fills Equals Occupies | -| | -| During Aligns Starts | -| | | Figure 14.8 P1 [fills] P2 Finishes | -| | -| | -| | | Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES It says that P1 doesn’t start after P2 and doesn’t end before P2 The idea behind it is that it includes every relationship in which every clock tick in P2... are standard They were provided by Allen in his original article, and continue to be used throughout the computer science literature Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES Time Period Relationships Along a Common Timeline Intersects Fills Occupies Aligns Starts | -| | | Figure 14.2 P1 [starts] P2 two time periods expressed as pairs of dates using the closedopen convention, is: (eff_beg_dt1 . effect any time in
the first half of 2008.
Pinning down a point in assertion time, and then asking for
versions of objects claimed at that point in time to. Period Queries 316
Point in Time to Period of Time Queries 333
Point in Time to Point in Time Queries 341
A Claims Processing Example 343
In Other Words 346
Glossary