Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
106,58 KB
Nội dung
Query Encapsulation
As we have already pointed out, production queries against
Asserted Versioning databases do not have to check for TEI or
TRI violations. The maintenance processes carried out by the AVF
guarantee that asserted version tables will already conform to
those semantic requirements. For example, when joining from a
TRI child to a TRI parent, these queries do not have to check that
the parent object is represented by an effective-time set of contig-
uous and non-overlapping rows whose end-to-end time period
fully includes that of the child row. Asserted Versioning already
guarantees that those parent version rows [meet] within an epi-
sode, and that they [
fill
-1
] the effective time period of the child row.
Ad hoc queries against Asserted Versioning databases can be
written directly against asserted version tables. But as far as pos-
sible, they should be written against views in order to simplify
the query-writing task of predominately non-technical query
authors. So we recommend that a basic set of views be provided
for each asserted version table. Additional subject-matter-
specific views written against these basic views could also be
created. Some basic views that we believe might prove useful
for these query authors are:
(i) The Conventional Data View, consisting of al l currently
asserted current versions in the table. This is a one-row-
per-object view.
(ii) The Current Versions View, consisting of all currently asserted
versions in the table, past, present and future. This is a view
that will satisfy all the requirements satisfied by any best
practice versioning tables, as described in Chapter 4.
(iii) The Episode View, consisting of one current assertion for
each episode. That is the current version for current
episodes, the last version for past episodes, and the latest
version for future episodes. This view is useful because it
filters out the “blow-by-blow” history which version tables
provide, and leaves only a “latest row” to represent each
episode of an object of interest.
(iv) The Semantic Log file View, consisting of all no longer
asserted versions in the table. This view collects all asserted
version data that we no longer claim is true, and should be
of particular interest to auditors.
(v) The Transaction File View, consisting of all near future
asserted versions. These are deferred assertions that will
become currently asserted data soon enough that the busi-
ness is willing to let them become current by means of the
passage of time.
Chapter 16 CONCLUSION 387
(vi) The Staging Area View, consisting of all far future asserted
versions. These are deferred assertions that are still a work
in progress. They might be incomplete data that the busi-
ness fully intends to assert once they are completed. They
might also be hypothetical data, created to try out various
what-if scenarios.
We also note that existing queries against conventional
tables will execute properly when their target tables are con-
verted to asserted version tables. In the conversion, the tables
are given new nam es. For example, we use the suffix “_AV ” on
asserted version tables and only on those tables. One of the
views provided on e ach table, then, is one which selects exactly
those columns that made up the original table, and all and only
those rows that dynamically remain currently asserted and cur-
rently in effect. This dynamic view provides, as a queryable
object, a set of data that is row for row and column f or column
identical to the original table. The view itself is given the name
the original table had. Every column has the same name it orig-
inally had. This provides temporal upward compatibility fo r
all queries, w hether embedded in application code or free-
standing.
We conclude that Asserted Versioning does provide query
encapsulation for bi-temporal data, and also temporal upward
compatibility for queries.
The Internalization of Pipeline Datasets
Non-current data is often found in numerous nooks and
crannies of conventional databases. Surrounding conventional
tables whose rows have no time periods explicitly attached to
them, and which represent our current beliefs about what their
objects are currently like, there may be various history tables,
transaction tables, staging area tables and developer-maintained
logfile tables. In some cases, temporality has even infiltrated
some of those tables themselves, transforming them into one
or another of some variation on the four types of version tables
which we described in Chapter 4.
When we began writing, we knew that deferred transactions
and deferred assertions went beyond the standard bi-temporal
semantics recognized in the computer science community. We
knew that they corresponded to insert, update or delete trans-
actions written but not yet submitted to the DBMS. The most
familiar collections of transactions in this state, we recognized,
are those called batch transaction datasets.
388 Chapter 16 CONCLUSION
But as soon as we identified the nine logical categories of bi-
temporal data, we realized that deferred transactions and
deferred assertions dealt with only three of those nine
categories—with future assertions about past, present or future
versions. What, then, we wondered, did the three categories of
past assertions correspond to?
The answer is that past assertions play the role of a DBMS
semantic logfile, one specific to a particular production table.
Of course, by now we understand that past assertions do not
make it possible to fully recreate the physical state of a table as
of any point in past time because of deferred assertions which
are not, by definition, past assertions. Instead, they make it pos-
sible to recreate what we claimed, at some past point in time,
was the truth about the past, present and future of the things
we were interested in at the time. In this way, past assertions
support a semantic logfile, and allow us to recreate what we once
claimed was true, as of any point of timein the past. They pro-
vide the as-was semantics for bi-temporal data.
But Asserted Versioning also supports a table-specific physi-
cal logfile. It does so with the row create date. With this date,
we can almost recreate everything that was physically in a table
as of any past point in time, no matter where in assertion time or
effective time any of those rows are located.
2
This leaves us with only three of the nine categories—the cur-
rent assertion of past, present and future versions of objects. The
current assertions of current versions, of course, are the conven-
tional data in an asserted version table. This leaves currently
asserted past versions and currently asserted future versions.
But these are nothing new to IT professionals. They are what
IT best practice version tables have been trying to manage for
several decades.
Now it all comes together. Instead of conventional physical
logfiles, Asserted Versioning suppor ts queries which make both
semantic logfile data and physical logfile data available. Instead
of batch transaction datasets, Asserted Versioning keeps track
of what the database will look like when those transactions are
applied—which, for asserted version tables, means when those
future assertions pass into currency. Instead of variations on best
practice version tables which support some part of the seman-
tics of versioning, Asserted Versioning is an enterprise solution
which implements versioning, in every case, with the same
2
The exception is deferred assertions that have been moved backwards in assertion
time. Currently, Asserted Versioning does not preserve information about the far future
assertion time these assertions originally existed in.
Chapter 16 CONCLUSION 389
schemas and with support for the full semantics of version ing,
whether or not the specific busin ess requirements, at the time,
specify those full semantics.
With all these various physical datasets internalized within
the production tables they are directed to or derived from,
Asserted Versioning eliminates the cost of managing them as
distinct physical data objects.
Asserted Versioning also eliminates the cost of coordinating
maintenance to them. There is no latency as updates to produc-
tion tables ripple out to downstream copies of that same data,
such as separate history tables. On the inward-bound side, there
is also no latency. As soon as a transaction is written, it becomes
part of its target table. The semantics supported here is, for
maintenance transactions, “submit it and forget it”.
We conclude that Asserted Versioning does suppor t the
semantics of the internalization of pipeline datasets.
Performance
We have provided techniques on how to index, partition,
cluster and query an Asserted Versioning database. We’ve
recommended key structures for primary keys, foreign keys
and search keys, a nd recommended the placement of temporal
columns in indexes for optimal performance. We have also
shown how to i mprove performance with the use of currency
flags. All these techniques help to provide query performance
in Asserted Versioning databases whi ch is nearly equivalent to
the query performance in equival ent conventional databases.
We conclude that queries against even very large Asserted
Versioning databases, especially those queries retrieving cur-
rently asserted current versions of persistent objects, will per-
form as well or nearly as well as the corresponding queries
against a conventional database.
Enterprise Contextualization
As temporal data has become increasingly important, much
of it has migrated from being reconstructable temporal data to
being queryable temporal data. But much of that queryable tem-
poral data is still isolated in data warehouses or other historical
databases, although some of it also exists in production databases
as history tables, or as version tables. Often, this queryable tem-
poral data fails to distinguish between data which reflects changes
in the real world, and data which corrects mistakes in earlier data.
390 Chapter 16 CONCLUSION
So business needs for a collection of temporal data against
which queries can be written are often difficult to meet. Some
of the needed data may be in a data warehouse; the rest of
it may be contained in various history tables and version tables
in the production database, and the odds of those history tables
all using the same schemas and all being updated according
to the same rules are not good. As for version tables, we have
seen how many different kinds there are, and how difficult it
can be to write queries that extract exactly the desired data
from them.
We need an enterprise solution to the provision of queryable
bi-temporal data. We need one consistent set of schemas, across
all tables and all databases. We need one set of transactions that
update bi-temporal data, and enforce the same temporal integ-
rity constraints, acros s all tables and all databases. We need a
standard way to ask for uni-temporal or bi-temporal data. And
we need a way to remove all temporal logic from application
programs, isolate it in a separate layer of code, and invoke it
declaratively.
Asserted Versioning is that enterprise solution.
Asserted Versioning as a Bridge and
as a Destination
Asserted Versioning, either in the form of the AVF or of a
home-grown implementation of its concepts, has value as both
a bridge and as a destination. As a bridge to a standards-based,
vendor-supported implementation of bi-temporal data manage-
ment, Asserted Versioning is a way to begin migrating databases
and applications right away, using the DBMSs available today
and the SQL available today. As a destination, Asserted
Versioning is an implementation of a more complete semantics
for bi-temporality than has yet been defined in the academic
literature.
Asserted Versioning as a Bridge
Applications which manage temporal data intermi ngle code
expressing subject-matter-specific business rules with code for
managing these different forms in which temporal data is stored.
Queries which access temporal data in these databases cannot
be written correctly without a deep knowledge of the specific
schemas used to store the data, and of both the scope and limits
of the semantics of that data. Assembling data from two or more
Chapter 16 CONCLUSION 391
temporal tables, whether in the same or in different physical
databases, is likely to require complicated logic to mediate the
discrepancies between different implementations of the same
semantics.
As a bridge to the new SQL standards and to DBMS support for
them, Asserted Versioning standardizes temporal semantics by
removing history tables, various forms of version tables, transac-
tion datasets, staging areas and logfile data from databases.
In their place, Asserted Versioning provides a standard canonical
form for bi-temporal data, that form being the Asserted Ver-
sioning schema used by all asserted version tables.
By implementing Asserted Versioning, businesses can begin
to remove temporal logic from their applications, and at each
point where often complex temporal logic is hardcoded inside
an application program, they can begin to replace that code with
a simple temporal insert, update or delete statement.
Sometimes this will be difficult work. Some implementations
of versioning, for example, are more convoluted than others. The
code that suppor ts those implementations will be correspond-
ingly difficult to identify, isolate and replace. But if a business
is going to avail itself of standards-based temporal SQL and
commercial support for those temporal extensions—as it surely
will, sooner or later—then this work will have to be done, sooner
or later. With an Asserted Versioning Framework available to the
business, that work can begin sooner rather than later. It can
begin right now.
Asserted Versioning as a Destination
Even if the primary motivation for using the AVF—ours or a
home-grown version—is as a bridge to stan dards-based and
vendor implemented bi-temporal functionality, that is certainly
not its only value. For as soon as the AVF is installed, hundreds
of person hours will typically be saved on every new project to
introduce temporal data into a database. Based on our own con-
sulting experience, which jointly spans about half a century and
several dozen client engagements, we can confidently say, with-
out exaggeration, that many large projects involving temporal
data will save thousands of person hours.
Here’s how. Temporal data modeling work that would other-
wise have to be done, will be eliminated. Project-specific designs
for history tables or version tables, likely differing in some way
from the many other designs that already exist in the databases
across the enterprise, will no longer proliferate. Separate code
to maintain these idiosyncratically different structures will no
392 Chapter 16 CONCLUSION
longer have to be written. Temporal entity integrity ru les and
temporal referential integrity rules will no longer be overlooked,
or only partially or incorrectly implemented.
Special instructions to those who will write the often complex
sets of SQL transactions required to carry out what is a single
insert, update or delete action from a business user perspective
will no longer have to be provided and remembered each time
a transaction is written. Special instructions to those who will
write queries against these tables, possibly joining them with
slightly different temporal tables designed and written by some
other project team, will no longer have to be provided and
remembered each time a query is written.
When the first set of tables is converted to asserted version
tables, seamless real-time access to bi-temporal data will be
immediately available for that data. This is declaratively specified
access, with the procedural complexities encapsulated within the
AVF. In addition, the benefits of the internalization of pipeline
datasets will also be made immediately available, this being one
of the principal areas in which Asserted Versioning extends bi-
temporal semantics beyond the semantics of the standard model.
We conclude that Asserted Versioning has value both as a bridge
and as a destination. It is a bridge to a standards-based SQL that
includes support for PERIOD datatypes, Allen relationships and
the declarative specification of bi-temporal semantics. It is a desti-
nation in the sense that it is a currently available solution which
provides the benefits of declaratively specified, seamless real-time
access to bi-temporal data, including the extended semantics of
objects, episodes and internalized pipeline datasets.
Ongoing Research and Development
Bi-temporal data is an ongoing research and development
topic within the computer science and DBMS vendor com-
munities. Most of that research will affect IT professionals
only as products delivered to us, specifically in the form of
enhancements to the SQL language and to relational DBMSs.
But bi-temporal data and its management by means of
Asserted Versioning’s conceptual and software frameworks is an
ongoing research and development topic for us as well. Some
of this ongoing work will appear as future releases of the
Asserted Versioning AVF. Some of it will be published on our
website, AssertedVersioning.com, and some of it will be made
available as seminars. Following is a partial list of topics that
we are working on as this book goes to press.
Chapter 16 CONCLUSION 393
(i) An Asserted Versioning Ontology. A research topic. We have
begun to formalize Asserted Versioning as an ontology by
translating our Glossary into a FOPL axiomatic system.
The undefined predicates of the system are being collected
into a controlled vocabulary. Multiple taxonomies will be
identified as KIND-OF threads running through the ontol-
ogy. Theorems will be formally proved, demonstrating
how automated inferencing can extr act useful information
from a collection of statements that are not organized as
a database of tables, rows and columns.
(ii) Asserted Versioning and the Relational Model. A research
topic. Bi-temporal extensions to the SQL language have
been blocked for over 15 years, in large part because of
objections that those extensions violate Codd’s relational
model and, in particular, his Information Principle. We will
discuss those objections, especially as they apply to
Asserted Versioning, and respond to them.
(iii) Deferred Transaction Workflow Management and the AVF.
A development topic. When deferred assertion groups are
moved backwards in assertion time, and when isolation
cannot be maintained across the entire unit of work, vio-
lations of bi-temporal semantics may be exposed to the
database user. We are developing a solution that identifies
semantic components within and across deferred asser-
tion groups, and moves those components backwards in
a sequence that preserves temporal semantic integrity at
each step of the process.
(iv) Asserted Versioning and Real-Time Data Warehousing.
A methodology topic. Asserted Versioning supports bi-
temporal tables in OLTP source system databases and/or
Operational Data Stores. It is a better solution to the man-
agement of near-term historical data than is real-time data
warehousing, for several reasons. First, much near-term
historical data remains operationally relevant, and m ust
be as accessible to OLTP systems as current data is. Thus,
it must either be maintained in ad hoc structures within
OLTP systems, or retrieved from the data warehouse with
poorly-performing federated queries. Second, data ware-
houses, and indeed any collection of uni-temporal data,
do not support the important as-was vs. as-is distinction.
Third, real-time feeds to data warehouses change the
warehousing paradigm. Data warehouses originally kept
historical data about persistent objects as a time-series of
periodic snapshots. Real-time updating of warehouses for-
ces versioning into warehouses, and the mixture of
394 Chapter 16 CONCLUSION
snapshots and versions is conceptually confused and
confusing. Asserted Versioning makes real-time data
warehousing neither necessary nor desirable.
(v) Temporalized Unique Indexes. A develop ment topic. Values
which are unique to one row in a conventional table may
appear on any number of rows when the table is converted
to an asserted version table. So uniq ue indexes on conven-
tional tables are no longer unique after the conversion. To
make those indexes uniq ue, both an assertion and an
effective time period must be added to them. This reflects
the fact that although those values are no longer unique
across all rows in the converted table, they remain unique
across all rows in the table at any one point in time, specif-
ically at any one combination of assertion and effective
time clock ticks.
(vi) Instead Of Triggers. A development topic. Instead Of
triggers function as updatable views. These updatable
views make Asserted Versioning’s temporal transactions
look like conventional SQL. When invo ked, the triggered
code recognizes insert, update and delete statements as
temporal transactions. As described in this book, it will
translate them into multiple physical tran sactions, apply
TEI and TRI checks, and manage the processing of those
physical transactions as atomic and isolated units of work.
The utilization of Instead Of triggers by the AVF is ongoing
work, as we go to press.
(vii) Java and Hibernate. A research and development topic.
Hibernate is an object/relational persistence and query
service framework for Java. It hides the complexities of
SQL, and functions as a da ta access layer supporting
object-oriented semantics (not to be confused with the
semantics of objects, as Asserted Versioning use s that
term). Hibernate and other frameworks can be used to
invoke the AVF logic to enforce TEI and TRI while
maintaining an Asserted Versioning bi-temporal database.
(viii) Archiving. A methodology topic. An important archiving
issue is how to archive integral semantic units, i.e. how
to archive without leaving “dangling references” to
archived data in the source database. Assertions, versions,
episodes and objects define integral semantic units, and
we are developing an archiving strategy, and AVF support
for it, based on those Asserted Versioning concepts.
(ix) Star Schema Temporal Data. A methodology topic. Bi-
temporal dimensions can make the “cube explosion prob-
lem” unmanageable, and bi-temporal semantics do not
Chapter 16 CONCLUSION 395
apply to fact tables the same way they apply to dimension
tables. We are developing a metho dology for supporting
both versioning, and the as-was vs. as-is distinction, in
both fact and dimension tables.
Going Forward
We thank our readers who have stuck with us through an
extended discussion of some very complex ideas. For those
who would like to learn more about bi-temporal data, and about
Asserted Versioning, we recommend that you visit our website,
AssertedVersioning.com, and our webpage at Elsevier.com.
At our website, we have also created a small sample database
of asserted version tables. Registered users can write both main-
tenance transactions and queries against that database. Because
these tables contain data from all nine temporal categories, we
recommend that interested readers first print out the contents
of these tables before querying them. It is by comparing the full
contents of those tables to query result sets that the work of each
query can best be understood, and the semantic richness of the
contents of Asserted Versioning databases best be appreciated.
Glossary References
Glossary entries whose definitions form strong inter-
dependencies are grouped together in the following list. The
same glossary entries may be grouped together in different ways
at the end of diffe rent chapters, each grouping reflecting the
semantic perspective of each chapter. There will usually be sev-
eral other, and often many other, glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered.
ad hoc query
production query
Allen relationships
time period
as-is query
as-was query
asserted version table
assertion
assertion time
396 Chapter 16 CONCLUSION
[...]... refer to them using the word “date” This is done for the same reason that all examples of points in timein the 405 406 THE ASSERTED VERSIONING GLOSSARY text, unless otherwise noted, are dates This reason is simply convenience Periods of time in either of the two bi-temporal dimensions are delimited by their starting point in time and ending point intime These points in time may be timestamps, dates,... books, introduced and developed Kimball’s event-centric approach to managing historical data Concepts such as dimensional data marts, the fact vs dimension distinction, and star schemas and snowflake schemas are all grounded in Kimball’s work, as is the entire range of OLAP and business intelligence software 2000: Developing Time- Oriented Database Applications in SQL R T Snodgrass Developing Time- Oriented... “Unobvious Redundancies inRelational Data Models, Part 2.” InfoManagement Direct (September 2001) http://www.information-management.com/infodirect/ 20010921/4017-1.html Tom Johnston “Unobvious Redundancies inRelational Data Models, Part 3.” InfoManagement Direct (September 2001) http://www.information-management.com/infodirect/ 20010928/4037-1.html Tom Johnston “Unobvious Redundancies inRelational Data... Business and Information System.” IBM Systems Journal (1988), 27(1) To the best of our knowledge, this article is the origin of data warehousing in just as incontrovertible a sense as Dr E F Codd’s early articles were the origins of relational theory 1996: Building the Data Warehouse William Inmon Building the Data Warehouse, 2nd ed (John Wiley, 1996) (The first edition was apparently published in. .. timestamps, dates, or any other point in time recognizable by the DBMS As defined in this Glossary, they are clock ticks Components Components of a definition are other Glossary entries used in the definition Listing the components of every definition separately makes it easier to pick them out and follow crossreference trails The Components sections of these definitions are also working notes towards a formal... relationships, have an inverse The inverse of an Allen relationship or relationship group, between two time periods which do not both begin and end on the same clock tick, is the relationship in which the two time periods are reversed Following Allen’s original notation, we use a superscript suffix (xÀ1) to denote the inverse relationship Inverse relationships exist in all cases where one of the two time periods... encapsulation in the first article, we did not distinguish between temporal and physical transactions All in all, we do not believe that these articles can usefully be consulted to gain additional insight into the topics discussed in this book Although we intended them as instructions to other modelers and developers on how to implement bi-temporal data in today’s DBMSs, we now look back on them as an on-line... available in PDF form, at no cost, at Dr Snodgrass’s website: http://www.cs.arizona.edu/people/rts/ publications.html 2000: Primary Key Reengineering Projects Tom Johnston “Primary Key Reengineering Projects: The Problem.” Information Management Magazine (February 2000) http://www.information-management.com/issues/20000201/ 1866-1.html Tom Johnston “Primary Key Reengineering Projects: The Solution.” Information... the main focus of Date, Darwen, and Lorentzos’s book is column-level versioning While the main focus of our book and Snodgrass’s is on implementing temporal data management with today’s DBMSs and today’s SQL, the main focus of their book is on describing language extensions that contain new operators for manipulating versioned data 401 402 Appendix BIBLIOGRAPHICAL ESSAY 2007: Time and Time Again This... number of installments, began in the May 2007 issue of DM Review magazine, now Information Management The entire set, amounting to some 50 articles and columns combined, ended in June of 2009 Although we had designed and built bi-temporal databases prior to writing these articles, our ideas evolved a great deal in the process of writing them For example, although we emphasized the importance of maintenance . time in either of the two bi-temporal
dimensions are delimite d by their starting point in time and
ending point in time. These points in time may be timestamps,
dates,. almost recreate everything that was physically in a table
as of any past point in time, no matter where in assertion time or
effective time any of those rows