Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
212,19 KB
Nội dung
approach is better at tracking changes to persistent objects and to relationships other than metric balances. State Temporal Data: Uni-Temporal and Bi-Temporal Data At this point in our discussion, we are concerned with state data rather than with event data, and with state data that is queryable rather than state data that needs to be reconstructed. What then are the various options for managing temporal queryable state data? First of all, we need to recognize that there are two kinds of states to manage. One is the state of the things we are interested in, the states those things pass through as they change over time. But there is another kind of state, that being the state of the data itself. Data, such as rows in tables, can be in one of two states: correct or incorrect. (As we will see in Chapter 12, it can also be in a third state, one in which it is neither correct nor incor- rect.) Version tables and assertion tables record, respectively, the state of objects and the state of our data about those objects. Uni-Temporal State Data In a conventional Customer table, each row represents the cur- rent state of a customer. Each time the state of a customer changes, i.e. each time a row is updated, the old data is overwritten with the new data. By adding one (or sometimes two) date(s) or timestamp(s) to the primary key of the table, it becomes a uni- temporal table. But since we already know that there are two dif- ferent temporal dimensions that can be associated with data, we know to ask “What kind of uni-temporal table?” As we saw in the Preface, there are uni-temporal version tables and uni-temporal assertion tables. Version tables keep track of changes that happen in the real world, changes to the objects represented in those tables. Each change is recorded as a new version of an object. Assertion tables keep track of correct- ions we have made to data we later discovered to be in error. Each correction is recorded as a new assertion about the object. The versions make up a true history of what happened to those objects. The assertions make up a virtual logfile of corrections to the data in the table. Usually, when table-level temporal data is discussed, the tables turn out to be version tables, not assertion tables. In their book describing the alternative temporal model [2002, Date, Darwen, Lorentzos], the authors focus on uni-temporal versioned data. Bi-temporality is not even alluded to until the Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 41 penultimate chapter, at which point it is suggested that “logged time history” tables be used to manage the other temporal dimension. Since bi-temporality receives only a passing mention in that book, we choose to classify the alternative temporal model as a uni-temporal model. In IT best practices for managing temporal data—which we will discuss in detail in Chapter 4—once again the temporal tables are version tables, and error correction is an issue that is mostly left to take care of itself. 4 For the most part, it does so by overwriting incorrect data. 5 This is why we classify IT best practices as uni-temporal models. The Alternative Temporal Model What we call the alternative tempo ral model was developed by Chris Date, Hugh Darwen and Dr. Nikos Lorentzos in their book Temporal Data and the Relational Model (Morgan- Kaufmann, 2002). 6 This model is based in large part on tec- hniques developed by Dr. Lorentzos to manage temporal data by breaking temporal durations down into temporally atomic components, applying various transformations to those compo- nents, and then re-assembling the components back into those temporal durations—a technique, as the authors note, whose applicability is not restricted to temporal data. As we said, except for the penultimate chapter in that book, the entire book is a discussi on of uni-temporal versioned tables. In that chapter, the authors recommend that if there is a require- ment to keep track of the assertion time history of a table (which they call “logged-time history”), it be implemented by means of an auxiliary table which is maintained by the DBMS. 4 Lacking criteria to distinguish the best from the rest, the term “best practices” has come to mean little more than “standard practices”. What we call “best practices”, and which we discuss in Chapter 4, are standard practices we have seen used by many of our clients. 5 An even worse solution is to mix up versions and assertions by creating a new row, with a begin date of Now(), both every time there is a real change, and also every time there is an error in the data to correct. When that happens, we no longer have a history of the changes things went through, because we cannot distinguish versions from corrections. And we no longer have a “virtual logfile” of corrections because we don’t know how far back the corrections should actually have taken effect. 6 The word “model”, as used here and also in the phrases “alternative model” and “Asserted Versioning model” obviously doesn’t refer to a data model of specific subject matter. It means something like theory, but with an emphasis on its applicability to real-world problems. So “the relational model”, as we use the term, for example, means something like “relational theory as implemented in current relational technology”. 42 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS In addition, these authors do not attempt, in their book, to explain how this method of managing temporal data would work with current relational technology. Like much of the computer science research on temporal data, they allude to SQL operators and other constructs that do not yet exist, and so their book is in large part a recommendation to the standards committees to adopt the changes to the SQL language which they describe. Because our own concern is with how to implement temporal concepts with today’s technologies, and also with how to sup- port both kinds of uni-temporal data, as well as fully bi-temporal data, we will have little more to say about the alternative tempo- ral model in this book. Best Practices Over several decades, a best practice has emerged in manag- ing temporal queryable state data. It is to manage this kind of data by versioning otherwise conventional tables. The result is versioned tables which, logically speaking, are tables which com- bine the history tables and current tables described previously. Past, present and future states of customers, for example, are kept in one and the same Customer table. Corrections may or may not be flagged; but if they are not, it will be impossible to distinguish versions created because something about a cus- tomer changed from versions created because past customer data was entered incorrectly. On the other hand, if they are flagged, the management and use of these flags will quickly become difficult and confusing. There are many variations on the theme of versioning, which we have grouped into four major categories. We will discuss them in Chapter 4. The IT community has always used the term “version” for this kind of uni-temporal data. And this terminology seems to reflect an awareness of an important concept that, as we shall see, is cen- tral to the Asserted Versioning approach to temporal data. For the term “version” naturally raises the question “A version of what?”, to which our answer is “A version of anything that can persist and change over time”. This is the concept of a persistent object, and it is, most fundamentally, what Asserted Versioning is about. Bi-Temporal State Data We now come to our second option, which is to manage both versions and assertions and, most importantly, their interdependencies. This is bi-temporal data management, the subject of both Dr. Rick Snodgrass’s book [2000, Snodgrass] and of our book. Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 43 The Standard Temporal Model What we call the standard temporal model was developed by Dr. Rick Snodgrass in his book Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, 2000). Based on the computer science work current at that time, and especially on the work Dr. Snodgrass and others had done on the TSQL (temporal SQL) proposal to the SQL standards committees, it shows how to implement both uni-temporal and bi-temporal data management using then-current DBMSs and then-current SQL. We emphasize that, as we are writing, Dr. Snodgrass’s book is a decade old. We use it as our baseline view of computer science work on bi-temporal data because most of the computer science literature exists in the form of articles in scientific journals that are not readily accessible to many IT professionals. We also emphasize that Dr. Snodgrass did not write that book as a com- pendium of computer science research for an IT audience. Instead, he wrote it as a description of how some of that research could be adapted to provide a means of managing bi-temporal data with the SQL and the DBMSs available at that time. One of the greatest strengths of the standard model is that it discusses and illustrates both the maintenance and the querying of temporal data at the level of SQL statements. For example, it shows us the kind of code that is needed to apply the temporal analogues of entity integrity and referential integrity to temporal data. And for any readers who might think that temporal data management is just a small step beyond the versioning they are already familiar with, many of the constraint-checking SQL statements shown in Dr. Snodgrass’s book should suffice to disabuse them of that notion. The Asserted Versioning Temporal Model What we call the Asserted Versioning temporal model is our own approach to managing temporal data. Like the standard model, it attempts to manage temporal data with current tech- nology and current SQL. The Asserted Versioning model of uni-temporal and bi-tem- poral data management supports all of the functionality of the standard model. In addition, it extends the standard model’s notion of transaction time by permitting data to be physically added to a table prior to the time when that data will appear in the table as production data, available for use. This is done by means of deferred transactions, which result in deferred assertions, those being the inserted, updated or logically deleted 44 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS rows resulting from those transactions. 7 Deferred assertions, although physically co-located in the same tables as other data, will not be immediately available to normal queries. But once timein the real world reaches the beginning of their assertion periods, they will, by that very fact, become currently asserted data, part of the production data that makes up the database as it is perceived by its users. We emphasize that deferred assertions are not the same thing as rows describing what things will be like at some timein the future. Those latter rows are current claims about what things will be like in the future. They are ontologically post-dated. Deferred assertions are rows describing what things were, are, or will be like, but rows which we are not yet willing to claim make true statements. They are epistemologically post-dated. Another way that Asserted Versioning differs from the stan- dard temporal model is in the encapsulation and simplification of integrity constraints. The encapsulation of integrity con- straints is made possible by distinguishing temporal transactions from physical transactions. Temporal transactions are the ones that users write. The corresponding physical transactions are what the DBMS applies to asserted version tables. The Asserted Versioning Framework (AVF) uses an API to accept temporal transactions. Once it validates them, the AVF translates each temporal transaction into one or more physical transactions. By means of triggers generated from a combination of a logical data model together with supplementary metadata, the AVF enforces temporal semantic constraints as it submits physical transactions to the DBMS. The simplification of these integrity constraints is made possi- ble by introducing the concept of an episode. With non-temporal tables, a row representing an object can be inserted into that table at some point in time, and later deleted from the table. After it is deleted, of course, that table no longer contains the information that the row was ever present. Corresponding to the period of time during which that row existed in that non-temporal table, there would be an episode in an asserted version table, consisting of one or more temporally contiguous rows for the same object. So an episode of an object in an asserted version table is in effect during exactly the period of time that a row for that object would exist in a non-temporal table. And just as a deletion in a conven- tional table can sometime later be followed by the insertion of a new row with the same primary key, the termination of an 7 The term “deferred transaction” was suggested by Dr. Snodgrass during a series of email exchanges which the authors had with him in the summer of 2008. Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 45 episode in an assertion version table can sometime later be followed by the insertion of a new episode for the same object. In a non-temporal table, each row must conform to entity integrity and referential integrity constraints. In an asserted ver- sion table, each version must conform to temporal entity integ- rity and temporal referential integrity constraints. As we will see, the parallels are in more than name only. Temporal entity integrity really is entity integrity applied to temporal data. Tem- poral referential integrity really is referential integrity applied to temporal data. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. as-is as-was Asserted Versioning Asserted Versioning Framework (AVF) episode persistent object state thing physical transaction temporal transaction temporal entity integrity (TEI) temporal referential integrity (TRI) the alternative temporal model the Asserted Versioning temporal model the standard temporal model 46 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS PART 2 AN INTRODUCTION TO ASSERTED VERSIONING Chapter Contents 3. The Origins of Asserted Versioning: Computer Science Research 51 4. The Origins of Asserted Versioning: The Best Practices 75 5. The Core Concepts of Asserted Versioning 95 6. Diagrams and Other Notations 119 7. The Basic Scenario 141 Part 1 provided the context for Asserted Versioning, a history and a taxonomy of various ways in which temporal data has been managed over the last several decades. Here in Part 2, we introduce Asserted Versioning itself and prepare the way for the detailed discussion in Part 3 of how Asserted Versioning actually works. In Chapter 3, we discuss the origins of Asserted Versioning in computer science research. Based on the work of computer scientists, we introduce the concepts of a clock tick and an atomic clock tick, the latter of which, in their terminology, is called a chronon. We go on to discuss the various ways in which time periods are represented by pairs of dates or of timestamps, since SQL does not directly support the concept of a time period. There are only a finite number of ways that two time periods can be situated, with respect to one another, along a common ManagingTimeinRelational Databases. Doi: 10.1016/B978-0-12-375041-9.00024-8 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 47 timeline. For example, one time period may entirely precede or entirely follow another, they may partially overlap or be identi- cal, they may start at different times but end at the same time, and so on. These different relationships among pairs of time per- iods have been identified and catalogued, and are called the Allen relationships. They will play an important role in our discussions of Asserted Versioning because there are various ways in which we will want to compare time periods. With the Allen relationships as a completeness check, we can make sure that we have considered all the possibilities. Another important section of this chapter discusses the dif- ference between the computer science notion of transaction time, and our own notion of assertion time. This difference is based on our development of the concepts of deferred trans- actions and deferred assertions, and for their subsumption under the more general concept of a pipeline dataset. In Chapter 4, we discuss the origins of Asserted Versioning in IT best practices, specifically those related to versioning. We believe that these practices are variations on four basic methods of versioning data. In this chapter, we present each of these methods by means of examples which include sample tables and a running commentary on how inserts, updates and deletes affect the data in those tables. In Chapter 5, we present the conceptual foundations of Asserted Versioning. The core concepts of objects, episodes, vers- ions and assertions are defined, a discussion which leads us to the fundamental statement of Asserted Versioning, that every row in an asserted version table is the assertion of a version of an episode of an object. We continue on to discuss how time periods are represented in asserted version tables, how temporal entity integrity and temporal referential integrity enforce the core semantics of Asserted Versioning, and finally how Asserted Versioning internalizes the complexities of temporal data management. In Chapter 6, we introduce the schema common to all asserted version tables, as well as various diagrams and notations that will be used in the rest of the book. We also introduce the topic of how Asserted Versioning supports the dynamic views that hide the complexities of that schema from query authors who would otherwise likely be confused by that complexity. When an object is represented by a row in a non-temporal table, the sequence of events begins with the insertion of that row, continues with zero or more updates, and either continues on with no further activity, or ends when the row is eventually deleted. When an object is represented in an asserted version 48 Part 2 AN INTRODUCTION TO ASSERTED VERSIONING table, the result includes one row corresponding to the insert in the non-temporal table, additional rows corresponding to the updates to the original row in the non-temporal table, and an additional row if a delete eventually takes place. This sequence of events constitutes what we call the basic scenario of activity against both conventional and asserted version tables. In Chap- ter 7, we describe how the basic scenario works when the target of that activity is an asserted version table. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same Glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, Glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. Allen relationships time period assertion version episode object assertion time transaction time atomic clock tick chronon clock tick deferred assertion deferred transaction pipeline dataset temporal entity integrity temporal referential integrity Part 2 AN INTRODUCTION TO ASSERTED VERSIONING 49 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH CONTENTS The Roots of Asserted Versioning 51 Computer Science Research 54 Clocks and Clock Ticks 55 Time Periods and Date Pairs 56 The Very Concept of Bi-Temporality 63 Allen Relationships 65 Advanced Indexing Strategies 68 Temporal Extensions to SQL 69 Glossary References 72 We begin this chapter with an overview of the three sources of Asser ted Versioning: compu ter science work on temporal data; best practices in the IT profession related to versioning; and original work by the authors themselves. We then spend the rest of this chapter discussing computer science contributions to temporal data management, and the relevance of some of these concepts to Asserted Versioning. The Roots of Asserted Versioning Over the last three decades, the computer science community has done extensive work on temporal data, and especially on bi- temporal data. During that same period of time, the IT commu- nity has developed various forms of versioning, all of which are methods of managing one of the two kinds of uni-temporal data. Asserted Versioning may be thought of as a method of manag- ing both uni- and bi-temporal data which, unlike the standard model of temporal data management, recognizes that rows in bi-temporal tables represent versions of things and that, ManagingTimeinRelational Databases. Doi: 10.1016/B978-0-12-375041-9.00003-0 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 51 [...]... an interval of time defined on the basis of atomic clock ticks, and that is used in an Asserted Versioning database to delimit the two time periods of rows in asserted version tables, and also to indicate several important points in timeIn asserted version tables, clock ticks are used for effective time begin and end dates and for episode begin dates; and atomic clock ticks are used for assertion time. .. second concept is that of the internalization of pipeline datasets We define a pipeline dataset as any collection of business data that is not a production table, but that contains data whose destination or origin is such a table.1 Pipeline datasets 1 The term “production” indicates that these tables contain “real” data Regularly scheduled processes are being carried out to maintain these tables, and to... orientation of Asserted Versioning is manifest in its encapsulation of the complexities of temporal data structures and the processes that manage them Asserted Versioning is an integrated method of managing temporal data which relieves data modelers of the burden of designing and maintaining data models that must explicitly define temporal data structures and integrity constraints on them It also relieves... assertions eliminates batch files of transactions waiting to be applied to a database by also internalizing them within their target tables In this book, we will show how the use of these internalized managed objects reduces the costs of maintaining databases by replacing external files or tables such as history tables, transaction files and logfiles, with structures internalized within production tables... timestamps as an accurate record of when business events happen in the real world In addition, for most business purposes, we assume that a SQL timestamp plus a defined interval of time will result in a second timestamp that represents when some second event will occur.3 Time Periods and Date Pairs SQL does not recognize a period of time as a managed object Instead, we have to use a pair of dates There... of dates to do this Either the beginning date, or 3 A more detailed discussion of how SQL timestamps relate to real-world events is contained in Chapter 3 of Dr Snodgrass’s book Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH the ending date, or both, may or may not be included in the time period they delimit If a date is not included, the time period is said to be open on... datatype has been introduced by such vendors as Oracle and Teradata But what that support means may differ from vendor to vendor Can a unique index be defined on a PERIOD datatype? Which of the Allen relationship comparisons are also supported? So, lacking a standard for the PERIOD datatype, we will continue the practice of defining periods of time in terms of their begin and end points in time 59 60 Chapter... tables are inflow pipeline datasets Pipeline datasets which contain data derived from production tables are outflow pipeline datasets History tables are one example of a pipeline dataset Sets of transactions, accumulated in files or tables and waiting to be applied to their target tables, are another example While the use of versions eliminates history tables by internalizing them within the tables whose... determine the mapping for us; and in most cases this is perfectly adequate But IT data management professionals should at least be aware that issues like these do exist 2 We are not referring here to the cesium-based atomic clock on which standard time is based An atomic clock tick, in the sense being defined here, is a logical concept, not a physical one 55 56 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING:... of how SQL timestamps map to when things happen in the real world SQL uses Universal Coordinated Time (UTC), which is based on cesium clocks, which might lead us to conclude that SQL timestamps are extremely accurate Precise they may be; but issues of accuracy involved in their use do exist For example, suppose we know that an astronomical event which has just happened will happen again in exactly . So, lacking a standard for the PERIOD datatype, we will continue the practice of defining periods of time in terms of their begin and end points in time. . common Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00024-8 Copyright # 2010 Elsevier Inc. All rights of reproduction in any