1. Trang chủ
  2. » Công Nghệ Thông Tin

Managing Time in Relational Databases: How to Design, Update and Query Temporal Data pptx

490 390 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 490
Dung lượng 6,86 MB

Nội dung

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com MANAGING TIME IN RELATIONAL DATABASES Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Companion Web site Ancillary materials are available online at: www.elsevierdirect.com/companions/9780123750419 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com MANAGING TIME IN RELATIONAL DATABASES How to Design, Update and Query Temporal Data TOM JOHNSTON RANDALL WEIS AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Mor g an Kaufmann Publishers is an imprint of Elsevier Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Morgan Kaufmann Publishers is an imprint of Elsevier. 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA This book is printed on acid-free paper. # 2010 ELSEVIER INC. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publis her (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-375041-9 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www .elsevierdirect.com Printe d in the United States of America 10 11 12 13 14 5 4 3 2 1 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com ABOUT THE AUTHORS Tom Johnston Tom Johnston is an independent consultant specializing in the design and management of data at the enterprise level. He has a doctorate in Philosophy, with an academic concentration in ontology, logic and semantics. He has spent his entire working career in business IT, in such roles as programmer, systems pro- grammer, analyst, systems designer, data modeler and enterprise data architect. He has designed and implemented systems in over a dozen industries, including healthcare, telecommunications, banking, manufacturing, transportation and retailing. His current research interests are (i) the management of bi-temporal data with today’s DBMS technology; (ii) overcoming this newest gener- ation of information stovepipes—for example, in medical records and national security databases—by more cleanly separating the semantics of data from the syntax of its representation; and (iii) providing additional semantics for the relational model of data by supplementing its first-order predicate logic statements with modalities such as time and person. Randall J. Weis Randall J Weis, founder and CEO of InBase, Inc., has more than 24 years of experience in IT, specializing in enterprise data architecture, including the logical and physical modeling of very large database (VLDB) systems in the financial, insurance and health care industries. He has been implementing systems with stringent temporal and performance requirements for over 15 years. The bi-temporal pattern he developed for modeling history, retro activity and future dating was used for the implementation of IBM’s Insurance Application Architecture (IAA) model. This pattern allows the multidimensional temporal view of data as of any given effective and assertion points in time. InBase, Inc. has developed software used by many of the nation’s largest companies, and is known for creating the first popular mainframe spellchecker, Lingo, early in Randy’s career. Weis has been a senior consultant at InBase and other companies, such as PricewaterhouseCoopers LLP, Solving IT International vii Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Inc., Visual Highway and Beyond If Informatics. Randy has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo, and has developed computer courses used in colleges and corporate training programs. Randy had been married to his wife Marina for over 30 years, and has 3 children, Matt, Michelle and Nicolle. He plays guitar and sings; he enjoys running, and has run several marathons. He also creates web sites and produces commercial videos. He may be reached via email at randyw@inbaseinc.com. viii ABOUT THE AUTHORS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com PREFACE Over time, things change—things like customers, products, accounts, and so forth. But most of the data we keep about things describes what they are like currently, not what they used to be like. When things change, we update the data that describes them so that the description remains current. But all these things have a history, and many of them have a future as well, and often data about their past or about their future is also important. It is usually possible to restore and then to retrieve historical data, given enough time and effort. But businesses are finding it increasingly important to access historical data, as well as data about the future, without those associated delays and costs. More and more, business valu e attaches to the ability to directly and immediately access non-current data as easily as current data, and to do so with equivalent response times. Conventional tables contain data describing what things are currently like. But to provide comparable access to data describ- ing what things used to be like, and to what they may be like in the future, we believe it is necessary to combine data about the past, the present and the future in the same tables. Tables which do this, which contain data about what the objects they repre- sent used to be like and also data about what they may be like later on, together with data about what those objects are like now, are versioned tables. Versioned tables are one of two kinds of uni-temporal tables. In this book, we will show how the use of versioned tables lowers the cost and increases the value of temporal data, data that describes what things used to be like as well as what they are like now, and sometimes what they will be like as well. Costs, as we will see, are lowered by simplifying the design, maintenance and querying of temporal data. Value, as we will see, is increased by providing faster and more accurate answers to queries that access temporal data. Another important thing about data is that, from time to time, we occasionally get it wrong. We might record the wrong data about a particular customer’s status, indicating, for example, that a VIP customer is really a deadbeat. If we do, then as soon as we find out about the mistake, we will hasten to fix it by updating the customer’s record with the correct data. ix Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com But that doesn’t just correct the mistake. It also covers it up. Auditors are often able to reconstruct erroneous data from backups and logfiles. But for the ordinary query author, no trace remains in the database that the mistake ever occurred, let alone what the mistake was, or when it happen ed, or for how long it went undetected. Fortunately, we can do better than that. Instead of overwriting the mistake, we can keep both the original customer record and its corrected copy in the same table, along with information about when and for how long the original was thought to be correct, and when we finally realized it wasn’t and then did something about it. Moreover, while continuing to provide undisturbed, directly queryable, immediate access to the data that we currently believe is correct, we can also provide that same level of access to data that we once believed was correct but now realize is not correct. There is no generally accepted term for this kind of table. We will call it an assertion table. Assertion tables, as we will see, are essential for recreating reports and queries, at a later time, when the objective is to retrieve the data as it was origi- nally entered, warts and all. Assertion tables are the second of the two kinds of uni-temporal tables. The same data manage- ment methods which lower the cost and increase the value of versioned data also lower the cost and increase the value of asserted data. There are also tables which combine versions and assertions, and combine them in the sense that every row in these tables is both a version and an assertion. These tables contain data about what we currently believe the objects they represent were/are/ will be like, data about what we once believed but no longer believe those objects were/are/will be like, and also data about what we may in the future come to believe those objects were/ are/will be like. Tables like these, tables whose rows contain data about both the past, the present and the future of things, and also about the past, the present and the future of our beliefs about those things, are bi-temporal tables. In spite of several decades of work on temporal data, and a growing awareness of the value of real-time access to it, little has been done to help IT professionals manage temporal data in real-world databa ses. One reason is that a temporal extension to the SQL language has yet to be approved, even though a proposal to add temporal features to the language was submitted over fifteen years ago. Lacking approved stan dards to guide them, DBMS vendors have been slow to build temporal support into their products. x PREFACE Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com In the meantime, IT professionals have developed home-grown support for versioning, but have paid almost no attention to bi-temporality. In many cases, they don’t know what bi-temporality is. In most cases, their business users, unaware of the benefits of bi-temporal data, don’t know to ask for suc h functionality. And among those who have at least heard of bi-temporality, or to whom we have tried to explain it, we have found two common responses. One is that Ralph Kimball solved this problem a long time ago with his three kinds of slowly changing dimensions. Another is that we can get all the temporal func- tionality we need by simply versioning the tables to which we wish to add temporal data. But both responses are mistaken. Slowly changing dimensions do not suppor t bi-temporal data management at all. Nor does versioning. Both are methods of managing versions; but both also fall, as we shall see, far short of the extensive support for versioning that Asserted Versioning provides. Objectives of this Book Seamless Access to Temporal Data One objective of this book is to describe how to manage uni-temporal and bi-temporal data in relational databases in such a way that they can be seamlessly accessed together with current data. 1 By “seamlessly” we mean (i) maintained with transactions simple enough that anyone who writes transactions against conventional tables could write them; (ii) accessed with queries simple enough that anyone who writes queries against conventional tables could write them; and (iii) executed with performance similar to that for transactions and queries that target conventional data only. Encapsulation of Temporal Data Structures and Processes A second objective is to describe how to encapsulate the complexities of uni-temporal and bi-temporal data manage- ment. These complexities are nowhere better illustrated than in a book published ten years ago by Dr. Richard Snodgrass, the 1 Both forms of temporal data can be implemented in non-relational databases also. For that matter, they can be implemented with a set of flat files. We use the language of relational technology simply because the ubiquity of relational database technology makes that terminology a lingua franca within business IT departments. PREFACE xi Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... between non -temporal and temporal data and, in the latter category, the two ways that time and data are interwoven However it is not until Part 2 that we will begin to discuss the complexities of bi -temporal data, and how Asserted Versioning renders that complexity manageable But since there are any number of things we could be talking about under the joint heading of time and data, and since it would... table will contain all the instances of a given type (e.g all policies) that are needed to satisfy a query Think of a world of corporate data in which none of that is necessary, a world in which all pipeline datasets are contained in the single table that is their destination or their point of origin In this world, maintaining data is a “submit it and forget it” activity, not one in which maintenance transactions... coordinated asynchronous feeds from one database to another These processes and environments are both expensive to maintain and conducive to error For example, with history tables, and work -in- progress in external staging areas, and a series of pending transaction datasets, a change to a single semantic unit of information, e.g to the policy type of an insurance policy, may need to be applied to many... 8 Part 1 AN INTRODUCTION TO TEMPORAL DATA MANAGEMENT Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com row the query author is interested in, two of those components being time periods that may or may not intersect in various ways Continuing on, consider the possibilities involved in joining bi -temporal tables to one another, or to uni -temporal tables, or to non -temporal tables!... the level of tables and their instances And at this level, temporal data is still a second-class citizen To manage it, developers have to build temporal structures and the code to manage them, by hand In order to fully appreciate both the costs and the benefits of managing temporal data at this level, we need to see it in the context of methods of temporal data management as a whole In Chapter 1, the... Devlin and Paul Murphy.1 This concept introduced temporal data management at the database level (as opposed to the table, row or column levels), since data warehouses are entire databases devoted to historical data History Tables On an architecturally smaller scale, IT developers were also beginning to design and implement several other ways of managing temporal data One of them was the use of history... and application developers You are the ones who will introduce these methods of temporal data management to your organizations, and explain the value of seamless real -time access to temporal data to your business users Successful implementation of seamless access to all data, and not just to data about the present, will result in better customer service, more accurate accounting, improved forecasting,... “uni -temporal because the term “uni -temporal suggests the idea of a single temporal dimension to the data, a single kind of time associated with the data, and this notion of one (or two) temporal dimensions is a useful one to keep in mind In fact, it may be useful to think of these two temporal dimensions as the X and Y axes of a Cartesian graph, and of each row in a bi -temporal table as represented by... May 2012 to January 2013 The first will report that customer id-1 had data 123 and 456 during that period of time The second will report that customer id-1 had data 123 and 457 during that same period of time So bd1 and ed1 delimit the time period out in the world during which things were as the data describes them, whereas bd2 and ed2 delimit a time period in the table, the time period during which... accounting, improved forecasting, and better tracking of data used in research The methods of managing temporal data introduced in this book will enhance systems used in education, finance, health care, insurance, manufacturing, retailing and transportation—all industries in which the authors have had consulting experience In using these methods, you will play your own role in their evolution If DBMS vendors . possible to restore and then to retrieve historical data, given enough time and effort. But businesses are finding it increasingly important to access historical data, as well as data about the. Versioning provides. Objectives of this Book Seamless Access to Temporal Data One objective of this book is to describe how to manage uni -temporal and bi -temporal data in relational databases in such. to describe how to bring pending transactions into the production tables that are their targets, and how to retain posted transactions in those same tables. Pending transactions are insert, update

Ngày đăng: 27/06/2014, 06:20

TỪ KHÓA LIÊN QUAN