Advanced Database Technology and Design phần 4 potx

56 375 0
Advanced Database Technology and Design phần 4 potx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

is d10, and so d10 shows as the TO value for each tuple that pertains to the current state of affairs. Note: You might be wondering what mechanism could cause all of those d10s to be replaced by d11s on the stroke of mid- night. Unfortunately, we have to set this issue aside for the moment; we will return to it in Section 5.11. Note that the temporal database of Table 5.3 includes all of the infor- mation from the semitemporal one of Table 5.2, together with historical information concerning a previous period (from d02 to d04 ) during which supplier S2 was under contract. The predicate for S_FROM_TO is Sup- plier S# was named SNAME, had status STATUS, was located in city CITY, and was under contract, from day FROM (and not on the day immediately before FROM) to day TO (and not on the day immediately after TO). The predicate for SP_FROM_TO is analogous. 5.3.2.1 Constraints (First Temporal Database) First of all, we need to guard against the absurdity of a FROM-TO pair appearing in which the TO timepoint precedes the FROM timepoint: CONSTRAINT S_FROM_TO_OK IS_EMPTY (S_FROM_TO WHERE TO < FROM); CONSTRAINT SP_FROM_TO_OK IS_EMPTY (SP_FROM_TO WHERE TO < FROM); Next, observe from the underlining in Table 5.3 that we have included the FROM attribute in the primary key for both S_FROM_TO and SP_FROM_TO; for example, the primary key of S_FROM_TO obviously cannot be just {S#}, for then we could not have the same supplier under contract for more than one continuous period. A similar observation applies to SP_FROM_TO. Note: We could have used the TO attributes instead of the FROM attributes; in fact, S_FROM_TO and SP_FROM_TO both have two candidate keys and are good examples of relvars for which there is no obvious reason to choose one of those keys as primary. We make the choices we do purely for definiteness. However, these primary keys do not of themselves capture all of the constraints we would like them to. Consider relvar S_FROM_TO, for exam- ple. It should be clear that if there is a tuple for supplier Sx in that relvar with FROM value f and TO value t, then we want there not to be a tuple for sup- plier Sx in that relvar indicating that Sx was under contract on the day imme- diately before f or the day immediately after t. For example, consider supplier S1, for whom we have just one S_FROM_TO tuple, with FROM = d04 and Temporal Database Systems 151 TO = d10. The mere fact that {S#, FROM} is the primary key for this relvar is clearly insufficient to prevent the appearance of an additional overlap- ping S1 tuple with, say, FROM = d02 and TO = d06, indicating among other things that S1 was under contract on the day immediately before d04. Clearly, what we would like is for these two S1 tuples to be coalesced into a single tuple with FROM = d02 and TO = d10. 7 The fact that {S#, FROM} is the primary key for S_FROM_TO is also insufficient to prevent the appearance of an abutting S1 tuple with, say, FROM = d02 and TO = d03, indicating again that S1 was under contract on the day immediately before d04. As before, what we would like is for the tuples to be coalesced into a single tuple. Here then is a constraint that does prohibit such overlapping and abutting: CONSTRAINT AUG_S_FROM_TO_PK IS_EMPTY (((S_FROM_TO RENAME FROM AS F1, TO AS T1) JOIN (S_FROM_TO RENAME FROM AS F2, TO AS T2)) WHERE (T1 ≥ F2 AND T2 ≥ F1)) OR (F2 = T1+1 OR F1 = T2+1)); This expression is quite complicated, not to mention that we have taken the gross liberty of writing, for example, T1 + 1 to designate the immedi- ate successor of the day denoted by T1, a point we will come back to in Section 5.5. Note: Assuming this constraint is indeed stated (and enforced, of course), some writers would refer to the attribute combination {S#, FROM,TO} as a temporal candidate key (in fact, a temporal primary key). The term is not very good, however, because the temporal candidate key is not in fact a candidate key in the first place. (In Section 5.9, by contrast, we will encounter temporal candidate keys that genuinely are candidate keys in the classical sense.) Next, note carefully that the attribute combination {S#, FROM} in relvar SP_FROM_TO is not a foreign key from SP_FROM_TO to S_FROM_TO (even though it does involve the same attributes, S# and FROM, as the primary key of S_FROM_TO). However, we certainly do 152 Advanced Database Technology and Design 7. Observe that not coalescing such tuples would be almost as bad as permitting duplicates. Duplicates amount to saying the same thing twice. And those two tuples for S1 with overlapping time intervals do indeed say the same thing twice; to be specific, they both say that S1 was under contract on days 4, 5, and 6. need to ensure that if a certain supplier appears in SP_FROM_TO, then that same supplier appears in S_FROM_TO as well: CONSTRAINT AUG_SP_TO_S_FK_AGAIN1 SP_FROM_TO {S#} ⊆ S_FROM_TO {S#}; But constraint AUG_SP_TO_S_FK_AGAIN1 is not enough by itself; we also need to ensure that (even if all desired coalescing of tuples has been done) if SP_FROM_TO shows some supplier as being able to supply some part during some interval of time, then S_FROM_TO shows that same sup- plier as being under contract during that same interval of time. We might try the following: CONSTRAINT AUG_SP_TO_S_FK_AGAIN2 /* Warning  incorrect! */ IS_EMPTY ((S_FROM_TO RENAME FROM AS SF, TO AS ST) JOIN (SP_FROM_TO RENAME FROM AS SPF, TO AS SPT)) WHERE SPF < SF OR SPT > ST); As the comment indicates, however, this specification is in fact incorrect. To see why, let S_FROM_TO be as shown in Table 5.3, and let SP_FROM_TO include a tuple for supplier S2 with, say, FROM = d03 and TO = d04. Such an arrangement is clearly consistent, yet constraint AUG_SP_ TO_S_FK_AGAIN2 as stated actually prohibits it. We will not try to fix this problem here, deferring it instead to a later section (Section 5.9). However, we remark as a matter of terminology that if (as noted earlier) attribute combination {S#, FROM, TO} in relvar S_FROM_TO is regarded as a temporal candidate key, then attribute combination {S#, FROM, TO} in relvar SP_FROM_TO might be regarded as a temporal foreign key (though it is not in fact a foreign key as such). Again, see Section 5.9 for further discussion. 5.3.2.2 Queries (First Temporal Database) Here now are fully temporal versions of Queries 1.1 and 1.2: • Query 3.1: Get S#-FROM-TO triples for suppliers who have been able to supply some part at some time, where FROM and TO together designate a maximal continuous period during which sup- plier S# was in fact able to supply some part. Note: We use the term Temporal Database Systems 153 TEAMFLY Team-Fly ® maximal here as a convenient shorthand to mean (in the case at hand) that supplier S# was unable to supply any part on the day immediately before FROM or after TO. • Query 3.2: Get S#-FROM-TO triples for suppliers who have been unable to supply any parts at all at some time, where FROM and TO together designate a maximal continuous period during which supplier S# was in fact unable to supply any part. Well, you might like to take a little time to convince yourself that, like us, you would really prefer not even to attempt these queries. If you do make the attempt, however, the fact that they can be expressed, albeit exceedingly labo- riously, will eventually emerge, but it will surely be obvious that some kind of shorthand is very desirable. In a nutshell, therefore, the problem of temporal data is that it quickly leads to constraints and queries that are unreasonably complex to stateunless the system provides some well-designed shorthands, of course, which (as we know) todays commercial products do not. 5.4 Intervals We now embark on our development of an appropriate set of shorthands. The first and most fundamental step is to recognize the need to deal with intervals as such in their own right, instead of having to treat them as pairs of separate values as we have been doing up to this point. What exactly is an interval? According to Table 5.3, supplier S1 was able to supply part P1 during the interval from day 4 to day 10. But what does from day 4 to day 10 mean? It is clear that days 5, 6, 7, 8, and 9 are includedbut what about the start and end points, days 4 and 10? It turns out that, given some specific interval, we sometimes want to regard the specified start and end points as included in the interval and sometimes not. If the interval from day 4 to day 10 does include day 4, we say it is closed with respect to its start point; otherwise we say it is open with respect to that point. Likewise, if it includes day 10, we say it is closed with respect to its end point; otherwise we say it is open with respect to that point. Conventionally, therefore, we denote an interval by its start point and its end point (in that order), preceded by either an opening bracket or an opening parenthesis and followed by either a closing bracket or a closing parenthesis. Brackets are used where the interval is closed, parentheses where 154 Advanced Database Technology and Design it is open. Thus, for example, there are four distinct ways to denote the specific interval that runs from day 4 to day 10 inclusive: [d04, d10] [d04, d11) (d03, d10] (d03, d11) Note: You might think it odd to use, for example, an opening bracket but a closing parenthesis; the fact is, however, there are good reasons to allow all four styles. Indeed, the so-called closed-open style (opening bracket, closing parenthesis) is the one most used in practice. 8 However, the closed- closed style (opening bracket, closing bracket) is surely the most intuitive, and we will favor it in what follows. Given that intervals such as [d04,d10] are values in their own right, it makes sense to combine the FROM and TO attributes of, say, SP_FROM_TO (see Table 5.3) into a single attribute, DURING, whose values are drawn from some interval type (see the next section). One imme- diate advantage of this idea is that it avoids the need to make the arbitrary choice as to which of the two candidate keys {S#, FROM} and {S#, TO} should be primary. Another advantage is that it also avoids the need to decide whether the FROM-TO intervals of Table 5.3 are to be interpreted as closed or open with respect to each of FROM and TO; in fact, [d04,d10], [d04,d11), (d03,d10], and (d03,d11) now become four distinct possible representations of the same interval, and we have no need to know which (if any) is the actual representation. Yet another advantage is that relvar con- straints to guard against the absurdity of a FROM ≤ TO pair appearing in which the TO timepoint precedes the FROM timepoint (as we put it in Section 5.3) are no longer necessary, because the constraint FROM TO is implicit in the very notion of an interval type (loosely speaking). Other con- straints might also be simplified, as we will see in Section 5.9. Table 5.4 shows what happens to our example database if we adopt this approach. Temporal Database Systems 155 8. To see why the closed-open style might be advantageous, consider the operation of split- ting the interval [d04,d10] immediately before, say, d07. The result is the immediately adjacent intervals [d04,d07 ) and [d07,d10]. 5.5 Interval Types Our discussion of intervals in the previous section was mostly intuitive in nature; now we need to approach the issue more formally. First of all, observe that the granularity of the interval [d04,d10] is days. More precisely, we could say it is type DATE, by which term we mean that member of the usual family of datetime data types whose precision is day (as opposed to, 156 Advanced Database Technology and Design Table 5.4 The Suppliers and Parts Database (Sample Values)Final Fully Temporal Version, Using Intervals S_DURING S# SNAME STATUS CITY DURING S1 Smith 20 London [d04, d10] S2 Jones 10 Paris [d07, d10] S2 Jones 10 Paris [d02, d04] S3 Blake 30 Paris [d03, d10] S4 Clark 20 London [d04, d10] S5 Adams 30 Athens [d02, d10] SP_DURING S# P# DURING S1 P1 [d04, d10] S1 P2 [d05, d10] S1 P3 [d09, d10] S1 P4 [d05, d10] S1 P5 [d04, d10] S1 P6 [d06, d10] S2 P1 [d02, d04] S2 P2 [d03, d03] S2 P1 [d08, d10] S2 P2 [d09, d10] S3 P2 [d08, d10] S4 P2 [d06, d09] S4 P4 [d04, d08] S4 P5 [d05, d10] say, hour or millisecond or month). This observation allows us to pin down the exact type of the interval in question, as follows: • First and foremost, of course, it is some interval type; this fact by itself is sufficient to determine the operators that are applicable to the interval value in question (just as to say that, for example, a value r is of some relation type is sufficient to determine the opera- torsJOIN, etc.that are applicable to that value r). • Second, the interval in question is, very specifically, an interval from one date to another, and this fact is sufficient to determine the set of interval values that constitute the interval type in question. The specific type of [d04,d10] is thus INTERVAL(DATE), where: a. INTERVAL is a type generator (like RELATION in Tutorial D, or array in conventional programming languages) that allows us to define a variety of specific interval types (see further discussion below); b. DATE is the point type of this specific interval type. It is important to note that, in general, point type PT determines both the type and the precision of the start and end pointsand all points in betweenof values of type INTERVAL(PT ). (In the case of type DATE, of course, the precision is implicit.) Note: Normally, we do not regard precision as part of the applicable type but, rather, as an integrity constraint. Given the declarations DECLARE X TIMESTAMP(3) and DECLARE Y TIMESTAMP(6), for example, X and Y are of the same type but are subject to different constraints (X is constrained to hold millisecond values and Y is constrained to hold microsecond values). Strictly speaking, therefore, to say that, for example, TIMESTAMP(3)or DATEis a legal point type is to bundle together two concepts that should really be kept separate. Instead, it would be better to define two types T1 and T2, both with a TIMESTAMP possible represen- tation but with different precision constraints, and then say that T1 and T2 (not, for example, TIMESTAMP(3) and TIMESTAMP(6)) are legal point types. For simplicity, however, we follow conventional usage in this chapter and pretend that precision is part of the type. What properties must a type possess if it is to be legal as a point type? Well, we have seen that an interval is denoted by its start and end points; we Temporal Database Systems 157 have also seen that (at least informally) an interval consists of a set of points. If we are to be able to determine the complete set of points, given just the start point s and the end point e, we must first be able to determine the point that immediately follows (in some agreed ordering) the point s. We call that immediately following point the successor of s; for simplicity, let us agree to refer to it as s + 1. Then the function by which s + 1 is determined from s is the successor function for the point type (and precision) in question. That successor function must be defined for every value of the point type, except the one designated as last. (There will also be one point designated as first, which is not the successor of anything.) Having determined that s + 1 is the successor of s, we must next deter- mine whether or not s + 1 comes after e, according to the same agreed order- ing for the point type in question. If it does not, then s + 1 is indeed a point in [s,e], and we must now consider the next point, s + 2. Continuing this process until we come to the first point s + n that comes after e (that is, the successor of e), we will discover every point of [s,e]. Noting that s + n is in fact the successor of e (that is, it actually comes immediately after e), we can now safely say that the only property a type PT must have to be legal as a point type is that a successor function must be defined for it. The existence of such a function implies that there must be a total ordering for the values in PT (and we can therefore assume the usual comparison operators<,≥, etc.are available and defined for all pairs of PT values). By the way, you will surely have noticed by now that we are no longer talking about temporal data specifically. Indeed, most of the rest of this chap- ter is about intervals in general rather than time intervals in particular, though we will consider certain specifically temporal issues in Section 5.11. Here then (at last) is a precise definition: Let PT be a point type. Then an interval (or interval value) i of type INTERVAL(PT ) is a scalar value for which two monadic scalar operators (START and END) and one dyadic operator (IN) are defined, such that: a. START(i ) and END(i ) each return a value of type PT. b. START(i ) ≤ END(i ). c. Let p be a value of type PT. Then p IN i is true if and only if START(i ) ≤ p and p ≤ END(i ) are both true. Note the appeals in this definition to the defined successor function for type PT. Note also that, by definition, intervals are always nonempty (that is, there is always at least one point IN any given interval). 158 Advanced Database Technology and Design Observe very carefully that a value of type INTERVAL(PT )isascalar valuethat is, it has no user-visible components. It is true that it does have a possible representationin fact, several possible representations, as we saw in the previous sectionand those possible representations in turn do have user-visible components, but the interval value per se does not. Another way of saying the same thing is to say that intervals are encapsulated. 5.6 Scalar Operators on Intervals In this section we define some useful scalar operators (most of them more or less self-explanatory) that apply to interval values. Consider the interval type INTERVAL(PT ). Let p be a value of type PT. We will continue to use the notation p + 1, p + 2, and so on, to denote the successor of p, the successor of p + 1, and so on (a real language might provide some kind of NEXT opera- tor). Similarly, we will use the notation p − 1, p − 2, and so on, to denote the value whose successor is p, the value whose successor is p  1, and so on (a real language might provide some kind of PRIOR operator). Let p1 and p2 be values in PT. Then we define MAX(p1,p2) to return p2 if p1 < p2 is true and p1 otherwise, and MIN(p1,p2) to return p1 if p1 < p2 is true and p2 otherwise. The notation we have already been using will do for interval selectors (at least in informal contexts). For example, the selector invocations [3,5] and [3,6] both yield that value of type INTERVAL(INTEGER) whose con- tained points are 3, 4, and 5. (A real language would probably require some more explicit syntax, as in, for example, INTERVAL([3,5]).) Let i1 be the interval [s1,e1] of type INTERVAL(PT ). As we have already seen, START(i1) returns s1 and END(i1) returns e1; we additionally define STOP(i1), which returns e1 + 1. Also, let i2 be the interval [s2,e2], also of type INTERVAL(PT ). Then we define the following more or less self-explanatory interval comparison operators. Note: These operators are often known as Allens operators, having first been proposed by Allen in [6]. • i1 = i2 is true if and only if s1 = s2 and e1 = e2 are both true. • i1 BEFORE i2 is true if and only if e1 < s2 is true. • i1 MEETS i2 is true if and only if s2 = e1 + 1 is true or s1 = e2 + 1is true. • i1 OVERLAPS i2 is true if and only if s1 ≤ e2 and s2 ≤ e1 are both true. Temporal Database Systems 159 • i1 DURING i2 is true if and only if s2 ≤ s1 and e2 ≥ e1 are both true. 9 • i1 STARTS i2 is true if and only if s1 = s2 and e1 ≤ e2 are both true. • i1 FINISHES i2 is true if and only if e1 = e2 and s1 ≥ s2 are both true. Following [2], we can also define the following useful additions to Allens operators: • i1 MERGES i2 is true if and only if i1 MEETS i2 is true or i1 OVERLAPS i2 is true. • i1 CONTAINS i2 is true if and only if i2 DURING i1 is true. 10 • To obtain the length, so to speak, of an interval, we have DURATION(i ), which returns the number of points in i. For example, DURATION([d03,d07 ]) = 5. Finally, we define some useful dyadic operators on intervals that return intervals: • i1 UNION i2 yields [MIN(s1,s2),MAX(e1,e2)] if i1 MERGES i2 is true and is otherwise undefined. • i1 INTERSECT i2 yields [MAX(s1,s2),MIN(e1,e2)] if i1 OVER- LAPS i2 is true and is otherwise undefined. Note: UNION and INTERSECT here are the general set operators, not their special relational counterparts. Reference [2] calls them MERGE and INTERVSECT, respectively. 5.7 Aggregate Operators on Intervals In this section we introduce two extremely important operators, UNFOLD and COALESCE. Each of these operators takes a set of intervals all of the same type as its single operand and returns another such set. The result in both cases can be regarded as a particular canonical form for the original set. 160 Advanced Database Technology and Design 9. Observe that here (for once) DURING does not mean throughout the interval in question. 10. INCLUDES might be a better keyword than CONTAINS here; then we could use CONTAINS as the inverse of IN, defining i CONTAINS p to be equivalent to p IN i. [...]... step merely discards part numbers Its result, T1, thus looks like this: S# DURING S1 S1 [d 04, d10 ] [d05, d10 ] S1 [d09, d10 ] S1 [d06, d10 ] S2 [d02, d 04 ] S2 [d03, d03 ] S2 [d08, d10 ] S2 [d09, d10 ] S3 [d08, d10 ] S4 [d06, d10 ] S4 [d 04, d08 ] S4 [d05, d10 ] Team-Fly® 1 64 Advanced Database Technology and Design Note that this relation contains redundant information; for example, we are told no less... (X) AS Y) {ALL BUT X} AS T3 : T3 looks like this: DURING [d 04, d10 ] S2 DURING [d02, d 04 ] [d08, d10 ] S3 DURING [d08, d10 ] S4 T3 UNGROUP Y X S1 Finally, we ungroup: S# DURING [d 04, d10 ] 166 Advanced Database Technology and Design This expression yields the relation we earlier called RESULT In other words, now showing all the steps together (and simplifying slightly), RESULT is the result of evaluating... for his careful review and useful comments Team-Fly® 1 84 Advanced Database Technology and Design References [1] Snodgrass, R T (ed.), The TSQL2 Temporal Query Language, Boston, MA: Kluwer Academic Publishers, 1995 [2] Lorentzos, N A., and Y G Mitsopoulos, “SQL Extension for Interval Data,” IEEE Trans on Knowledge and Data Engineering, Vol 9, No 3, May/June 1997 [3] Darwen, H., and C J Date, Foundation... define the monadic operators UNFOLD and COALESCE Let X be a set of intervals of type INTERVAL(PT ) Then UNFOLD(X ) returns the unfolded form of X, while COALESCE(X ) returns the coalesced form of X Note: We should add that unfolded form and 162 Advanced Database Technology and Design coalesced form are not standard terms; in fact, there do not appear to be any standard terms for these concepts, even... shorthand for them.12 To be specific, it seems worth capturing as a single operation the sequence (a) unfold both operands, (b) take the difference, and then (c) coalesce Here is our proposed further shorthand: R1 I_MINUS R2 ON A R1 and R2 are relational expressions denoting relations r1 and r2 of the same type and A is an attribute of some interval type that is common to those two relations (and the... stands for “interval,” of course) As we have 12 Note that (by contrast) we did not define a special shorthand for temporal projection 170 Advanced Database Technology and Design more or less seen already, this expression is defined to be semantically equivalent to the following: ( ( R1 UNFOLD A ) MINUS ( R2 UNFOLD A ) ) COALESCE A The definitions of possible further “I_” operators, such as I_UNION and. .. Decomposition Even before temporal data was studied and before SQL was invented, for that matter—some writers argued in favor of decomposing relvars as far as possible, instead of just as far as classical normalization would require Some 180 Advanced Database Technology and Design of those writers unfortunately damaged their cause by proposing database designs consisting entirely of binary relvars One... ] S1 [d10, d10 ] S2 [d07, d07 ] S2 [d08, d08 ] S2 [d09, d09 ] S2 [d10, d10 ] S2 [d02, d02 ] S2 [d03, d03 ] S2 [d 04, d 04 ] S3 [d03, d03 ] … ………… Given the sample data of Table 5 .4, T1 actually contains a total of 23 tuples (Exercise: Check this claim.) 168 Advanced Database Technology and Design If we define a “unary relation” version of UNFOLD (analogous to the “unary relation” version of COALESCE),... UPDATE and DELETE problems 176 Advanced Database Technology and Design • UPDATE: The UPDATE problem can be addressed by extending the UPDATE operator as suggested by the following example:15 UPDATE S_DURING WHERE S# = S# (‘S2’) DURING INTERVAL ( [d09,d09] ) STATUS := 20 ; The third line here specifies the interval attribute to which the COALESCED specification applies—DURING in the example and the... contradiction In any case, the only variables in a truly relational database are the relation variables constituting that database Here are some examples of questions arising from the notion of now that you might care to ponder over: • What happens to the interval [now, d 14 ] at midnight on day 14? • What is the value of END([d 04, now]) on day 14? Is it d 14 or is it now ? We believe it is hard to give coherent . where 1 54 Advanced Database Technology and Design it is open. Thus, for example, there are four distinct ways to denote the specific interval that runs from day 4 to day 10 inclusive: [d 04, d10] [d 04, . now move on to Query 3.2. Query 4. 2 is a restatement of that query in terms of the database of Table 5 .4: 166 Advanced Database Technology and Design 11. The A operand could be extended to permit. with a single interval. 1 64 Advanced Database Technology and Design T2 looks like this: S# X S1 DURING [d 04, d10 ] [d05, d10 ] [d09, d10 ] [d06, d10 ] S2 DURING [d02, d 04] [d03, d03] [d08, d10

Ngày đăng: 08/08/2014, 18:21

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan