CHAPTER 13 BETWEEN and OVERLAPS Predicates T HE BETWEEN AND OVERLAPS predicates both offer a shorthand way of showing that one value lies within a range defined by two other values. The BETWEEN predicate works with scalar range limits; the OVERLAPS predicate looks at two time periods (defined either by start and end points or by a starting time and an INTERVAL ) to see if they overlap in time. 13.1 The BETWEEN Predicate The predicate <value expression> [NOT] BETWEEN <low value expression> AND <high value expression> is a feature of SQL that is used often enough to deserve special attention. It is also just tricky enough to fool beginning programmers. This predicate is actually just shorthand for the expression: ((<low value expression> <= <value expression>) AND (<value expression> <= <high value expression>)) Please note that the end points are included in this definition. This predicate works with any data types that can be compared. Most programmers miss this fact and use it only for numeric values, but it can be used for character strings and temporal data as well. The <high 274 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES value expression> and <low value expression> can be expressions or constants, but again, programmers tend to use just constants. 13.1.1 Results with NULL Values The results of this predicate with NULL values for <value expression> , <low value expression> , or <high value expression> follow directly from the definition. If both <low value expression> and <high value expression> are NULL , the result is UNKNOWN for any value of <value expression> . If <low value expression> or <high value expression> is NULL , but not both of them, the result is determined by the value of <value expression> and its comparison with the remaining non- NULL term. If <value expression> is NULL , the results are UNKNOWN for any values of <low value expression> and <high value expression> . 13.1.2 Results with Empty Sets Notice that if <high value expression> is less than <low value expression> , the expression will always be FALSE unless the value is NULL ; then it is UNKNOWN . That is a bit confusing, since there is no value to which <value expression> could resolve itself that would produce a TRUE result. But this follows directly from expanding the definition: x BETWEEN 12 AND 15 depends on the value of x x BETWEEN 15 AND 12 always FALSE x BETWEEN NULL AND 15 always UNKNOWN NULL BETWEEN 12 AND 15 always UNKNOWN x BETWEEN 12 AND NULL always UNKNOWN x BETWEEN x AND x always TRUE 13.1.3 Programming Tips The BETWEEN range includes the end points, so you have to be careful. Here is an example that deals with changing a percent range on a test into a letter grade: Grades low_score high_score grade ========================= 90 100 'A' 13.2 OVERLAPS Predicate 275 80 90 'B' 70 80 'C' 60 70 'D' 00 60 'F' However, this will not work when a student gets a grade on the borderlines (90, 80, 70, or 60). One way to solve the problem is to change the table by adding 1 to the low scores. Of course, the student who got 90.1 will argue that he should have gotten an ‘A’ and not a ‘B’. If you add 0.01 to the low scores, the student who got 90.001 will argue that he should have gotten an ‘A’ and not a ‘B’, and so forth. This is a problem with a continuous variable. A better solution might be to change the predicate to (score BETWEEN low_score AND high_score) AND (score > low_score) or simply to ((low_score < score) AND (score <= high_score)) . Neither approach will be much different in this example, since few values will fall on the borders between grades and this table is very, very small. As a sidebar, the reader might want to look up an introductory book to fuzzy logic. In that model, an entity can have a degree of membership in a set, rather than being strictly in or out of the set. Some experimental databases use fuzzy logic. However, some indexing schemes might make the BETWEEN predicate the better choice for larger tables of this sort. They will keep index values in trees whose nodes hold a range of values (look up a description of the B-Tree family in a computer science book). An optimizer can compare the range of values in the BETWEEN predicate to the range of values in the index nodes as a single action. If the BETWEEN predicate were presented as two comparisons, it might execute them as separate actions against the database, which would be slower. 13.2 OVERLAPS Predicate The OVERLAPS predicate is a feature not yet available in most SQL implementations, because it requires more of the Standard SQL temporal data features than most implementations have. Many programmers have been faking the functionality of the INTERVAL data type with the existing date and time features of their products. 13.2.1 Time Periods and OVERLAPS Predicate An INTERVAL is a measure of temporal duration, expressed in units such as days, hours, minutes, and so forth. This is how you add or 276 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES subtract days to or from a date, hours and minutes to or from a time, and so forth. When INTERVAL s are more generally available, you will also have an OVERLAPS predicate, which compares two time periods. These time periods are defined as row values with two columns. The first column (the starting time) of the pair is always a <datetime> data type, and the second column (the termination time) is a <datetime> data type that can be used to compute a <datetime> value. If the starting and termination times are the same, this is an instantaneous event. The result of the <overlaps predicate> is formally defined as the result of the following expression: (S1 > S2 AND NOT (S1 >= T2 AND T1 >= T2)) OR (S2 > S1 AND NOT (S2 >= T1 AND T2 >= T1)) OR (S1 = S2 AND (T1 <> T2 OR T1 = T2)) In this expression, S1 and S2 are the starting times of the two time periods, and T1 and T2 are their termination times. The rules for the OVERLAPS predicate should be intuitive, but they are not. The principles that we wanted in the standard were: 1. A time period includes its starting point, but does not include its end point. The reason for this model is that it follows the ISO convention that there is no 24:00 today; midnight is 00:00 tomorrow. Half-open durations have closure properties that are useful. The concatenation of two half-open durations is a half-open duration. 2. If the time periods are not instantaneous, they overlap when they share a common time period. 3. If the first term of the predicate is an INTERVAL and the second term is an instantaneous event (a <datetime> data type), they overlap when the second term is in the time period (but is not the end point of the time period). 4. If the first and second terms are both instantaneous events, they overlap only when they are equal. 5. If the starting time is NULL and the finishing time is a <datetime> value, the finishing time becomes the starting time and we have an event. If the starting time is NULL and the finishing time is an INTERVAL value, then both the finishing and starting times are NULL. 13.2 OVERLAPS Predicate 277 Please consider how your intuition reacts to these results, when the granularity is at the YEAR-MONTH-DAY level. Remember that a day begins at 00:00. (today, today) OVERLAPS (today, today) is TRUE (today, tomorrow) OVERLAPS (today, today) is TRUE (today, tomorrow) OVERLAPS (tomorrow, tomorrow) is FALSE (yesterday, today) OVERLAPS (today, tomorrow) is FALSE Since the OVERLAPS predicate is not yet common in SQL products, let’s see what we have to do to handle overlapping times. Consider a table of hotel guests with the days of their stays and a table of special events being held at the hotel. The tables might look like this: CREATE TABLE Guests (guest_name CHARACTER(30) NOT NULL PRIMARY KEY, arrival_date DATE NOT NULL, depart_date DATE NOT NULL, ); Guests guest_name arrival_date depart_date ============================================== 'Dorothy Gale' '2005-02-01' '2005-11-01' 'Indiana Jones' '2005-02-01' '2005-02-01' 'Don Quixote' '2005-01-01' '2005-10-01' 'James T. Kirk' '2005-02-01' '2005-02-28' 'Santa Claus' '2005-12-01' '2005-12-25' CREATE TABLE Celebrations (eventname CHARACTER(30) PRIMARY KEY, start_date DATE NOT NULL, finish_date DATE NOT NULL, ); Celebrations celeb_name start_date finish_date ================================================== 'Apple Month' '2005-02-01' '2005-02-28' 'Christmas Season' '2005-12-01' '2005-12-25' 278 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES 'Garlic Festival' '2005-01-15' '2005-02-15' 'National Pear Week' '2005-01-01' '2005-01-07' 'New Year's Day' '2005-01-01' '2005-01-01' 'St. Fred's Day' '2005-02-24' '2005-02-24' 'Year of the Prune' '2005-01-01' '2005-12-31' The BETWEEN operator will work just fine with single dates that fall between the starting and finishing dates of these celebrations, but please remember that the BETWEEN predicate will include the end point of an interval, and the OVERLAPS predicate will not. To find out if a particular date occurs during an event, you can simply write queries like: SELECT guest_name, ' arrived during ', celeb_name FROM Guests, Celebrations WHERE arrival_date BETWEEN start_date AND finish_date AND arrival_date <> finish_date; This query will find the guests who arrived at the hotel during each event. The final predicate can be kept, if you want to conform to the ANSI convention, or dropped, if that makes more sense in your situation. From now on, we will keep both end points to make the queries easier to read. SELECT guest_name, ' arrived during ', celeb_name FROM Guests, Celebrations WHERE arrival_date BETWEEN start_date AND finish_date; Results guest_name " arrived during " celeb_name ========================================================= 'Dorothy Gale' 'arrived during' 'Apple Month' 'Dorothy Gale' 'arrived during' 'Garlic Festival' 'Dorothy Gale' 'arrived during' 'Year of the Prune' 'Indiana Jones' 'arrived during' 'Apple Month' 'Indiana Jones' 'arrived during' 'Garlic Festival' 'Indiana Jones' 'arrived during' 'Year of the Prune' 'Don Quixote' 'arrived during' 'National Pear Week' 'Don Quixote' 'arrived during' 'New Year's Day' 'Don Quixote' 'arrived during' 'Year of the Prune' 'James T. Kirk' 'arrived during' 'Apple Month' 'James T. Kirk' 'arrived during' 'Garlic Festival' 13.2 OVERLAPS Predicate 279 'James T. Kirk' 'arrived during' 'Year of the Prune' 'Santa Claus' 'arrived during' 'Christmas Season' 'Santa Claus' 'arrived during' 'Year of the Prune' The obvious question is which guests were at the hotel during each event. A common programming error when trying to find out if two intervals overlap is to write the query with the BETWEEN predicate, thus: SELECT guest_name, ' was here during ', celeb_name FROM Guests, Celebrations WHERE arrival_date BETWEEN start_date AND finish_date OR depart_date BETWEEN start_date AND finish_date; This is wrong, because it does not cover the case where the event began and finished during the guest’s visit. Seeing his error, the programmer will sit down and draw a timeline diagram of all four possible overlapping cases, as shown in Figure 13.1. So the programmer adds more predicates, thus: SELECT guest_name, ' was here during ', celeb_name FROM Guests, Celebrations WHERE arrival_date BETWEEN start_date AND finish_date OR depart_date BETWEEN start_date AND finish_date OR start_date BETWEEN arrival_date AND depart_date OR finish_date BETWEEN arrival_date AND depart_date; A thoughtful programmer will notice that the last predicate is not needed and might drop it, but either way, this is a correct query. But it is not the best answer. In the case of the overlapping intervals, there are two cases where a guest’s stay at the hotel and an event do not both fall within the same time frame: either the guest checked out before the Figure 13.1 Timeline Diagram of All Possible Overlapping Cases. 280 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES event started, or the event ended before the guest arrived. If you want to do the logic, that is what the first predicate will work out to be when you also add the conditions that arrival_date <= depart_date and start_date <= finish_date. But it is easier to see in a timeline diagram, thus: Both cases can be represented in one SQL statement as: SELECT guest_name, celeb_name FROM Guests, Celebrations WHERE NOT ((depart_date < start_date) OR (arrival_date > finish_date)); VIEW GuestsEvents guest_name celeb_name ====================================== 'Dorothy Gale' 'Apple Month' 'Dorothy Gale' 'Garlic Festival' 'Dorothy Gale' 'St. Fred's Day' 'Dorothy Gale' 'Year of the Prune' 'Indiana Jones' 'Apple Month' 'Indiana Jones' 'Garlic Festival' 'Indiana Jones' 'Year of the Prune' 'Don Quixote' 'Apple Month' 'Don Quixote' 'Garlic Festival' 'Don Quixote' 'National Pear Week' 'Don Quixote' 'New Year's Day' 'Don Quixote' 'St. Fred's Day' 'Don Quixote' 'Year of the Prune' 'James T. Kirk' 'Apple Month' 'James T. Kirk' 'Garlic Festival' 'James T. Kirk' 'St. Fred's Day' 'James T. Kirk' 'Year of the Prune' 'Santa Claus' 'Christmas Season' 'Santa Claus' 'Year of the Prune' This VIEW is handy for other queries. The reason for using the NOT in the WHERE clause is so that you can add or remove it to reverse the sense Figure 13.2 Timeline Diagram. 13.2 OVERLAPS Predicate 281 of the query. For example, to find out how many celebrations each guest could have seen, you would write: CREATE VIEW GuestCelebrations (guest_name, celeb_name) AS SELECT guest_name, celeb_name FROM Guests, Celebrations WHERE NOT ((depart_date < start_date) OR (arrival_date > finish_date)); SELECT guest_name, COUNT(*) AS celebcount FROM GuestCelebrations GROUP BY guest_name; Results guest_name celebcount ========================= 'Dorothy Gale' 4 'Indiana Jones' 3 'Don Quixote' 6 'James T. Kirk' 4 'Santa Claus' 2 Then, to find out how many guests were at the hotel during each celebration, you would write: SELECT celeb_name, COUNT(*) AS guestcount FROM GuestCelebrations GROUP BY celeb_name; Result celeb_name guestcount ============================ 'Apple Month' 4 'Christmas Season' 1 'Garlic Festival' 4 'National Pear Week' 1 'New Year's Day' 1 'St. Fred's Day' 3 'Year of the Prune' 5 This last query is only part of the story. What the hotel management really wants to know is how many room nights were sold for a . Week' '200 5-0 1-0 1' '200 5-0 1-0 7' 'New Year&apos ;s Day' '200 5-0 1-0 1' '200 5-0 1-0 1' 'St. Fred&apos ;s Day' '200 5-0 2-2 4' '200 5-0 2-2 4' . '200 5-0 2-0 1' '200 5-0 2-0 1' 'Don Quixote' '200 5-0 1-0 1' '200 5-1 0-0 1' 'James T. Kirk' '200 5-0 2-0 1' '200 5-0 2-2 8' . Festival' 'James T. Kirk' 'St. Fred&apos ;s Day' 'James T. Kirk' 'Year of the Prune' 'Santa Claus' 'Christmas Season' 'Santa Claus'