662 CHAPTER 29: TEMPORAL QUERIES There is a technical difference between a Julian date and a Julianized date. A Julian date is an astronomer’s term that counts the number of days since January 1, 4713 B.C.E. This count is now well over 2 billion; only astronomers use it. However, computer companies have corrupted the term to mean a count from some point in time from which they can build a date or time. The fixed point is usually the year 1, or 1900, or the start of the Gregorian calendar. A Julianized, or ordinal, date is the position of the date within its year, so it falls between 1 and 365 or 366. You will see this number printed on the bottom edges of desk calendar pages. The usual way to find the Julianized day within the current year is to use a simple program that stores the number of days in each month as an array and sums them with the day of the month for the date in question. The only difficult part is remembering to add one if the year is a leap year and the month is after February. Here is a very fast and compact algorithm that computes the Julian date from a Gregorian date and vice versa. These algorithms appeared as Algorithm 199 (ACM 1980) and were first written in ALGOL by Robert Tantzen. Here are SQL translations of the code: CREATE FUNCTION Julianize1 (greg_day INTEGER, greg_month INTEGER, greg_year INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN DECLARE century INTEGER; DECLARE yearincentury INTEGER; IF (greg_month > 2) THEN SET greg_month = greg_month - 3; ELSE SET greg_month = greg_month + 9; SET greg_year = greg_year - 1; END IF; SET century = greg_year/100; SET yearincentury = greg_year - 100 * century; RETURN ((146097 * century)/4 + (1461 * yearincentury)/4 + (153 * greg_month + 2)/5 + greg_day + 1721119); END; Remember that the division will be integer division, because the variables involved are all integers. Here is a Pascal procedure taken from 29.4 Julian Dates 663 Numerical Recipes in Pascal (Press et al. 1990) for converting a Georgian date to a Julian date. First, you need to know the difference between TRUNCATE() and FLOOR(). The FLOOR() function is also called the greatest integer function; it returns the greatest integer less than its argument. The TRUNCATE() function returns the integer part of a number. Thus, they behave differently with negative decimals. FLOOR(-2.5) = -3 FLOOR(-2) = -2 FLOOR(2.5) = 2 FLOOR(2) = 2 TRUNCATE(-2.5) = -2 TRUNCATE(-2) = -2 TRUNCATE(2.5) = 2 TRUNCATE(2) = 2 Here is an SQL/PSM version of the algorithm. CREATE FUNCTION Julianize (IN greg_year INTEGER, IN greg_month INTEGER, IN greg_day INTEGER) RETURNS INTEGER BEGIN DECLARE gregorian INTEGER; DECLARE greg_year INTEGER; DECLARE jul_leap INTEGER; DECLARE greg_month INTEGER; SET gregorian = 588829; IF greg_year = 0 error: no greg_year zero THEN SIGNAL SQLSTATE 'no year zero'; not actual SQL state code! END IF; IF greg_year < 0 THEN SET greg_year = greg_year + 1; END IF; IF greg_month > 2 THEN SET greg_year = greg_year; SET greg_month = greg_month + 1; ELSE SET greg_year = greg_year - 1; SET greg_month = greg_month + 13; 664 CHAPTER 29: TEMPORAL QUERIES END IF; SET greg_day = TRUNCATE(365.2522 * greg_year) + TRUNCATE(30.6001 * greg_month) + greg_day + 1720995; IF (greg_day + 31 * (greg_month + 12 * greg_year) >= gregorian) THEN SET jul_leap = TRUNCATE(greg_year * 0.01); SET greg_day = greg_day + 2 - jul_leap + TRUNCATE(0.25 * jul_leap); END IF; END; This algorithm to convert a Julian day number into a Gregorian calendar date is due to Peter Meyer. You need to assume that you have FLOOR() and TRUNCATE() functions. CREATE PROCEDURE JulDate (IN julian INTEGER, OUT greg_year INTEGER, OUT greg_month INTEGER, OUT greg_day INTEGER) LANGUAGE SQL DETERMINISTIC BEGIN DECLARE z INTEGER; DECLARE r INTEGER; DECLARE g INTEGER; DECLARE a INTEGER; DECLARE b INTEGER; SET z = FLOOR(julian - 1721118.5); SET r = julian - 1721118.5 - z; SET g = z - 0.25; SET a = FLOOR(g/36524.25); SET b = a - FLOOR(a/4.0); SET greg_year = FLOOR((b + g)/365.25); SET c = b + z - FLOOR(365.25 * greg_year); SET greg_month = TRUNCATE((5 * c + 456)/153); SET greg_day = c - TRUNCATE((153 * greg_month - 457)/5) + r; IF greg_month > 12 THEN SET greg_year = greg_year + 1; SET greg_month = greg_month - 12; END IF; END; 29.5 Date and Time Extraction Functions 665 There are two problems with these algorithms. First, the Julian day the astronomers use starts at noon. If you think about it, it makes sense because they are doing their work at night. The second problem is that the integers involved get large, and you cannot use floating-point numbers to replace them because the rounding errors are too great. You need long integers that can go to 2.5 million. 29.5 Date and Time Extraction Functions No two SQL products agree on the functions that should be available for use with <datetime> data types. In keeping with the practice of overloading functions, the SQL3 proposal has a function for extracting components from a datetime or interval value. The syntax looks like this: <extract expression> ::= EXTRACT (<extract field> FROM <extract source>) <extract field> ::= <datetime field> | <time zone field> <time zone field> ::= TIMEZONE_HOUR | TIMEZONE_MINUTE <extract source> ::= <datetime value expression> | <interval value expression> The interesting feature is that this function always returns a numeric value. For example, EXTRACT (MONTH FROM birthday) will be an INTEGER between 1 and 12. Vendors might also separate functions, such as YEAR(<date>), MONTH(<date>), and DAY(<date>), that extract components from a <datetime> data type. Most versions of SQL also have a library function something like MAKEDATE(<year>, <month>, <day>), DATE(<year>, <month>, <day>), or an equivalent, which will construct a date from three numbers representing a year, month, and day. Standard SQL uses the CAST function, but the details are not pretty, since it involves assembling a string in the ISO format and then converting it to a date. Bill Karwin came up with a fairly portable trick for doing extraction in SQL products that do not have this library function. Use the LIKE predicate and CAST() operator (or whatever the product uses for formatting output) to convert the DATE expressions into character string expressions and test them against a template. For example, to find all the rows of data in the month of March: 666 CHAPTER 29: TEMPORAL QUERIES SELECT * FROM Table1 WHERE CAST(datefield AS CHAR(10)) LIKE '%MAR%'; Obviously, this technique can be extended to use other string functions to search for parts of a date or time, to look for ranges of dates and so forth. The best warning is to read your SQL product manual and see what you can do with its library functions. 29.6 Other Temporal Functions Another common set of functions, which are not represented in standard SQL, deal with weeks. For example, Sybase’s SQL Anywhere (née WATCOM SQL) has a DOW(<date>) that returns a number between 1 and 7 to represent the day of the week (1 = Sunday, 2 = Monday, . . ., 7 = Saturday, following an ISO standard convention). You can also find functions that add or subtract weeks from a date given the number of the date within the year and so on. The function for finding the day of the week for a date is called Zeller’s algorithm: CREATE FUNCTION Zeller (IN z_year INTEGER, IN z_month INTEGER, IN z_day INTEGER) RETURNS INTEGER LANGUAGE SQL DETERMINISTIC BEGIN DECLARE m INTEGER; DECLARE d INTEGER; DECLARE y INTEGER; SET y = z_year; SET m = z_month - 2; IF (m <= 0) THEN SET m = m + 12; SET y = y - 1; END IF; RETURN (MOD((z_day + (13 * m - 1)/5 + 5 * MOD(y, 100)/4 - 7 * y/400), 7) + 1); END; 29.7 Weeks 667 DB2 and other SQLs have an AGE(<date1>, <date2>) function, which returns the difference in years between <date1> and <date2>. The table in Section 9.1 gives a summary of the type conversions involving <datetimes> and <intervals> in Standard SQL. Arithmetic operations involving <datetimes> or <intervals> obey the natural rules associated with dates and times and yield valid <datetime> or <interval> results according to the Common Era calendar. Operations involving items of type <datetime> require that the <datetime> items be mutually comparable. Operations involving intervals require that the <interval> items be mutually comparable. Operations involving a <datetime> and an <interval> preserve the time zone of the <datetime> operand. If the <datetime> operand does not include a time zone part, then the local time zone is used. The OVERLAPS predicate determines whether two chronological periods overlap in time (see Section 13.2 for details). A chronological period is specified either as a pair of <datetimes> (starting and ending) or as a starting <datetime> and an <interval>. EXTRACT (<temporal unit> FROM <temporal expression>) takes a <datetime> or an <interval> and returns an exact numeric value representing the value of one component of the <datetime> or <interval>. 29.7 Weeks Weeks are not part of the SQL temporal functions, but they are part of ISO-8601 Standards. While it is not as common in the United States as it is in Europe, many commercial and industrial applications use the week within a year as a unit of time. Week 01 of a year is defined as the first week that has the Thursday in that year, which is equivalent to the week that contains the fourth day of January. In other words, the first week of a New Year is the week that has the majority of its days in the New Year. Week 01 might also contain days from the previous year, so it does not align with the years. The standard notation uses the letter ‘W’ to announce that the following two digits are a week number. The week number component of the vector can be separated with a hyphen or not, as required by space. 1999-W01 or 1999W01 668 CHAPTER 29: TEMPORAL QUERIES A single digit between 1 and 7 can extend this notation for the day of the week. For example, the day 1996-12-31, which is Tuesday (day 2) of the first week of 1997, can be shown as: 1999-W01-2 or 1999W012 The ISO standard avoids explicitly stating the possible range of week numbers, but a little thought will show that the range is between 01 and 52 or between 01 and 53, depending on the particular year. There is one exception to the rule that a year has at least 52 weeks: 1753, when the Gregorian calendar was introduced, had less than 365 days and therefore less than 52 weeks. SQL Server programmers have to be very careful, because their product does not follow ISO Standards for numbering the weeks in its function library. Furthermore, it is not easy to see how to calculate the weeks between two different dates. Here is an example from Rudy Limeback (SQL Consultant, r937.com) taken from http://searchdatabase.techtarget.com/dateQuestionNResponse/ 0,289625,sid13_cid517627_tax285649,00.html, Suppose we have a beginning date of ‘2003-02-06’ and an end date of ‘2003-02-19’. I would like to see the weeks as two because the 17th is not a Tuesday. There are a number of ways to approach this problem, and the solution depends on what the meaning of the word “week” is. Here is the calendar for that month, just in case you cannot figure it out in your head. Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 In this example, we want the number of weeks between February 6th and 19th. First method: One. One week after the 6th is the 13th. Another week is the 20th. Since we are only as far as the 19th, it is not two weeks yet. Second method: Two. The number of days is 14, if we count both the 6th and the 19th at the beginning and end of the specified range. Since there are seven days in a week, 14/7 = 2. 29.7 Weeks 669 Third method: One. We should not count both the beginning and end days. We do not do it for years, for example. How many years are between 1999 and 2007? Most people would say eight, not nine, and they do this by subtracting the earlier from the later. So using days, 19 − 6 = 13. Then 13/7 = 1.857142. . . which truncates to one. Fourth method: Two. We want a whole number of weeks, so it is okay to round 1.857142 up to 2. Fifth method: One. Did you mean whole weeks? There’s only one whole week in that date range, and it is the week from the 9th to the 15th. In fact, if the starting date were the 3rd and the ending date the 21st, that would be 18 (or 19) days, and there’s still only one whole week in there. Sixth method: Three. February 6th is in week 6 of 2003. February 19th is in week 8. Between them are several days from each of three different weeks. Seventh method: Two. February 6th is in week 6 of 2003. February 19th is in week 8. Subtract the week numbers to get 2. This is why Standard SQL prefers to deal with days, a nice unit of time that does not have fractional parts. 29.7.1 Sorting by Weekday Names This trick is due to Craig S. Mullins. There is a table with a column containing the name of the days of the week, on which an event happened like this: CREATE TABLE Foobar ( day_name CHAR(3) NOT NULL CHECK day_name IN ('SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT'), ); How do we sort it properly? We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. Well, if we write the first query that comes to mind, the results will obviously be sorted improperly: SELECT day_name, col1, col2, FROM Foobar ORDER BY day_name; 670 CHAPTER 29: TEMPORAL QUERIES The results from this query would be ordered alphabetically; in other words: FRI MON SAT SUN THU TUE WED Of course, one solution would be to design the table with a numeric column that uses Zeller’s number. There is another solution that is both elegant and does not require any change to the database. SELECT day_name, col1, cl2, , POSITION (day_name IN 'SUNMONTUEWEDTHUFRISAT') AS day_nbr FROM Foobar ORDER BY day_nbr; Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value, given only the day’s name. CAST (POSITION (day_name IN 'SUNMONTUEWEDTHUFRISAT')/3 AS INTEGER) + 1; Obviously the same trick can be used with the three-letter month abbreviations. This was very handy in the first release of ACCESS, which did sort dates alphabetically. 29.8 Modeling Time in Tables Since the nature of time is a continuum, and the ISO model is half-open intervals, the best approach is to have (start_time, end_time) pairs for each event in a history. This is a state transition model of data, where the facts represented by the columns in that row were true for the time period given. For this to work, we need the constraint that the (start_time, end_time) pairs do not overlap. 29.8 Modeling Time in Tables 671 A NULL ending time is the flag for an “unfinished fact,” such as a hotel room stay that is still in progress. A history for an entity can clearly have at most one NULL at a time. CREATE TABLE FoobarHistory (foo_key INTEGER NOT NULL, start_date DATE DEFAULT CURRENT_DATE NOT NULL, PRIMARY KEY (foo_key, start_date), end_date TIMESTAMP, null means current foo_status INTEGER NOT NULL, CONSTRAINT started_before_ended CHECK(start_date < end_date), CONSTRAINT end_time_open_interval CHECK (end_date = CAST(end_date AS DATE) + INTERVAL '23:59:59.999' HOUR), CONSTRAINT no_date_overlaps CHECK (NOT EXISTS (SELECT * FROM FoobarHistory AS H1, Calendar AS C1 WHERE C1.cal_date BETWEEN H1.start_date AND H1.end_date GROUP BY foo_key HAVING COUNT(*) > 1)), CONSTRAINT only_one_current_status CHECK (NOT EXISTS (SELECT * FROM FoobarHistory AS H1 WHERE H1.end_date IS NULL GROUP BY foo_key HAVING COUNT(*) > 1)) ); The Calendar table is explained in a following section. Table level CHECK() constraints are still not common in SQL implementations, so you might have to use a TRIGGER to enforce integrity. . this: CREATE TABLE Foobar ( day_name CHAR(3) NOT NULL CHECK day_name IN ('SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT'), . are not represented in standard SQL, deal with weeks. For example, Sybase s SQL Anywhere (née WATCOM SQL) has a DOW(<date>) that returns a number between 1 and 7 to represent the day. from three numbers representing a year, month, and day. Standard SQL uses the CAST function, but the details are not pretty, since it involves assembling a string in the ISO format and then