SQL Performance Tuning: Dates in SQL
By Peter Gulutzan,Trudy Pelzer
Date: Feb 14, 2003
Article is provided courtesy of Addison Wesley.
Using dates in database-management systems (DBMSs) brings up some portability problems and uncovers some undocumented and underexplained features. Peter Gulutzan and Trudy Pelzer explore the DATE data type and its impact on eight SQL DBMSs.
During research for our book SQL Performance Tuning (Addison Wesley Professional, 2002), we found that using dates brings up some portability problems, some undocumented features, and some areas that cry out for more explanation. This article shows how the DATE data type works with eight SQL database-management systems: IBM DB2, Informix, Ingres, InterBase, Microsoft SQL Server, MySQL, Oracle, and Sybase (we'll call these "the Big Eight DBMSs").
Our concern is what you can do with dates, so we'll leave aside information on how to format a date and other locale-specific trivia. We'll use extended dates such as January 4 1970 throughout, except when we're showing a value in ANSI/ISO SQL Standard form (such as 1970-01-04).
Range of Values
The next chart shows the minimum and maximum date values according to the ANSI/ISO SQL Standard and the Big Eight. (Note: MySQL's minimum is really 0001-01-01; there's an error in its documentation.)
|
Minimum Value |
Maximum Value |
|
|
ANSI SQL |
0001-01-01 |
9999-12-31 |
|
IBM |
0001-01-01 |
9999-12-31 |
|
Informix |
0001-01-01 |
9999-12-31 |
|
Ingres |
0001-01-01 |
9999-12-31 |
|
InterBase |
0001-01-01 |
9999-12-31 |
|
Microsoft |
1753-01-01 |
9999-12-31 |
|
MySQL |
1000-01-01 |
9999-12-31 |
|
Oracle |
4713-01-01 BC |
9999-12-31 |
|
Sybase |
1753-01-01 |
9999-12-31 |
Generally, the range of values is between the year 1 and the year 9999, but what's interesting is the odd-looking minimum date values for Microsoft, Oracle, and Sybase.
The explanation of Microsoft and Sybase's minimum date is that they're trying to avoid the questions that arise with date arithmetic that uses the Julian calendar rules (which we'll discuss in the next section, "The Western Calendar").
Oracle's minimum date can be explained by the fact that it's allowing for all the Julian Day Numbers (see the section "Julian Day Numbers," below). Incidentally, we are very sure that the minimum value is 4713 BCwe know that the Oracle manual says 4712 BC, but the manual is wrong.
The Western Calendar
During the period between 1 A.D. and today, the Western world has actually used two main calendars: the Julian calendar of Julius Caesar and the Gregorian calendar of Pope Gregory XIII. The two calendars differ with respect to only one rule: the rule for deciding what a leap year is. In the Julian calendar, all years divisible by four are leap years. In the Gregorian calendar, all years divisible by four are leap years, except that years divisible by 100 (but not divisible by 400) are not leap years. Thus, the years 1700, 1800, and 1900 are leap years in the Julian calendar but not in the Gregorian calendar, while the years 1600 and 2000 are leap years in both calendars.
When Pope Gregory XIII introduced his calendar in 1582, he also directed that the days between October 4, 1582, and October 15, 1582, should be skippedthat is, he said that the day after October 4 should be October 15. Many countries delayed changing over, though. England and her colonies didn't switch from Julian to Gregorian reckoning until 1752, so for them, the skipped dates were between September 4 and September 14, 1752. Other countries switched at other times, but 1582 and 1752 are the relevant dates for the DBMSs that we're discussing.
Thus, two problems arise with date arithmetic when one goes back many years. The first is, should leap years before the switch be calculated according to the Julian or the Gregorian rules? The second problem is, when and how should the skipped days be handled?
This is how the Big Eight DBMSs handle these questions:
Pretend there was no switch. This is what the SQL Standard seems to require, although the standard document is unclear: It just says that dates are "constrained by the natural rules for dates using the Gregorian calendar"whatever "natural rules" are. This is the option that DB2 chose. When there is a pretence that a single calendar's rules have always applied even to times when nobody heard of the calendar, the technical term is that a "proleptic" calendar is in force. So, for example, we could say that DB2 follows a proleptic Gregorian calendar.
Avoid the problem entirely. Microsoft and Sybase set their minimum date values at January 1, 1753, safely past the time that America switched calendars. This is defendable, but from time to time complaints surface that these two DBMSs lack a useful functionality that the other DBMSs have and that the SQL Standard requires.
Pick 1582. This is what Oracle did. An Oracle user would find that the date-arithmetic expression October 15 1582 minus October 4 1582 yields a value of 1 day (because October 514 don't exist) and that the date February 29 1300 is valid (because the Julian leap-year rule applies). Why did Oracle go to extra trouble when the SQL Standard doesn't seem to require it? The answer is that users might require it. Historians and astronomers use this hybrid system instead of a proleptic Gregorian calendar. (This is also the default option that Sun picked when implementing the GregorianCalendar class for Javadespite the name, GregorianCalendar is a hybrid calendar.)
Julian Day Numbers
There is a simpler way to measure the passage of days. Based on a date that Joseph Scaliger suggested in 1583, astronomers use a continuous count of days since a fixed start point that is well before recorded history. Now, because many documentation sources (including most vendors' manuals) have this detail wrong, we're going to use a bit of emphasis here:
Julian Day 0 is the same as January 1 4713 B.C. on the proleptic Julian calendar.
In most DBMSs, there is a function for getting the Julian Day Number given a DATE value. Unfortunately, for a few DBMSs, there is a tendency to use the word Julian when giving the ordinal day number within a yearfor example, they say that any January 1 date is Julian Day 1.
Naturally, different DBMSs return different values for a Julian-day function, depending on the way that they handle the Julian/Gregorian switch.
Day of the Week
The majority of DBMSs have a function that returns the day of the week as a number. There is some dispute whether the first day is Sunday or Monday.
The arguments for Sunday are as follows:
The ODBC spec for the DAYOFWEEK function says that Sunday is Day 1 and Saturday is Day 7.
Jews and Christians agree that the first Sabbaththe "seventh day" on which God restedwas Saturday.
The German word for Wednesday, Mittwoch, means midweek.
The arguments for Monday are these:
The ISO 8601 standard says that Monday is Day 1 and Sunday is Day 7.
Other bodies such as the BSI follow the ISO in this respect.
When no built-in function is available, you can use SQL's MOD function with the Julian Day Number as an argument.
Week of the Year
The Big Eight DBMSs have five different ways to calculate a week number:
The first day of the first week is January 1. This has the advantage of avoiding weeks that are split across years, but it doesn't correspond to the "first day of the week" possibilities that we've just discussed. Supporters include Oracle.
Weeks start on Monday, and the first week of the year is the first week that has a Thursday in it. This has the advantage that all weeks have seven days. On the other hand, this can mean that days between January 1 and January 3 are counted as being part of a week in the previous year. Supporters include ISO, DB2, Ingres, and Oracle.
Weeks start on Sunday, and the first week starts wherever the first Sunday is. This has the same advantages and disadvantages as the previous method.
Weeks start on Monday, and the first week starts wherever the first Monday is. Any days before the first Monday are Week 0. This has nearly the same advantages and disadvantages as methods two and three. Supporters include Ingres.
Weeks start on Sunday, and the first week (Week 1) is the Sunday that begins the week of January 1. Values are from 1 to 54. Supporters include DB2.
Option 2 is what the ISO 8601 standard requires. DB2, Ingres, and Oracle each have two "week" functions: one for a proprietary calculation, and one for the option 2 calculation. ODBC requires a WEEK function but is vague about the requirements.
Subtracting and Adding
All DBMSs allow the subtraction of a date from a date. By default, the result is an integer representing the number of result days. For example, March 1 2000 minus February 28 2000 returns 2.
The SQL Standard actually demands more complexity. It defines two new data types: year-month INTERVAL and day INTERVAL (there are other INTERVAL variants, but they are for differences between two times or between two timestamps). So, with standard SQL, a subtraction results in either a number of years/months or a number of days, but never a combination of the two. Informix, Ingres, and Oracle have some support for the INTERVAL data types.
DB2 tries to allow for subtractions that result in more complex "duration" values. For example, March 31 2000 minus February 28 2000 returns a duration value telling us that the difference is one month and three days. This is bold because there can be difficulties deciding on the order of execution and the end of the month.
Adding months to dates can be troublesome. Should March 31 plus 1 month be an error, should it result in April 30, or should it result in May 1? The consensus is that the answer is April 30. Some DBMSs allow the calculation but return a warning. The SQL Standard says the result should be an error.
Birthdays
The following are some simple examples for handling date-related queries. Assume that you have an example table called Datetests, which contains a DATE column called datefield. We will use pure SQL Standard syntax.
List people whose birthdays come within the next 14 days:
SELECT name, datefield FROM Datetests WHERE (CURRENT_DATE (datefield - INTERVAL '14' DAY)) YEAR <> (CURRENT_DATE - datefield) YEAR;
This will fail if any datefield value is greater than CURRENT_DATE.
Return the current millennium of Christ's era:
SELECT SUBSTRING (CAST (EXTRACT (YEAR FROM CURRENT_DATE) AS CHAR(4)) FROM 1 FOR 1) FROM Datetests;
This works and avoids the MOD operator, which requires SQL:1999 support. It also returns 2 if the year is 2000, which is the only correct answer.
List all rows and say Happy Birthday where appropriate:
SELECT datefield, CASE WHEN (datefield + INTERVAL '1' DAY) - CURRENT_DATE) YEAR <> (datefield - CURRENT_DATE) YEAR THEN 'Happy Birthday' ELSE ' ' END FROM Datetests;
This is technically more accurate than using EXTRACT to compare month and day because it gives people born on February 29 a birthday on March 1 in non-leap years. We think that is the ordinary convention.
Month End
Some DBMSs have a built-in function that returns the last day of the month. For example, the Oracle function LAST_DAY(DATE '1992-04-01') returns 30. What if the DBMS does not provide such a function, though?
Again using standard SQL, we have a (somewhat complex) solution to the question: What is the last day of the month? Let's take it in steps because the initial parts have some independent value. Assume that there is a table named Table1 with a DATE column named datecol.
To get the last day of the previous month, you would use this query:
SELECT datecol CAST (EXTRACT (DAY FROM datecol) AS INTERVAL DAY) FROM Table1;
Thus, you would do this to get the first day of the current month:
SELECT datecol CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY) FROM Table1;
You would do this to get the first day of the following month:
SELECT datecol CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY) + INTERVAL '1' MONTH FROM Table1;
And you would do this to get the last day of the current month:
SELECT datecol CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY) + INTERVAL '1' MONTH INTERVAL '1' DAY FROM Table1;
Quod erat demonstrandum.
Tips and Warnings
Oracle has some slight bugs that affect calculations with BC dates.
Arithmetic on dates is slower than arithmetic on integers. If the only date calculations that you do involve number of days, and if you do a lot of arithmetic, it is cheaper to store Julian Day Numbers rather than DATE values. This tip is useless for Informix, SQL Server, or Sybase because their internal formats have something like a Julian Day Number anyway (as a signed integer with January 1 1900 equal to 0).
Although the Big Eight DBMSs all have the same functionality for dates, the storage, value ranges, and calendar calculations differ. So, portability is always possible but is never smooth.
In our book, we give some reasons for preferring TIMESTAMP over DATE for most situations. Check out SQL Performance Tuning (Addison Wesley Professional, 2002) for more information.