InformIT

SQL Performance Tuning: Dates in SQL

By and

Date: Feb 14, 2003

Article is provided courtesy of Addison Wesley.

Return to the article

Using dates in database-management systems (DBMSs) brings up some portability problems and uncovers some undocumented and underexplained features. Peter Gulutzan and Trudy Pelzer explore the DATE data type and its impact on eight SQL DBMSs.

During research for our book SQL Performance Tuning (Addison Wesley Professional, 2002), we found that using dates brings up some portability problems, some undocumented features, and some areas that cry out for more explanation. This article shows how the DATE data type works with eight SQL database-management systems: IBM DB2, Informix, Ingres, InterBase, Microsoft SQL Server, MySQL, Oracle, and Sybase (we'll call these "the Big Eight DBMSs").

Our concern is what you can do with dates, so we'll leave aside information on how to format a date and other locale-specific trivia. We'll use extended dates such as January 4 1970 throughout, except when we're showing a value in ANSI/ISO SQL Standard form (such as 1970-01-04).

Range of Values

The next chart shows the minimum and maximum date values according to the ANSI/ISO SQL Standard and the Big Eight. (Note: MySQL's minimum is really 0001-01-01; there's an error in its documentation.)

Minimum Value

Maximum Value

ANSI SQL

0001-01-01

9999-12-31

IBM

0001-01-01

9999-12-31

Informix

0001-01-01

9999-12-31

Ingres

0001-01-01

9999-12-31

InterBase

0001-01-01

9999-12-31

Microsoft

1753-01-01

9999-12-31

MySQL

1000-01-01

9999-12-31

Oracle

4713-01-01 BC

9999-12-31

Sybase

1753-01-01

9999-12-31


Generally, the range of values is between the year 1 and the year 9999, but what's interesting is the odd-looking minimum date values for Microsoft, Oracle, and Sybase.

The explanation of Microsoft and Sybase's minimum date is that they're trying to avoid the questions that arise with date arithmetic that uses the Julian calendar rules (which we'll discuss in the next section, "The Western Calendar").

Oracle's minimum date can be explained by the fact that it's allowing for all the Julian Day Numbers (see the section "Julian Day Numbers," below). Incidentally, we are very sure that the minimum value is 4713 BC—we know that the Oracle manual says 4712 BC, but the manual is wrong.

The Western Calendar

During the period between 1 A.D. and today, the Western world has actually used two main calendars: the Julian calendar of Julius Caesar and the Gregorian calendar of Pope Gregory XIII. The two calendars differ with respect to only one rule: the rule for deciding what a leap year is. In the Julian calendar, all years divisible by four are leap years. In the Gregorian calendar, all years divisible by four are leap years, except that years divisible by 100 (but not divisible by 400) are not leap years. Thus, the years 1700, 1800, and 1900 are leap years in the Julian calendar but not in the Gregorian calendar, while the years 1600 and 2000 are leap years in both calendars.

When Pope Gregory XIII introduced his calendar in 1582, he also directed that the days between October 4, 1582, and October 15, 1582, should be skipped—that is, he said that the day after October 4 should be October 15. Many countries delayed changing over, though. England and her colonies didn't switch from Julian to Gregorian reckoning until 1752, so for them, the skipped dates were between September 4 and September 14, 1752. Other countries switched at other times, but 1582 and 1752 are the relevant dates for the DBMSs that we're discussing.

Thus, two problems arise with date arithmetic when one goes back many years. The first is, should leap years before the switch be calculated according to the Julian or the Gregorian rules? The second problem is, when and how should the skipped days be handled?

This is how the Big Eight DBMSs handle these questions:

Julian Day Numbers

There is a simpler way to measure the passage of days. Based on a date that Joseph Scaliger suggested in 1583, astronomers use a continuous count of days since a fixed start point that is well before recorded history. Now, because many documentation sources (including most vendors' manuals) have this detail wrong, we're going to use a bit of emphasis here:

Julian Day 0 is the same as January 1 4713 B.C. on the proleptic Julian calendar.

In most DBMSs, there is a function for getting the Julian Day Number given a DATE value. Unfortunately, for a few DBMSs, there is a tendency to use the word Julian when giving the ordinal day number within a year—for example, they say that any January 1 date is Julian Day 1.

Naturally, different DBMSs return different values for a Julian-day function, depending on the way that they handle the Julian/Gregorian switch.

Day of the Week

The majority of DBMSs have a function that returns the day of the week as a number. There is some dispute whether the first day is Sunday or Monday.

The arguments for Sunday are as follows:

  1. The ODBC spec for the DAYOFWEEK function says that Sunday is Day 1 and Saturday is Day 7.

  2. Jews and Christians agree that the first Sabbath—the "seventh day" on which God rested—was Saturday.

  3. The German word for Wednesday, Mittwoch, means midweek.

The arguments for Monday are these:

  1. The ISO 8601 standard says that Monday is Day 1 and Sunday is Day 7.

  2. Other bodies such as the BSI follow the ISO in this respect.

When no built-in function is available, you can use SQL's MOD function with the Julian Day Number as an argument.

Week of the Year

The Big Eight DBMSs have five different ways to calculate a week number:

  1. The first day of the first week is January 1. This has the advantage of avoiding weeks that are split across years, but it doesn't correspond to the "first day of the week" possibilities that we've just discussed. Supporters include Oracle.

  2. Weeks start on Monday, and the first week of the year is the first week that has a Thursday in it. This has the advantage that all weeks have seven days. On the other hand, this can mean that days between January 1 and January 3 are counted as being part of a week in the previous year. Supporters include ISO, DB2, Ingres, and Oracle.

  3. Weeks start on Sunday, and the first week starts wherever the first Sunday is. This has the same advantages and disadvantages as the previous method.

  4. Weeks start on Monday, and the first week starts wherever the first Monday is. Any days before the first Monday are Week 0. This has nearly the same advantages and disadvantages as methods two and three. Supporters include Ingres.

  5. Weeks start on Sunday, and the first week (Week 1) is the Sunday that begins the week of January 1. Values are from 1 to 54. Supporters include DB2.

Option 2 is what the ISO 8601 standard requires. DB2, Ingres, and Oracle each have two "week" functions: one for a proprietary calculation, and one for the option 2 calculation. ODBC requires a WEEK function but is vague about the requirements.

Subtracting and Adding

All DBMSs allow the subtraction of a date from a date. By default, the result is an integer representing the number of result days. For example, March 1 2000 minus February 28 2000 returns 2.

The SQL Standard actually demands more complexity. It defines two new data types: year-month INTERVAL and day INTERVAL (there are other INTERVAL variants, but they are for differences between two times or between two timestamps). So, with standard SQL, a subtraction results in either a number of years/months or a number of days, but never a combination of the two. Informix, Ingres, and Oracle have some support for the INTERVAL data types.

DB2 tries to allow for subtractions that result in more complex "duration" values. For example, March 31 2000 minus February 28 2000 returns a duration value telling us that the difference is one month and three days. This is bold because there can be difficulties deciding on the order of execution and the end of the month.

Adding months to dates can be troublesome. Should March 31 plus 1 month be an error, should it result in April 30, or should it result in May 1? The consensus is that the answer is April 30. Some DBMSs allow the calculation but return a warning. The SQL Standard says the result should be an error.

Birthdays

The following are some simple examples for handling date-related queries. Assume that you have an example table called Datetests, which contains a DATE column called datefield. We will use pure SQL Standard syntax.

  1. List people whose birthdays come within the next 14 days:

    SELECT name, datefield
    FROM Datetests
    WHERE (CURRENT_DATE – 
       (datefield - INTERVAL '14' DAY)) YEAR
       <>
       (CURRENT_DATE - datefield) YEAR;

    This will fail if any datefield value is greater than CURRENT_DATE.

  2. Return the current millennium of Christ's era:

    SELECT SUBSTRING (CAST (EXTRACT 
                             (YEAR FROM CURRENT_DATE) 
                           AS CHAR(4)) 
      FROM 1 FOR 1)
    FROM Datetests;

    This works and avoids the MOD operator, which requires SQL:1999 support. It also returns 2 if the year is 2000, which is the only correct answer.

  3. List all rows and say Happy Birthday where appropriate:

    SELECT datefield, CASE 
       WHEN 
       (datefield + INTERVAL '1' DAY)
          - CURRENT_DATE) YEAR 
       <>
       (datefield - CURRENT_DATE) YEAR
    THEN 'Happy Birthday'
    ELSE '       '
    END
    FROM Datetests;

This is technically more accurate than using EXTRACT to compare month and day because it gives people born on February 29 a birthday on March 1 in non-leap years. We think that is the ordinary convention.

Month End

Some DBMSs have a built-in function that returns the last day of the month. For example, the Oracle function LAST_DAY(DATE '1992-04-01') returns 30. What if the DBMS does not provide such a function, though?

Again using standard SQL, we have a (somewhat complex) solution to the question: What is the last day of the month? Let's take it in steps because the initial parts have some independent value. Assume that there is a table named Table1 with a DATE column named datecol.

To get the last day of the previous month, you would use this query:

SELECT datecol – 
CAST (EXTRACT (DAY FROM datecol) AS INTERVAL DAY)
FROM Table1;

Thus, you would do this to get the first day of the current month:

SELECT datecol – 
CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY)
FROM Table1;

You would do this to get the first day of the following month:

SELECT datecol – 
CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY) + 
   INTERVAL '1' MONTH
FROM Table1;

And you would do this to get the last day of the current month:

SELECT datecol – 
CAST ((EXTRACT (DAY FROM datecol) - 1) AS INTERVAL DAY) + 
   INTERVAL '1' MONTH – 
   INTERVAL '1' DAY 
FROM Table1;

Quod erat demonstrandum.

Tips and Warnings

In our book, we give some reasons for preferring TIMESTAMP over DATE for most situations. Check out SQL Performance Tuning (Addison Wesley Professional, 2002) for more information.

800 East 96th Street, Indianapolis, Indiana 46240