Using DB2 Functions
- May 10, 2004
Chapter 3: Using DB2 Functions
In this Chapter
The RAISE_ERROR Function
MQSeries Built-in Functions
XML Publishing Built-in Functions
Built-in Function Guidelines
Two types of built-in functions can be applied to data in a DB2 table using SQL: column functions and scalar functions. You can use these functions to further simplify the requirements of complex data access.
DB2 also provides the capability for users to create their own functions. This capability, called user-defined functions, is discussed in-depth in Chapter 4, "Using DB2 User-Defined Functions and Data Types."
Functions are called by specifying the function name and any required operands. A built-in function can be used any place an expression can be used (with some exceptions).
Column functions compute, from a group of rows, a single value for a designated column or expression. This provides the capability to aggregate data, thereby enabling you to perform statistical calculations across many rows with one SQL statement. To fully appreciate the column functions, you must understand SQL's set-level processing capabilities.
This list shows some rules for the column functions:
Column functions can be executed only in SELECT statements.
A column function must be specified for an explicitly named column or expression.
Each column function returns only one value for the set of selected rows.
If you apply a column function to one column in a SELECT statement, you must apply column functions to any other columns specified in the same SELECT statement, unless you also use the GROUP BY clause.
Use GROUP BY to apply a column function to a group of named columns. Any other column named in the SELECT statement must be operated on by a column function.
The result of any column function (except the COUNT and COUNT_BIG functions) will have the same data type as the column to which it was applied. The COUNT function returns an integer number; COUNT_BIG returns a decimal number.
The result of any column function (except the COUNT and COUNT_BIG functions) can be null. COUNT and COUNT_BIG always return a numeric result.
Columns functions will not return a SQLCODE of +100 if the predicate specified in the WHERE clause finds no data. Instead, a null is returned. For example, consider the following SQL statement:
SELECT MAX(SALARY) FROM DSN8810.EMP WHERE EMPNO = '999999';
There is no employee with an EMPNO of '999999' in the DSN8810.EMP table. This statement therefore returns a null for the MAX(SALARY). Of course, this does not apply to COUNT and COUNT_BIG, both of which always return a value, never a null.
When using the AVG, MAX, MIN, STDDEV, SUM, and VARIANCE functions on nullable columns, all occurrences of null are eliminated before applying the function.
You can use the DISTINCT keyword with all column functions to eliminate duplicates before applying the given function. DISTINCT has no effect, however, on the MAX and MIN functions.
You can use the ALL keyword to indicate that duplicates should not be eliminated. ALL is the default.
A column function can be specified in a WHERE clause only if that clause is part of a subquery of a HAVING clause. Additionally, every column name specified in the expression of the column function must be a correlated reference to the same group.
The column functions are AVG, COUNT, COUNT_BIG, MAX, MIN, STDDEV, SUM, and VARIANCE.
The AVG Function
The AVG function computes the average of the values for the column or expression specified as an argument. This function operates only on numeric arguments. The following example calculates the average salary of each department:
SELECT WORKDEPT, AVG(SALARY) FROM DSN8810.EMP GROUP BY WORKDEPT;
The AVG function is the preferred method of calculating the average of a group of values. Although an average, in theory, is nothing more than a sum divided by a count, DB2 may not return equivalent values for AVG(COL_NAME) and SUM(COL_NAME)/COUNT(*). The reason is that the COUNT function will count all rows regardless of value, whereas SUM ignores nulls.
The COUNT Function
The COUNT function counts the number of rows in a table, or the number of distinct values for a given column. It can operate, therefore, at the column or row level. The syntax differs for each. To count the number of rows in the EMP table, issue this SQL statement:
SELECT COUNT(*) FROM DSN8810.EMP;
It does not matter what values are stored in the rows being counted. DB2 will simply count the number of rows and return the result. To count the number of distinct departments represented in the EMP table, issue the following
SELECT COUNT(DISTINCT WORKDEPT) FROM DSN8810.EMP;
The keyword DISTINCT is not considered an argument of the function. It simply specifies an operation to be performed before the function is applied. When DISTINCT is coded, duplicate values are eliminated.
If DISTINCT is not specified, then ALL is implicitly specified. ALL also can be explicitly specified in the COUNT function. When ALL is specified, duplicate values are not eliminated.
The argument of the COUNT function can be of any built-in data type other than a large object: CLOB, DBCLOB, or BLOB. Character string arguments can be no longer 255 bytes and graphic string arguments can be no longer than 127 bytes.
The result of the COUNT function cannot be null. COUNT always returns an INTEGER value greater than or equal to zero.
The COUNT_BIG Function
The COUNT_BIG function is similar to the COUNT function. It counts the number of rows in a table, or the number of distinct values for a given column. However, the COUNT_BIG function returns a result of data type DECIMAL(31,0), whereas COUNT can return a result only as large as the largest DB2 integer value, namely +2,147,483,647.
The COUNT_BIG function works the same as the COUNT function, except it returns a decimal value. Therefore, the example SQL for COUNT is applicable to COUNT_BIG. Simply substitute COUNT_BIG for COUNT. For example, the following statement counts the number of rows in the EMP table (returning a decimal value, instead of an integer):
SELECT COUNT_BIG(*) FROM DSN8810.EMP;
The COUNT_BIG function has the same restrictions as the COUNT function. The argument of the COUNT_BIG function can be of any built-in data type other than a large object: CLOB, DBCLOB, or BLOB. Character string arguments can be no longer than 255 bytes and graphic string arguments can be no longer than 127 bytes.
The result of the COUNT_BIG function cannot be null. COUNT_BIG returns a decimal value greater than or equal to zero.
The MAX Function
The MAX function returns the largest value in the specified column or expression. The following SQL statement determines the project with the latest end date:
SELECT MAX(ACENDATE) FROM DSN8810.PROJACT;
The result of the MAX function is of the same data type as the column or expression on which it operates.
The argument of the MAX function can be of any built-in data type other than a large object: CLOB, DBCLOB, or BLOB. Character string arguments can be no longer than 255 bytes and graphic string arguments can be no longer than 127 bytes.
A somewhat more complicated example using MAX is shown below. It returns the largest salary paid to a man in department D01:
SELECT MAX(SALARY) FROM DSN8810.EMP WHERE WORKDEPT = 'D01' AND SEX = 'M';
The MIN Function
The MIN function returns the smallest value in the specified column or expression. To retrieve the smallest bonus given to any employee, issue this SQL statement:
SELECT MIN(BONUS) FROM DSN8810.EMP;
The result of the MIN function is of the same data type as the column or expression on which it operates.
The argument of the MIN function can be of any built-in data type other than a large object: CLOB, DBCLOB, or BLOB. Character string arguments can be no longer than 255 bytes and graphic string arguments can be no longer than 127 bytes.
The STDDEV Function
The STDDEV function returns the standard deviation of a set of numbers. The standard deviation is calculated at the square root of the variance. For example
SELECT STDDEV(SALARY) FROM DSN8810.EMP WHERE WORKDEPT = 'D01';
The argument of the STDDEV function can be any built-in numeric data type. The resulting standard deviation is a double precision floating-point number.
The SUM Function
The accumulated total of all values in the specified column or expression are returned by the SUM column function. For example, the following SQL statement calculates the total yearly monetary output for the corporation:
SELECT SUM(SALARY+COMM+BONUS) FROM DSN8810.EMP;
This SQL statement adds each employee's salary, commission, and bonus. It then aggregates these results into a single value representing the total amount of compensation paid to all employees.
The argument of the SUM function can be any built-in numeric data type. The resulting sum must be within the range of acceptable values for the data type. For example, the sum of an INTEGER column must be within the range 2,147,483,648 to +2,147,483,647. This is because the data type of the result is the same as the data type of the argument values, except:
The sum of SMALLINT values returns an INTEGER result.
The sum of single precision floating point values returns a double precision floating-point result.
The VARIANCE Function
The VARIANCE function returns the variance of a set of numbers. The result is the biased variance of the set of numbers. The variance is calculated as follows:
VARIANCE = SUM(X**2)/COUNT(X) - (SUM(X)/COUNT(X))**2
The argument of the VARIANCE function can be any built-in numeric data type. The resulting variance is a double precision floating-point number.
For brevity and ease of coding, VARIANCE can be shortened to VAR.