Home > Articles

  • Print
  • + Share This
This chapter is from the book

10.7 General Rules for the GROUP BY Clause

This section describes a number of important rules for select blocks with a GROUP BY clause.

  • Rule 1. Section 9.7 gives several rules for the use of aggregation functions in the SELECT clause. For many SQL products, the following rule applies: If a select block has a GROUP BY clause, any column specification in the SELECT clause must exclusively occur as a parameter of an aggregation function, or in the list of columns given in the GROUP BY clause, or in both. Therefore, for most products, the following statement is incorrect because the TOWN column appears in the SELECT clause, yet it is not the parameter of an aggregation function and does not occur in the list of columns by which the result is grouped.

    SELECT   TOWN, COUNT(*)
    FROM     PLAYERS
    GROUP BY SEX

    This restriction is because the result of an aggregation function always consists of one value for each group. The result of a column specification on which grouping is performed also always consists of one value per group. These results are compatible. In contrast, the result of a column specification on which no grouping is performed consists of a set of values. This would not be compatible with the results of the other expressions in the SELECT clause.

    This rule does not apply for MySQL. The previous query will return the following result:

    TOWN        COUNT(*)
    ---------  --------
    Stratford          9
    Inglewood          5

    The value of the second column is understandable—it is the number of players per sex. But the answer of the first column is unexpected. Why does it show Stratford in the first row and Inglewood in the second? This is strange because, for each sex, there can be multiple towns. The answer is that MySQL itself determines the values to be returned. Those values are selected almost randomly. We can enforce this rule if we add the setting ONLY_FULL_GROUP_BY to the SQL_MODE system variable.

    Therefore, we strongly recommend that you do not formulate this type of SQL statement, and instead adhere to this rule that applies to most SQL products.

  • Rule 2. In most examples, the expressions used to form groups also occur in the SELECT clause. However, that is not necessary. An expression that occurs in the GROUP BY clause can appear in the SELECT clause.
  • Rule 3. An expression that is used to form groups can also occur in the SELECT clause within a compound expression. See the next example.

    Example 10.12: Get the list with the different penalty amounts in cents.

    SELECT   CAST(AMOUNT * 100 AS SIGNED INTEGER)
             AS AMOUNT_IN_CENTS
    FROM     PENALTIES
    GROUP BY AMOUNT

    The result is:

    AMOUNT_IN_CENTS
    ---------------
               2500
               3000
               5000
               7500
              10000

    Explanation: A grouping is performed on a simple expression consisting of the column name AMOUNT. In the SELECT clause, that same AMOUNT column occurs within a compound expression. This is allowed.

    No matter how complex a compound expression is, if it occurs in a GROUP BY clause, it can be included in its entirety only in the SELECT clause. For example, if the compound expression PLAYERNO * 2 occurs in a GROUP BY clause, the expressions PLAYERNO * 2, (PLAYERNO * 2) – 100 and MOD(PLAYERNO * 2, 3) – 100 can occur in the SELECT clause. On the other hand, the expressions PLAYERNO, 2 * PLAYERNO, PLAYERNO * 100, and 8 * PLAYERNO * 2 are not allowed.

  • Rule 4. If an expression occurs more than once in a GROUP BY clause, double expressions are simply removed. The GROUP BY clause GROUP BY TOWN, TOWN is converted to GROUP BY TOWN. Also GROUP BY SUBSTR(TOWN,1,1), SEX, SUBSTR(TOWN,1,1) is converted to GROUP BY SUBSTR(TOWN,1,1), SEX.
  • Rule 5. Section 9.4 described the cases in which the use of DISTINCT in the SELECT clause is superfluous. The rules given in that section apply to SELECT statements without a GROUP BY clause. A different rule exists for SELECT statements with a GROUP BY clause: DISTINCT (if used outside an aggregation function) is superfluous when the SELECT clause includes all the columns specified in the GROUP BY clause. The GROUP BY clause groups the rows in such a way that the column(s) on which they are grouped no longer contain duplicate values.

Exercise 10.12: Describe why the following statements are incorrect:

  1. SELECT   PLAYERNO, DIVISION
    FROM     TEAMS
    GROUP BY PLAYERNO
  2. SELECT   SUBSTR(TOWN,1,1), NAME
    FROM     PLAYERS
    GROUP BY TOWN, SUBSTR(NAME,1,1)
  3. SELECT   PLAYERNO * (AMOUNT + 100)
    FROM     PENALTIES
    GROUP BY AMOUNT + 100

Exercise 10.13: In which of the following statements is DISTINCT superfluous?

  1. SELECT   DISTINCT PLAYERNO
    FROM     TEAMS
    GROUP BY PLAYERNO
  2. SELECT   DISTINCT COUNT(*)
    FROM     MATCHES
    GROUP BY TEAMNO
  3. SELECT   DISTINCT COUNT(*)
    FROM     MATCHES
    WHERE    TEAMNO = 2
    GROUP BY TEAMNO
  • + Share This
  • 🔖 Save To Your Account