Home > Articles > Data > SQL

  • Print
  • + Share This

Sorting with DISTINCT and UNION

Sorting shows up in a number of places in addition to the ORDER BY clause. Here, check how it works in DISTINCT, UNION, and WHERE.

DISTINCT

Some SQL users automatically throw a DISTINCT into every query (Figure 6–17). Don't do it unless you really need to get rid of duplicates! DISTINCT means that results are generated and then sorted.

Figure 6-17Figure 6–17. DISTINCT

In addition, DISTINCT can serve as a cover-up for "broken" queries. Don't reach for a DISTINCT every time you get multiples instead of singles. Think through the logic first. Make sure your query is really asking the question you have in your mind.

When DISTINCT means more work, and no real increase in result coherence, do without it. This doesn't mean to avoid DISTINCT—it's a very useful element. It just means to weigh the cost against the advantage.

UNION

In the same way, use UNION ALL rather than UNION unless you need to eliminate duplicates. UNION returns rows from each query included in the statement, puts them in a work table, and then sorts them to remove duplicates. UNION ALL skips the last step. (Information on UNION appears in Chapter 3.)

Here's a UNION ALL query and the ASA PLAN it generated:

Adaptive Server Anywhere
select plan ('
select name, state
from supplier
union all
select lname, state
from customer ') 

Estimate 5 I/O operations
Take all rows from Subquery1,Subquery2
Subquery1:
 Estimate 1 I/O operations
 Scan supplier sequentially
  Estimate getting here 7 times
Subquery2:
 Estimate 3 I/O operations
 Scan customer sequentially
  Estimate getting here 12 times

Following is the plan for the same query with the ALL removed. Without knowing anything about PLAN messages, it's clear that UNION takes more processing than UNION ALL. An additional subquery is listed, for example.

Adaptive Server Anywhere
Estimate 11 I/O operations
Summarize Subquery1 grouping by expr,expr
Subquery1:
 Estimate 11 I/O operations
 Temporary table on (expr,expr)
  Take all rows from Subquery2,Subquery3
Subquery2:
 Estimate 1 I/O operations
 Scan supplier sequentially
  Estimate getting here 7 times
Subquery3:
 Estimate 3 I/O operations
 Scan customer sequentially
  Estimate getting here 12 times

In this case, the two queries actually produce the same results. You see differences only if the UNIONed queries contain duplicate result rows.

Adaptive Server Anywhere
name                 state
==================== =====
Connectix Co.         CA
Soft Stuff            CA
Total Recall         (NULL
Hi Finance!           NY
TrendMaster           WA
Above Average Arts    MA
Emu Sister Prdctns    PA
McBaird               CA
aziz                  MA
khandasamy            NY
mokoperto             MA
Peters                NY
WONG                  MD
archer                CA
le blanc              MA
sato                  WA
deathmask-z           MA
rs                    TX
Menendez              NY

[19 rows]

WHERE

Some Oracle references suggest avoiding a sort by using a meaningless condition on a WHERE clause, forcing the data into some order. Let's say you want to see customers in last-name order. There is a unique index on customer number. You add one (not unique, given the nature of names) on lname fname. In this section, there is a leading space on lname rs and fname SAM.

Oracle
SQL> create index custnmix on customer(lname, fname);

Index created.

A query with no WHERE or ORDER BY clause shows data in this order:

Oracle
SQL> select lname, fname, custnum
 2 from customer;

LNAME                FNAME                CUSTNUM
-------------------- -------------------- ---------
McBaird              geoff lowell         111222222
archer               ruby                 111223333
aziz                 phillip              111333333
le blanc             felipe               111334444
sato                 kimiko               111444444
khandasamy           SAM                  223456789
deathmask-z                               777777777
mokoperto            merit                777777778
 rs                  pete pete            776677778
Peters               Pete                 776667778
Menendez             lauren               923457789
WONG                 LI-REN               999456789

12 rows selected.

Without using an ORDER BY, add a WHERE clause on the column in question (lname), assuming that there are no conflicting WHERE conditions that call for some other index and the indexed column does not use nulls. Oracle (and some other systems) may produce results in lname order. If you get the results you want, use your performance tools to measure the difference between this technique and using ORDER BY. However, this technique can fail if the indexes change. Be sure to document it and note it's a trick.

Oracle
SQL> select lname, fname, custnum
 2  from customer
 3  where lname >= ' ';

LNAME                FNAME                CUSTNUM
-------------------- -------------------- ---------
 rs                  pete pete            776677778
McBaird              geoff lowell         111222222
Menendez             lauren               923457789
Peters               Pete                 776667778
WONG                 LI-REN               999456789
archer               ruby                 111223333
aziz                 phillip              111333333
deathmask-z                               777777777
khandasamy           SAM                  223456789
le blanc             felipe               111334444
mokoperto            merit                777777778
sato                 kimiko               111444444

12 rows selected.
  • + Share This
  • 🔖 Save To Your Account

Related Resources

There are currently no related titles. Please check back later.