Sorting with DISTINCT and UNION
Sorting shows up in a number of places in addition to the ORDER BY clause. Here, check how it works in DISTINCT, UNION, and WHERE.
Some SQL users automatically throw a DISTINCT into every query (Figure 617). Don't do it unless you really need to get rid of duplicates! DISTINCT means that results are generated and then sorted.
Figure 617. DISTINCT
In addition, DISTINCT can serve as a cover-up for "broken" queries. Don't reach for a DISTINCT every time you get multiples instead of singles. Think through the logic first. Make sure your query is really asking the question you have in your mind.
When DISTINCT means more work, and no real increase in result coherence, do without it. This doesn't mean to avoid DISTINCTit's a very useful element. It just means to weigh the cost against the advantage.
In the same way, use UNION ALL rather than UNION unless you need to eliminate duplicates. UNION returns rows from each query included in the statement, puts them in a work table, and then sorts them to remove duplicates. UNION ALL skips the last step. (Information on UNION appears in Chapter 3.)
Here's a UNION ALL query and the ASA PLAN it generated:
Adaptive Server Anywhere select plan (' select name, state from supplier union all select lname, state from customer ') Estimate 5 I/O operations Take all rows from Subquery1,Subquery2 Subquery1: Estimate 1 I/O operations Scan supplier sequentially Estimate getting here 7 times Subquery2: Estimate 3 I/O operations Scan customer sequentially Estimate getting here 12 times
Following is the plan for the same query with the ALL removed. Without knowing anything about PLAN messages, it's clear that UNION takes more processing than UNION ALL. An additional subquery is listed, for example.
Adaptive Server Anywhere Estimate 11 I/O operations Summarize Subquery1 grouping by expr,expr Subquery1: Estimate 11 I/O operations Temporary table on (expr,expr) Take all rows from Subquery2,Subquery3 Subquery2: Estimate 1 I/O operations Scan supplier sequentially Estimate getting here 7 times Subquery3: Estimate 3 I/O operations Scan customer sequentially Estimate getting here 12 times
In this case, the two queries actually produce the same results. You see differences only if the UNIONed queries contain duplicate result rows.
Adaptive Server Anywhere name state ==================== ===== Connectix Co. CA Soft Stuff CA Total Recall (NULL Hi Finance! NY TrendMaster WA Above Average Arts MA Emu Sister Prdctns PA McBaird CA aziz MA khandasamy NY mokoperto MA Peters NY WONG MD archer CA le blanc MA sato WA deathmask-z MA rs TX Menendez NY [19 rows]
Some Oracle references suggest avoiding a sort by using a meaningless condition on a WHERE clause, forcing the data into some order. Let's say you want to see customers in last-name order. There is a unique index on customer number. You add one (not unique, given the nature of names) on lname fname. In this section, there is a leading space on lname rs and fname SAM.
Oracle SQL> create index custnmix on customer(lname, fname); Index created.
A query with no WHERE or ORDER BY clause shows data in this order:
Oracle SQL> select lname, fname, custnum 2 from customer; LNAME FNAME CUSTNUM -------------------- -------------------- --------- McBaird geoff lowell 111222222 archer ruby 111223333 aziz phillip 111333333 le blanc felipe 111334444 sato kimiko 111444444 khandasamy SAM 223456789 deathmask-z 777777777 mokoperto merit 777777778 rs pete pete 776677778 Peters Pete 776667778 Menendez lauren 923457789 WONG LI-REN 999456789 12 rows selected.
Without using an ORDER BY, add a WHERE clause on the column in question (lname), assuming that there are no conflicting WHERE conditions that call for some other index and the indexed column does not use nulls. Oracle (and some other systems) may produce results in lname order. If you get the results you want, use your performance tools to measure the difference between this technique and using ORDER BY. However, this technique can fail if the indexes change. Be sure to document it and note it's a trick.
Oracle SQL> select lname, fname, custnum 2 from customer 3 where lname >= ' '; LNAME FNAME CUSTNUM -------------------- -------------------- --------- rs pete pete 776677778 McBaird geoff lowell 111222222 Menendez lauren 923457789 Peters Pete 776667778 WONG LI-REN 999456789 archer ruby 111223333 aziz phillip 111333333 deathmask-z 777777777 khandasamy SAM 223456789 le blanc felipe 111334444 mokoperto merit 777777778 sato kimiko 111444444 12 rows selected.