Home > Articles > Data > SQL

  • Print
  • + Share This

Managing the WHERE Clause

The WHERE clause is the place to start your search for SQL you can change to get better performance. You already know what indexes you have. Now look at the optimizer plan for a problem query and see if the indexes are getting used or if the optimizer is searching the table row by row.

Why a Table Scan?

Just because you have an index on a column doesn't mean your optimizer will use it.

  • If the amount of data is trivial (as in the msdpn tables), a search using an index may not be faster than a table scan and the optimizer may choose not to use the index.

  • If the query includes all rows in the table (no WHERE clause), the optimizer does a table scan.

  • If you're retrieving a lot of data from the table, an index may give no advantage.

  • If the optimizer does not have accurate information on data distribution, it may pick the wrong index or no index at all.

  • If you include certain elements in the WHERE clause, you may make the index unavailable.

  • Of these elements, only the last two are under your control.

Data Distribution Statistics

First, check your system documents to see if the DBA needs periodically to run a command to keep the optimizer current (UPDATE STATISTICS in Transact-SQL and Informix, ANALYZE in Oracle). Microsoft SQL Server also supports a command that tells you when the command was last run (STAT_DATE—there's more information on it in "Using System Functions" in Chapter 7). See Table 6–2 for a list of commands associated with index statistics. The SQL Anywhere ESTIMATE command is included in the table, but it works differently than the UPDATE STATISTICS or ANALYZE commands, allowing the user to give the optimizer hints on data distribution.

Table 6–2. Index Statistics

ANSI

SQL Anywhere

SAE

MS SQL Server

Oracle

Informix

 

ESTIMATE

UPDATE STATISTICS

UPDATE STATISTICS

STATS_DATE

ANALYZE

UPDATE STATISTICS


Disabling an Index with a Bad WHERE

Second, don't disable a valid index by the way you construct your WHERE clause. The easiest elements to optimize are comparison operators (=, >, <, and variants) or operators that can be translated to comparison operators (BETWEEN and some LIKE clauses).

Anything else may make your indexes unavailable. Here are some suspicious areas to investigate. Since architecture and optimizers vary so much, you'll have to check your system documentation to find out just how these areas affect (or don't affect!) your queries.

  • Comparing columns in the same table
  • Choosing columns with low-selectivity indexes
  • Doing math on a column before comparing it to a constant
  • Applying a function to column data before comparing it to a constant
  • Finding ranges with BETWEEN
  • Matching with LIKE
  • Comparing to NULL
  • Negating with NOT
  • Converting values
  • Using OR
  • Finding sets of values with IN
  • Using multicolumn indexes

Comparing Columns in the Same Table

In many systems, comparing columns in the same table makes an index useless. For example, consider matching the empnum column to a string constant versus matching it to another column, bossnum. The two queries return the same results.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum = bossnum

select fname, lname, empnum, bossnum
from employee
where empnum = '443232366'
 
fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

With a nontrivial number of rows, the first query scans the table sequentially. The second uses the empix index. If you test this query on another system, you may not see this difference, because the employee table is so small. However, it's clear that an index on empid helps find the employee with a particular identification number and is less useful in finding one with the same number as the boss. Why? Because a constant (443232366) is constant. An index points to a known value, not an unknown value.

Using Nonselective Indexes

A unique index is 100% selective. Every index entry points to a single location in the data. A nonselective index is just the opposite—each index entry points to multiple data locations. You can think of selectivity as roughly the number of distinct index entries divided by the number of data rows. Optimizers don't get much advantage from nonselective indexes, and they often don't use them.

Returning to the book index comparison, imagine an index that listed every occurrence of the word "and." Because "and" is so common, you'd find an index reference to just about every page in the book. Using the index does not make finding "and" faster than paging through the book, because "and" has low selectivity.

Consider attributes such as gender, where there are only two choices. There's no point indexing this kind of column unless the data distribution is very skewed. If 90% of the employees are male and 10% are female, the optimizer might use the index where gender is female. It would not use it in searches for males.

Doing Math on a Column

Another WHERE clause element to watch out for is doing math on an indexed column before comparing it to a constant. The index can find the column value but not the column value * 2. To test this out on the msdpn database on the Adaptive Server Anywhere CD, run these two queries.

Adaptive Server Anywhere
select prodnum, price 
from product
where price * 2 > 200

select prodnum, price
from product
where price > 100

  prodnum   price
=========== ==========
       2111    119.99
       1106    149.00
       1794    400.00

[3 rows]

The first one does a table scan. The second takes advantage of the pricix index. This change is relatively easy to enforce, thanks to our good friend Ms. Algebra. Unadorned indexed columns on the left! Computations on the right! Get in the habit of changing WHERE clauses as in Figure 6–9.

Figure 6-9Figure 6–9. Calculation in the WHERE Clause

Using Functions

Functions have the same effect on an indexed column in a WHERE clause as math does. The index points to a value but understands nothing about calculations or functions. You already know that a search for employees by number uses the empix index.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum = '443232366'

If you modify the column name (empnum) with a substring function, even one that does exactly the same match and produces the same results, you'll get a table scan.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where substr(empnum, 1, 9) = '443232366'

fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

Here's another example. The first query does a table scan. The second uses the empid index.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum || ' ' || fname = '443232366 Scorley'

select fname, lname, empnum, bossnum
from employee
where empnum = '443232366' and fname = 'Scorley'

fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

Finding Ranges with BETWEEN

If there is an index on a column, BETWEEN will not disable it. BETWEEN is treated as a pair of comparison operators. The low value must precede the high value.

The first of the following queries uses the prodix index on prodnum. The second uses the index on price.

Adaptive Server Anywhere
select prodnum, name
from product
where prodnum between '1110' and '1357'

  prodnum     name        
=========== ====================
    1110      star systems
    1255      bug stories
    1357      nt guru

[3 rows]

select prodnum, name, price
from product
where price between 300 and 400

  prodnum   name                 price
=========== ==================== ==========
    1794    memory8              400.00

[1 row]

NOT BETWEEN is not as easy to resolve with an index—an index points to a specific value. In the following case, you get a table scan.

Adaptive Server Anywhere
select prodnum, name, price
from product
where price not between 10 and 100

  prodnum     name                price
=========== ==================== ==========
    2111      memory tripler      119.99
    1794      memory8             400.00
    1106      z_connector         149.00

[3 rows]

Matching with LIKE

Most systems can take advantage of an index with LIKE as long as you provide the first character in the pattern (see "Matching Patterns" in Chapter 2). To follow this example, start by creating an index on the name columns of customer, last name first.

Adaptive Server Anywhere
create unique index custnmix 
on customer (lname, fname)

select lname, fname
from customer
where lname like 'Pe%'

The SQL engine translates the LIKE into a range comparison and is able to take advantage of the index.

Adaptive Server Anywhere
select lname, fname
from customer
where lname >= 'Pe' and lname < 'Pf'

lname                fname        
==================== ====================
Peters               Pete

[1 row]

However, the picture is different if you start with a wildcard. The optimizer does not use an index—the index cannot point to an unknown value.

Adaptive Server Anywhere
select lname, fname
from customer
where lname like '%ete%'

You get the same answer, of course, but the method is a table scan.

Comparing to NULL

Adaptive Server Anywhere uses the index on product.price in both the following queries, but not all systems do. Check vendor documentation to find out how nulls and indexes relate. If your system takes the "a null is not equal to anything" dictum into the realm of indexes, you may want to campaign for defaults instead of nulls at design time.

Adaptive Server Anywhere
select prodnum, price
from product
where price = 49.99

select prodnum, price
from product
where price is null

Another issue to check out is whether or not null returns a value in an IN. The following query may return one or two rows, depending on how your system handles null values in this situation.

Adaptive Server Anywhere
select prodnum, price
from product
where price in (null, 400.00)

Negating with NOT

A related area is how NOT and negatives such as <> and != affect index use. Indexes, after all, point to entries. What do they know about nonentries? The two following queries return the same results here (they could return quite disparate results with different data). The first uses the custnmix on last and first names. The second does not.

Adaptive Server Anywhere
select lname, fname
from customer
where lname = 'WONG'

select lname, fname
from customer
where lname not between ' ' and 'W'

lname                  fname        
==================== ====================
WONG                   LI-REN
[1 row]

Get in the habit of converting NOT phrases when you can (Figure 6–10). In most cases, you'll be better off looking for rows with prices greater than zero rather than rows with prices not equal to zero. In addition to its effect on index uses, negative logic is sometimes hard to understand and therefore open to error.

Figure 6-10Figure 6–10. Negatives in WHERE

Converting Values

Conversions, like math or functions, can disable an index, and this is true for autoconversions the system handles as well as conversions you perform. For starters, keep the conversion on the right side of the equation, if possible (Figure 6–11).

Figure 6-11Figure 6–11. Conversions in WHERE

In ASE, character data defined to allow null is actually stored as VARCHAR data, so joining two columns that differ only in whether or not they allow nulls may cause a conversion. The optimizer cannot use an index on the converted column. In Oracle, some autoconversions deactivate an index, others don't. Check your vendor documentation for details.

You'll find information about convert functions in "Converting Dates (and Other Datatypes)" in Chapter 2.

Using OR

An OR clause returns results if any one of the conditions is true (A = 1 or B > A or C = 3). If the columns in an OR clause have indexes, the optimizer can use the relevant indexes or do a table scan. Using the indexes means dumping results from each clause into an intermediate table and then removing duplicates from the intermediate table (Figure 6–12).

Figure 6-12Figure 6–12. OR Processing: Indexes

In a cost-based system, it's important that the table statistics be up to date. Otherwise, the optimizer may make the wrong choice.

If the ORed columns do not have indexes or the optimizer chooses not to use the indexes, you'll see a table scan (Figure 6–13).

Figure 6-13Figure 6–13. OR Processing: No Indexes

Finding Sets of Values with IN

Many third-party front-end applications overuse IN, sometimes employing it to find a single value or to return values in a range (Figure 6–14). This can be a problem, as IN is a form of OR processing, and tends to be expensive. If you are lucky, an IN with multiple terms can be re-stated as a range. This works only when the elements in the IN are the only ones in the range. If there were a $200 price in the second Figure 6–14 query, the BETWEEN translation would not be equivalent to the IN version.

Figure 6-14Figure 6–14. IN

Check IN clauses carefully. Could you do the same work with a simple equals comparison? With a range?

Using Multicolumn Indexes

In many cases, indexes consist of two or more columns (Figure 6–15). The ordprodix index in orderdetail is an example: it includes the ordnum and prodnum columns, in that order, and the order matters. The optimizer can use the index to find ordnum or combinations of ordnum and prodnum, but it can't follow pointers to go directly to prodnum values. Think of a phone book entry. It helps you find all the subscribers surnamed Smith and all the Smiths with Heather as a first name. It's no good for finding people with an unknown last name whose first name is Heather, though.

Figure 6-15Figure 6–15. Multicolumn Indexes

When you construct WHERE clauses, keep in mind the order of the columns in the index. Order of columns in the SELECT list has no effect.

  • + Share This
  • 🔖 Save To Your Account

Related Resources

There are currently no related titles. Please check back later.