Home > Articles > Data > SQL

SQL Performance Tuning: Simple "Searches"

Master SQL syntax-based optimizing and simple search conditions. Learn which search conditions are best, and armed with this information, you can decide whether to change the order of expressions, or to substitute one expression for another that does the same work more efficiently.
This chapter is from the book

This chapter is from the book

In this chapter, we'll talk about syntax-based optimizing and simple search conditions.

A syntax is a choice of words and their arrangement in the SQL statement. To optimize based on syntax, assume that nonsyntactic factors (e.g., indexes, table sizes, storage) are irrelevant or unknown. This is the lowest level of optimizing—it's usually predictable, and some of it can be done on the client.

There's no point in attempting to optimize most SQL syntax because only certain SQL statements have options that lend themselves to optimization. The particular syntax that offers many optimization possibilities is the SQL search condition. Here are three examples of search conditions:

 ... WHERE title LIKE 'The %' OR title LIKE 'A %'
 ... WHERE name <> 'Smith'
 ... WHERE number = 5

Although the slowest search conditions are those that contain joins and subqueries, this chapter deals only with single-table searches. (We'll talk about joins and subqueries later.) Also, although search conditions can appear in HAVING, IF, or ON clauses, we'll talk only about search conditions in WHERE clauses. So the chapter title—"Simple Searches"—is an understatement. Don't worry—the complex queries will come later.

General Tuning

In this part of the chapter, we'll look at some general ideas you should keep in mind when writing simple search conditions.

Code for Points

The best search conditions are those that work on few rows with easy-to-do comparisons. Table 2-1 and Table 2-2 show typical lists (derived from vendors' manuals) of types of search conditions, in order from best to worst. Each search condition component has a "point count"—the better the component, the higher the score. You can see from the allotted points shown in Tables 2-1 and 2-2 that the best search condition is something like:

 ... WHERE smallint_column = 12345

Table 2-1. Search Condition Point Counts for Operators

Operator

Points

=

10

>

5

>=

5

<

5

<=

5

LIKE

3

<>

0


Table 2-2. Search Condition Point Counts for Operands

Operand

Points

Literal alone

10

Column alone

5

Parameter alone

5

Multioperand expression

3

Exact numeric data type

2

Other numeric data type

1

Temporal data type

1

Character data type

0

NULL

0


This example gets a total of 27 points, calculated as follows:

  • Five points for the column (smallint_column) alone on the left

  • Two points for the exact numeric (smallint_column) operand data type

  • Ten points for the equals operator

  • Ten points for the literal (12345) alone on the right

Here's another example:

 ... WHERE char_column >= varchar_column || 'x'

The point count for this type of search condition is much lower—only 13:

  • Five points for the column (char_column) alone on the left

  • Zero points for the CHAR (char_column) operand data type

  • Five points for the greater than or equal to operator

  • Three points for the multioperand expression (varchar_column || 'x') on the right

  • Zero points for the VARCHAR (varchar_column) operand data type

The precise point count for a search condition varies from vendor to vendor, so it's pointless to memorize anything other than the order and the concept for this optimization technique. So just remember:

The condition that takes the least time—usually because it involves fewer rows or easier comparisons—gets the most points.

Armed with this concept, you can decide whether to change the order of expressions, or to substitute one expression for another that does the same work. Even though a modern cost-based DBMS optimizer has many more rules that require information outside the SQL statement itself, all DBMSs still fall back on the point count when no other information is available. The possibility is always there that you will use an item of information that the optimizer doesn't. (For more information about cost-based optimizers, see Chapter 17 "Cost-Based Optimizers.")

Another way you can optimize a search condition is by putting multiple expressions in the correct order. The expressions in this WHERE clause are already in optimal order:

 SELECT * FROM Table1
  WHERE column1 = 5
   AND column2 = 77.3
   AND column3 = 'Smith'
   AND column4 < 117
   AND column4 > column5
 GAIN: 0/8

The note at the bottom of this example says there is a GAIN: 0/8. That's an important number, and we're going to say "GAIN: x/8" in many of our examples, so let's clarify. As explained in Chapter 1, the gain shows how many of the Big Eight run faster when the search condition is optimized. Mileage varies with different data and different machines, of course. We're only reporting what our tests showed.

So "GAIN: 0/8" means "you'd be wasting your time if you rearranged this particular WHERE clause into optimum order because the DBMS does this for you." All DBMS makers know the basics of point counting, so the rearrangement is automatic. This means that, in ordinary cases, you will gain nothing by doing your own syntax-based optimization. However, there are many exceptions to the rule. For the rest of this chapter, we'll look at cases where the gain is both significant and predictable.

Constant Propagation

"All men are mortal. Socrates is a man. Therefore, Socrates is mortal."
—Attributed to Aristotle

Formally, the Law of Transitivity states that:

 IF
 (A <comparison operator> B) IS TRUE AND (B <comparison operator> C) IS TRUE
 THEN
 (A <comparison operator> C) IS TRUE AND NOT (A <comparison operator> C) IS FALSE

(Comparison operator is any one of: = or > or >= or < or <= but not one of: <> or LIKE.)

The Law of Transitivity leads to the simple observation that we can substitute C for B without changing the meaning of an expression. When such a substitution involves substituting a constant value, the process is called constant propagation.1

The next two expressions mean the same thing, but the second expression has a better point count because it substitutes a literal (5) for a column name (column1):

 Expression #1
 ... WHERE column1 < column2
    AND column2 = column3
    AND column1 = 5

 Expression #2
 ... WHERE 5 < column2
    AND column2 = column3
    AND column1 = 5
 GAIN: 2/8

Expression #2 is called a transform of Expression #1. (Writing a transform means rewriting an SQL statement to produce the same result but with different syntax. When two SQL statements have different syntax, but will predictably and regularly produce the same outputs, they are known as transforms of one another.) Most good DBMSs do this sort of thing automatically. But some DBMSs won't try transforms when the expression contains multiple parentheses and NOTs. For example, this SELECT statement can be slow:

 SELECT * FROM Table1
  WHERE column1 = 5 AND
   NOT (column3 = 7 OR column1 = column2)

Applying the transforms ourselves, we came up with this statement:

 SELECT * FROM Table1
  WHERE column1 = 5
   AND column3 <> 7
   AND column2 <> 5
 GAIN: 5/8

The transformed statement is faster more than half of the time. In other words, sometimes it pays to do your own transforms.

Sometimes constant propagation won't work with floats, because it's possible to be both "greater than" and "equal to" at the same time when approximate numeric comparisons happen. When it does work, though, expect a GAIN: 5/8. And sometimes, constant propagation won't work for CHAR expressions. But when it does, expect a GAIN: 4/8.

Exercise time: The MySQL online documentation has this example:

 ... WHERE a < b AND b = c AND a = 5

transforms to:

 ... WHERE b > 5 AND b = c AND a = 5

The quiz question here is—Did the MySQL folks make a mistake?2

In the real world, you'll find many semiconstant operands, as program parameters or functions. Examples are the niladic functions like CURRENT_DATE (a niladic function is a function that has no arguments). Because using a constant value always accelerates accesses, try a transform to speed up these cases. Here's an example: Query #1 transforms to Query #2:

 Query #1:
 SELECT * FROM Table1
  WHERE date_column = CURRENT_DATE
   AND amount * 5 > 100.00
 Query #2:
 SELECT * FROM Table1
  WHERE date_column = DATE '2002-01-01'
   AND amount * 5 > 100.00
 GAIN: 5/8

If you're thinking of transforming this type of expression, keep in mind that (because of the DATE constant), you'd have to change the query every day. That's only practical when an application program is generating the queries on the server.

Dead Code Elimination

"Branches with no leaves should be cut off."
—Ortho Guide To Pruning Bushes and Shrubs

In some old SQL programs, you'll encounter literals on both sides of the comparison operator, as in this example:

 SELECT * FROM Table1
  WHERE 0 = 1
   AND column1 = 'I hope we never execute this'

In the days before C-style /* comments */ were legal in an SQL statement, this was a way to add an inline comment string. Because the expression 0 = 1 is always false, this query will always return zero rows, and therefore DBMSs can skip the whole WHERE clause. But some of them don't. We tested this by removing the WHERE clause and got a gain:

 SELECT * FROM Table1
 GAIN: 5/8

It is, of course, obvious that these two queries aren't equivalent—the point is merely that it should take less time to retrieve zero rows due to an always-false condition than it should to do a full table scan-provided the DBMS doesn't evaluate the always-false condition. This example shows that DBMSs don't always throw out always-false conditions and all their dependents in the PREPARE stage. But they're pretty reliable at throwing out always-true conditions. So you can use always-true conditions for an SQL equivalent of conditional compilation. For example, if you worry that a DBMS won't give high precision for division results, add a separate condition that comes into play only when necessary—as in this example:

 ... WHERE (77 / 10 = 7.7 AND column1 / 10 = 7.7)
    OR (77 / 10 = 7 AND column1 * 10 = 77)
 GAIN: 5/8

Because of the unreliability aspect, though, it is usually a bad idea to put in redundant code. Suppose that a column, indexed_column, is an indexed NOT NULL column. You could transform this SQL statement:

 SELECT * FROM Table1

to this statement:

 SELECT * FROM Table1
  WHERE indexed_column > 0

This is a way of forcing the DBMS to look up via the index. Alas, it works only with a few DBMSs. In general, then, don't add redundant conditions to WHERE clauses.

Ensure You Use the Right DBMS

There are several ways to ensure that a specific DBMS (and no other) executes an expression. Here are three examples, all of which use nonstandard SQL extensions:

 Example 1:
 ... WHERE :variable = 'Oracle'
    AND /* Oracle-specific code here */

 Example 2:
 SELECT /* ! HIGH_PRIORITY */ ...
  /* all DBMSs except MySQL ignore this */

 Example 3:
 ... WHERE <escape-sequence> AND /* ODBC code */

While we're on the subject, Oracle allows you to add a comment that indicates what index you want to use. It looks like this:

 SELECT /*+ INDEX(Widgets Widget_index) */
    column1, column2, column3
  FROM Widgets
  WHERE column1 <> 7;
 GAIN: only 1/8 because it's Oracle-specific

Oracle-specific optimizations are bad ideas if they tie you to Oracle. In this case, the hint is in a comment, so other DBMSs will ignore it. That's good—it's more portable than putting hints outside comments as Microsoft and Sybase do. So it's okay—until other DBMSs start to put executable data inside comments too. Some are already starting to do so. So right now hints are okay, but eventually they will lead to conflict and chaos.

Constant Folding

"Go north one mile, west one mile, west one more mile, then south one mile, and you will be at your starting point."
—The South Pole Riddle

Anyone who has used C will know that the expression x=1+1-1-1 is folded to x=0 at compile time. So it may surprise you that many SQL DBMSs do not fold these five obvious-looking transform candidates:

 ... WHERE column1 + 0

 ... WHERE 5 + 0.0

 ... WHERE column1 IN (1, 3, 3)

 ... CAST(1 AS INTEGER)

 ... WHERE 'a' || 'b'

If you find expressions like these in old code though, our tip is—Leave them alone. They are there for historical reasons, such as forcing the DBMS to ignore indexes, changing the result data type, allowing for the difference between SMALLINT and INTEGER, or evading a limit on line size. Sorry—but the obvious-looking cases are precisely the cases where you should stop and wonder whether the original programmer had some reason for the weird syntax choice. We read a Java optimization article once that sums it up nicely: "Rule #1: Understand the code." Nevertheless, we do recommend that you transform this search condition:

 ... WHERE a - 3 = 5

to:

 ... WHERE a = 8     /* a - 3 = 5 */
 GAIN: 6/8

Although it's useless in simple cases, constant folding can lead to constant propagation and is therefore A Good Thing.

Case-Insensitive Searches

Microsoft's Access Jet considers the strings 'SMITH' and 'Smith' to be equal, so Access is called a case-insensitive DBMS. Oracle, on the other hand, is usually case sensitive (the engine would say 'SMITH' and 'Smith' are unequal strings). Sybase allows you to decide about case sensitivity when you install, and a true SQL Standard DBMS will allow you to switch case sensitivity at runtime. We've seen many programmers try to ensure case insensitivity by using the fold function UPPER, as in:

 ... WHERE UPPER(column1) = 'SMITH'

That can be a mistake if you're dealing with strings that contain anything other than strictly Latin letters. With some DBMSs, when you translate certain French or German strings to uppercase, you lose information. For example, the function:

 ... UPPER('rÈsumÈ')

returns RESUME—that is, the accent marks are lost, changing the meaning of the word from "curriculum vitae" to "begin again." Because information isn't lost going the other way, it's better to use the LOWER function, like this:

 ... WHERE LOWER(column1) = 'rÈsumÈ'

An even better way is to eliminate the fold function entirely if that's possible, because—we appeal to authority here—both the Microsoft and Oracle manuals say: "Avoid functions on columns." We're sure they mean "avoid functions on columns when there's another way to get the result needed"—for example, to ensure case insensitivity, the best method is to use a case-insensitive collation rather than a fold function.

A slightly faster search assumes that the data is clean and asks for the only reasonable combinations, like this:

 ... WHERE column1 = 'SMITH'
    OR column1 = 'Smith'
 GAIN: 8/8

which is still slow. Our tip here is—Take advantage of dead code elimination so that the 'Smith' search happens only when the DBMS is case sensitive. Here's how:

 ... WHERE column1 = 'SMITH'
    OR ('SMITH' <> 'Smith' AND column1 = 'Smith')
 GAIN: 3/8

Sargability

"CHUFFED. adj. [1] Pleased. [2] Displeased."
—The Concise Oxford Dictionary

The ideal SQL search condition has the general form:

 <column> <comparison operator> <literal>

In the early days, IBM researchers named these kinds of search conditions "sargable predicates" because SARG is a contraction for Search ARGument. In later days, Microsoft and Sybase redefined "sargable" to mean "can be looked up via the index." Alas—when the same word has two very different meanings, it's not much use as a word any more! So we titled this section "Sargability" just for fun. But although the word is dead, the idea lives on in this rule:

The left side of a search condition should be a simple column name; the right side should be an easy-to-look-up value.

To enforce this rule, all DBMSs will transpose the expression:

 5 = column1

to:

 column1 = 5

When there's arithmetic involved, though, only some DBMSs will transpose. For example, we tested this transform:

 ... WHERE column1 - 3 = -column2

transforms to:

 ... WHERE column1 = -column2 + 3
 GAIN: 4/8

The gain shows that doing the transform ourselves helped considerably.

On a 32-bit computer, arithmetic is fastest if all operands are INTEGERs (because INTEGERs are 32-bit signed numbers) rather than SMALLINTs, DECIMALs, or FLOATs. Thus this condition:

 ... WHERE decimal_column * float_column

is slower than:

 ... WHERE integer_column * integer_column
 GAIN: 5/8

You might expect "GAIN: 8/8" with the last example, but some DBMSs (Oracle is an example) don't distinguish between 16-bit and 32-bit numbers or store them in binary form. (Recall that dBASE stores numbers in ASCII. When a DBMS stores numbers in ASCII, truncation will always beat division by ten. So store using the base that you might divide by.)

The Bottom Line: General Tuning

The left side of a search condition should be a simple column name; the right side should be an easy-to-look-up value.

Each component of a search condition has a point count. The higher the points, the faster the component. The condition that takes the least time gets the most points.

Put multiple expressions in the correct order.

Use the Law of Transitivity and the concept of constant propagation to substitute a literal for a column name or column expression whenever you can do so without changing the meaning of an expression.

Some DBMSs won't fold most obvious-looking (to a C programmer) expressions. Don't use this principle as the reason to always transform such expressions when you find them in old code though—usually they're there for historical reasons. Remember Rule #1: Understand the code before changing it.

If the code involves an obvious math expression, do evaluate it and transform the condition to the evaluated result. Constant folding can lead to constant propagation and is therefore A Good Thing.

Avoid functions on columns.

If you can't avoid functions, don't use UPPER to ensure case insensitivity. Use LOWER instead.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020