Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Transact-SQL: Ordering and Aggregating Data

Last updated Mar 28, 2003.

We've been learning about the Transact-SQL language, beginning with the SELECT command. So far we've worked primarily with text-based fields, but we need to spend a little time with my favorite type of data: numbers.

While the WHERE, BETWEEN, IN and EXISTS work just fine with text or numbers, you need to remember to remove the single quotes (called "ticks") we've been using on text. Here's an example from Query Analyzer. Notice the number at the end:

  USE pubs
  GO
  SELECT * 
  FROM sales
  WHERE qty > 5

Here are the results from that query:

7066

A2976

1993-05-24 00:00:00.000

50

Net 30

PC8888

7066

QA7442.3

1994-09-13 00:00:00.000

75

ON invoice

PS2091

7067

D4482

1994-09-14 00:00:00.000

10

Net 60

PS2091

 

 

(and so on)

 

 

 


There's one other interesting aside regarding the use of numbers in a select query. It has to do with the concatenation of text and numbers. Examine this query:

  SELECT ord_num, qty 
  FROM sales 

This query returns both numbers (qty) and text (ord_num) correctly:

6871

5

722a

3

A2976

50

QA7442.3

75

(and so on)

 


Now look at this query:

  USE pubs
  GO
  SELECT ord_num + qty 
  FROM sales

This one doesn't work – when we run it, we get the following result:

Server: Msg 245, Level 16, State 1, Line 1
Syntax error converting the varchar value '722a' to a column of data type smallint.

The reason is that the plus sign is used in math to add two numbers. We don't want to add the value in the ord_num text field to the number in qty; we want to display them together on the screen.

Even worse, sometimes we want to display the text, but the field can be treated as a number – such as a date field. The fact that a query might actually work is a worse situation in which the query works but brings back bad data.

So how do we resolve this dilemma? We'll see the answer in full in another article, but the short answer is that we need to convert the number to a text field.

Don't focus too much on the syntax just yet, but here's the corrected query:

  USE pubs
  GO
  SELECT ord_num + CAST(qty AS varchar(10)) 
  FROM sales

Running that query gives us this set:

68715

722a3

A297650

QA7442.375

(and so on)


Not that this concatenation makes sense, but you see the point.

Now that we're familiar with selecting numbers from our tables, let's take a look at the aggregation functions.

Aggregation is simply the act of working on groups of data to obtain a single result. The most obvious example would be to get a sum of the values in a column, a very common task. To do that, we use the first of our aggregations, the SUM() function, in our selection. Here's how it works:

  USE pubs
  GO
  SELECT SUM(qty)
  FROM sales

Giving us:

493


We can see the format that is used, where the aggregate function name, SUM, is followed by parentheses. In between the parenthesis is the column to aggregate.

Once again, using our layering techniques, we can sum the number of books sold in 1993:

  USE pubs
  GO
  SELECT SUM(qty)
  FROM sales
  WHERE ord_date BETWEEN '01/01/1993' AND '12/31/1993'

...And here is the result:

250


Since SUM is a function, we can use it to calculate other metrics along with the aggregate function. The following query calculates the 5% profit margin for store number 7067:

  SELECT SUM(qty * .05) AS 'StoreTotal'
  FROM sales
  WHERE stor_id = '7067'

When we run the query, we see why bookstores must sell coffee to make ends meet!

4.50


The important thing to notice is that the multiplication is applied one row at a time, and then the aggregate is applied to the result.

The SUM() function isn't the only aggregate we have to work with. We also have the AVG() function, which gives us the average of a range of numbers. The format and use is the same as SUM(), but here we'll introduce another concept that deal with aggregates, the DISTINCT keyword.

With DISTINCT, the aggregate function operates only on unique values. This query returns the average quantity of books sold, but leaves out duplicate quantities:

  SELECT AVG(DISTINCT qty) 
  FROM sales

Resulting in:

28


We can use DISTINCT with SUM(), AVG(), and our next function, COUNT(). The COUNT() function is used to count the number of rows returned by the query. At its simplest, we can use the function to find out how many rows are in a table, like this:

  SELECT COUNT(*) 
  FROM sales

Which gives us:

21


To extend that query a bit further, we can use the DISTINCT keyword to find the number of stores that have sold books at all:

  SELECT COUNT(DISTINCT stor_id) 
  FROM sales

Returning:

6


And we can use the function with a WHERE limiter as well.

The other aggregate functions work in much the same way, so here's a handy chart of the more useful ones:

AVG

Gets the average of the selected values

COUNT

Returns the count of the selected values

MAX

Finds the maximum value in the group of the selected values

MIN

Finds the minimum value in the group of the selected values

STDEV

Returns the standard deviation of the selected values, useful when combined with the average number

SUM

Finds the sum of the selected values

VAR

Finds the statistical variance of the selected values


Notice we've said "more useful" because there are a couple of other aggregates (such as COUNT_BIG(), VARP() and STDDEVP()) that we don't run into that often. We can always look those up if we need them!

Next, let's take a look at two other aggregate concepts, the GROUP BY and HAVING commands.

If you tried some of these queries, you might have been tempted to try and place headings out to the side, like this:

Book 1 Sales

90.00

Book 2 Sales

15.00

Total of Sales:

105.00


The issue with constructing this query is that aggregate functions work on multiple values, and we want the values to break on a variety of values. We simply can't put those together with a single-value result. Happily, SQL provides a command to do just that: GROUP BY. The format of that command looks like this:

SELECT column, AggregateFunction(column) 
FROM table 
GROUP BY column

We'll layer this idea just as we have been, remembering that the result is always cumulative. Let's take a simple query and build on it to show the result.

First, we'll query the authors table to show the number of authors in each state:

  SELECT state, COUNT(*)
  FROM authors
  GROUP BY state

This query returns the following results:

CA

15

IN

1

KS

1

MD

1

MI

1

OR

1

TN

1

UT

2


...Which gives us the authors, broken down by state.

Notice that if we use a column in the SELECT list, it must appear in either the GROUP BY clause or it must be used as an argument to one of the SQL aggregate functions like SUM() or COUNT(). As we can see, the GROUP BY addition is used to calculate values for each subset. We can also use GROUP BYs on multiple columns. Try it!

The results of what we're building are additive – meaning that the SELECT and all its results are processed first, and then the aggregates are performed. By extension, this means that we can use the WHERE clause before the grouping just as we always have, but we can't use a WHERE on the grouped results.

The reason for this is that, once again, we're missing a set of scalar row by row values with aggregated calculated values. SQL Server solves the issue with the HAVING extension to the ORDER BY command. HAVING addresses the fact that a WHERE cannot be used with aggregate functions. Here's an example of the HAVING extension, showing the same results from above with the added constraint that is used after the grouping:

  SELECT state, COUNT(*)
  FROM authors
  GROUP BY state
  HAVING COUNT(*) > 1

Returning the following results:

CA

15

UT

2


We covered quite a few concepts in the last few articles. Make sure you practice with several queries on your own. The examples I've used have been very simple, to explain these complex ideas, but you should try a few more real-world queries. If you get stuck, jump over to the weblogs area and ask me (or another reader) for help; people really love to give assistance.

That's all for this week – Oh, one final note: GROUP BY can only be used with an aggregate function such as AVG(), COUNT(), MAX(), MIN(), or SUM().

Online Resources

Chuo-Han Lee has a great SQL tutorial on his site, and also covers the GROUP BY clause.

InformIT Articles and Sample Chapters

Chris Fehily has a sample chapter from his work called Summarizing and Grouping Data in SQL Sample Chapter. This sample chapter introduces SQL's aggregate functions, or set functions, which operate on a group of values to produce a single, summarizing value.