- Introduction
-
Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
-
Microsoft SQL Server Programming
- An Outline for Development
- Database
- Database Services
- Database Objects: Databases
- Database Objects: Tables
- Database Objects: Table Relationships
- Database Objects: Keys
- Database Objects: Constraints
- Database Objects: Data Types
- Database Objects: Views
- Database Objects: Stored Procedures
- Database Objects: Indexes
- Database Objects: User Defined Functions
- Database Objects: Triggers
- Database Design: Requirements, Entities, and Attributes
- Business Process Model Notation (BPMN) and the Data Professional
- Business Questions for Database Design, Part One
- Business Questions for Database Design, Part Two
- Database Design: Finalizing Requirements and Defining Relationships
- Database Design: Creating an Entity Relationship Diagram
- Database Design: The Logical ERD
- Database Design: Adjusting The Model
- Database Design: Normalizing the Model
- Creating The Physical Model
- Database Design: Changing Attributes to Columns
- Database Design: Creating The Physical Database
- Database Design Example: Curriculum Vitae
- NULLs
- The SQL Server Sample Databases
- The SQL Server Sample Databases: pubs
- The SQL Server Sample Databases: NorthWind
- The SQL Server Sample Databases: AdventureWorks
- The SQL Server Sample Databases: Adventureworks Derivatives
- UniversalDB: The Demo and Testing Database, Part 1
- UniversalDB: The Demo and Testing Database, Part 2
- UniversalDB: The Demo and Testing Database, Part 3
- UniversalDB: The Demo and Testing Database, Part 4
- Getting Started with Transact-SQL
- Transact-SQL: Data Definition Language (DDL) Basics
- Transact-SQL: Limiting Results
- Transact-SQL: More Operators
- Transact-SQL: Ordering and Aggregating Data
- Transact-SQL: Subqueries
- Transact-SQL: Joins
- Transact-SQL: Complex Joins - Building a View with Multiple JOINs
- Transact-SQL: Inserts, Updates, and Deletes
- An Introduction to the CLR in SQL Server 2005
- Design Elements Part 1: Programming Flow Overview, Code Format and Commenting your Code
- Design Elements Part 2: Controlling SQL's Scope
- Design Elements Part 3: Error Handling
- Design Elements Part 4: Variables
- Design Elements Part 5: Where Does The Code Live?
- Design Elements Part 6: Math Operators and Functions
- Design Elements Part 7: Statistical Functions
- Design Elements Part 8: Summarization Statistical Algorithms
- Design Elements Part 9:Representing Data with Statistical Algorithms
- Design Elements Part 10: Interpreting the Data—Regression
- Design Elements Part 11: String Manipulation
- Design Elements Part 12: Loops
- Design Elements Part 13: Recursion
- Design Elements Part 14: Arrays
- Design Elements Part 15: Event-Driven Programming Vs. Scheduled Processes
- Design Elements Part 16: Event-Driven Programming
- Design Elements Part 17: Program Flow
- Forming Queries Part 1: Design
- Forming Queries Part 2: Query Basics
- Forming Queries Part 3: Query Optimization
- Forming Queries Part 4: SET Options
- Forming Queries Part 5: Table Optimization Hints
- Using SQL Server Templates
- Transact-SQL Unit Testing
- Index Tuning Wizard
- Unicode and SQL Server
- SQL Server Development Tools
- The SQL Server Transact-SQL Debugger
- The Transact-SQL Debugger, Part 2
- Basic Troubleshooting for Transact-SQL Code
- An Introduction to Spatial Data in SQL Server 2008
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Transact-SQL: Ordering and Aggregating Data
Last updated Mar 28, 2003.
We've been learning about the Transact-SQL language, beginning with the SELECT command. So far we've worked primarily with text-based fields, but we need to spend a little time with my favorite type of data: numbers.
While the WHERE, BETWEEN, IN and EXISTS work just fine with text or numbers, you need to remember to remove the single quotes (called "ticks") we've been using on text. Here's an example from Query Analyzer. Notice the number at the end:
USE pubs GO SELECT * FROM sales WHERE qty > 5
Here are the results from that query:
|
7066 |
A2976 |
1993-05-24 00:00:00.000 |
50 |
Net 30 |
PC8888 |
|
7066 |
QA7442.3 |
1994-09-13 00:00:00.000 |
75 |
ON invoice |
PS2091 |
|
7067 |
D4482 |
1994-09-14 00:00:00.000 |
10 |
Net 60 |
PS2091 |
|
|
|
(and so on) |
|
|
|
There's one other interesting aside regarding the use of numbers in a select query. It has to do with the concatenation of text and numbers. Examine this query:
SELECT ord_num, qty FROM sales
This query returns both numbers (qty) and text (ord_num) correctly:
|
6871 |
5 |
|
722a |
3 |
|
A2976 |
50 |
|
QA7442.3 |
75 |
|
(and so on) |
|
Now look at this query:
USE pubs GO SELECT ord_num + qty FROM sales
This one doesn't work when we run it, we get the following result:
Server: Msg 245, Level 16, State 1, Line 1 Syntax error converting the varchar value '722a' to a column of data type smallint.
The reason is that the plus sign is used in math to add two numbers. We don't want to add the value in the ord_num text field to the number in qty; we want to display them together on the screen.
Even worse, sometimes we want to display the text, but the field can be treated as a number such as a date field. The fact that a query might actually work is a worse situation in which the query works but brings back bad data.
So how do we resolve this dilemma? We'll see the answer in full in another article, but the short answer is that we need to convert the number to a text field.
Don't focus too much on the syntax just yet, but here's the corrected query:
USE pubs GO SELECT ord_num + CAST(qty AS varchar(10)) FROM sales
Running that query gives us this set:
|
68715 |
|
722a3 |
|
A297650 |
|
QA7442.375 |
|
(and so on) |
Not that this concatenation makes sense, but you see the point.
Now that we're familiar with selecting numbers from our tables, let's take a look at the aggregation functions.
Aggregation is simply the act of working on groups of data to obtain a single result. The most obvious example would be to get a sum of the values in a column, a very common task. To do that, we use the first of our aggregations, the SUM() function, in our selection. Here's how it works:
USE pubs GO SELECT SUM(qty) FROM sales
Giving us:
|
493 |
We can see the format that is used, where the aggregate function name, SUM, is followed by parentheses. In between the parenthesis is the column to aggregate.
Once again, using our layering techniques, we can sum the number of books sold in 1993:
USE pubs GO SELECT SUM(qty) FROM sales WHERE ord_date BETWEEN '01/01/1993' AND '12/31/1993'
...And here is the result:
|
250 |
Since SUM is a function, we can use it to calculate other metrics along with the aggregate function. The following query calculates the 5% profit margin for store number 7067:
SELECT SUM(qty * .05) AS 'StoreTotal' FROM sales WHERE stor_id = '7067'
When we run the query, we see why bookstores must sell coffee to make ends meet!
|
4.50 |
The important thing to notice is that the multiplication is applied one row at a time, and then the aggregate is applied to the result.
The SUM() function isn't the only aggregate we have to work with. We also have the AVG() function, which gives us the average of a range of numbers. The format and use is the same as SUM(), but here we'll introduce another concept that deal with aggregates, the DISTINCT keyword.
With DISTINCT, the aggregate function operates only on unique values. This query returns the average quantity of books sold, but leaves out duplicate quantities:
SELECT AVG(DISTINCT qty) FROM sales
Resulting in:
|
28 |
We can use DISTINCT with SUM(), AVG(), and our next function, COUNT(). The COUNT() function is used to count the number of rows returned by the query. At its simplest, we can use the function to find out how many rows are in a table, like this:
SELECT COUNT(*) FROM sales
Which gives us:
|
21 |
To extend that query a bit further, we can use the DISTINCT keyword to find the number of stores that have sold books at all:
SELECT COUNT(DISTINCT stor_id) FROM sales
Returning:
|
6 |
And we can use the function with a WHERE limiter as well.
The other aggregate functions work in much the same way, so here's a handy chart of the more useful ones:
|
AVG |
Gets the average of the selected values |
|
COUNT |
Returns the count of the selected values |
|
MAX |
Finds the maximum value in the group of the selected values |
|
MIN |
Finds the minimum value in the group of the selected values |
|
STDEV |
Returns the standard deviation of the selected values, useful when combined with the average number |
|
SUM |
Finds the sum of the selected values |
|
VAR |
Finds the statistical variance of the selected values |
Notice we've said "more useful" because there are a couple of other aggregates (such as COUNT_BIG(), VARP() and STDDEVP()) that we don't run into that often. We can always look those up if we need them!
Next, let's take a look at two other aggregate concepts, the GROUP BY and HAVING commands.
If you tried some of these queries, you might have been tempted to try and place headings out to the side, like this:
|
Book 1 Sales |
90.00 |
|
Book 2 Sales |
15.00 |
|
Total of Sales: |
105.00 |
The issue with constructing this query is that aggregate functions work on multiple values, and we want the values to break on a variety of values. We simply can't put those together with a single-value result. Happily, SQL provides a command to do just that: GROUP BY. The format of that command looks like this:
SELECT column, AggregateFunction(column) FROM table GROUP BY column
We'll layer this idea just as we have been, remembering that the result is always cumulative. Let's take a simple query and build on it to show the result.
First, we'll query the authors table to show the number of authors in each state:
SELECT state, COUNT(*) FROM authors GROUP BY state
This query returns the following results:
|
CA |
15 |
|
IN |
1 |
|
KS |
1 |
|
MD |
1 |
|
MI |
1 |
|
OR |
1 |
|
TN |
1 |
|
UT |
2 |
...Which gives us the authors, broken down by state.
Notice that if we use a column in the SELECT list, it must appear in either the GROUP BY clause or it must be used as an argument to one of the SQL aggregate functions like SUM() or COUNT(). As we can see, the GROUP BY addition is used to calculate values for each subset. We can also use GROUP BYs on multiple columns. Try it!
The results of what we're building are additive meaning that the SELECT and all its results are processed first, and then the aggregates are performed. By extension, this means that we can use the WHERE clause before the grouping just as we always have, but we can't use a WHERE on the grouped results.
The reason for this is that, once again, we're missing a set of scalar row by row values with aggregated calculated values. SQL Server solves the issue with the HAVING extension to the ORDER BY command. HAVING addresses the fact that a WHERE cannot be used with aggregate functions. Here's an example of the HAVING extension, showing the same results from above with the added constraint that is used after the grouping:
SELECT state, COUNT(*) FROM authors GROUP BY state HAVING COUNT(*) > 1
Returning the following results:
|
CA |
15 |
|
UT |
2 |
We covered quite a few concepts in the last few articles. Make sure you practice with several queries on your own. The examples I've used have been very simple, to explain these complex ideas, but you should try a few more real-world queries. If you get stuck, jump over to the weblogs area and ask me (or another reader) for help; people really love to give assistance.
That's all for this week Oh, one final note: GROUP BY can only be used with an aggregate function such as AVG(), COUNT(), MAX(), MIN(), or SUM().
Online Resources
Chuo-Han Lee has a great SQL tutorial on his site, and also covers the GROUP BY clause.
InformIT Articles and Sample Chapters
Chris Fehily has a sample chapter from his work called Summarizing and Grouping Data in SQL Sample Chapter. This sample chapter introduces SQL's aggregate functions, or set functions, which operate on a group of values to produce a single, summarizing value.
