Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Building a Reporting Data Server

Last updated Mar 28, 2003.

If you've been in the technology industry for any length of time, you've probably seen a project or two that ended up just a little past the limits of its design. You may have even been unfortunate enough to be part of one of those projects. What I'm talking about is a when a project is designed to fill a certain need, but isn't designed to be flexible enough to extend beyond its initial purpose.

In my past as a consultant, I saw this happen time and time again. Most of the time the IT staff felt that the problem wasn't their fault — but in fact it was. It is incumbent on us as data professionals not only to design a system that fits the current goal, but one that is as adaptable as possible given current technology, talent and budget. We need to think about the future and anticipate as many needs the organization might have as we can.

Of course, this clairvoyance only goes so far. We can't be expected to design a single system on a small budget for a department that can scale to a multi-purpose system that can serve an enterprise. But as the professional you should explain to the people who request your designs to find out what the ultimate purpose of the system will be. And don't take "this is all we will ever want this system to do" as an answer — that answer is almost never true. You'll need to plumb the corners and ask questions like "What other users might possibly be interested in this data? Where are they? What would they do with it?"

This overview deals with the data and platform design decisions for creating a system for reporting. In another article I've covered Reporting Services found in SQL Server 2000 as an add-on and 2005 as a built-in feature. You can use the information I'll give you here not only for a Reporting Services application, but in a broader sense as a platform for that application as well as any other.

Always Plan for Reporting

There are some design decisions that you can make almost every time as a "safe bet". It's a given that most system accept input, manipulate and store data, and produce output. And it's that output part where you can have a big impact on how you design your system.

Interestingly, this is the area where I have seen the most glaring problems in a system's design. In a typical organization, the It staff will be asked to design or implement a new system to handle some function. At the outset of the design, the primary focus from the user's side is the user interface — they want to know how the workflow will be laid out, which buttons to click, and generally how they handle their day-to-day tasks on the system. From the IT design side, the initial focus is often on the business logic. Since the IT group might not be as familiar with what that particular set of users does every day, they try to understand the rules that the users follow. After the initial design, the users are often not as involved as they were during the initial phase. The IT group, on the other hand, is on to the details of the system, which most of the time includes the data design. Often at the end of the process, reporting is put into the system, and very often, it's not as thorough a design discovery as the user interface screens are.

This is a mistake. The users will quickly get familiar with the screen layouts in an application, and any corrections there are often not a difficult task to work around or change. But soon after the data piles up, the users will begin to want reports from the system that they did not originally anticipate. And they will often run reports on the system in a frequency that the IT design group did not anticipate.

The results of this process are that the system begins to become overloaded, locks on the database begin to affect input, and things begin to slow down. The IT staff looks at the symptoms, but the real problem is that the reports were never "baked in", meaning that reporting isn't treated as a separate function.

So the first fix is to do just that — elevate the reporting function during your design. Make sure that the users understand just how much reporting they are liable to do from the system, and have them think that part of the process through.

Designing Reporting Into the System

To begin, you'll need to account for reporting in your system. It needs to be one of the central parts of your system design, even if you're putting together a small application.

In my articles on Business Intelligence, I talk a lot about developing a layout for your enterprise that can support intelligent analysis. The first level of these layout decisions is to ensure that your data is clean and that you are following all the basics of good design. Just after users are able to enter and trust the data in the system, they will begin to want to report on it, even if the data never ends up on paper. In those articles I continue on talking about the rest of the steps for creating an analytical system, but we'll stop at the reporting system in this overview. In fact, for many applications, intelligent report design can be used when many users suspect they need a Business Intelligence system.

So how do you design a reporting system into an application? You start with the data itself. Take a look at a typical table layout for most applications. Within those structures you'll see data that changes often, and other data that remains static. That's often the first place you can start. By breaking out those fields, you can handle the change-schedule of data that you will later want to report on. By separating the static from the rapidly changing data, you gain quite a few advantages.

The first advantage is that you automatically deal with many locking issues. Since the data doesn't change very often in some fields, putting them in another table keeps the table locks away from the tables where you put data that does change quickly.

Another advantage is that you can design better indexes. Indexes on quickly changing data take longer to update. By placing data that doesn't change as often in another table, you can index it more heavily (only where needed, of course) and make the indexes you do have to have on the quickly changing data smaller.

The final advantage in splitting the data this way is that you can handle imports and exports better. This brings up the next strategy for report data design.

At the very outset of the system design, you should think about making a copy of any data useful for reporting. I'll discuss a few distribution methods in a moment, but the important part is to think about the reporting data as a separate entity from the On-Line Transaction Processing (OLTP), even if they share data. OLTP systems are concerned with getting data in to the system – reporting is concerned with getting the data out. One is INSERT-heavy, the other is SELECT-heavy. This affects components all the way down to the disk controller. If you keep the tables and data apart, you can optimize those components.

This has coding implications as well. The end-application, whether it is in C# or a web page, should be designed so that the reporting function makes its own data connection, even if that is the same server as the OLTP function. Even better is to make that a configurable option, so that the administrator or even the user can set the server used for reports.

Selecting the Layout

Even if the system is small, you should think about the layout of the components in a logical rather than a physical layout. What that means is that you should determine the requirements, access methods and properties of the data for each component (data entry, data manipulation, and reporting) separately during your design.

Your system should be architected in such a way so that any time you need to you can add a reporting server. That server might be small at first, but by treating the report function separately, you should be able to move the reporting function from one system to another with little downtime.

From the architecture side, the most basic step is to carefully evaluate your data layout and separate it into tables that can be stored on separate Filegroups. That's the first method you can use to optimize your reporting onto separate devices. You instantly gain the ability to use separate drives for reporting, which frees up the drives you are using for the OLTP activity to work unencumbered by heavy reporting loads. Don't forget to separate at least the tempdb database, log files, data files and indexes onto separate physical drives as well.

The next level of architecture design has a great deal to do with how you plan to move the data from the OLTP system to the reporting system. Again, this is something you should plan for even when the system is small. You don't have to implement these features, but they should be part of the design documentation, so that the poor soul that has to extend your design has the ability to do that.

Programmatic Methods

Once the application grows and you begin to notice load and locking issues on your primary OTLP system, the first method you can use to offload the reporting function is to create a program, stored procedure, CLR or a batch job that runs periodically to copy data from the proper OLTP tables to a reporting server or even just another instance or database on the same server.

The advantage of this process is that can be inexpensive to implement. You don't have to buy any other hardware, and you can "starve" the other instance or database for resources so that reporting slows down, but the OLTP activity doesn't. You can use this method to transfer data to another server entirely, which mitigates any issues of staying on the same server hardware.

There are numerous disadvantages with this method, however. Since it's a manually-designed process, there are error points that are bound to pop up. You will need to document the process heavily so that everyone understands what is happening.

It's also a manual process to change. When the needs for reporting grow or extend, you'll possibly have to recode your entire process to accommodate the change, which might possibly cause yet another set of errors.

If you don't buy more hardware, you're still using the same resources for an increased load. Something has to give, and the tradeoff is that either the reporting has to be slower or the OLTP has to be slower. Nothing is free.

Finally, if you're using a timed or batched method, the data might be "stale", or old. Most users believe that they need the most current data for a report, but this isn't always true. Make sure you press them on this point – could the data be three hours old or even longer? If they say "no", then find out exactly which elements of the data has to be current, and batch that more often. You can often design a report on even the most fragmented table to be very responsive if you slice it up properly.

Triggers

Let's assume that the users tell you that the data they report on must be exactly the same as what went into the OLTP system only seconds before. In this case you can use a trigger on the columns of data that make the report.

There are some heavy concerns here. You should never put a lot of triggers on the database, because if you do you'll cause the same kind of load and locking issues you're trying to solve. Triggers also make errors hard to track down, especially if you nest them more than one level deep.

You can use triggers effectively, however. If you trigger only the last part of the process, perhaps when the user clicks the "submit" button, you can make a batch-type process. The big issue here is the latency of the data. You want to have enough currency on the data so that the users will use it, but not so current as to negatively impact the OLTP system. Most of the time, getting the data in the system is more important than getting it out.

Replication

A more painless solution (most of the time anyway) is to use SQL Server's Replication feature, built in to most every edition of SQL Server 2000 and 2005. In fact, even the MSDE (2000) and Express (2005) free editions can act as a subscriber to replication, and in many cases they make a great solution for a small reporting server.

I've covered replication in another tutorial, and you can read more about how to set it up here.

For reporting, I normally set up a one-way, read-only transaction replication. On the reporting server side, I include any tables that store user-preferences and so on that need to be written to from the reporting component. That way when users run reports not only do they not take locks on the current OLTP data, but they also don't write at all to the OLTP system.

Once again, the application needs to be flexible enough to point to various servers for the reporting feature.

Database Mirroring

Under certain circumstances, you need to be able to report off of most or even all of an OLTP system's data. In this case, you can set up a read-only copy of a copy of the database using SQL Server 2005's Database Mirroring feature. I've covered Database Mirroring in more depth here.

Even when you use this approach, the data is always read-only, so I use a second database and use three-part names in views and stored procedures. Here's a simple scenario of that process:

Let's assume I have database D1 that has a lot of data I want to report on. I set up the database to be mirrored, and then I create other tables on another database called D2 on the second server. So now on the other server I have a read-only database called D1, and a read-write database called D2. Assuming I need the "name" field from table T2 on database D2, and the "Status" field from table T1 on database D1, I just make a view that references the mirrored database. Since the databases are on the same server, there are almost no performance hits.

Selecting the Hardware

If you've designed your system properly, the hardware is a tuning feature. As I mentioned earlier, one of the most important components is the storage subsystem. You should lay out that storage properly, as I've described in this tutorial.

Another important component in a reporting system is memory. The more the better, because the query plans are reused in memory, and reporting is all about queries.

I have more advice about configuration here, and more about performance tuning here.

Don't Forget About the Network

Finally, you need to consider the network. Breaking out the reporting server is great, but you're still sending the same amount of data across the wire. Even worse, because your reporting is more efficient, users are liable to put more traffic on the wire. In addition, it may have been the network that was the reporting bottleneck to begin with.

You should always make sure your reporting system is on a different network segment if possible from your OLTP and maintenance traffic.

Informit Articles and Sample Chapters

If you're using Reporting Services on SQL Server 2000, check out this article.

Online Resources

Need to do some more tuning on your Reporting Services system? Check this article out.