Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- BI Explained
- Developing a Data Dictionary
- BI Security
- Gathering BI Requirements
- Source System Extracts and Transforms
- ETL Mechanisms
- Business Intelligence Landscapes
- Business Intelligence Layouts and the Build or Buy Decision
- A Single Version of the Truth
- The Operational Data Store (ODS)
- Data Marts – Combining and Transforming Data
- Designing Data Elements
- The Enterprise Data Warehouse — Aggregations and the Star Schema
- On-Line Analytical Processing (OLAP)
- Data Mining
- Key Performance Indicators
- BI Presentation - Client Tools
- BI Presentation - Portals
- Implementing ETL - Introduction to SQL Server 2005 Integration Services
- Building a Business Intelligence Solution, Part 1
- Building a Business Intelligence Solution, Part 2
- Building a Business Intelligence Solution, Part 3
- Tips and Troubleshooting
- Additional Resources
Building a Business Intelligence Solution, Part 2
Last updated Mar 28, 2003.
In a previous tutorial, I explained the general process I follow to implement a Business Intelligence project. Starting in this series of tutorials, I’ll implement that process using a need I now have for some intelligence data. If you’re just joining me now, start out with this article first, since you’ll need that background for this tutorial. You’ve got a front-seat to my experiment as I work through it.
SQL Server 2008 introduces a new feature called the Management Data Warehouse. This feature uses yet another new feature, called the Data Collector, to store performance data from various systems in a single location. Together with a new set of reports, these features combine to show you how your systems are performing, and how you can correct many of them. I’m using the data store for that feature for my “source”.
As I mentioned in the last article, I’m in a bit of a circular pattern here. The first thing you normally want to do when you create a BI solution is to define the type of answers you want to be able to find. To do that in this situation, I need to know what kind of data I can use to find those answers. After all, I can’t report using data I don’t have. so in this case I actually need to evaluate the source data before I state what I want to learn from it. I’m using the standard Data Collectors in the Management Data Warehouse feature, and as I explained in the last tutorial I do know the data structures.
Since the Management Data Warehouse (MDW) comes with reports over the data I’ve collected, I have a leg up already on what kind of information I can see, and even a pretty good idea of how to see it. So that’s where I’ll start — I’ll dig in to the reports I already have, and then work backwards from that to find out the data they use. From there, I’ll know what level of detail I have and I can extrapolate from that the kind of rollups to do.
This process is one of the methods you can use to flesh out your own BI implementation. This is a “bottoms up” approach, since I have a set of pre-defined data that I understand well. In other cases, you might already know the questions the business wants to ask, and you’ll have to find or start collecting that data.
So step one is to evaluate the reports I have in the MDW (see the article referenced in the Note above for a quick description of those reports). The first MDW report I open is called “Server Activity”:
This looks promising. I can see CPU, Memory, I/O and Network data, along with the data that shows the waits the users are experiencing. It also has data that shows user activity. All that information is interesting, but not necessarily in a BI sense. For instance, is it important to know a particular wait occurred a year and a half ago? Probably not. Is it important to know about the network usage? Possibly. But what strikes me as very interesting is the part about the Disk I/O — if I could find out more about my drive subsystem, I can track growth, trend usage, and incorporate the data into my decision making about SAN purchases and more. I think I’ve hit on something I want to investigate further.
I click on the Disk I/O graph and get the following information:
Perfect — this is just the kind of thing I want. I’ll scroll down a little to see what the bottom part of the report shows:
Hmm. This is interesting as well, but since it shows pretty detailed information on the specific applications that are generating reads and writes, that isn’t as useful to me in the long run. That’s more of a short-term thing, at least in my case.
Now I’ll return to the main report overview for the MDW system, and select the “Disk Usage” report:
This report is exactly what I’m looking for. It shows not only the size of the database files, but the earlier sizes and even an average growth number. In fact, if I just kept this data, this would be one of the reports I would use. As it is, this data will “roll forward” every two weeks (that’s the retention period I have set) so I’ll use this report for a model of what I want.
Now on to the data that this report is using. You’ll have the same process to follow on your own systems — you can locate the reports or even application screens that show the data you want, and then find out what those screens and reports are using. There are a lot of ways to do that, but in this case it’s very simple.
Recall from the last tutorial in this series that I already have a full outline of the schema and what each table in the database does. That’s a rare privilege, so I’ll take full advantage of it and start with another tool I use quite often — the Database Diagrams in SQL Server Management Studio. I add four tables to the tool, and it automatically draws the relationships between them:
The reason this tool is so useful to me is that it shows the tables, their joins, and all of the columns I need.
Based on this diagram, I begin to work with the data in the MDW with two sets of queries that will show me the database read and write patterns, and the size of the data. The first query uses three tables that show me the read and write patterns, along with the “stall” information, which is the time that users spend waiting on the disk subsystem:
USE MDW; GO SELECT c.instance_name , a.database_name , a.logical_disk , a.logical_file_name , a.type_desc , a.num_of_reads , a.num_of_writes , a.num_of_bytes_read , a.num_of_bytes_written , a.io_stall_read_ms , a.io_stall_write_ms , a.collection_time FROM snapshots.io_virtual_file_stats a INNER JOIN core.snapshots_internal b ON a.snapshot_id = b.snapshot_id INNER JOIN core.source_info_internal c ON b.source_id = c.source_id ORDER BY c.instance_name, a.database_name, a.type_desc; GO
This information is very useful over the long term, since I can use it to show which databases are more read-oriented and which are more write-intensive. Why is that useful? Well, it allows me to pick the drive types I can use for a particular application — certain RAID levels do better on reads, others on writes. Since there is a cost associated with that, I want to put the right data in the right place. That also saves time, and power. Hey — green computing with a little BI!
The next query shows the file sizes and growth. I had to investigate the source of this data a little further, since I wasn’t sure of the size numbers the table was using. With a little experimentation, I found that the number was in pages, which in SQL Server is 8K. Then to get the size in megabytes, I divided that by 1024:
SELECT f.instance_name , d.database_name , (d.dbsize*8)/1024 , (d.logsize*8)/1024 , d.ftsize , d.pages , d.reservedpages , d.collection_time FROM snapshots.disk_usage d INNER JOIN core.snapshots_internal e ON d.snapshot_id = e.snapshot_id INNER JOIN core.source_info_internal f ON f.source_id = e.source_id ORDER BY d.collection_time; GO
To find all that out, I just navigated down to the Management node of the server that is being monitored and double-clicked the Data Collectors that are feeding the Server Activity and Disk Usage reports. There in the details pane I saw the SELECT statements that are used for the reports. I’m not interested in the specific statements that collect the data, just how it is creating that data. That’s how I found out where they got the data from, and then I could do the number translations.
With all this information, I’m ready for the next step: creating my tables that will hold all this data for the analysis. I’ll cover that process next week.
InformIT Articles and Sample Chapters
There’s a lot more on the MDW feature in The SQL Server 2008 Management Data Warehouse and Data Collector elsewhere in this Reference Guide.
Books and eBooks
Before we get through with this project, you’ll need a good background on SQL Server Integration Services. I’ve got the reference for that right here: Microsoft SQL Server 2008 Integration Services Unleashed. (Also available as a downloadable eBook and in Safari Books Online)
To make sure I give credit where it is due, here is the reference from Books Online that I used in this tutorial.