Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Tips and Troubleshooting
Last updated Mar 28, 2003.
Here in the "Tips and Troubleshooting" area I’ll show you how to do things with SQL Server that you might not find in other areas. I’ll also help you with common problems that you’ll face with SQL Server.
The sections in this Guide are designed to allow quick access to what you need. The tutorials and overviews can be read in just a few minutes, and many contain useful scripts and hands-on guides to examples you can follow.
Tips for SQL Server
There is a LOT of documentation for SQL Server, from the official “Books Online” product (which I’ve explained in detail in this article) and hundreds of web sites, including this one. By carefully reading these sources of information, you can find ways to solve almost any software technical problem using SQL Server.
But that’s a lot of reading. SQL Server Books Online has an estimated (as of this writing) 60 thousand printed pages which is why it isn’t printed any more, only electronic form. And finding what you want on any website can be a little daunting.
So to fix that issue, I’ve included this section of the guide where I’ll put two kinds of information one is for when you’re in trouble (more on that in a moment) and the other for things I get asked how to do a lot. For instance, I was asked not long ago about working with Microsoft Excel and SQL Server, and that’s just the kind of thing I’ll write about here.
Now some of these solutions and tips deal with specific conditions, versions, and configurations. Does that mean you can't use one of the tips here unless it exactly fits your situation at the time? No you can actually find use for the information in each of these articles even for those times when your servers aren’t set up the way mine are. If you read and understand the example, and extrapolate that information out to your situation, you'll find that with a little modification the tip can work for you.
You'll notice that there is a place to comment on each article. If you find a tip that didn't work for you, or you modify it to fit a different situation, by all means, post it there. And if you’re looking for a specific tip or troubleshooting step, by all means contact me and let me know. If I hear from enough folks to make it an interesting topic for all of us, I’ll write it up.
Troubleshooting Basics for SQL Server
I’ve mentioned a basic troubleshooting process for databases in a previous tutorial, but there are some steps that are generic to almost any kind of troubleshooting, so I’ll spend a moment on the broad steps you should follow if you have an issue with SQL Server, even before you take a look at the database itself. I normally break this simple methodology into six steps, which I’ll cover in a moment.
You might this is all a bit much for solving a problem. Someone contacts you and says that they can’t connect to the server, you ask a couple of questions, they re-start their computer and everything is fixed. Problem solved. I’ve even had these kinds of situations myself.
But some problems are more complicated than that. They aren’t solved using simple quick fixes, and some even start growing and becoming worse. Then you’re in a fire-fighting mode, with lots of people crowding around your cubicle, you begin to make mistakes, and that’s when the stress gets a lot higher. I’ve also seen those “simple” fixes not last, and the problem just repeats itself later because you really didn’t deal with it, you just postponed it.
So I now follow these steps, even when it takes a little more time, even when it frustrates the users a little. I explain that if I’m allowed to follow a scientific method that takes up more of their time, the payoff is that I’m less likely to have to revisit the issue later.
Step One: Identify the Components in the System
These is a lot easier if the components (hardware, software and configuration) are documented ahead of time, but even if they aren’t, take the time to lay out what you’re looking at.
I don’t mean that whenever someone calls with a simple issue that you have to stop and document the entire application stack, from the configuration of the user’s hardware to each stored procedure in the database, but if you know the issue is on the server (more on that in a moment) then just identify or look up the parts involved in making that component of the system work properly. Then you’ll know how to move on to the next step.
As a simple example, suppose you have an application that runs on the user’s desktop, which has an ODBC connection directly to a SQL Server Instance and Database. A quick list of the components involved here are:
- User’s hardware
- User’s software configuration (operating system, ODBC driver, application)
- Server Hardware
- Server Software configuration
- SQL Server Instance
- SQL Server Database
- SQL Server Database Objects
Of course, each of these components has a lot of settings and so on within themselves, but I’ll keep it simple for this example.
Step Two: Identify the Stop/Start Point of the Issue
With the parts of the application identified, you can now begin to figure out where the issue starts. You can pick a “direction,” meaning starting at the client end or at the server end, and then work through until a feature stops working (client end) or stops working (server end). Sometimes this is quite simple in my example, I’ve received a call that the user’s application is “not working.” That’s a little vague of course, so I begin by asking “Has it ever worked?” If the answer is no, then I move on to other steps. If the answer is yes, then I ask “what has changed?” Of course, the answer to this question is ALWAYS “nothing,” which isn’t a lot of help, but I ask anyway.
The point is, I begin to work from the client or server to find what is working until I find what isn’t. Most of the time I start at the client end, but I might, from experience know to start at the other end. It actually doesn’t really matter all the time which direction you come from, only that you have a plan in place to test the connections and functions from that end towards your target.
In the example here, I would first ask if anyone else is having the issue. If the answer is yes, then I can guess that it is probably something to do with the network, server or SQL Server parts of the application. I would open a sample application on my virtual machine and try to connect myself to verify this.
If the answer is that others (or myself) can get into the application with no trouble, then of course I’m dealing with either the user’s system hardware or software, or their network connection.
If the user can get “partway” through, that is, if the application starts, they log in and so on, but then they can’t run a particular function, then once again I try this from another user to see if there is a configuration or permissions issue. This might be on the server or the client, but at least now I have identified where the issue is which is all I’m after in this step.
Step Three: Locate Logs and Warnings
Users are notorious for not being able to describe a problem thoroughly. That isn’t always their fault, since many times they are unfamiliar with the technology they are using. Most of us have systems we deal with each day (perhaps it’s plumbing or a car for you) that we don’t understand completely. Don’t allow yourself to become frustrated with the user or in any way insult them that will only serve to make them angry, and you won’t get the information you need to fix the issue. Be as polite, professional and helpful as you can, even when they are frustrated. Remember, this is interrupting their day in a big way.
Have the user repeat the steps of causing the problem, and get as much information from them as possible. Have them read the error messages if they got one, carefully, and then repeat that to them. Explain what you’re going to do, and then have them work on something else or at least let them know how long your guess is for finding out what to do next.
From there, check as many things as you can at the location for the problem start. If it is on the user’s workstation or the server, open the Windows Event logs there and begin to scrub those for all kinds of errors not just the ones you think directly affect your application. It might be a driver issue that is causing a network behavior, something that your application doesn’t directly control. Even if it is a networking device, most of those have logs that you can find.
As a tip, you might want to document the application flow (from the first step) and specify the logging and warning locations that each component provides. It’s easier to do that when you aren’t in a crisis.
Step Four: Develop Solution and Backout Strategy
You now know the general area you’re dealing with, and you’ve researched the logs. Hopefully you have an error code, probably you have an error description, and at least you know what is happening. From all this information, you might already know a possible solution. If not, use your search engine of choice to research Books Online first, then other web sites. As a last resort, you can search or post to a question-and-answer site, but be very careful here. Just because someone responds, that doesn’t mean they know the real solution. Only a thorough examination of your particular environment could do something like that.
In any case, get your test system ready to try the solution. Before you do anything else, ask yourself what you will do if it all goes wrong if your solution does not fix the problem, or even makes it worse. Ensure that you have backups, you know how to use them, and that you have tested your restore procedures somewhere. If you don’t have reliable backups, my recommendation is to take one and then test it. If you’re not willing to do that, stop what you are doing and call the product’s technical support. I can’t emphasize this enough. Don’t continue until you know what you will do if you make the situation worse.
Step Five: Test and Implement
Now run the test solution on a system that closely resembles your production environment. It’s only in the most extreme situations that I ever do anything directly on production without testing it first.
When you’re satisfied that the solution will fix the issue, run it on production and let the users back into the pool.
Step Six: Monitor and Document
But you’re not finished yet no matter how trivial the issue, I recommend that you watch the system’s counters and health after you make the change to ensure you haven’t tickled another problem by making that alteration.
Make sure you document the change, what caused it, and why you chose that resolution. That will help you in the future if you have a similar issue, and make sure that someone doesn’t accidentally back out your change by applying a service pack, changing a setting and so on. Over time you’ll develop a great knowledge base on your fixes and it isn’t a bad idea to post the problem and resolution on one of those question-and-answer sites it’s a great way to give back to the community that helps you.
In the articles and tutorials that follow, I’ll explain a few of the more common errors you’ll encounter with SQL Server. As always, read and understand what you’re up against before you try anything in your environment.
InformIT Articles and Sample Chapters
I created a system that you can use to track your servers over time to prevent issues from happening in the first place, and you can also use it to store the documentation you make along the way. It’s called the SQL Server Central Management System.
Books and eBooks
Although Cisco Internetwork Troubleshooting deals with Cisco networks in specific, there are some great general troubleshooting tips here. You can even read a sample chapter on that very topic.