- A Problem-Solving Pattern
- Step 2: Understand the Environment
- Step 3: List Hypotheses
- Step 4: Prioritize Hypotheses and Narrow Focus
- Step 5: Create a Plan of Attack
- Step 6: Act on Your Plan
- Step 7: Test Results
- Step 8: Apply Results of Testing to Hypotheses
- Step 9: Iterate as Needed
- About this Article
A Problem-Solving Pattern
A pattern is just that: It is not a firm set of rulesit's a set of guidelines. If you follow a troubleshooting method consistently, it will help you to find solutions more easily. You will be able to zero in on the root cause of the issue and quickly resolve it. One nice thing about this pattern is that it is neither Linux- nor TCP/IP-specific. You can apply it to a variety of problemsI make no promises about in-law problems, though.
To try to set this pattern into context, each step of the pattern is described in its own section. Nine steps are involved in the pattern, as shown in Figure 1.
Figure 1 A nine-step problem-solving pattern
Step 1: Clearly Describe the Symptoms
There's no good way to attack a problem until you know what the problem really is. Far too often, system and network administrators hear a rather poor (if not outright misleading) description of the problem. It's then your job to dig in and find out what's really going on.
As you can probably guess, you'll need some interviewing skills to get a clear description of the symptoms from a user. People don't want to hide the truth from you, but they often have predetermined the problem, coloring their perception of the issues involved.
It's a good idea to take notes as you're talking with someone, periodically summarizing the problem description as you go. This can help you spot follow-up questions to ask the user. It can also help jog a user's memory for other tidbits.
Never hesitate to call or email the user back with further questions to clarify the situation. It is certainly better to get all the answers you need up-front, but the reality is that you might not know all the questions that you need to ask until you've gotten your hands dirty working on the problem. If you need more detail, go get it.
Holding your interview at the customer's location also gives you a chance to say, "Show me." This enables you to see what the user is doing and perhaps to identify some more key points about the problem. Sometimes it will also reveal the problem as one of those transient things that just won't show up when you're there to see it.
If you run into a problem that you can't reproduce, you have yet another problem on your handsæwhat to do about it. The best thing is often to set up a monitoring plan with the user. Get all the details that you can, and tell the user to call you back when the problem recurs. Leave the user with a list of questions to try to answer when calling you back. On your end, you should maintain a log so that you can track details about the problem.
There is no good rule to determine when a problem is clearly stated. This is fairly objective. If you think it's clear enough, it probably is. If you're not sure, try to describe the problem to someone else. (It really doesn't matter whether that person understands networking. In fact you could try explaining it to a house plantæit's the process of talking through the problem while describing the symptoms that helps clarify things for you.)
As you're talking with people about the problem, see if there are other hosts with the same symptoms. If people haven't seen this problem, ask them to try to reproduce it. If there isn't anyone else available, try to reproduce it yourself. Knowing whether this problem affects a single host, a local group of hosts, or all the hosts on a network will help you when you hit Step 2.
Some key questions that you should know the answers to are listed here:
What applications or protocols are affected?
What hosts are involved?
What is common between affected hosts?
When did the problem start?
Is this a constant problem?
If the problem is not constant, does it occur at a regular time or interval?