Define Network as Control Problem
Security is about control. If we can't control our environment, we can't give any assurances that we are secure. One day it hit me: Out security problem is really a process control problem.
Allow me to digress for a moment and explain how I reached this conclusion. I had the unique experience of being an electrical engineer while I was also responsible for securing the network. It wasn't uncommon for me to be designing a flight termination system for a missile while I was writing the security procedures for a classified network. (By the way, if you have a sleep-deprivation problem, I highly recommend either "Standard Practices and Procedures for the Classified Network Supporting the Theater High Altitude Area Defense System" or "Range Ordinance Safety Specifications for the White Sands Missile Test Range." The last one is a little hard to get because they're mimeographed copies,6 but if you can get it, it's cheaper than Ambian!)
One of my other tasks was designing and installing a system that was designed to control the world's second largest cryo-vacuum chamber. A special type of computer called a programmable logic controller (PLC) is used to interface with the valves, pumps, switches, and sensors that are needed to simulate a deep space environment in the cryovac chamber. The computer knows what the set-points are for atmospheric pressure and temperature and manages the devices to achieve that goal. Whereas the thermostat in your house is really a bimodal control (it turns the heater on and off), my cryovac system used a proportional process control methodology. The difference is that the only feedback to your home system is via the thermostat, whereas in a process control system numerous feedback paths help to achieve and maintain a specific set-point.
Why do we need these numerous feedback paths? Because there are time constants associated with each control action. For example, when your home thermostat detects that the air temperature is too low, it turns on the heater. The heater does its job of pumping hot air back into the house through the various paths provided by the air ducts. The temperature isn't changed instantly, so the thermostat has to wait for the warm air to reach it. What that means is that by the time the thermostat reaches the correct temperature, the temperature by the heating ducts is actually higher. The rate of change, all things being equal, is fairly constant.
The net result of this bang-bang type of control is that the temperature in the house actually varies around the set-point by a couple of degrees. If you set the thermostat to 68 degrees, the temperature in the rooms typically oscillates around the set-point, as depicted in Figure 3-2. I know that this is going to sound a bit anal, but I have a recording thermometer in my bedroom that records the minimum and maximum temperatures (along with the humidity). With the temperature set in the hallway to 68 degrees, the temperature in the master bedroom, way down the hall, records a minimum of 66 and a maximum of 70 degrees, as shown in Figure 3-2.
Figure 3-2 Typical room temperature variance around the set-point as verified in the author's master bedroom. Notice the time difference between when the heater turns on and when the room temperature begins to rise.
Now let's look at something also close to you—your car. If you're fortunate enough to have a car with an environmental control system, you'll notice that the fan is running all the time. If you look closely, you'll also notice that the air conditioner is running. The reason is that your car uses a proportional sensor that tells the computer what the temperature is and how much difference there is between it and the desired temperature or set-point. Your car actually mixes heat and cold to produce an output air temperature designed to keep the temperature inside the car where you set it. When the sun beats down on the windows, the system mixes in more cold air. When you're scraping snow off the windshield, it mixes in more hot air.
This how they can sell cars with dual-zone environmental controls.
I brought this up at the beginning of the chapter, and I want to discuss one more device in your life that is a great example of proportional control: the toilet in your house. Yes, the toilet in your house has a proportional control mechanism in it. As you can see in Figure 3-3, the proportional control in your toilet is a combination of two valves and a float. One valve, the refill valve, allows water to refill the system at a rate based on the position of the float. The other valve, the control valve, or as it's known at the hardware store, the flapper valve, allows the system to be "activated" and reset.
Figure 3-3 A toilet is a basic proportional control at work. The float controls the level of water in the system within a narrow band. The pressure regulator ensures that the float and valve fill the tank to the same level every time.
When the system has completed its designed task, the reset process kicks in, and this is where the proportional control takes over. This is a fairly critical process. If not enough water is put back into the system, it fails when we try to use it. If there's too much water put into the system, we have another, arguably less desirable, failure mode to deal with. We need the same amount of water each and every time no matter how much water pressure there is. The float connected to the refill valve provides this type of proportional control. As the float rises in the tank, it gradually closes off the opening in the valve, thereby slowing down the rate at which the tank fills until the valve is shut off completely and the tank is full. The system has been successfully reset.
When you activate the system, the water rushes out cleaning the bowl, the flapper valve closes, the float drops, and the water is allowed to refill the tank. The metric for success is simple: You look into the bowl and either know success or hit the lever again. If the system fails, the failure is obvious ... on many levels.
Map Control Modes to Technology
Because we're looking to proportional control technology to help us solve our problem, let's look at the basic components of that solution.
As mentioned in the preceding section, the main component in this process is the proportional control, the central process that acts as the foundation for the system. In our previous examples, we found that this process was a combination of a sensor, such as the thermostat or the float, working in conjunction with an energy source, such as the heater or the water pressure. However, we know that in those systems there is always variation. Sometimes, the doors are open a bit longer, and the heater has to run longer to catch up. The result is that the temperature in the room doesn't stay a constant 72 degrees. It varies around the set-point by a few degrees because there is nothing to tell it how fast the room is cooling down or how fast it's heating up.
To address the basic shortcomings of a proportional-only control, two other "helper" processes make it easier for the proportional control to do its job. These control modes are derivative and integral.
The derivative process controls the rate of change. Using our toilet process in Figure 3-3, let's say that the water pressure doubles in the system. Because the float and valve are designed to work within a fairly narrow band of water pressure, doubling the pressure causes the tank to fill up much faster and to a slightly higher level. This is because the float and valve can't work fast enough to prevent the overflow. So, to control the pressure, we add a pressure regulator to the system to keep the water pressure that the intake valve sees at a normal level or slightly below it. What this means is that it will take a little bit more time to fill the bowl. What this also means is derivative controls lower the response frequency of the system. Instead of 60 flushes per minute, we might only get 50 flushes.
Unfortunately, using proportional and derivative controls means that it's possible to have a stable system that still doesn't hit the set-point, because the resolution on the sensors is not capable of seeing the potential error. Enter integration. The integral process adds the small errors over time to create an accumulated error that forces the system to once again correct itself.
Although a toilet is a great example of a proportional control, being a simple mechanical device it's not a great example of either a derivative or integral control. (After all, I don't know of anyone with a regulator on his or her incoming toilet water.) So, we'll have to go back to our climate control example to explain integration.
Our system can sample the air temperature once every 20 seconds and in doing so discovers that the temperature is 71 degrees rather than 72 degrees. Because our thermostat has only a 2-degree resolution, we need a way to tell the system that we're not really at our set-point if we want to exactly hit 72 degrees. Integration enables us to do this by accumulating our error each time we collect a sample. We add the 1-degree difference to our feedback signal each time we take a sample. Eventually, our feedback signal exceeds our 2-degree threshold, the heater is forced back on, and the process starts all over again.
We can see that it takes all three control modes—proportional, integral, and derivative—to make a functioning PID control system. Derivative and integral functions are there to ensure that the set-point the proportional control works around is accurately achieved and maintained.
Identify Feedback Paths
The reason closed-loop control processes work is because they have identified what kind of feedback they need to close the loop. In the heating example, it's the thermometer in the various rooms. In the toilet example, the feedback path is the float. As the water rises, it pushes the float and slowly closes off the valve.
The lesson here is that feedback can be either electronic or mechanical. We just need to identify what kind of feedback we need in our system. The good news is that an examination of our network reveals that it's just one big potential feedback loop!
Each system that lives on the network produces logs and alerts, and most can exchange management messages. Authentication protocols are designed to provide a feedback loop such that failed attempts are reported as alerts and accounts can be locked out. This is a great example of an integration function because it takes a number of them over time to generate a change in the system.
Another good example of a basic feedback loop can be observed in 802.1x,7 an authentication protocol designed initially for wireless networks. 802.1x works in conjunction with a Remote Authentication Dial In User Service (RADIUS) server and can act as the backbone in a proportional control process because it can act as the valve that meters the amount of risk introduced onto the network.
An 802.1x-enabled network could query each endpoint that makes an attempt to join the network based on the following:
- Endpoint security state
- User authentication
- Resources accessed
A decision can be made to allow privileged access, decline all access, initiate a remediation plan, or allow restricted access. This is a bit more than present 802.1x authentication does, so we discuss how this works a bit later.
Identify Metrics That Influence Other Metrics
You can find some good books on metrics,8 but by using our process control model, we can more accurately identify metrics that have a greater impact on our security. As you might recall from the previous discussion, time constants are associated with the control process. By adding controls, we're essentially adding delay lines. These delay lines can help us by slowing down the spread of fast-moving worms, or they can hurt us by slowing down the remediation process. Without understanding where and what these metrics are, we have no way of planning for their usage or implementation.
If we make the assumption that no endpoint is going to join our network unless it meets a minimum level of trust, and part of that trust is based on the security posture of the endpoint, it stands to reason that one element that we must consider is patch level.
A good metric to examine at this point is as follows:
- How many endpoints need patches?
- How many patch levels are required per endpoint?
- How long does it take to deploy new patches to the enterprise?
- How has this changed since the last time we looked at it?
An answer to these questions would look something like this:
- 546 of 785, or 70% of endpoints require patches9
- 50% require the latest patch (one level down)
- 5% require the latest two patches (two levels down)
- 2 days to approve a deployment
- 45 minutes to deploy to the enterprise
- 6% improvement over last week
Many people would measure the time it takes to load the patch file into the server and push it out to the endpoints, saying that anything else is out of their control. This would only be the tip of the iceberg, however.
Automated patch management systems do help a bit, but how many of the endpoints are truly being updated? Other, "long" time constant questions must be asked:
- How long does the approval process take for the deployment?
- How long does it take to determine just how many endpoints are on the network?
- What percentage of the endpoints meets the requirements for the patch?
- What is the difference in deployment time between desktop endpoints and critical resources?
The difference here is that these questions usually generate long-time constant-based responses because a human has to get into the loop to provide an accurate answer.
Map Business and Technology Paths
This might sound like a no-brainer, but it's a bit more complicated when you dig into it. We've learned to think of technology as complex mechanisms and sophisticated software. However, if you talk to an archeologist, the stone axe is also an example of technology. Ancient technology, yes, but technology nonetheless.
I think this opinion of what "technology" is, is the reason that we ignore a major type of technology that glues our present solutions together: people. When an organization engages in process reengineering, the first thing that they do is look at the relationship of people and how efficiently they exchange information in the quest to accomplish their mission. They ask how well they use the tools that have been afforded them and how many workarounds are in place to "fix" poorly engineered processes. All too often, we're given new technology, but instead of reexamining how we can put this new technology to good use, we just use it to take the place of an older process without understanding how it can make the overall process better.
We do this with our security technology by trying to make it completely transparent. We overlay it on top of our existing processes in the hope that we can get some level of increased protection without disturbing the user community. The problem with that is that it obscures the human element of the security problem to both the practitioners and the users.
To counter this, we must examine our business processes with respect to security so that we can understand where the human paths are with respect to the technology paths. We must also be willing to push for change where needed. Our technology paths, both human and technological, need to be understood if we're going to create a closed-loop process.
We need to be able to identify them and measure them to understand how much of an impact any delay is going to have on our security process. For example, your organization might have an automated patch management system that pushes patches and updates out to thousands of endpoints in a few minutes. Because of this technology, you can stand up in front of the board of directors and tell them that your solution pushes updates to vulnerabilities in minutes! The problem is that in many organizations there's a manual process of evaluating the patch, called regression testing, that can take as long as three months!
I'm not saying that you should eliminate regression testing. What I am saying is that for a process control solution to work, you must embrace the idea that you do have human feedback paths that can dramatically degrade your ability to respond to an attack. Regression testing is a business process that has a huge effect on security.
Another example of business and security intersecting is during the incident response cycle. Many people think of incident response as responding to an intrusion detection system (IDS) alert. What if I call the help desk and claim that I'm the CFO and I want my password changed? This is clearly an indicator that my network may be under attack and that something should be done, but how long will it take for this information to move through the business process of the help desk?
This means that we, as security people, need to understand our company's business processes and instead of saying "no," we need to find ways to say "yes" that encourage the business plan to grow and adapt to the changing business objectives. When new technologies appear, we need to understand how those technologies will impact our security and our ability to compete effectively in the marketplace. How many organizations, because the security group is afraid of it, haven't deployed wireless technology regardless of its demonstrated ability to simplify deployment and reduce associated costs?
Who do you think is going to win in the marketplace when the market gets tough and margins get small? The organization afraid to use technology because their security process can't handle it, or the agile group that understands that security and business processes can work together?
Can We Build a Better Model?
I believe the answer to this question is a resounding yes. I think that most of what we need is already here; we just need to connect it a little better than we have in the past.
The answer lies in identifying how we allow risk to be introduced into our networks and setting a low limit that prevents endpoints that don't meet our criteria from joining. That instantly begs the question of how to define risk. Well, I think that's the wrong question to ask. I think we need to ask this: What is an acceptable risk? When I go car shopping, I know what I don't want. I don't want a car that's so old that it doesn't have air bags and antilock brakes. I don't want a car that has broken windows and bald tires. I don't want a car that has a torn-up interior or rusty fenders.
I know that I can have a mechanic go over the car with a fine-tooth comb, but that won't eliminate the possibility of a flat tire or an exploding engine later on. I've reduced my risk by examining the car prior to buying it, but I still run the risk that something could happen later.
What I have done by taking the effort to examine the car is begin the process of engendering trust. By setting a minimum level of capability, I have enabled myself to trust the system—in this case, my car—to behave in a manner acceptable to me. I believe that this is also possible on our networks. By setting a minimum level of capability, we can set a minimum level of trust in the systems that join our network.