Identifying Control Nodes
Now that we have a new way of approaching the problem using closed-loop process control, all we have to do is identify those parts of our network that can assist with the basic control modes associated with proportional, integral, and derivative controls.
Map Technology to Control Nodes
A control node is a place where we can enforce a condition or extract data for the purposes of managing the process. In our networks, we have multiple devices that we can easily consider control nodes, including the following:
- Switches
- Routers
- VPN gateways
- DHCP servers
These are great examples of control nodes because they all have the capability to decide what happens to the traffic that passes through them. In addition, they all can report data that enables us to make other decisions in support of either derivative or integral control functions.
From a basic security perspective, we also have the following:
- Firewalls
- IDSs (intrusion detection systems)
- IPSs (intrusion prevention systems)
- AV systems (antivirus systems)
Map Control Nodes to Control Modes
When we consider their roles in our PID-based solution, we can see that most of these systems, with a few exceptions, fall under the category of derivative controls. Their purpose is to help us understand just how fast things are changing and to give us notice that we might have to deal with an overshoot of our expected status quo. I say "overshoot" because it's not often that our systems notify us that nothing is happening.
As mentioned previously, we can use some log information to provide an integration function. Three failed login attempts and the endpoint is locked out for a period of ten minutes is a good example of this function.
The exceptions I was referring to earlier are firewalls and VPN concentrators. Firewalls and VPN concentrators can also function as proportional controls if their operation is tied to some action such as limiting traffic loads rather than the simple bimodal yes or no. However, some people are not comfortable with the idea that an automated system can change the configuration of the network. Failures have occurred, and money has been lost, so now there is usually a human in the loop.
In Table 3-1, you can see how the different types of technology map to the four control modes. Devices can be classified as proportional, derivative, or integral. Some devices are simple bimodal on or off and are called bang-bang controls.
Table 3-1. Devices Mapped to Control Modes
Device |
Function |
Proportional |
Integral |
Derivative |
Bang-Bang |
Firewall |
Perimeter control |
Not alone |
No |
No |
Yes |
HIDS |
Intrusion trigger |
No |
No |
Yes |
Yes |
NIDS |
IRP trigger |
No |
No |
Yes |
Yes |
HIPS |
Attack prevention |
No |
No |
No |
Yes |
NIPS |
Network protection |
No |
No |
No |
Yes |
SAV |
Server AV |
No |
No |
No |
Yes |
EAV |
Endpoint AV |
No |
No |
No |
Yes |
Router |
Traffic control |
No |
No |
No |
Yes |
Switch |
Traffic control |
No |
No |
No |
Yes |
VPN |
Privacy enforcement |
Not alone |
No |
No |
Yes |
DHCP |
Network provisioning |
No |
No |
No |
Yes |
Probes |
Vulnerability assessment |
No |
Yes |
No |
No |
Logs |
Due diligence |
No |
Yes |
Yes |
Yes |
Alerts |
IRP trigger |
No |
No |
Yes |
Yes |
Correlation (SIM) |
Policy management |
No |
Yes |
Yes |
No |
Now, just to confuse things a little, all these systems also function as bang-bang controls, because they make binary decisions about what to do with traffic. Either it passes traffic or it doesn't. I think it's this dual-mode operation has masked their possible contribution as control systems.
Quantify Time Constants
A time constant is just the amount of time it takes to complete any specific part of the process. If it takes one minute to fill the toilet bowl prior to a reflush, the time constant for that process is one minute. If you try to recycle the process before the time constant completes, you wind up with less than satisfactory results. To have an effective process control system, you must understand these time constants; otherwise, you risk creating a system that oscillates wildly around the set-point.
The hard part is identifying them in your process and accurately measuring them. This is part of what the metrics people are trying to do. The problem is that each enterprise has a different set of requirements and dependencies, and therefore the same process in a different environment has different time constants associated with it.
Let's look at the incident response cycle again. Every enterprise that has a decent security program has an incident response plan. It's triggered when something evil happens and an alert is sent "somewhere." Maybe the IDS has seen a suspicious packet stream and has sent out an alert, or perhaps the help desk has too many trouble tickets complaining of slow systems. In many cases, this alert is sent to the security group. Someone with a pager gets the alert and either runs to a computer or, if that person is off-site, makes a phone call. That call can be to someone close to the system or it can be to the data center. After the call has been made, the process of evaluating the event kicks into gear, and the decision process takes over:
- Is it a false positive?
- Is it a truly evil event?
- Is it internal or external?
- Are we hemorrhaging data?
- Can we recover?
- Do we need to call law enforcement?
- How much time has elapsed since the initial alert?
For most organizations, this time constant will probably be on the order of minutes.
By deconstructing the processes, you can discover how long each individual part of it takes, and thus identify where you should put your effort to improve it. Each breakpoint in the process is an opportunity to gather some information about the state of the process.
We can move from the alerting entity to the notification channel, through the analysis process, and into the resolution process, tracking the time it takes for each. For example, we examine our analysis portion of our hypothetical process and discover that the notification process takes more than 15 minutes. Clearly, 15 minutes is not a reasonable time to be notified that a critical condition exists on your network.
Another benefit of this effort is to identify exactly where the various control nodes in your network are, be they technological or human based. You now have a list that you can use or pass on to someone else. What you've done is move from a talent-based response to a role-based response that doesn't pigeonhole you as a resource. You'll also discover that the human-based process components are the ones with the long time constants and, by the way, the ones with the lowest level of repeatability and reliability.
Control Paths and the Business Processes
You might be wondering what exactly a control path is. A control path is the path that the control and feedback signals take to change the set-point of the system. I believe that you need to map your control paths to the business processes to understand where the cracks in the security process are. Understanding how control signals are generated and understanding where they go, and possibly don't go, can prove critical to your success. This can also help you understand where a little bit of automation can make your life a lot easier (and identify some important metrics).
Let's start by looking at some of the information that passes through a control path. We'll call this our control signal. Perhaps that will help us understand how the business process affects our control process. Because we're talking about security, let's define a control signal as anything that is security relevant, such as the following:
- Failed login attempts
- Firewall rule violations
- IDS alerts
- New user requests
- User termination
- New software requests
- New protocol requests
- Software decommissioning
- Network access requests
Next, we have to ask ourselves how much of this information is made available to us by our control nodes as they were defined in the previous section, and how much is made available to us by the business process. As you can see, things such as firewall rule violations and IDS alerts are more like spam, because they're "made available" to us all the time in large numbers. However, the rest of them are made available to us through a business process that may or may not include the security group in the notification path.
The other sad part of this story is that all these processes are open loop—that is, there is no notification that they were completed, denied, or simply disregarded. How do you manage network access requests? Does a verification process occur prior to the decision to allow access? In most cases, the answer is yes, but only for that particular moment. After access has been granted, there is little follow-up to ensure that the system remains compliant with policy, so our control process breaks at the point where we hand the user his or her system.
Another good example starts with a question: Where do login errors go, and how are they processed? A large number of failed logins can indicate that someone is trying to break into your network. If that's the case, the behavior of the network should change in a way that attempts to eliminate those login attempts. In many cases, low-frequency failures are not noticed because they don't trigger the "three failed attempts per hour" rule. This kind of low and slow attack can easily be automated, but is difficult to detect. An interesting metric that you can use as a control signal is the number of failed login attempts compared to the number of successful attempts per user over a longer period of time (for example, a day). You can then compare that number to the preceding day and look for trends.