- A Procedural Approach to Systems Monitoring
- Processing and Overhead
- Network Location and Dependencies
- Security
- Silence Is Golden
- Watching Ports Versus Watching Applications
- Who's Watching the Watchers?
Watching Ports Versus Watching Applications
In the "Processing and Overhead" section, earlier in the chapter, we briefly discussed redundant plugins that monitored a Web server. One plugin simply connected to port 80 on the Web server, while the other attempted to login to the Web site hosted by the server. The latter plugin is an example of what is increasingly being referred to as End to End (E2E) Monitoring, which makes use of the monitored services in the same way a user might. Instead of monitoring port 25 on a mail server, the E2E approach would be to send an email through the system. Instead of monitoring the processes required for CIFS, an E2E plugin would attempt to mount a shared drive, and so on.
While introducing more overhead individually, E2E plugins can actually lighten the load when used to replace several of their conventional counterparts. A set of plugins that monitors a Web application by checking the Web ports, database services, and application server availability might be replaced by a single plugin that logs into the Web site and makes a query. E2E plugins tend to be "smarter." That is, they catch more problems by virtue of detecting the outcome of an attempted use of a service, rather than watching single points of likely failure. For example, an E2E plugin that parses the content of a Web site can find and alert on a permissions problem, where a simple port watcher cannot.
Sometimes that's a good thing and sometimes it isn't. What E2E gains in rate of detection, it loses in resolution. What I mean by that is, with E2E, you often know that there is a problem but not where the problem actually resides, which can be bad when the problem is actually in a completely unrelated system. For example, an E2E plugin that watches an email system can detect failure and send notifications in the event of a DNS outage, because the mail servers cannot perform MX lookups and, therefore, cannot send mail. This makes E2E plugins susceptible to what some may consider false alarms, so they should be used sparingly.
A problem in some unrelated infrastructure, which affects a system responsible for transferring funds, is something bank management needs to know about, regardless of the root cause. E2E is great at catching failures in unexpected places and can be a real lifesaver when used on systems for which problem detection is absolutely critical.
Adoption of E2E is slow among the commercial monitoring systems, because it's difficult to predict what customers' needs are, which makes it hard to write agent software. On the other hand, Nagios excels at this sort of application-layer monitoring because it makes no assumptions about how you want to monitor stuff, so extending Nagios' functionality is usually trivial. More on plugins and how they work is in Chapter 2, "Theory of Operations."