Home > Articles > Operating Systems, Server > Microsoft Servers

Windows Server Reference Guide

Hosted by

Two More Influential Kernel Changes

Last updated Sep 26, 2003.

Last year, when Microsoft unveiled many of the changes it would be making to Windows Vista, it tossed around a concept called Mandatory Integrity Control. Sometimes Microsoft's initiatives come as a result of its product developments instead of acting as the inspiration for them. We haven't heard much about MIC as a phrase, though the technologies it was created to encompass in marketing brochures live on.

#5: Address Space Load Randomization

Microsoft did not invent the concept of Address Space Load Randomization (ASLR), nor even the term. So now that we've made the perfunctory concession to Linux proponents, let's discuss how this works in Windows Server 2008.

Malware, almost by definition, is software that declines the protection of being loaded by system services. The exceptions are items that exploit some overlooked behavior, that use system services to load into memory, and can still wreak havoc. But typically, malware has to use stealth to make its way into memory, and continue to use stealth to communicate with the rest of the system. So it can't be loaded normally, which is why it takes advantage of such opportune things as buffer overruns to slip code into memory, to be executed later.

Because this type of malware is unregistered, it can't contact other running services in the normal way. So the only way it can execute code outside of itself is by addressing it directly, by its location in memory. Which means it has to know where that code is.

The reason malware has been able to know where Windows services reside in memory is because they're always loaded into the same addresses on the memory map. So the purpose of ASLR is to render it impossible for malware to "know" where system processes are in memory...because other processes will address them indirectly anyway.

Beginning with Vista and now with WS2K8, system processes may be loaded into any one of 256 base addresses, plus or minus 16 megabytes. Frankly, the random distribution algorithm doesn't need to be sophisticated; all it needs to do is make it reasonably difficult for malware to have to deploy some kind of detection mechanism or heuristics to determine system processes' locations. One thing novelists and Hollywood screenplay writers fail to note about malware authors is that they're typically not very smart; they wait for real engineers to discover the deficiencies in software (e.g., system processes always loading into the same addresses) and then exploit them like vultures exploiting fresh road kill. They're not going to take the trouble to write sophisticated algorithms, though they may wait instead for some miracle whereupon someone will have written one for them.

#4: Windows Hardware Error Architecture

Administrators who first learned of the existence of Windows Hardware Error Architecture (WHEA) from the 2007 WinHEC conference in Los Angeles snickered openly at the idea that Microsoft had re-invented the error. In fact, in terms of error codes generated by attached hardware, Microsoft had actually never "invented" it until now. There had actually been no standard protocol for hardware to register problems with the operating system, sometimes leaving it to the OS to detect trouble with third-party drivers that may have been behaving erratically.

This has actually been one of the principal problems with Windows on desktop machines, and the actual cause of users' perceptions that new and unusual hardware seem to drag down the system.

If you're already familiar with Event Tracing (ETW)—which handles system events for the entire OS—this is the signaling mechanism WHEA uses. But in order to effectively put expansion and bus-connected hardware into the error loop, WHEA has to create a low-level signal line. By "low-level" in this case, we're referring to a hardware interrupt, the "stop cable" that was created back in the early PC days for an expansion card to tug on when it needs the attention of the CPU. These days, an interrupt (or more likely, part of one) is assigned to a Low-Level Hardware Error Handler (LLHEH). This new semi-acronym is programmed to trap errors at this level—on an I/O bus, for instance, within its bus driver—to generate error messages that are passed on to the kernel.

The kernel is set to listen for incoming signals from these error event handlers. These messages are not the ETW events; rather, even though the WHEA error events are now standardized, the WHEA mechanism in the kernel generates the ETW events for its own logs, as well as for the admin. This is because hardware shouldn't necessarily expect the operating system to always correct a problem on its behalf, or assume that when a problem does occur—despite temptation—the OS must be at fault. Maybe the hardware itself can fix it, maybe it can't, or maybe it's trying to fix it and hasn't succeeded yet. In any of these three cases, the expansion hardware tugs on the interrupt and then reports on the error to the kernel. Inside the kernel, WHEA creates an ETW event that alerts the rest of the operating system as to the problem—an alert which should eventually make its way to the admin.

Imagine an application that can stop trying to flush its contents to disk whenever it's been given an ETW alert that the disk seems unusually preoccupied. Now that the alert mechanism has been standardized, it can be up to developers to enable their applications to adjust their own behaviors whenever it's been verified that the system is behaving erratically.

As Microsoft technical fellow Mark Russinovich told a crowd at WinHEC in May 2007, "[WHEA] architecture unifies all of these disparate error sources under a common umbrella. If you've got an error device that's generating a specific format of errors, you write a WHEA plug-in, [which will connect to] the WHEA architecture and deliver all the events in the same, common format. So all events from these different devices, if they come in through WHEA, are all in the same format, they're all discoverable in the same way, and that makes it possible for someone to write a management tool that can go and discover the error sources on a system, and report and monitor those errors."

Online Resources