Defragmenting the Windows Registry
Sooner or later, nearly anybody who’s worked with Windows for some length of time will find him- or herself up against that mysterious and forbidding repository of configuration data, settings, preferences, and whatnot known as “the registry.” Windows newcomers may be thrilled or scared off by warnings about the irreparable harm they might do to their systems by messing about with the registry. As I first got to know this OS in the 3.1, 95, and 98 Windows days, I often thought the registry should have the same sign posted over the entrance that Dante puts over the gate of Hell: “Abandon All Hope, Ye Who Enter Here.”
But in reality, while the registry is complex and often grows to huge proportions (typical values for XP run from 20-40MB, where values for Vista in the 60-80MB range are fairly common), it’s not something a serious Windows user can avoid for long. Whether you want to improve network performance, re-activate the hidden Administrator account, or change default folder assignments, you will occasionally find it worthwhile to dive in and make some changes to the registry.
Over Time, a Registry Picks Up Interesting Artifacts
Interestingly, while the registry represents a static snapshot for all kinds of Windows information on disk, at runtime it’s also a dynamic and vital source of information about hardware, drivers, devices, and the software your Windows system runs. Over time, in fact, the registry can pick up considerable detritus, either from devices installed and then removed and no longer used, or from software installed and incompletely or incorrectly uninstalled.
Registry data is stored in special file structures called hives, which reflect a hierarchical structure of stored keys and values. On disk, a hive is divided up into allocation units called blocks, much like the way a hard disk gets divided into groups of sectors called clusters by the file system that uses it. In the registry, block size is set to 4,096 bytes (4KB). As new data is added to a hive, the hive expands in 4KB increments.
Inside a hive file, data stored as part of the hive is organized into containers called cells. A cell might contain a key, a value, a security descriptor (a.k.a. SID) or a list of subkeys or key values. The first element in each cell is a field that identifies its datatype, which is then followed by one or more values of that type. As a cell joins any hive, the hive expands to contain that cell inside an allocation unit known as a bin, where the bin represents the size of the cell plus whatever additional room is needed to round up to the next block boundary (some multiple of 4KB, in other words). Any vacant space after the cell up to that boundary is free space that the registry configuration manager can allocate for other cells as needs dictate. Thus, any given block in a hive file consists of a collection of blocks, each of which contains a bin, with one or more cells inside the bin, along with empty space among the cells that make up its contents.
To make things more interesting still, when you delete a registry key or value, the various cells associated with that item are unlinked from the various lists that define accessible registry entries. And the space those cells occupied get linked into a list of empty space that can be re-allocated upon demand. This turns the registry files into a palimpsest, where specific file locations in the file may or may not be used, depending on what kinds of new data gets added to the registry over time.
In their classic book Inside Microsoft Windows, pp. 226-228, authors Mark Russinovich and David Solomon describe this situation as follows:
“When the system adds and deletes cells in a hive, the hive can contain empty bins interspersed with active bins. This situation is similar to disk defragmentation, which occurs when the system creates and deletes files on the disk.”
This section goes on to observe that “the configuration manager never tries to compact a registry hive,” noting further that specific Win32 API functions enable a registry to be compacted as a result of their operation.