Further Impact on Indexes
All RDMBSs have implemented b-tree indexes. The reason for the proliferation of these binary trees is that a critical mass of PhD theses have proven that this is the best way to go. (Remember that Ted Codd was a PhD and that relational theory, in particular, springs from the "science" part of "computer science.")
The way b-trees work is elegantly simple (see Figure 2). Indexes are a physical structure, meaning that they occupy "formatted" disk areas given to the RDBMScalled blocks or pages, depending on which RDBMS product you're using.
The entry point into the index is a root. The root contains ranges of index values and the address of the block in the next layer, called a branch, for the value being sought. The branch block then points to the bottom layer, referred to as the leaf to keep the tree analogy from becoming a mixed metaphor. The leaf has the index value and the pointer to where the actual table row is located. The index values are in ascending order.
Figure 2 Binary tree index structure.
In a perfect world, a maximum of four blocks must be touched to retrieve a single row: root, branch, leaf, and table. Actually, the average is less than four, as RDBMSs save frequently used blocks in a buffered data-cache memory area. Ask your resident PhD math whiz if you want to know the exact probability.
But we don't live in a perfect world; we've been trying to live in a natural world.
When we use naturally occurring data, we don't have control over the values. Hence, it's possible (even likely) that new rows being inserted will fall into the middle of the domain, not at the end. Imagine that we have a set of employee data, using name as the PK, and we acquire a new company in Ireland. As we add the new employees, we notice that many of the names begin with the letter O. The leaf block where the O's are quickly becomes saturated, and another block is spawned. If this happens too many times, the branch block pointing to it can no longer manage the values, and a new branch level is created. Now, without checking with your math whiz, how many index blocks will have to be touched to locate O'Reilly?
We can fix this problem by re-creating the index, but if that index is huge, this process could take a while. And don't forget about the foreign-key indexes that we built for each table with which this table has a relationship. It starts to add up.