- 2.1 The Big Picture
- 2.2 Physical Aggregation
- 2.3 Logical/Physical Coherence
- 2.4 Logical and Physical Name Cohesion
2.2 Physical Aggregation
In the preceding chapters, we talked about the atomic unit of physical design, which we call a component, and also the physical hierarchy created by their (acyclic) physical dependencies. Scalability demands hierarchy, and the hierarchy imposed by physical dependency, while of critical importance, is only one architectural aspect of large-scale physical design. Separately, we must also consider how related components can be packaged into larger cohesive physical units. We refer to this other hierarchical dimension of component-based design as physical aggregation.
2.2.1 General Definition of Physical Aggregate
The purpose of aggregation is to bring together logical content (in the form of C++ source code) as a cohesive physical entity that can be treated architecturally as an atomic unit. At one end of the physical-aggregation spectrum lies the component. Each individual component aggregates logical content. Figure 2-4 illustrates schematically a collection of 15 components having 5 separate levels of physical dependency that together might represent a hierarchically reusable subsystem.
 
FIGURE 2-4: Logical content aggregated within 15 individual components
2.2.2 Small End of Physical-Aggregation Spectrum
By design, each component embodies a limited amount of code — typically only a few hundred to a thousand lines of source3 (excluding comments and the component’s associated test driver). A single component is therefore too fine-grained (section 0.4) to fully represent most nontrivial architectural subsystems and patterns.4 For example, given a protocol (section 1.7.5) for, say, an (abstract) memory allocator (see Volume II, section 4.10), we might want to provide several distinct components defining various concrete implementations, each tailored to address a different specific behavioral and performance need.5 Taken as a whole, these components naturally represent a larger cohesive architectural entity, as illustrated in Figure 2-5. To capture these and other cohesive relationships among logically related components — assuming they do not have substantially disparate physical dependencies — we might choose to colocate them within a larger physical unit (see sections 2.8, 2.9, and 3.3). In so doing, we can facilitate both the discovery and management of our library software.
 
FIGURE 2-5: Suite of logically similar yet independent components
2.2.3 Large End of Physical-Aggregation Spectrum
At the other end of the physical-aggregation spectrum is the unit of release (UOR), which represents a physically (and usually also logically) cohesive collection of software (source code) that is designed to be deployed and consumed in an all-or-nothing fashion. Each UOR typically comprises multiple separate smaller physical aggregates, bringing together vastly more source code than would occur in any individual component. Even so, we should expect our library software will in time grow to be far too large to belong to any one UOR. Hence, from an enterprise-wide planning perspective, we must be prepared to accommodate the many UORs that are likely to appear at the top level of our inventory of library source code.
2.2.4 Conceptual Atomicity of Aggregates
Even though a UOR may aggregate otherwise physically independent entities, it should nonetheless always be treated, for design purposes, as atomic.6 Like a component (and every physical aggregate), the granularity with which the contents of a UOR are incorporated into a dependent program will depend on organizational, platform-specific, and deployment details, none of which can be relied upon at design time. Hence, we must assume that any use of a UOR could well result in incorporating all of it — and everything it depends on — into our final executable program. For this reason alone, how we choose to aggregate our software into distinct UORs is vital.
2.2.5 Generalized Definition of Dependencies for Aggregates
This definition of physical dependency for aggregates intentionally casts a wide net, so that it can be applied to aggregates that do not necessarily follow our methodology. For aggregates composed entirely of components as defined by the four properties in Chapter 1,7 the definition of direct dependency of y on x reduces to whether any file in y includes a header from x.
Given the atomic nature with which physical aggregates must be treated for design purposes, if an aggregate z Depends-On y (directly or otherwise) and y in turn Depends-On x, then we must assume, at least from an architectural perspective, that z Depends-On x.
2.2.6 Architectural Significance
Architecturally significant entities are those parts of a UOR that are intended to be seen (and potentially used) directly by external clients. These entities together effectively form the public interface of the UOR, any changes to which could adversely affect the stability of its clients. The definition of architectural significance emphasizes deliberate intent, rather than just the actual physical manifestation, because it is that intent that is necessarily reflected by the architecture.
A suboptimal implementation might, for example, inadvertently expose a symbol (at the .o level) that was never intended for use outside the UOR. If such unintentional visibility were to occur within a UOR consisting entirely of components, it would likely be due to an accidental violation of Component Property 2 (section 1.6.2) and not a deliberate (and misguided) attempt to provide a secret “backdoor” access point. Repairing such defects would not constitute a change in architecture — especially in this case, since any use of such a symbol would itself be a violation of Component Property 4 (section 1.11.1).
2.2.7 Architectural Significance for General UORs
In our component-based methodology, all the software that we write outside the file that implements main() is implemented in terms of components. Unfortunately, not all UORs that we might want or need (or be compelled) to use are necessarily component-based (the way we would have designed them). We will start by considering the parts of a general UOR that are architecturally significant irrespective of whether or not they are made up exclusively of components. Later we will discuss the specifics of those that fortunately are.
2.2.8 Parts of a UOR That Are Architecturally Significant
In a nutshell, each externally accessible .h file,8 each nonprivate logical construct declared within those .h files, and the UOR itself are all architecturally significant. To make use of logical entities from outside the UOR in which they are defined, their (package-qualified) names (see section 2.4.6) will be needed. In addition, the .h files declaring those entities must (or at least should) be included (section 1.11.1) — by name — directly (see section 2.6) for clients to make substantive use of them. Finally, to refer to the particular library comprising the .o files corresponding to a UOR (e.g., for linking purposes), it will be necessary to identify it, again, by name.
2.2.9 What Parts of a UOR Are Not Architecturally Significant?
While .h files are naturally architecturally significant, .cpp files and their corresponding .o files are not. If we were to change the names of header files or redistribute the logical constructs declared within them, it would adversely affect the stability of its clients; however, such is not the case for .cpp or .o files. Assuming the UOR is identified in totality by its name, the internal organization of the library archive that embodies the .o files (corresponding to its .cpp files) comprised by that UOR will have absolutely no effect on client source code. What’s more, changing such insulated details (see section 3.11.1) will not require client code even to recompile.
2.2.10 A Component Is “Naturally” Architecturally Significant
For UORs consisting of .h /.cpp pairs forming components as defined in Chapter 1, both the .h and .cpp files will each have the component name as a prefix (see section 2.4.6), making components architecturally significant as well. To maximize hierarchical reuse (section 0.4), all components within a UOR and all nonprivate constructs defined within those components are normally architecturally significant. There are, however, valid engineering reasons for occasionally suppressing the architectural significance of a component. Section 2.7 describes how we can — by conventional naming — effectively limit the visibility of (1) nonprivate logical entities outside of the component in which they are defined, and (2) a component as a whole.
2.2.11 Does a Component Really Have to Be a .h/.cpp Pair?
What ultimately characterizes a component architecturally is governed entirely by its .h file. In Chapter 1, we arrived at the definition of a component as being a .h /.cpp pair satisfying four essential properties. In virtually all cases, this phrasing serves as the definition of a component in C++.9 For completeness, however, we point out that, though this definition is sufficient and practically useful, it is not strictly necessary. The true essential requirement for components in C++ is that there be exactly one .h file and one10 (at least) or more (see below) .cpp files that together satisfy these four essential properties.
2.2.12 When, If Ever, Is a .h /.cpp Pair Not Good Enough?
In exceedingly rare cases,11 there might be sufficient justification to represent a single component using multiple .cpp files. Unlike header files, .cpp files in a component, and especially the resulting .o files in a statically linked library (.a), are not considered architecturally significant. For example, a component myutil defining three logically related, but physically independent functions might reasonably be implemented as having a single header file myutil.h and multiple implementation files — e.g., myutil.1.cpp, myutil.2.cpp, and myutil.3.cpp — each uniquely named, but all sharing the component name as a common prefix. Consequently, a program calling only one of the three functions might, under certain deployment strategies (see section 2.15), wind up incorporating only the one .o file corresponding to the needed function. Such nuanced considerations are not relevant to typical development and are most usually relegated to the subdomain of embedded systems.
2.2.13 Partitioning a .cpp File Is an Organizational-Only Change
It is important to realize that the aggressive physical partitioning discussed above is permissible only because it is organizational and not architectural. That is, our view and use of the component, its logical design, and its physical dependencies are left unaffected by such architecturally insignificant optimizations. Introducing (or removing) such optimizations has no effect on the client-facing interface (including any need for recompilation) or logical behavior, only on program size. By contrast, introducing multiple .h files for a single component would represent an architectural change manifestly affecting usage; hence, a component — in all cases — must have exactly one header file, whose root name identifies the component uniquely (see section 2.2.23).
2.2.14 Entity Manifest and Allowed Dependencies
To be practically useful, every aggregate (from a component to a UOR) must, at a minimum, somehow allow us to specify contractually the entities it aggregates, as well as the other physical entities upon which those contained entities are allowed (i.e., explicitly permitted) to depend directly. Much of our design methodology is anchored in understanding the physical dependencies among the discrete logically and physically cohesive (see section 2.3) entities within our software. Given a dependency graph, without knowing the specific (outwardly visible) entities at its nodes or its (permissible) edges, there is simply no good way to reason about it.
For any given component, as illustrated in Figure 2-6a, the manifest of aggregated entities is implied by the accessible logical entities declared within its header file. The allowed direct dependencies are implied by the combined #include directives embedded within the .h and .cpp files of that component (section 1.11). For the second and successive levels of physical aggregation, the manifest of member aggregates and list of allowed dependencies is an essential part of the architectural specification and must somehow be stated explicitly (Figure 2-6b).
 
FIGURE 2-6: Specifying members and allowed dependencies for aggregates
Unfortunately, the C++ language itself does not support any notion of architecture beyond a single translation unit.12 Hence, much of the aggregative structure we discuss in this chapter will have to be implemented alongside the language using metadata (see section 2.16). This metadata will be kept locally as an integral part of each aggregate to help guide the tools we use to develop, build, and deploy our software.13 An abstract subsystem consisting of four second-level aggregates forming three separate (aggregate) dependency levels is illustrated schematically in Figure 2-7.
 
FIGURE 2-7: Schematic subsystem built from second-level physical aggregates
2.2.15 Need for Expressing Envelope of Allowed Dependencies
Expressing the envelope of allowed dependencies for aggregations of components explicitly might, at first, seem redundant and therefore unnecessary. As noted in section 1.11, there are numerous dependency-analysis tools available that can be used to extract actual dependencies from the aggregated components and produce the envelope of those dependencies across physical aggregates automatically, but to do so misses the point: The purpose of stating allowed dependencies is to be anticipatory, not reactive. Characterizing a set of proposed aggregations and then supplying an envelope of allowed dependencies among those aggregations enables us to express our physical design (intent) before any code is written. As new functionality is added, unexpected physical dependencies can be detected and flagged as implementation errors. Without specifying allowed dependencies a priori, there is no physical design to implement, let alone verify. Hence, explicitly specifying — and verifying — allowed dependencies is necessary at every level of physical aggregation.
2.2.16 Need for Balance in Physical Hierarchy
Between a component and a UOR, we might imagine that there could (in theory) be any number of intermediate levels of physical aggregation, each of which might or might not have architectural significance. Some physical aggregation hierarchies are better than others. In particular, an unbalanced hierarchy, such as the one illustrated schematically in Figure 2-8, is suboptimal.
 
FIGURE 2-8: UOR having unbalanced levels of physical aggregation (BAD IDEA)
2.2.17 Not Just Hierarchy, but Also Balance
Effective regular decomposition of large systems requires not only hierarchy, but also balance. We choose to model our software development accordingly. Although not strictly necessary, we want each aggregate to comprise entities having similar physical complexity. In particular, we deliberately avoid placing components alongside larger aggregates within a UOR. We find that entities having comparable complexity at each aggregation depth improves comprehension and facilitates reuse.
At each increasing level of physical aggregation, we strive to bring together a significant, but not overwhelming amount of information and engineering at a uniform level of abstraction such that it can be understood and used effectively. As a rule, we would like the relevant schematic detail to correspond to what might reasonably fit on a single 8 1/2 × 11 inch piece of paper14 as suggested by the complexity of each of the individual diagrams in Figure 2-9. By achieving this balance — much like the chapters and sections within this book — we provide fairly uniformly chunked content, which makes it more convenient to analyze and discuss.
 
FIGURE 2-9: Balancing complexity at each level of physical aggregation
2.2.18 Having More Than Three Levels of Physical Aggregation Is Too Many
While components (being deliberately fine grained) are too small to be practical to release or deploy individually, having more than three appropriately balanced levels of physical aggregation (as illustrated schematically in Figure 2-10) is not especially useful and can be impractical due to the sheer magnitude of the code involved. There are limits as to what we can reasonably fit into a single physical library and what typical development and build tools can accommodate. There are also design and deployment issues that would tend to discourage physically aggregating such massive architectural entities.
 
FIGURE 2-10: More than three levels of physical aggregation (BAD IDEA)
2.2.19 Three Levels Are Enough Even for Larger Systems
In our experience, we find that three appropriately balanced, architecturally significant levels of physical aggregation have been sufficient to represent very large libraries. When there are three architecturally significant levels, we will consistently refer to each entity at the second level of architecturally significant aggregates within the UOR as a package15 (see section 2.8) and the UOR itself as a package group (see section 2.9).
For example, using even the modest size estimates for a component, package, and package group illustrated in Figure 2-11, each UOR would, on average, support a couple of hundred thousand lines of noncommentary source code — excluding, of course, the corresponding component-level test drivers (see Volume III, section 7.5). Thus, an enterprise-wide body of library software consisting of 10 million lines of source code could fit comfortably within fifty such UORs, with yet larger code bases requiring only proportionately more.
 
FIGURE 2-11: Modest size estimates of components, packages, and package groups.
2.2.20 UORs Always Have Two or Three Levels of Physical Aggregation
Hence, in our methodology, the number of appropriately balanced, architecturally significant levels of physical aggregation within our library software will always be at least two (i.e., the individual components and the UOR that comprises them), but never more than three.
There might, in rare cases, be valid reasons — e.g., to accommodate a large, monolithic, externally designed interface16 — to introduce, purely for organizational purposes, an additional, intervening level of physical aggregation. Any such organization-based partitioning of the implementation of an architecturally significant aggregate — just like with that of a component — should, of course, never be architecturally significant (see section 2.11).
2.2.21 Three Balanced Levels of Aggregation Are Sufficient. Trust Me!
The “artificial” constraints on physical aggregation suggested here do not in any way stop individual developers from being creative; rather, this regularly structured physical aggregation model helps to focus creativity where it will be most effective — the functionality, not the packaging — thereby making our software developers as a whole more successful. It will turn out that having a regular, balanced, and fairly shallow architectural structure also lends itself to an economical notation for identifying every architecturally significant logical and physical entity within our proprietary library software (see section 2.4).
2.2.22 There Should Be Nothing Architecturally Significant Larger Than a UOR
We deliberately avoid creating anything architecturally significant that is larger than a single (physical) UOR.17 Treating such expansive logical units atomically, as illustrated in Figure 2-12a, would increase our envelope of allowed dependencies without providing any concrete encapsulation of logical functionality within a cohesive physical entity (see section 2.3). Instead, we choose to model such coarse architectural policy more articulately as individual allowed physical dependencies among UORs (Figure 2-12b). The more that we can encapsulate each logical subsystem within a single (architecturally significant) physical aggregate, the more we will be able to infer useful physical dependencies (section 1.9) from logical relationships across those entities.
 
FIGURE 2-12: Supplanting logical aggregation with allowed physical dependency
2.2.23 Architecturally Significant Names Must Be Unique
The C++ language requires that the name of every logical entity visible outside of the translation unit in which it is defined must be unique within a program (section 1.3.1). We need more. We require that the names of all externally accessible logical entities within our library identify each entity uniquely because, with reuse, a combination of those logical entities might one day wind up within the same program (see section 3.9.4). For the same reason, the names of all UORs (package groups and packages) and components — each also being visible to external clients — must be globally unique as well.
Even without our cohesive naming strategy (see section 2.4), there remain compelling advantages (e.g., see sections 2.4.6 and 2.15.2) to ensuring that component filenames are themselves guaranteed to be globally unique throughout the enterprise — irrespective of directory structure.18
- The benefit of unique filenames is uniqueness. When one sees a filename (such as xyza_context.h) anywhere in the system — be it in a log message, an assertion, an email, or a tab in a text editor – one knows, uniquely, the component to which it refers. Unique filenames also make the rendering of include directives in source code orthogonal to the physical placement of headers on a filesystem. A lack of unique filenames does not break any one thing, but makes a large collection of tasks more difficult because the filename itself is no longer a unique identifier. In a large-scale organization with hundreds of thousands of components (among which there will inevitably be many having the base name “context”), maintaining the filename as a unique identifier has been, and will continue to be, a very valuable property indeed! 
- — Mike Verschell 
2.2.24 No Cyclic Physical Dependencies!
Cyclic physical dependencies19 among any physical entities — irrespective of the level of physical aggregation — do not scale and are always undesirable. Such cyclically interdependent architectures are not only harder to build, they are also much, much harder to comprehend, test, and maintain than their acyclic counterparts. In fact, to help improve human cognition, we almost always structure our source code to avoid forward references to logical entities even within the same component. Whenever the physical specification of a design would allow cyclic dependencies among architecturally significant physical aggregates, we assert that the design is unacceptably flawed. Even if, for some unusual (organizational) reason, we were to choose to partition an outwardly visible aggregate into subaggregates that were not architecturally significant (e.g., see section 2.11), we would nonetheless insist that the allowed dependencies among those subaggregates be acyclic as well (see also Figure 2-89, section 2.15.10).
2.2.25 Section Summary
In summary, a physical aggregate is a physically cohesive unit of logical content and a necessary abstraction in any development process. The organizational details of a physical aggregate will likely vary from one platform, compiler/linker technology, and deployment strategy to the next; hence, each physical aggregate is treated, at least architecturally, as atomic. Our logical designs must also, therefore, always be governed by the envelope of architecturally allowed (rather than actual) physical dependencies specified for the aggregate. Balancing complexity at each successive level of aggregation facilitates human cognition and potential reuse. The use of three balanced levels of architecturally significant physical aggregation has been demonstrated to be sufficient (and in fact optimal) to describe even the largest of systems. We do, however, want to avoid architecturally significant logical entities (other than an enterprise-wide namespace) that span UORs.