Going Large-Scale with C++, Part 2: Maximizing Returns on Scale with Hierarchical Reuse

Dec 17, 2015

⎙ Print

Page 1 of 1

John Lakos, author of Large-Scale C++ LiveLessons (Workshop): Applied Hierarchical Reuse Using Bloomberg's Foundation Libraries, concludes his two-part series by discussing how hierarchical reuse avoids common pitfalls of large development projects and takes advantage of economies of scale.

Read Part 1.

Like this article? We recommend 

Large-Scale C++ LiveLessons (Workshop): Applied Hierarchical Reuse Using Bloomberg's Foundation Libraries

Learn More Buy

Introduction

Widening the scope of a software project brings opportunities for effective reuse. In Part 1 of this series, we considered the development efforts involved in increasing size for a single application. Such applications could be partitioned over time into a sequence of successive projects, shorter development segments (sometimes called sprints), or successive versions of the product.

In this article, we consider that even disparate projects developed within a single enterprise are likely to comprise similar, lower-level components. By adhering to canonical class categories, we can more easily extract commonality at every level of the physical hierarchy, and thereby dramatically improve upon the efficiency gains afforded by conventional reuse.

Multiple Applications and Global Reuse

Suppose your enterprise has multiple related products at varying stages of development. The opportunities for reuse potentially increase by an order of magnitude! A well-factored component arising from one application or product might now be reused in many others. Of course, such a "reusable" component must be segregated from malleable application code, and must reside in a stable library at an appropriate level in the enterprise's physical hierarchy. This kind of refactoring, known as demotion, is one of several levelization techniques commonly used to avoid cyclic or excessive link-time dependencies. [1]

Software Insulation Techniques

As code size continues to increase, we encounter a new problem: excessive compile-time coupling. Ideally, making a local change requires recompiling only a single translation unit. However, even a minor change to the implementation of C++ templates or inline functions can force significant portions of a large code base to be recompiled, with potentially devastating implications for development time.

Fortunately, a variety of specific techniques exist for insulating implementation details (such as abstract base classes and non-inline methods) so that they can be modified without forcing clients to recompile. [2] These insulation techniques are most effective when applied early in the software development lifecycle.

Preventing Client Misuse of Libraries

Misuse of perfectly correct library software by its clients is a common source of errors, and this misuse tends to grow along with a successful code base, leading to increasing support costs. Design by contract, characterized by explicit preconditions and postconditions, facilitates testing—arguably for even small, short-lived programs. As the development effort grows to span several developers or longer time periods, the essential behavior of each function, along with the specific conditions on initial state and input that clients must satisfy prior to invoking it, can no longer reside solely in the minds of developers. Thoughtful, descriptive function and parameter names are always vital, but cannot substitute for concise function-level contracts that are primarily directed at potential human clients, yet sufficient to enable thorough unit testing.

Communicating all of the necessary contract information in source code is not viable. Certain preconditions also require a knowledge of execution history and/or global context that simply cannot be validated mechanically—not even at runtime. The only general, practical way to capture all the information developers need to use a function properly is to spell it out in prose—a dimension of engineering that takes considerable time and effort for most software developers to master.

Defensive Programming

We need to weed out client misuse early, during development and beta testing, without impacting production software. In our methodology, each application must be able to specify coarsely (at compile time) the amount of runtime checking that robust library software, such as Bloomberg's Basic Development Environment (BDE) should spend checking for precondition violations. If a precondition violation is detected, the application should also be able to specify (at runtime) the specific action to be taken, such as aborting the program, throwing an exception, logging an error, saving client data, or whatever else might make sense for that particular application. By using a centralized defensive programming facility such as bsls_assert, [3] each individual application owner retains explicit control over every aspect of defensive checking during each phase of development, up to and including the release of production software.

Moving from Conventional to Hierarchical Reuse

As the sheer magnitude of software grows, we can observe common recurring patterns in even the sub-parts of the components that make up a stable infrastructure. Although we could rewrite these sub-parts each time the need arises, a better option is to factor them out into their own useful, stable components, placing them in appropriate (lower-level) libraries within the enterprise-wide physical hierarchy, and then making those libraries publicly accessible. With this practice, both developers and clients can reuse these well-crafted, thoroughly tested implementation details without continually having to reinvent them.

To this end, we segregate each publicly accessible class into its own distinct physical component, unless there is a compelling reason to do otherwise (such as C++ friendship). [5] This extremely beneficial form of fine-grained reuse constitutes what we refer to as hierarchical reuse.

Vocabulary Types and Interoperability

Not all repeated code is troublesome. Redundant code in the implementations of several modules can cause unnecessary code bloat and lead to increased maintenance costs, but is otherwise relatively innocuous. Compare that with distinctly named types that represent the same concrete value (such as a date) or abstract service (such as a connection) across functional interface boundaries. Duplication of such critical vocabulary types quickly leads to interoperability problems across cooperating subsystems.

A pervasive example of duplicate value types leading to significant performance degradation (due to frequent conversions from one to the other) is that of (const char *) and std::string, as has been observed in the Chromium project. [6] Given an increasing body of software developed by a single entity, a centralized authority will need to ensure a couple of things:

Duplication of vocabulary types is avoided.
A single vocabulary type for any required value or service resides at the appropriate level in the enterprise-wide physical hierarchy.

Canonical Rendering

Consistent rendering is another important consideration. A few people can easily agree on a general style. Larger development groups have more difficulty reaching consensus on any sort of consistent, uniform rendering. As projects grow, codifying such standards in writing and reinforcing them with supporting tools becomes increasingly important.

Allowing multiple styles to creep in virtually ensures that no unified style will emerge. Gratuitous variation in rendering adds no value, and it detracts from human understanding by masking important visual cues and obscuring the location of needed information. Unfortunately, the larger the project, the harder it is to avoid such variations. A style-checking tool is an important part of any substantial development effort.

Organizing Code Into Canonical Class Categories

Large bodies of software are notorious for being incomprehensible. Insisting on a regular physical packaging structure and avoiding cyclic, excessive, or otherwise inappropriate physical dependencies goes a long way toward improving human cognition. But there are still many other opportunities for facilitating a better common understanding of how software works in general.

Over time, it has become apparent that we need just a few distinct categories of common types:

Value-semantic types: Instantiable C++ types that attempt to represent ethereal, Platonic (such as mathematical) values. (Instantiable here means that we intend to construct objects of this type.)
Mechanisms: Instantiable types that don't try to represent values.
Protocols: Pure abstract interfaces.
Utilities: Non-instantiable types comprising collections of non-primitive algorithms that typically operate on value types.

Along with the use of these common types, classes that reside in the same category will share many important, familiar properties. For more information and specific examples, see our taxonomy. [7]

By deliberately designing the overwhelming majority of classes to fit these categories, we achieve a high degree of shared context that greatly facilitates communication among developers.

Ensuring Effective Testing

Given a high degree of both logical and physical regularity, we can tackle one of the most problematic aspects of software development: effective testing. For application programs that fit within a single file, the usual way of testing the program takes one of two forms:

Running the entire program repeatedly, supplying varying inputs.
Embedding repeatable test code within the application source itself.

Another common approach to verification is peer review, which brings certain important advantages that testing doesn't address, but alone is generally insufficient to assure correctness on all but the tiniest of programs.

Once the overall program size compels us to segment it into separate translation units, however, we can test the lower-level functionality in these units non-intrusively, independently of the parent application and other peer components. Unit testing involves associating a dedicated test program with each component, allowing the various tests to be rerun independently and automatically.

At its heart, unit testing requires us to render an equivalent (redundant) representation of the functionality of the component under test. We then sample these two "implementations" both appropriately (systematically) and with sufficient frequency (applying enough tests) to be sure that they exhibit the same essential behavior. Knowledge of systematic data-selection methods is essential, as is familiarity with transparent (highly readable and maintainable) test-case implementation techniques. Fine-grained physical modularity makes thorough unit testing possible, and adherence to common class categories provides a blueprint for efficient testing of representatives of each respective category.

Testing Instantiable Types

To test any instantiable type (that is, a value-semantic type or mechanism), we first need to identify two categories of (typically member) functions:

Primary manipulators: A minimal set of constructor and manipulator methods capable of bringing an object to every state required for thorough testing.
Basic accessors: A sufficient set of accessor methods having direct access to object state.

Then, in the case of value-semantic types, we must first test the equality-comparison operations before using them to help verify the postconditions of other value-semantic operations, such as copy construction and assignment. Once the kernel of a type is proven, we can then use that functionality to flesh out the remaining tests. By identifying common class categories, we also identify common strategies for organizing our testing, which in turn provides profound benefits in developer productivity.

Dynamic and Static Analysis Tools

The more people who are involved in a project, the more opportunities there are for mistakes. Although thorough testing along with peer review are highly effective, complementary, and synergistic ways to make sure that carefully documented functions behave as described, each demands substantial time and effort.

Successful code bases tend to spawn many applications, not all of which are necessarily tested and reviewed with the same level of care as the underlying library infrastructure. Where applicable, publicly available analysis tools such as Google's (dynamic) ThreadSanitizer and especially Bloomberg's (static) bde_verify are inexpensive to run, effectively offset some of the cost of peer review for library software, and can readily be applied (at very low cost) to improve the quality of application software. At a certain scale, however, it becomes cost-effective (for review tasks that can be automated) to invest in the development of custom tools when existing ones fail to address specific recurring needs.

Scalable Distributed Systems

The nature of large-scale software has changed significantly since my book Large-Scale C++ Software Design was published in 1996. A typical large system these days (such as Solr, Hadoop, or Spark) is not rendered as a monolithic executable running in a single process, but instead as a collection of cooperating programs running in multiple processes. These discrete programs, in addition to having the dimensions of complexity discussed previously, have to deal with complications of inter-process, inter-computer, and even wide-area communication.

Nonetheless, the previously described general approach to developing large-scale C++ software applies to each of these cooperating programs individually. The extent to which hierarchically reusable library software can be leveraged in multiple programs in the same distributed system demonstrates the true value of understanding and addressing the many dimensions of large-scale C++ software design and development from the start.

Conclusion

Hierarchical reuse involves making the sub-parts of each reusable piece available and stable, and therefore themselves reusable. This practice becomes more important with increasing project size, and dramatically so when the development effort encompasses multiple related programs.

To achieve effective reuse, certain key physical design rules, such as avoiding cyclic dependencies and long-distance friendships, must be followed scrupulously. In addition, a canonical rendering strategy and general adherence to a small set of common class categories facilitates human cognition, thereby further promoting profitable reuse. Finally, these common class categories lead to highly effective standard testing strategies, improving reliability and bolstering client confidence, furthering the adoption of the software. If we think ahead, we will gain economies of scale as our software capital asset continues to grow and pay dividends.

References

[1] To learn more about avoiding cycles in software, see Chapter 5, "Levelization," in Large-Scale C++ Software Design.

[2] For more on avoiding compile-time dependencies, see Chapter 6, "Insulation," in Large-Scale C++ Software Design.

[3] See how bsls_assert is used in released software.

[4] The article "Language Support for Contract Assertions (Revision 10)" discusses preconditions as proposed for C++ 17.

[5] I discuss the evils of long-distance friendships in the section "Friendship" in Chapter 3, "Components," in Large-Scale C++ Software Design.

[6] See how duplicate types impacted the Chromium project.

[7] Learn more about our class-category taxonomy.

Page 1 of 1

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address