Home > Articles

Distribute Your Work

Veteran technology and business executives discuss scaling databases and services through cloning and replication, separating functionality or services, and splitting similar data sets across storage and application systems. Using these three approaches, you will be able to scale nearly any system or database to a level that approaches infinite scalability.

This chapter is from the book

In 2004 the founding team of ServiceNow (originally called Glidesoft), built a generic workflow platform they called “Glide.” In looking for an industry in which they could apply the Glide platform, the team felt that the Information Technology Service Management (ITSM) space, founded on the Information Technology Infrastructure Library (ITIL), was primed for a platform as a service (PaaS) player. While there existed competition or potentially substitutes in this space in the form of on-premise software solutions such as Remedy, the team felt that the success of companies like Salesforce for customer relationship management (CRM) solutions was a good indication of potential adoption for online ITSM solutions.

In 2006 the company changed its name to ServiceNow in order to better represent its approach to the needs of buyers in the ITSM solution space. By 2007 the company was profitable. Unlike many startups, ServiceNow appreciated the value of designing, implementing, and deploying for scale early in its life. The initial solutions design included the notions of both fault isolation (covered in Chapter 9, “Design for Fault Tolerance and Graceful Failure”) and Z axis customer splits (covered in this chapter). This fault isolation and customer segmentation allowed the company to both scale to profitability early on and to avoid the noisy-neighbor effect common to so many early SaaS and PaaS offerings. Furthermore, the company valued the cost effectiveness afforded by multitenancy, so while they created fault isolation along customer boundaries, they still designed their solution to leverage multitenancy within a database management system (DBMS) for smaller customers not requiring complete isolation. Finally, the company also valued the insight offered by outside perspectives and the value inherent to experienced employees.

ServiceNow contracted with AKF Partners over a number of engagements to help them think through their future architectural needs and ultimately hired one of the founding partners of AKF, Tom Keeven, to augment their already-talented engineering staff. “We were born with incredible scalability from the date of launch,” indicated Tom. “Segmentation along customer boundaries using the AKF Z axis of scale went a long way to ensuring that we could scale into our early demand. But as our customer base grew and the average size of our customer increased beyond small early adopters to much larger Fortune 500 companies, the characterization of our workload changed and the average number of seats per customer dramatically increased. All of these led to each customer performing more transactions and storing more data. Furthermore, we were extending our scope of functionality, adding significantly greater value to our customer base with each release. This functionality extension meant even greater demand was being placed on the systems for customers both large and small. Finally, we had a small problem with running multiple schemas or databases under a single DBMS within MySQL. Specifically, the catalog functionality within MySQL [sometimes technically referred to as the information_schema] was starting to show contention when we had 30 high-volume tenants on each DBMS instance.”

Tom Keeven’s unique experience building Web-based products from the high-flying days of Gateway Computer, to the Wild West startup days of the Internet at companies like eBay and PayPal, along with his experience across a number of clients at AKF, made him uniquely suited to helping to solve ServiceNow’s challenges. Tom explained, “The database catalog problem was simple to solve. For very large customers we simply had to dedicate a DBMS per customer, thereby reducing the burst radius of the fault isolation zone. Medium-size customers may have tenants below 30, and small customers could continue to have a high degree of multitenancy [for more on this see Chapter 9]. The AKF Scale Cube was helpful in offsetting both the increasing size of our customers and the increased demands of rapid functionality extensions and value creation. For large customers with heavy transaction processing demands we incorporated the X axis by replicating data to read-only databases. With this configuration, reports, which are typically computationally and I/O intensive but read-only, could be run without impact to the scale of the lighter-weight transaction (OLTP) requests. While the report functionality also represented a Y axis (service/function or resource-based) split, we added further Y axis splits by service to enable additional fault isolation by service, significantly greater caching of data, and faster developer throughput. All of these splits, the X, Y, and Z axes, allowed us to have consistency within the infrastructure and purchase similar commodity systems for any type of customer. Need more horsepower? The X axis allows us to increase transaction volumes easily and quickly. If data is starting to become unwieldy on databases, our architecture allows us to reduce the degree of multi-tenancy (Z axis) or split discrete services off (Y axis) onto similarly sized hardware.”

This chapter discusses scaling databases and services through cloning and replication, separating functionality or services, and splitting similar data sets across storage and application systems. Using these three approaches, you will be able to scale nearly any system or database to a level that approaches infinite scalability. We use the word approaches here as a bit of a hedge, but in our experience across hundreds of companies and thousands of systems these techniques have yet to fail. To help visualize these three approaches to scale we employ the AKF Scale Cube, a diagram we developed to represent these methods of scaling systems. Figure 2.1 shows the AKF Scale Cube, which is named after our partnership, AKF Partners.

Figure 2.1

Figure 2.1 AKF Scale Cube

At the heart of the AKF Scale Cube are three simple axes, each with an associated rule for scalability. The cube is a great way to represent the path from minimal scale (lower left front of the cube) to near-infinite scalability (upper right back corner of the cube). Sometimes, it’s easier to see these three axes without the confined space of the cube. Figure 2.2 shows the axes along with their associated rules. We cover each of the three rules in this chapter.

Figure 2.2

Figure 2.2 Three axes of scale

Not every company will need all of the capabilities (all three axes) inherent to the AKF Scale Cube. For many of our clients, one of the types of splits (X, Y, or Z) meets their needs for a decade or more. But when you have the type of viral success achieved by the likes of ServiceNow, it is likely that you will need two or more of the splits identified within this chapter.

Rule 7—Design to Clone or Replicate Things (X Axis)

Often, the hardest part of a solution to scale is the database or persistent storage tier. The beginning of this problem can be traced back to Edgar F. Codd’s 1970 paper “A Relational Model of Data for Large Shared Data Banks,”1 which is credited with introducing the concept of the relational database management system (RDBMS). Today’s most popular RDBMSs, such as Oracle, MySQL, and SQL Server, just as the name implies, allow for relations between data elements. These relationships can exist within or between tables. The tables of most OLTP systems are normalized to third normal form,2 where all records of a table have the same fields, nonkey fields cannot be described by only one of the keys in a composite key, and all nonkey fields must be described by the key. Within the table each piece of data is related to other pieces of data in that table. Between tables there are often relationships, known as foreign keys. Most applications depend on the database to support and enforce these relationships because of its ACID properties (see Table 2.1). Requiring the database to maintain and enforce these relationships makes it difficult to split the database without significant engineering effort.

Table 2.1 ACID Properties of Databases

Property

Description

Atomicity

All of the operations in the transaction will complete, or none will.

Consistency

The database will be in a consistent state when the transaction begins and ends.

Isolation

The transaction will behave as if it is the only operation being performed upon the database.

Durability

Upon completion of the transaction, the operation will not be reversed.

One technique for scaling databases is to take advantage of the fact that most applications and databases perform significantly more reads than writes. A client of ours that handles booking reservations for customers has on average 400 searches for a single booking. Each booking is a write and each search a read, resulting in a 400:1 read-to-write ratio. This type of system can be easily scaled by creating read-only copies (or replicas) of the data.

There are a couple of ways that you can distribute the read copy of your data depending on the time sensitivity of the data. Time (or temporal) sensitivity is how fresh or completely correct the read copy has to be relative to the write copy. Before you scream out that the data has to be instant, real time, in sync, and completely correct across the entire system, take a breath and appreciate the costs of such a system. While perfectly in-sync data is ideal, it costs . . . a lot. Furthermore, it doesn’t always give you the return that you might expect or desire for that cost. Rule 19, “Relax Temporal Constraints” (see Chapter 5, “Get Out of Your Own Way”), will delve more into these costs and the resulting impact on the scalability of products.

Let’s go back to our client with the reservation system that has 400 reads for every write. They’re handling reservations for customers, so you would think the data they display to customers would have to be completely in sync. For starters you’d be keeping 400 sets of data in sync for the one piece of data that the customer wants to reserve. Second, just because the data is out of sync with the primary transactional database by 3 or 30 or 90 seconds doesn’t mean that it isn’t correct, just that there is a chance that it isn’t correct. This client probably has 100,000 pieces of data in their system at any one time and books 10% of those each day. If those bookings are evenly distributed across the course of a day, they are booking one reservation just about every second (0.86 second). All things being equal, the chance of a customer wanting a particular booking that is already taken by another customer (assuming a 90-second sync of data) is 0.104%. Of course even at 0.1% some customers will select a booking that is already taken, which might not be ideal but can be handled in the application by doing a final check before allowing the booking to be placed in the customer’s cart. Certainly every application’s data needs are going to be different, but from this discussion we hope you will get a sense of how you can push back on the idea that all data has to be kept in sync in real time.

Now that we’ve covered the time sensitivity, let’s start discussing the ways to distribute the data. One way is to use a caching tier in front of the database. An object cache can be used to read from instead of going back to the application for each query. Only when the data has been marked expired would the application have to query the primary transactional database to retrieve the data and refresh the cache. We highly recommend this as a first step given the availability of numerous excellent, open-source key-value stores that can be used as object caches.

The next step beyond an object cache between the application tier and the database tier is replicating the database. Most major relational database systems allow for some type of replication “out of the box.” Many databases implement replication through some sort of master-slave concept—the master database being the primary transactional database that gets written to, and the slave databases being read-only copies of the master database. The master database keeps track of updates, inserts, deletes, and so on in a binary log. Each slave requests the binary log from the master and replays these commands on its database. While this is asynchronous, the latency between data being updated in the master and then in the slave can be very low, depending on the amount of data being inserted or updated in the master database. In our client’s example, 10% of the data changed each day, resulting in one update per second. This is likely a low enough volume of change to maintain the slave databases with low latency. Often this implementation consists of several slave databases or read replicas that are configured behind a load balancer. The application makes a read request to the load balancer, which passes the request in either a round-robin or least-connections manner to a read replica. Some databases further allow replication using a master-master concept in which either database can be used to read or write. Synchronization processes help ensure the consistency and coherency of the data between the masters. While this technology has been available for quite some time, we prefer solutions that rely on a single write database to help eliminate confusion and logical contention between the databases.

We call the type of split (replication) an X axis split, and it is represented on the AKF Scale Cube in Figure 2.1 as the X axis—Horizontal Duplication. An example that many developers familiar with hosting Web applications will recognize is on the Web or application tier of a system, running multiple servers behind a load balancer all with the same code. A request comes in to the load balancer which distributes it to any one of the many Web or application servers to fulfill. The great thing about this distributed model on the application tier is that you can put dozens, hundreds, or even thousands of servers behind load balancers all running the same code and handling similar requests.

The X axis can be applied to more than just the database. Web servers and application servers typically can be easily cloned. This cloning allows the distribution of transactions across systems evenly for horizontal scale. Cloning of application or Web services tends to be relatively easy to perform and allows us to scale the number of transactions processed. Unfortunately, it doesn’t really help us when trying to scale the data we must manipulate to perform these transactions. In memory, caching of data unique to several customers or unique to disparate functions might create a bottleneck that keeps us from scaling these services without significant impact on customer response time. To solve these memory constraints we’ll look to the Y and Z axes of our scale cube.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020