Home > Articles

An Interview with the authors of “The Practice of Cloud System Administration” on DevOps and Data Security

Win Treese interviews Thomas A. Limoncelli, Strata R. Chalup, and Christina J. Hogan, the authors of The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems, Volume 2, about DevOps, tearing down silos, the challenge of data security, and why we’re all doomed.
Like this article? We recommend

Win Treese: A focal point for your new book is “DevOps”. For those not familiar with the term, how would you describe DevOps and its philosophy?

Thomas Limoncelli: DevOps is breaking down the silos of Dev(eloper)s and Op(eration)s so that they work together. In the old days, Devs wrote code and threw it over the wall (or into a shrink-wrap machine) and Ops picked it up and were stuck trying to figure out how to make it operational in an efficient way. Requests for code changes that would improve operability went largely ignored.

Christina Hogan: The result was that Ops resisted change, introducing change management processes and sign-off tasks that made it harder to push new releases into production. This was frustrating for Devs, as it delayed new features being made available to customers and them seeing the benefit of the Devs’ hard work. Debugging was hard because of the long time between developing the code and getting bug reports. Releases were larger and less frequent, and therefore riskier and more error prone, making Ops even more change averse. It was a vicious circle that benefited no one.

Limoncelli: In a web environment, Devs and Ops work at the same company and can work together. If management makes both Devs and Ops responsible for uptime and performance, it becomes natural to tear down the silos and work together. Automation becomes the rule, not the exception. Devs and Ops develop empathy for each other’s roles and can focus on global optimizations, rather than local optimizations that are globally ineffective.

Hogan: Devs benefit from getting new features more quickly into production, and having less overhead and resistance for deploying new releases. Ops benefit from more reliable releases and better operational features.

Strata Chalup: These lessons are not just for web environments. While the techniques and cultural changes came from the web, they are now being applied to all environments because every organization has silos and can benefit from more cooperation.

Treese: Your book is quite comprehensive, possibly with an intimidating amount of detail. How would you advise existing groups about starting to work as a D evO ps team?

Limoncelli: It usually happens by a grassroots initiative. Two people from different teams sit down, usually outside of the office, and have a heart-to-heart conversation about ways they can improve the entire system by working closer together. When I worked at Google I would often have team leads meet and discuss their processes and the handoffs between the teams. Soon people would be listing “pain points” and both teams would commit to changes that would make things better.

Alternatively it can start from the top. However, if an executive puts out a mandate that everyone should tear down silos, that’s not going to change anything. They should, instead, set goals for end-to-end improvements in latency (how long it takes a feature to get to production), and quality (how many failed deployments) and then make sure the middle management understands that these improvements can best be made by making it possible for their teams to cooperate directly across silos. In other words, management needs to get out of the way.

Here’s a quick test an executive can use: If one team needs to talk to another to make an improvement, are they permitted to talk directly to the person that they need to talk with to fix the problem, or do they have to go up one management chain, to the common manager, then down another chain, getting approval at each manager until a couple weeks later they’re finally talking to the right person? Or, worse yet, is the message conveyed on the engineer's behalf by management who act as gatekeepers so that critical technical details are lost? If the norm is anything other than direct person-to-person contact, the executives have built a company with bad culture and they need to change that.

Hogan: Not only should employees be encouraged to talk directly to the person they need to talk to, in a highly productive culture people are scolded for trying to go up the management chain rather than talking directly. At most companies it is the other way around.

Treese: The Practice of Cloud System Administration is described as Volume 2, following your previous book, The Practice of System and Network Administration. How have things changed since volume 1?

Hogan: Volume 1 was written in 2000, when outsourcing often meant laying off an IT department and replacing it with a company that did IT services, often using the people that had just been laid off. In retrospect, these deals rarely achieved the desired improvements in cost or quality. Since then using IaaS, PaaS and SaaS offerings has become an alternative way to outsource.

Volume 2 covers designing and operating these services. It looks at the DevOps cultural and organizational approach that is an essential part of successful cloud services. It describes the benefits of continuous integration, continuous delivery and continuous deployment. And it introduces an assessment framework for driving and measuring continuous improvement.

Limoncelli: Volume 1 has a lot of information about how to run a helpdesk and how to maintain a fleet of desktops and laptop. That used to be a big part of system administration. Now helpdesks generally don’t exist, and when they do they’re not staffed by system administrators because the skill set required is not as technical. When Volume 1 was written we were radicals for saying that OS installation should be automated so that each machine started out the same; and that even machines that come from a vendor preloaded with an OS should be wiped and re-installed with the standard configuration. Now that kind of thing is conventional wisdom.

Hogan: Volume 1 is also getting a make-over in the coming year to bring it up-to-date with the current state of the art, but it will still be quite distinct from Volume 2. We refer to Volume 1 as “the enterprise book” and Volume 2 as “the cloud book”. For better or worse, the enterprise is inherently more concerned with helping the end-users to be able to perform their jobs effectively and “ship product” that is not an Internet-based service. Enterprise IT teams need to provide all the services that their end-users need—not just provide a select few really well at massive scale. On the other hand, scaling of most services in the enterprise is much less of an issue than it is “in the cloud”. Techniques in Volume 2 can be applied in the enterprise, of course, particularly in large enterprises that are building private clouds. Also, most enterprises are moving in the direction of becoming customers for the various cloud offerings, rather than building those services themselves. Enterprise SA teams also need to prepare for moving more services into the cloud, and to know when to recommend and how to manage IaaS or PaaS solutions.

Treese: How does the emergence of cloud computing affect enterprise IT?  How is it different from outsourcing?

Hogan: Cloud providers offer a clearly defined service, akin to something you buy from a hardware or software vendor. A cloud service does not try to replicate the existing complex mess at the enterprise and support it for less money. The cloud provider leaves the enterprise to decide if its solution meets the enterprise’s needs or not, and to adapt its processes to the cloud solution. The cloud provider takes requests for features and customization, but only implements the ones that make sound business and technical sense, again akin to a hardware or software vendor. When an enterprise moves something into the cloud, the service and the costs are clearly defined up-front and there are fewer surprises. I say “ fewer” rather than “no” because enterprises that do not understand what their usage patterns will really be may still end up surprised at the costs. Because the service is clearly defined, economies of scale can be (and are) realized. The provider’s operational costs are amortized over many customers. The customer trades off the ability to customize the solution for a cheaper and better service. The benefits of the standardized cloud service usually outweigh the benefits of the customization.

For most small businesses, using cloud services should be a clear win. They typically just need some standard infrastructure services that should be available as cloud services. That approach should reduce the IT expertise that the company needs, obviate the need for expensive computing infrastructure and datacenter space, and provide them with better reliability than they could otherwise afford.

Larger enterprises tend to have more customized environments. However, there is a lot of commonality between the needs of different companies. We expect that over time companies will move towards using more SaaS solutions, and that SaaS solutions for more services that enterprises need will be developed. We also expect that the challenges posed by data privacy and integrity needs, local regulatory requirements and other blockers will be addressed by cloud providers, and solutions for particular industry sectors such as banking and healthcare will emerge.

Treese: A startup company doesn’t begin with a large-scale operation. How is this book relevant to them in the early days?

Chalup: Some people feel that DevOps is only needed by companies like Google, Facebook and Yahoo!... the unicorns of this industry: amazingly unique and special entities. The truth is that there are no unicorns. The problems that big companies face are the same problems that small companies face. If anything, most big companies are just conglomerates of many, many small teams with all the same problems that small companies have.

Limoncelli: Every startup wants to eventually have millions, or billions, of users. The architectural decisions they make at the start will enable or inhibit that kind of growth. The book has a lot of advice about platform selection, tools, and methodologies that work at small, medium, and large scale.

Treese: Sometimes it can seem impossible to keep up with changing tools, best practices, new products, and new open source projects that all promise to make life better for system administrators and developers. How do you stay on top of those things for your job?

Limoncelli: This is a book of fundamental principles and practices that are timeless. Therefore we don't make recommendations about which specific products or technologies to use. We could provide a comparison of the top five most popular web servers or NoSQL databases or continuous build systems. If we did, then the book would be out of date the moment it was published. Instead, we discuss the qualities one should look for when selecting such things. We provide a model to work from. This approach is intended to prepare the reader for a long career where technology changes over time but they are always prepared. We do, of course, illustrate our points with specific technologies and products, but not as an endorsement of those products and services.

Hogan: The best way to stay on top of all the developments in our field is to go to conferences and talk to people. Find out what they are doing and share your experiences with others. For people outside of the US, it can be a lot harder to get companies to fund attendance of the mostly US-based conferences. However, if you are willing to bear some of the costs yourself (e.g. flights), usually your company will agree to pay the rest (e.g. accommodation, meals and conference fees). If you care about your career and staying current, it’s worth it.

Treese: Many components of the systems you describe in the book are available as commercial or open source products, but they often require substantial work to integrate them. How do you think about making the build-or-buy decision?

Chalup: In a perfect world the exact modules and subsystems we need already exist and our role is simply buy-and-integrate. Sadly the world isn’t like that. Instead we need to build systems when we can do a better job, or there is value in a more tightly integrated solution.

Limoncelli: Google has achieved an unparalleled level of efficiency because they created their own stack; not just software but hardware too. Their software stack is years ahead of anyone else. Their hardware stack is too. They have rethought how the concrete floors of the datacenter should be engineered, to how the roof should be made, and everything in between. As a result they can’t include off-the-shelf products (software or hardware) without a lot of pain. However, that occasional sacrifice is minor compared to the huge advantage from their tight integration.

Treese: Recent security attacks are forcing everyone to up their game when it comes to cybersecurity. What are the best short-term steps for a D evO ps team to take?

Hogan: Recent events like Heartbleed, Shellshock and Poodle provided some golden opportunities for Dev Ops teams to identify bottlenecks. How quickly were their systems patched, and how long did each step in the process take?  Which steps need the most improvement in order for the team to be able to upgrade more quickly the next time?  These steps are the ones that should be improved in the near term.

Limoncelli: There’s a new layer in “multi-layer security” called: change-ability. If it is uncomfortable to upgrade software, you’re doomed to be stuck with insecure software. When Heartbleed struck, nearly every device and software package in every server and system needed to be upgraded. This made organizations painfully aware of which products could be upgraded easily and which were virtually impossible (and everything in between). Organizations that had one or two servers that never get upgraded because “the last software upgrade went badly, let’s never do it again” were in for a world of hurt. Every box and software system needs to be change-able: you need to be able to rapidly, frequently, and confidently roll out changes. If the system is change-averse for technical, political, or operational reasons, you need to re-engineer it so that it is change-able. Continuous Integration (CI) isn't about speed, it is about confidence that new releases will be quality. Speed is a happy side effect.

If a system's upgrade process is risky, we avoid upgrades thinking we are reducing risk. However, we are actually creating a bigger risk by becoming “out of practice”. If you upgrade it frequently, you’ll force yourself to get better at it, which reduces risk of failure and improves your ability to react faster to software vulnerabilities. The first few upgrades will be painful, but over time the process will get smoother. Risk of negative outcomes is reduced. The scientific practice of improvement through repetition is “practice makes perfect” and if every grade school child knows this, then IT should too.

Being change-able doesn’t only affect security. An organization that has grown accustomed to never changing internal processes can’t improve itself. When we accept software that is difficult or unable to be upgraded, we become accustomed to its deficiencies. Soon our time is consumed with working around that calcified organization or buggy software. This becomes hugely inefficient and ultimately a competitive disadvantage.

Treese: What are the best long-term security practices to put in place?

Limoncelli: Stop focusing on compliance and focus on being better at protecting your data. Much of the security world is focused on compliance which, basically, prepares you for last year’s problems. Don’t be shocked that a minimum password length and desktops that lock their screens after 2 minutes don’t protect you from Heartbleed. Those were problems in the 1990s. Focus on new architectures that are secure from the start: develop a good certificate management system and then use it consistently for everything; think hard about every type of data you have and whether it should be in the cloud, on premises, or on computers that never connect to networks at all; stop buying software from vendors that have never taken security seriously. If a company has a history of making vulnerable products with no remorse, we must stop giving them money so that they’ll stop hurting society.

I think we need new business models. The current ones aren't working. If you buy a product from someone else, their goal is to make money. Any security fix or patch they do is just a painful subtraction from their highly profitable maintenance contract. There is a perverse incentive that leads to insecure products. Building software securely from the start is too expensive (see A Security Market for Lemons ) or delays your product from reaching market; either puts you out of business, leaving snake-oil and sloppy, insecure vendors remaining. It is cheaper to hide problems than to fix them. The alternative is to write your own software stack so that the entire value chain is controlled by you and therefore aligned with your best interests. That is impossibly expensive and requires every company to have technical prowess that is unrealistic to achieve. So basically, we’re doomed.

Hogan: On that cheerful note... I agree with Tom that the current financial incentives for information security are flawed. Finding a solution to this conundrum is non-trivial, and not something that technologists with little real understanding of economics can do alone. However, there is an annual workshop on the economics of information security (WEIS), which brings together economists and technocrats who are interested in this problem. Some of the things that they have analyzed include the effectiveness of using civil liability as a financial incentive for improving security (PDF), the effect of cloud-based SaaS offerings on security (PDF), and the effectiveness and incentives of bi-directional ISP-based filtering (PDF). A recent ACM article by WEIS attendee Cormac Herley indicates that increasing the cost of an attack can significantly reduce the likelihood of successful break-ins by financially motivated attackers . In other words, encryption and strong authentication do help. Also, the IETF is actively looking into standards and protocol revisions to enable, and in some cases enforce, better security practices.

So, in a nutshell, our recommendations are: integrate security from the ground up; know your data; understand your data security requirements; use encryption and strong authentication; manage your encryption keys well; stay current; be change-able; and avoid vendors who don’t take security seriously.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020