"Your organization needs this book!"
--Peter Salus, Chief Knowledge Officer, Matrix.Net, "The Bookworm"
This book describes the best practices of system and network administration, independent of specific platforms or technologies. It features six key principles of site design and support practices: simplicity, clarity, generality, automation, communication, and basics first. It examines the major areas of responsibility for system administrators within the context of these principles. The book also discusses change management and revision control, server upgrades, maintenance windows, and service conversions. You will find experience-based advice on topics such as:
And there's more! When was the last time you read a book that dealt with:
Chapters are divided into The Basics and The Icing. The Basics are those key elements that, when done right, make every other aspect of the job easier. Things like starting all new hosts with the same configuration and picking the right things to automate first. The Icing sections contain all those powerful things that can be done on top of the basics to wow customers and managers. Do the basics first. The icing is a vision for the future that usually only comes with decades of experience.
Click below for Web Resources related to this title:
Author's Web Site
Click below for Sample Chapter related to this title:
About the Authors.
Do These Now!
Use a Trouble-Ticket System.
Manage Quick Requests Right.
Start Every New Host in a Known State.
I. THE PRINCIPLES.1. Desktops.
Loading the System Software and Applications Initially.
Updating the System Software and Applications.
Dynamic DNS with DHCP.
High Confidence in Completion.
Involve Customers in the Standardization Process.
A Variety of Standard Configurations.
Buy Server Hardware for Servers.
Vendors Known for Reliable Products.
Does Server Hardware Really Cost More?
Maintenance Contracts and Spare Parts.
Servers Live in the Data Center.
Same, Different, or a Stripped-Down OS on Clients.
Remote Administration Access.
Mirrored Root Disks.
Redundant Power Supplies.
Full and n + 1</I> Redundancy.
Separate Networks for Administrative Functions.
Opposing View: Many Inexpensive Workstations.
Single or Multiple Servers.
Centralization and Standards.
Learn the Customer's Problem.
Find the Problem's Cause and Fix It.
Have the Right Tools.
Formal Training on the Tools.
End-to-End Understanding of the System.
Conclusion.5. Fixing Things Once.
Fix Things Once, Rather Than Over and Over.
Avoid the Temporary Fix Trap.
Learning from Carpenters.
Namespaces Need Policies.
Namespaces Need Change Procedures.
Namespace Management Should Be Centralized.
One Huge Database That Drives Everything.
Customers Do Many of the Updates.
Next-Level Namespace Ubiquity.
Conclusion.7. Security Policy.
Build Security Using a Solid Infrastructure.
Ask the Right Questions.
Document the Company's Security Policies.
Basics for the Technical Staff.
Management and Organizational Issues.
Make Security Pervasive.
Stay Up-to-Date: Contacts and Technologies.
Conclusion.8. Disaster Recovery and Data Integrity.
What Is a Disaster?
Professional Code of Conduct.
Network/Computer User Code of Conduct.
Privileged Access Code of Conduct.
Working with Law Enforcement.
Setting Expectations on Privacy and Monitoring.
Being Told to Do Something Illegal/Unethical.
II. THE PROCESSES.10. Change Management and Revision Control.
Process and Documentation.
Change Management Meetings.
Streamline the Process.
Conclusion.11. Server Upgrades.
The Steps in Detail.
Add and Remove Services at the Same Time.
Reusing the Tests.
A Dress Rehearsal.
Install Old and New Versions on the Same Machine.
Minimal Changes From the Base.
Conclusion.12. Maintenance Windows.
The Master Plan.
Mechanics and Coordination.
Deadlines for Change Completion.
Comprehensive System Testing.
Re-enable Remote Access.
Visible Presence the Next Morning.
Mentoring a New Flight Director.
Trending of Historical Data.
Providing Limited Availability.
Conclusion.13. Service Conversions.
Small Groups First, Then Expand Communication.
Layers Versus Pillars.
Avoid Explicit Conversions.
Conclusion.14. Centralization and Decentralization.
Candidates for Centralization.
Candidates for Decentralization.
III. THE PRACTICES.15. Helpdesks.
Have a Helpdesk.
A Friendly Face.
Defined Scope of Coverage.
Defined Processes for Sta.
An Escalation Process.
Out of Hours and 24 x 7 Coverage.
Better Advertising for the Helpdesk.
Different "Desks" for Service Provision Versus Problem Resolution.
Conclusion.16. Customer Care.
Ticket Tracking Software.
Phase A: The Greeting.
Phase B: Problem Identification ("What's Wrong?").
Phase C: Planning and Execution ("Fix It").
Phase D: Verification ("Verify It").
Perils of Skipping a Step.
Team of One.
Training Based on the Model.
The Single Point of Contact.
Increasing Customer Familiarity.
Special Announcements for Major Outages.
Customers That Know the Process.
Architectural Decisions That Match the Process.
Conclusion.17. Data Centers.
Picking a Location.
Power and Air.
Tools and Supplies.
Ideal Data Centers.
Tom's Dream Data Center.
Christine's Dream Data Center.
The OSI Model.
Intermediate Distribution Frame.
Main Distribution Frame.
Simple Host Routing.
Use Network Devices.
Number of Vendors.
Single Administrative Domain.
Leading-Edge Versus Reliability.
Multiple Administrative Domains.
Conclusion.19. Email Service.
High-Volume List Processing.
Conclusion.20. Print Service.
Select the Level of Centralization.
Print Architecture Policy.
Designing the System.
Automatic Fail-Over and Load Balancing.
Dedicated Clerical Support.
Dealing with Printer Abuse.
Conclusion.21. Backup and Restore.
Three Reasons for Restores.
The Backup Schedule.
Time and Capacity Planning.
The Restore Process.
Backup Media and Off-Site Storage.
High DB Availability.
Conclusion.22. Remote Access Service.
Remote Access Requirements.
Define a Remote Access Policy.
Define Service Levels.
Cost Analysis and Reduction.
Conclusion.23. Software Depot Service.
Understand the Justification.
Understand the Technical Expectations.
Set the Policy.
Selecting Depot Software.
Create the Process Manual.
A Unix Example.
A Windows Example.
Different Configurations for Different Hosts.
Including Commercial Software in the Depot.
Handling Second-Class Citizens.
Conclusion.24. Service Monitoring.
Application Response Time Monitoring.
IV. MANAGEMENT.25. Organizational Structures.
Consultants and Contractors.
Sample Organizational Structures.
Universities and Non-Profit Organizations.
Conclusion.26. Perception and Visibility.
A Good First Impression.
Attitude, Perception, and Customers.
Align Your Priorities with Customer Expectations.
Be the System Advocate.
The System Status Web Page.
Mail to All Customers.
Conclusion.27. Being Happy.
Organizing for Excellent Follow-Through.
Constant Professional Development.
Learn to Negotiate.
Loving Your Job.
Managing Your Manager.
Conclusion.28. A Guide for Technical Managers.
Working with Nontechnical Managers.
Working with Your Employees.
Make Your Team Even Stronger.
Sell Your Department to Senior Management.
Work on Your Own Career Growth.
Do Something You Enjoy.
Conclusion.29. A Guide for Nontechnical Managers.
Look for One-Year Plans.
Technical Staff and the Budget Process.
Have a Five-Year Vision.
Meetings with Single Point of Contact.
Understand the Technical Staff's Work.
Conclusion.30. Hiring System Administrators.
Select the Interview Team.
Sell the Position.
Conclusion.31. Firing System Administrators.
Follow Your Corporate HR Policy.
Remove Physical Access.
Remove Remote Access.
Remove Service Access.
Fewer Access Databases.
A Single Authentication Database.
Monitoring System File Changes.
The goal of this book is to write down all the things that we've learned from our mentors and our real-world experiences. These are the things that are beyond what the manuals and the usual system administration books teach. System administrators (SAs) often find themselves swamped with work, struggling to keep the site running, and faced with requests for new technologies from their customers. Servers are overloaded or unreliable, but fixing the problem requires weeks of planning and painstakingly untangling a mess of services so that they can be moved to new machines. Hidden dependencies are lurking around every corner, and getting bitten by one can be catastrophic. In the meantime, repetitive day-to-day tasks still need to be done. The challenges seem insurmountable.
Most sites grow organically, with little thought given to the big picture as each little change is implemented. Haphazardly, SAs learn about the fundamentals of good site design and support practices. They are taught by mentors, if at all, about the importance of simplicity, clarity, generality, automation, communication, and doing the basics first. These six principles are recurring themes in this book.
These principles are universal. They apply at all levels of the system. They apply to physical networks and to computer hardware. They apply to all operating systems running at the site, all protocols used, all software, and all services provided. They apply at universities, non-profit institutions, government sites, businesses, and Internet service sites.
Explaining What System Administration Entails
It's difficult to define system administration, but trying to explain it to a nontechnical person is even more difficult, especially if that person is your mom. Moms have the right to know how their offspring are paying their rent. A friend of Christine's always had trouble explaining to his mother what he did for a living and ended up giving a different answer every time she asked. Therefore she kept repeating the question every couple of months, waiting for an answer that would be meaningful to her. Then he started working for WebTV. When the product became available, he bought one for his Mom. From then on, he told her that he made sure that her WebTV service was working and was as fast as possible. She was very happy that she could now show her friends something and say, "That's what my son does!"
System administrators do many things. They look after computers, networks, and the people who use them. An SA may look after hardware, operating systems, software, configurations, applications, or security. A system administrator is someone who influences how effectively other people can use their computers and networks.
System administration matters because computers and networks matter. Computers are a lot more important than they were years ago. What happened?
First of all, the technology has changed. Corporate computers used to be independent, now they are connected. Business processes used to have a component that involved using a computer, now entire processes are done online and come to a halt if any part of the system is broken.
The widespread use of the Internet, intranets, and the move to a dot com world has redefined the way companies depend on computers. The Internet is a 24 x 7 operation, and sloppy operations can no longer be tolerated. A paper purchase order can be processed any time, anywhere; therefore there is an expectation that the computer system that automates the process will be available all the time, from anywhere. Nightly maintenance windows have become an unheard of luxury. That unreliable power system in the machine room that caused occasional but bearable problems now prevents sales from being recorded.
The biggest change, however, is due to CEOs putting a new importance on computing. In business, nothing is important unless the CEO feels it is important. The CEO controls funding and sets priorities. Now CEOs have become dependent on email. They notice when an outage or an overloaded system slows down their email. The massive preparations for Y2K also brought home to CEOs how dependent their organizations have become on computers.
We use the term chief executive officer (CEO) loosely to mean the top person in an organization. Educational institutions have CEOs, they're just referred to as president, provost, proctor, or head. Governments have CEOs they're just referred to as mayor, governor, Prime Minister, leader, or President.
Management now has a more realistic view of computers. Previously people had unrealistic ideas of what computers could do; seeing them as portrayed in film: big, all-knowing, self-sufficient, miracle machines. This has changed. Even the need for SAs is now portrayed in films. In 1993, Jurassic Park (Crichton 1993) was the first mainstream movie to portray computers as needing system administration, leading to a better public understanding of what it is.
Computers matter more than ever. If computers are to work and work well, then system administration matters. We matter.
This book was born from our experiences as SAs in a variety of companies. We have helped sites to grow. We have worked at small start-ups and universities, where lack of funding was an issue. We have worked at mid-size and large multinationals, where mergers and spin-offs give rise to more challenges. We ve worked at fast-paced companies that do business on the Internet and have high-availability, high-performance, and rapid scaling issues. On the surface, these are very different environments with diverse challenges. But underneath, they all need the same building blocks, and the same fundamental principles apply.
This book gives you a framework a way of thinking about system administration problems rather than a narrow how-to solution to a particular problem. Given a solid framework, you can solve problems every time they appear, no matter what operating system (OS), brand of computer, or type of environment. This book is unique because it looks at system administration from this point of view, whereas most books for SAs focus on how to maintain one particular type of OS. With experience, however, all SAs learn that the big-picture problems and solutions are largely independent of the platform. This book will change the way you approach your work as an SA and the way you view the site you maintain.
The principles in this book apply to all environments. The approaches described may need to be scaled up or down, depending on your environment, but the basic principles still apply. In chapters where we felt that how to apply the information to other environments might not be obvious, we have included a section that illustrates how to apply the principles at different companies.
This book is not about how to configure or debug a particular OS. It will not tell you how to recover the shared libraries or DLLs when someone accidentally moves them. There are some excellent books that do cover those topics, and we will refer you to many of them throughout the book. What we will discuss here are the principles of good system administration, both basic and advanced, that we have learned through our own and others experiences. These principles apply to all OSs. Following them well can make your life a lot easier. If you improve the way you approach problems, the benefit will be multiplied. Get the fundamentals right, and everything else falls into place. If they aren't done well, you will waste time repeatedly fixing the same things, and your customers2 will be unhappy because they can't work effectively with broken machines.
2Throughout the book we refer to the end-user of our systems as customers rather than users. A detailed explanation of why we do this is in Section 26.1.2.
We believe that SAs of all levels will benefit from reading this book. It gives junior SAs insight into the bigger picture of how sites work, their roles in the organizations, and how their careers can progress. Intermediate SAs will learn how to approach more complex problems and how to improve the sites, making their jobs easier and more interesting and their customers happier. It will help you to understand what is behind your day-to-day work, to learn the things that you can do now to save time in the future, to decide policy, to be architects and designers, to plan far into the future, to negotiate with vendors, and to interface with management. These are the things that concern senior SAs. None of them are listed in an OS's manual. Even senior SAs and systems architects can learn from our experiences and the experiences of our colleagues that are captured in these pages, as we have learned from each other in writing this book. We also cover several management topics, both for SA managers and for SAs who aspire to move into management.
The easiest way to learn usually is by example, particularly in the case of practical areas like system administration. Throughout the book, we use examples to illustrate the points we are making. The examples are mostly from medium or large sites, where scale adds its own problems. Typically, the examples are generic rather than specific to a particular OS, although some are OS-specific, usually Unix or Windows. One of the strongest motivations we had for writing this book is the understanding that the problems SAs face are the same across all OSs. A new OS that is significantly different from what we are used to can seem like a black box, a nuisance, or even a threat. However, despite the unfamiliar interface, as we get used to the new technology, eventually we realize that we face the same set of problems in deploying, scaling, and maintaining the new OS. Recognizing that fact, knowing what problems need solving, and understanding how to approach the solutions by building on experience with other OSs let us master the new challenges more easily.
We want this book to be something that changes your career. We want you to become so successful that if you see us on the street you'll give us a great big hug.
This book has four major parts:
The book ends with several appendices.
Each chapter discusses a different topic, and the topics vary from the technical to the nontechnical. If one chapter doesn't apply to you, feel free to skip it. The chapters are linked to each other, so you may find yourself returning to a chapter that you previously thought was boring. We won't be offended.
There are two halves to each chapter: The Basics and The Icing. The Basics discusses the essentials that you just plain have to get right. Skipping any of these items will simply create more work for you in the future. Consider them investments that pay off in efficiency later on. The Icing deals with the cool things that you can do to be spectacular. Don't spend your time with these things until you are done with The Basics. We have made an attempt to drive the points home through anecdotes and case studies from personal experience. We hope that this makes the advice here more real for you. Never trust salespeople who don't use their own products.
Each chapter stands on its own. Feel free to jump around. However, we have carefully ordered the chapters so that they make the most sense if you read the book from start to finish. Either way, we hope you enjoy the book. We have learned a lot and had a lot of fun writing it. Let's begin.
Thomas A. Limoncelli
P.S. Books, like software, always have bugs. We intend to maintain a list of updates to this book on its web site: http://www.awl.com/cseng/titles/0-201-70271-1 or our web site, http://www.EverythingSysAdmin.com. Please visit!