My previous article talked about the importance of having a QA function in the infrastructure development and support organization. This article discusses the group's responsibilities.
The production control group addresses second-level production support activities. This includes acting as a liaison between applications development staff, user community, database administration, systems administration, and computer operations to resolve production problems, implement new systems, and change existing systems. This also includes ownership of key data center processes.
Three-Tier Support Model
Whoever developed the idea of three levels of support within an organization was a genius. This is one of the best structures ever designed, and probably the single most important reason that the mainframe world—in particular, the data center—was so successful. Some of the roles and responsibilities of this structure are described in the following sections.
Monitor systems (servers, network, peripheral devices).
Perform incremental and full backups.
Provide tape librarian functions.
Assist in the physical layout of production servers.
Issue trouble tickets and monitor the data center on a 24x7 basis.
First-level problem determination and resolution attempt. After N number of minutes, as determined by the problem management process, the problem will be escalated to second-level support.
Process design, implementation, ownership, and accountability (production acceptance, change management, etc.).
Support software installation and configuration.
Perform system maintenance as required.
Perform storage management functions.
24x7 on-call support.
Perform disaster-recovery drills.
Establish end-of-life plans to deactivate servers and applications.
Monitor system and network performance.
Provide online availability statistics.
Define and reset standards to support mission-critical applications.
Problem determination and attempted resolution. After N minutes as determined by the problem management process, the problem will be escalated to third level.
Second-level support should do everything possible to resolve the problem before escalating to the third level: the senior gurus of the department. Senior system administrators and database administrators are worth their weight in gold. The entire organization needs to protect this valuable resource.
Physical location of the server, network connections, and sufficient power for all peripherals.
Preventive maintenance diagnostics on all incoming equipment.
Partitioning the disks during OS installation.
Configuring the OS.
Applying patches to the OS as needed.
Assisting database administration with RDBMS installations.
Installing any unbundled products, such as tape management and disk mirroring, and applying patches to unbundled products as needed.
Installing all required support packages, such as the console server, auto-pager, preventive maintenance routines, and so on.
Support of software installation and configuration.
Maintaining and configuring system security.
Performing system maintenance as required.
24x7 on-call support.
Performing disaster-recovery drills.
Monitoring system and network performance.
Tuning systems for peak performance.
Implementing capacity planning.
Performing security audits and monitoring security access.
Establishing system user accounts and root ownership.
Defining and setting standards to support mission-critical applications.
Problem resolution. The buck stops here; if they can't fix the problem, no one can.
Designing and architecting infrastructure-related programs.