Home > Articles > Software Development & Management

This chapter is from the book

System Monitoring

This process aims to provide continuous knowledge of systems availability, health, and status. It does so by monitoring all server, database, and application resources; responding to system and application-generated requests and events; automating monitored events; and rapidly diagnosing and resolving availability problems.

Tasks

Skills

Monitor health of enterprise systems

Determine when problems exist and escalate as required

Ensure optimal availability, using predefined procedures to recover systems when problems occur

Define processes/procedures to optimize system monitoring process

Expertise with selected monitoring tools

Ability to determine Basic Level 1 problems

Knowledge of management protocols (such as SNMP)

Knowledge of component (operating system, databases, middleware, and so on) behavior

Staffing

Automation Technology

Console specialist

Systems operations specialist

Availability specialist

OEM-supplied tools

Instrumentation

Suites

Best Practices

Metrics

Extremely high level of automated monitoring

Use of standard instrumentation provided by system suppliers

Ability to integrate event data across processes

Ability to integrate and present system information to differing operational groups

Integration of system monitoring with automation, notification, and problem management systems

Integration of event data with service-level agreement reporting

Use of Web-based user access to system management data

Class and aggregate resource availability

Number of elements monitored per employee

Employees per 10,000 events

Unit cost of monitoring per 10,000 events

Percentage of events handled manually

Process Integration

Futures

Performance management

Problem management

Further consolidation of resource-centric data related to monitoring (event, problem, asset, change)

Additional cross-platform integration (and with console automation) into business process and application views

Derivative capabilities of business impact based on outages


  • + Share This
  • 🔖 Save To Your Account