Home > Articles > Networking > Storage

Deployment Considerations for Data Center Management Tools

  • Print
  • + Share This
Build a better management infrastructure by understanding more about the basic building blocks, architecture, and key design elements of a complete Systems Management Tools Framework, as presented in this first article in a two-part series.
Like this article? We recommend

Introduction

This article describes some of the main aspects to consider when deploying a data center management tools infrastructure (DCMTI). It also includes considerations to keep in mind when complementing this environment with a process management tool to facilitate the integration with other external processes such as, but not limited to, a help desk function.

This article is a prelude to a follow-on article that will describe an actual implementation of such a management architecture in one of Sun's iForceSM Ready Center programs.

The topics in this article are:

  • Main Considerations
  • Architecture
  • Other Considerations

The main considerations when designing and implementing a DCMTI are:

  • Create visibility at all layers for all aspects. "FCAPS" on page 3 describes these aspects (fault, configuration, accounting, performance and security)

  • Create a process management environment to facilitate interaction with other organizations and service request control.

Considering these aspects results in a management architecture that has five major components:

  • Agents
  • Management servers and consoles
  • Correlation and framework server
  • Consoles
  • Process management tool

The physical distribution within the management architecture can vary based on specific requirements. However, the natural separation points are:

  • Between the agents and the server

  • Between the management servers and the Framework server

  • Between the Framework server and the process server

We recommend a separate management network for performance, visibility and security reasons.

After reading this document, you should have a good understanding of some of the main aspects to consider when building a DCMTI, and you will be ready to begin deploying a DCMTI. A follow-on article will describe the details of a deployment that incorporates the suggestions in this article.

Main Considerations

A good DCMTI provides the information to support several different views into the managed environment. These views are often organized by layer—facilities, network, compute and storage, and application infrastructure—Lightweight Directory Access Protocol (LDAP), domain name service (DNS), relational database management system (RDBMS), Network Time Protocol (NTP) and so forth, and at the top, the business application.

In addition to these views, there should also be a Service Level Management (SLM) view. The main objective of this view is to show how the service provided measures against a predefined Service Level Agreement (SLA) and its associated Service Level Objectives (SLOs). The articles Service Level Management in the Data Center and Building a Service Level Agreement in the Data Center describe the main concepts of SLAs and SLOs, so no additional details are included herein.

The views by layer must provide information of all aspects that are deemed important by the operations staff to keep the systems up and running. The International Standards Organization (ISO) has defined five areas (FCAPS) that completely address this requirement.

FCAPS

The FCAPS aspects are:

  • Fault
  • Configuration
  • Accounting
  • Performance
  • Security

Fault

This aspect looks at the status of the components and whether they are performing within set thresholds. It is event based. Broken disks and dead processes are examples of events.

Configuration

This aspect manages the configuration of the IT components. It tracks the parameters and values of the IT components. Preferably a history of configurations is maintained so a bad change is backed-out easily.

Accounting

This aspect is an older concept that stems from the mainframe world. It is the ability to track usage of system resources and relate that to business units and/or customers to enable billing. An interesting side note is that, with the emerging ASP business models, accounting has received renewed interest.

Performance

This aspect manages the challenging task of monitoring how fast or slow a system responds and processes transactions. A key process in this area is performance tuning and capacity planning, where historical data is submitted for analysis to discover trends or model anticipated changes in the environment.

Security

This aspect manages the complete infrastructure from an authentication, authorization and access perspective. Security is very pervasive and should be addressed early in the architecture design and deployment phases.

As mentioned earlier, all of these aspects should be managed at all layers in the infrastructure. TABLE 1 shows that concept. An advantage of this representation is that it enables a quick overview to assess and identify areas that are candidates to be addressed by the management infrastructure.

TABLE 1 FCAPS Overview

 

Fault

Configuration

Accounting

Performance

Security

Business application

5

2

2

3

2

Application infrastructure (RDBMS, LDAP and so forth)

5

2

1

1

1

Compute and storage platform

5

3

1

3

2

Network

5

2

1

3

3

Facilities

5

2

2

1

3


The numbers in this example, indicate a level of compliance. Five means, "well covered" and zero means, "not covered." The same table can be used to describe the requirements for a DCMTI. In that case, five could mean, "important requirement" and zero could mean "no requirement".

Interaction With Other Organizations

In addition to the views that represent the appropriate aspects organized by layer, a process management tool is a very important consideration.

A process management tool facilitates the transition of activities into other processes, and it facilitates the following main aspects:

Service Request Control

  • Status update (new, latest event and so on)

  • Progress enforcement (escalation, if needed)

  • Qualification and routing (where next?)

  • Closure (quality control surveys and so on)

Reporting

  • Periodic reports

    • Management

    • Service Performance

  • Exception reports

These functions are often provided by a help desk or customer care desk. However, in context of this document, the management infrastructure is assumed to be capable of generating requests based on predefined rules. The rules to determine when to create a request are implemented and enforced at the alert consolidation and correlation layer in the management infrastructure. "Architecture" on page 8 details this process.

Service Request Process

FIGURE 1 is a high-level process view of how the process management tool would handle a ticket. The intent is to highlight key steps that you must consider when building such a process and mapping it to the tools ticket.

Figure 1FIGURE 1 Sample Request Process View of Ticket Handling

It is important to realize that there are multiple sources for action requests in the IT management environment. Four sources are given here as an example; other sources exist, depending on specific situations. Before the request enters the process it should be prioritized, localized (in case of multiple locations of activities) and categorized. Based on that information it will be qualified and assigned.

Typically, this should be a generic name or group (not a person's name) to avoid constant updating of the configuration files that link this information. Depending on priority, location and category, the ticket starts to follow a distinct process that tracks progress and key information for service performance and management reporting purposes.

Essential considerations for prioritizing and routing a request are:

  • P—Priority

  • S—Skills needed that determine the routing

  • A—Action(s) to represent a distinct process

TABLE 2 Service and Management Reporting

Function

Driver

Examples

Priority ->

Cost of downtime

No. of users affected

P1–More than 10 users affected and/or business critical system is down during production hours

 

System function (business critical)

P2–Less than 10 users affected and/or not a business critical system during any time of the day

 

Time of day

P3–Request for enhancement. Not business critical. No time pressure.

 

Service Level Agreement

P4–Specific rules as per the agreement

 

 

... and so on.

Routing ->

Skills needed

 

 

What type pf technology?

S1–Computer Sun hardware disk fault P1

 

What type pf alert (FCAPS)?

S2–Computer IBM operating kernel performance P2

system

 

What priority?

S3–Network Cisco hardware router configuration P3

Process ->

Action needed

 

 

Resolution time

A1–Must be resolved ASAP S3 P1

 

Skills needed

A2–Should be resolved within 4 hours S1 P2

 

Priority of request

A3–Should be resolved within 2 hours S2 P4


When all three functions have been defined, you can create a matrix that relates the priorities of a request, based on the skills needed to the appropriate process. This typically identifies which group is assigned. Based on the preceding table, TABLE 3 shows this priority request matrix.

TABLE 3 Priority Request Matrix

 

P1

P2

P3

P4

S1

A1

A1

A6

A10

S2

A2

A4

A7

A10

S3

A3

A5

A8

A10

S4

A3

A5

A9

A10


It is important to realize that service request priorities do not influence the priorities or criticality at the system agent layer. The health of a system is independent of its impact on the business. The former is addressed in the DCMTI, the latter in process management.

Each specific process has a rule to allow for escalation and re-assignment. When all goes well, the request is fulfilled and the ticket is closed. The closing process can include activities like informing users, updating databases, and sometimes even initiating clearing of alarms in the DCMTI.

FIGURE 2 shows some key aspects to consider in the specialized resolution process of a trouble ticket. It illustrates the preceding considerations with more detail.

Figure 2FIGURE 2 Sample Trouble Ticket Resolution Process

Most notable is the update of the process database at key steps in the process. Also, in the decision tree towards the end, there is an interesting example of how escalation can be achieved. Generally, an automated approach to escalation is not recommended because it would automatically reassign a ticket. The most common approach is to run daily reports or create alerts for supervisors who make the best decision for the next step, and generate ad-hoc reports (email, text page and so on) for high priority events that require immediate attention.

  • + Share This
  • 🔖 Save To Your Account