The reliability of hardware and software can be verified from customer references and industry analysts. Beyond that, you should consider performing an empirical component reliability analysis, which consists of the following steps:
Review and analyze problem management logs.
Review and analyze supplier logs.
Acquire feedback from operations personnel.
Acquire feedback from support personnel.
Acquire feedback from supplier repair personnel.
Compare experiences with other shops.
Study reports from industry analysts.
An analysis of problem logs should reveal any unusual patterns of failure; it should be studied by supplier, product, using department, day and time of failures, frequency of failures, and time to repair. Suppliers often keep onsite repair logs that can be perused to conduct a similar analysis.
Feedback from operations personnelespecially offsite operatorsis often candid, and can be revealing as to how components truly perform. For example, operators may be doing numerous resets on a particular network component every morning prior to startup, but they may not bother to log these activities since the network always comes up. Similar conversations with various support personnel such as systems administrators, network administrators, and database administrators may elicit similar revelations.
You might worry about bias when canvassing a supplier's repair personnel about the true reliability of their products. In my experience, however, these people can be just as candid and revealing as the people using the product. This becomes another valuable source of information for evaluating component reliability. Yet another is comparing experiences with other shops. Shops that are closely aligned with your own in terms of platforms, configurations, services offered, and customers can be especially helpful. Reports from reputable industry analysts can also be used to predict component reliability.
Repairability is the relative ease with which service technicians can resolve or replace failing components. Two common metrics used to evaluate this trait are how long it takes to do the actual repair, and how often the repair work needs to be repeated. In more sophisticated systems, initial repair work can be done from remote diagnostic centers where failures are detected, circumvented, and arrangements made for permanent resolution with little or no involvement of operations personnel.