Cluster Building Process, Phase 2: Installation
Let's assume that you have just exited the last step in the previous phase in the process, final solution design. The previously abstract pieces of the process are about to become more concrete—literally. The installation phase of the cluster building process comprises three steps (see Figure 3):
- Site preparation
- Physical hardware assembly
- Software installation and configuration
Figure 3 Cluster building process, phase 2: installation.
After you place the orders for the cluster's hardware and software, there will be a delay before all of the physical components arrive. In most situations, the hardware and other components will dribble in from the various vendors over some unforeseen period. Now is a good point to recheck the cluster context diagram (refer to Figure 1). While it may seem a good time to relax and have a few beers, instead you should focus on site preparation, which might include steps such as these:
- Verify or wire all necessary electrical connections (use professional electricians for all wiring tasks).
- Verify or modify cooling equipment, airflow, and floor tile placement.
- Make space in the computer room for the cluster racks.
- Replace existing doors to make them tall enough for racks (I'm only partly joking).
Complex site preparation may become the critical item affecting installation schedule and final acceptance of the cluster. After all, if you can't plug in the racks—or cool them—it's unlikely that you'll be able to proceed with the rest of the steps in the installation process. Careful evaluation of the site preparation requirements is essential to success.
Physical Hardware Assembly
Physical assembly of the hardware, involving racking and cabling, can be a tedious and repetitive task. It's truly amazing the pile of boxes, manuals, rubber bands, twist ties, and plastic bags that are left over from a cluster construction project—you do not want this detritus in the computer room if you can avoid it. Arranging a separate work area for hardware assembly, racking, and cabling is a very good idea. This area can also double as storage for the components until the complete set has arrived.
There are other verification steps (such as network connectivity) that require the hardware assembly and cabling to be complete. It's often best to not attempt installation or configuration of software in the whole cluster until the hardware assembly is completed. It's always best to keep the users away from the partially completed cluster until you're absolutely sure that the cluster is ready for them. You have been warned.
Software Installation and Configuration
Installation and configuration of the image installation server—such as the server hosting SystemImager—and a single compute slice is possible while the remainder of the hardware installation is being completed. With this approach, you can rapidly install operating system images on the cluster's compute slices when they're ready. Many hardware and network verification steps require an operating system to be present—this is certainly true if your cluster has a high-speed interconnect (HSI) or other network components.
It's very important to verify the proper operation of all hardware elements once the cluster's racks are assembled, moved into place, and powered on. Compute-slice BIOS settings, console switches, storage arrays, network switches, and other active hardware elements must be configured and verified. (Your cable diagram and labeling scheme will pay for themselves here.) Some verification may be possible on individual racks, if your cluster is that large.
Inevitably, dead hardware, missing components, cables of the wrong length, and other unexpected issues will crop up in such a complex project. Each issue needs to be dealt with and a determination made as to whether the problem will affect testing of the cluster—assigning a severity and priority to issues is a good way to manage and track them, just as with software bug tracking. Once all of the individual cluster elements are verified, it's time to move on to the next phase in the process.