Home > Articles > Operating Systems, Server > Microsoft Servers

  • Print
  • + Share This
This chapter is from the book

Ensuring High Availability of a Hyper-V Host Server

One of the concerns expressed by many IT administrators when consolidating and centralizing their physical servers into fewer virtual host systems is "what happens when the host server fails," because a single host server failure can now impact several network servers simultaneously. Instead of just having 1 server down, the organization can have 4, 8, or 10 systems all down at the same time. The good and bad of centralized servers is that although it is bad that all these server systems are offline, because there is so much riding on a single server, it becomes easier to justify the high availability of a server that is hosting so many business applications. Instead of clustering 10 physical servers, an organization may choose to just cluster the virtual host server that will then protect the guest sessions under the host. Or in an environment where redundancy and disaster recovery is part of the IT strategy, the organization would split server resources across multiple Hyper-V host systems.

In the SQL world, split server resources means mirroring databases across two or more servers; and with virtualization, that means putting one SQL server on one host server and a mirror copy of the SQL server on a second host server. In the event that either of the guest SQL sessions fails or even if either of the virtual host server sessions fails, the SQL mirroring will provide redundant resource storage and access from more than one system.

Significant improvements in Windows Server 2008 clustering and support for both host and guest session clustering provides reliability and improved uptime for virtualized hosts and guest sessions. Because IT administrators are tasked with the responsibility of keeping the network operational 24 hours a day, 7 days a week, it becomes even more important that clustering works. Fortunately, the cost of hardware that supports clustering has gotten significantly less expensive; in fact, any server that meets the required specifications to run Windows Server 2008, Enterprise Edition can typically support Windows clustering. The basic standard for a server that is used for enterprise networking has the technologies built in to the system for high availability. Windows Server 2008, Enterprise Edition or Datacenter Edition is required to run Windows 2008 clustering services.

No Single Point of Failure in Clustering

Clustering by definition should provide redundancy and high availability of server systems; however, in previous versions of Windows clustering, a "quorum drive" was required for the cluster systems to connect to as the point of validation for cluster operations. If at any point the quorum drive failed, the cluster would not be able to fail over from one system to another. Windows 2008 clustering removed this requirement of a static quorum drive. Two major technologies facilitate this elimination of a single or central point of failure: majority-based cluster membership verification and witness-based quorum validation.

The majority-based cluster membership allows the IT administrator to define which devices in the cluster get a vote to determine whether a cluster node is in a failed state (and so the cluster needs to fail over to another node). Instead of assuming the disk will always be available as in the previous quorum disk model, now nodes of the cluster and shared storage devices participate in the new enhanced quorum model in Windows 2008. Effectively, Windows 2008 server clusters have better information to determine whether it is appropriate to fail over a cluster in the event of a system or device failure.

The witness-based quorum eliminates the single quorum disk from the cluster-operation validation model. Instead, a completely separate node or file share can be set as the file share witness. In the case of a GeoCluster, where cluster nodes are in completely different locations, the ability to place the file share in a third site and even enable that file share to serve as the witness for multiple clusters becomes a benefit for organizations with distributed data centers and also provides more resiliency in the cluster-operation components.

The elimination of points of failure in clustering plus the ability to cluster across geographic distances allows the administrators of an organization to put one cluster server on one host system and another cluster server on another host system and have guest session redundancy without single points of failure.

Stretched Clusters for Hyper-V Hosts and Guests Across Sites

Windows 2008 also introduced the concept of stretched clusters to provide better server and site server redundancy. Effectively, Microsoft has eliminated the need to have cluster servers remain on the same subnet, as has been the case in Windows clustering in the past. Although organizations have used virtual local area networks (VLANs) to stretch a subnet across multiple locations, this was not always easy to do and, in many cases, technologically not the right thing to do in IP networking design.

By allowing cluster nodes to reside on different subnets, plus with the addition of a configurable heartbeat timeout, clusters can now be set up in ways that match an organization's disaster-failover and -recovery strategy. In the case of multiple host environments, one host with a cluster guest session can sit in one site, and another host with a cluster guest session can sit in another site. In the event that either the guest session fails or the entire site becomes available, the virtualized cluster spanning multiple physical sites can provide extremely high-level redundancy in a Windows 2008 Hyper-V environment.

Leveraging Storage Area Networks for Virtual Hosts and Guests

Windows 2008 has also improved its support for storage area networks (SANs) by providing enhanced mechanisms for connecting to SANs and switching between SAN nodes. In the past, a connection to a SAN was a static connection, meaning that a server was connected to a SAN just as if the server was physically connected to a direct attached storage system. However, the concept of a SAN is that if a SAN fails, the server should reconnect to a SAN device that is now online. This could not be easily done with Windows 2003 or earlier. SCSI bus resets were required to disconnect a server from one SAN device to another.

With Windows 2008, a server can be associated with a SAN with a persistent reservation to access a specific shared disk; however, in the event that the SAN fails, the server session can be logically connected to another SAN target system without having to script device resets that have been complicated and disruptive in disaster-recovery scenarios.

All the SAN connect and disconnect associations, failover, and recovery are translated back to the Windows 2008 Hyper-V host server and to any of the guest sessions running on Hyper-V that are Windows 2008 server guests. With the inclusion of clustering along with SAN storage replication, an organization can design and implement a highly available network environment based on Hyper-V virtualization.

  • + Share This
  • 🔖 Save To Your Account