- Resource Pooling
- Resource Reservation
- Hypervisor Clustering
- Redundant Storage
- Dynamic Failure Detection and Recovery
- Multipath Resource Access
- Redundant Physical Connection for Virtual Servers
- Synchronized Operating State
- Zero Downtime
- Storage Maintenance Window
- Virtual Server Auto Crash Recovery
- Non-Disruptive Service Relocation
Non-Disruptive Service Relocation
How can cloud service activity be temporarily or permanently relocated without causing service interruption?
There are circumstances under which redirecting cloud service activity or relocating an entire cloud service implementation is required or preferable. However, diverting service activity or relocating a cloud service implementation can cause outage, thereby disrupting the availability of the cloud service.
A system can be established whereby cloud service redirection or relocation is carried out at runtime by temporarily creating a duplicate implementation before the original implementation is deactivated or removed.
Virtualization technology is used by the system to enable the duplication and migration of the cloud service implementation across different locations in realtime.
Cloud Storage Device, Cloud Usage Monitor, Hypervisor, Live VM Migration, Pay-Per-Use Monitor, Resource Replication, SLA Management System, SLA Monitor, Virtual Infrastructure Manager (VIM), Virtual Server, Virtual Switch
A cloud service can become unavailable due to a number of reasons, such as:
- The cloud service encounters more runtime usage demand than it has processing capacity to handle.
- The cloud service implementation needs to undergo a maintenance update that mandates a temporary outage.
- The cloud service implementation needs to be permanently migrated to a new physical server host.
Cloud service consumer requests are rejected if a cloud service becomes unavailable, which can potentially result in exception conditions. Rendering the cloud service temporarily unavailable to cloud consumers is not preferred even if the outage is planned.
A system is established by which a pre-defined event triggers the duplication or migration of a cloud service implementation at runtime, thereby avoiding any disruption in service for cloud consumers.
An alternative to scaling cloud services in or out with redundant implementations, cloud service activity can be temporarily diverted to another hosting environment at runtime by adding a duplicate implementation onto a new host. Cloud service consumer requests can similarly be temporarily redirected to a duplicate implementation when the original implementation needs to undergo a maintenance outage. The relocation of the cloud service implementation and any cloud service activity can also be permanent to accommodate cloud service migrations to new physical server hosts.
A key aspect to the underlying architecture is that the system ensures that the new cloud service implementation is successfully receiving and responding to cloud service consumer requests before the original cloud service implementation is deactivated or removed.
A common approach is to employ the live VM migration component to move the entire virtual server instance hosting the cloud service. The automated scaling listener and/or the load balancer mechanisms can be used to trigger a temporary redirection of cloud service consumer requests in response to scaling and workload distribution requirements. In this case either mechanism can contact the VIM to initiate the live VM migration process.
Figure 4.41 An example of a scaling-based application of the Non-Disruptive Service Relocation pattern (Part I).
Figure 4.42 An example of a scaling-based application of the Non-Disruptive Service Relocation pattern (Part II).
Figure 4.43 An example of a scaling-based application of the Non-Disruptive Service Relocation pattern (Part III).
- The automated scaling listener monitors the workload for a cloud service.
- As the workload increases, a pre-defined threshold within the cloud service is reached.
- The automated scaling listener signals the VIM to initiate the relocation.
- The VIM signals both the origin and destination hypervisors to carry out a runtime relocation via the use of a live VM migration program.
- A second copy of the virtual server and its hosted cloud service are created via the destination hypervisor on Physical Server B.
- The state of both virtual server instances is synchronized.
- The first virtual server instance is removed from Physical Server A after it is confirmed that cloud service consumer requests are being successfully exchanged with the cloud service on Physical Server B.
- Cloud service consumer requests are only sent to the cloud service on Physical Server B from hereon.
Depending on the location of the virtual server’s disks and configuration, this migration can happen in one of two ways:
- If the virtual server disks are stored on a local storage device or on non-shared remote storage devices attached to the source host, then a copy of the virtual server disks is created on the destination host (either on a local or remote shared/non-shared storage device). After the copy has been created, both virtual server instances are synchronized and virtual server files are subsequently removed from the origin host.
- If the virtual server’s files are stored on a remote storage device shared between origin and destination hosts, there is no need to create the copy of virtual server disks. In this case, the ownership of the virtual server is simply transferred from the origin to the destination physical server host, and the virtual server’s state is automatically synchronized.
Note that this pattern conflicts and cannot be applied together with Direct I/O Access (169). A virtual server with direct I/O access is locked into its physical server host and cannot be moved to other hosts in this fashion.
Furthermore, Persistent Virtual Network Configuration (227) may need to be applied in support of this pattern so that by moving the virtual server, its defined network configuration is not inadvertently lost, which would prevent cloud service consumers from being able to connect to the virtual server.
- Cloud Storage Device – This mechanism is fundamental to the Non-Disruptive Service Relocation pattern in how it provides the storage required to host data pertaining to the virtual servers in a central location.
- Cloud Usage Monitor – Cloud usage monitors are used to continuously track IT resource usage and activity of the system established by the Non-Disruptive Service Relocation pattern.
- Hypervisor – The hypervisor is associated with this pattern in how it is used to host the virtual servers that are hosting the cloud services that need to be relocated. It is further used to transfer a virtual server’s ownership and runtime, including CPU and memory state, from one hypervisor to another.
- Live VM Migration – This mechanism is responsible for transferring the ownership and runtime information of a virtual server from one hypervisor to another.
- Pay-Per-Use Monitor – The pay-per-use monitor is used to continuously collect the service usage costs of the IT resources at both their source and destination locations.
- Resource Replication – The resource replication mechanism is used to instantiate the shadow copy of the cloud service at its destination.
- SLA Management System – The SLA management system is responsible for acquiring SLA information from the SLA monitor, in order to obtain cloud service availability assurances both during and after the cloud service has been copied or relocated.
- SLA Monitor – This monitoring mechanism collects the aforementioned information required by the SLA management system.
- Virtual Infrastructure Manager (VIM) – This mechanism is used to initiate relocation, which can be automated in response to a threshold being reached or monitoring event.
- Virtual Server – Virtual servers generally host the cloud services at the source and destination locations.
- Virtual Switch – The virtual switch mechanism keeps virtual servers connected to and accessible over the network.