Principles and Best Practices
Let's consider some best practices for migration, including clearing up some common misconceptions about these practices.
Best Practices for Migration on the Cloud
One misconception is that image capture is a problem-free way to migrate workloads. While image capture is a useful tool that's simple to use and employs standard features, it needs to be used only where it's most appropriate. Image capture makes a bit-for-bit copy of the operating system, all the applications installed on the machine, and all the data stored. Using image capture in a simple way for migration presents some disadvantages:
- Downtime required to save instances to images, as well as time to copy images between data centers
- The primary hostname and IP address are embedded in application servers, database configuration files, and other software-configuration files, which may then be frozen in virtual machine images
Figure 2 shows the steps for workload migration by image capture. The steps in red must be executed while the service is unavailable to users, illustrating the point just described—the image-capture method of migration can lead to excessive downtime.
Figure 2 Simple use of image capture for workload migration.
This downtime may be acceptable if you're the only user of the system, but it can be unacceptable if used injudiciously to migrate systems that support larger user populations. If you must avoid the excessive downtime of image capture, an alternative best practice to image capture is to base instances on standard images available at multiple data centers with an automated, repeatable install procedure to ensure portability. Some applications such as the WebSphere wsadmin scripting tool or WebSphere profile-management tools can be leveraged for automated setup from a base image in a new location. We will discuss this option later in this article.
Second and third best practices address the second issue associated with image capture; namely, that network configurations are frozen in an image capture.
The second best practice is to refer to servers via DNS aliases (CNAMEs). These aliases are more portable than primary hostnames and IP addresses. Multiple aliases can be added for servers and changed when needed, unlike the primary hostname and IP address, which are embedded in configuration files and therefore cannot be changed easily. For example, suppose we have the alias http://www.myserver.com and primary hostname vhost12345.ihost.com. We can use the primary hostname for administrative purposes and provide the alias to end users.
A third best practice is to use HTML/HTTP redirect and maintenance messages on the server to be taken offline. Avoid the situation where clients cache old DNS records containing name-to-IP mappings by managing the DNS time to live (TTL). Many Internet service providers (ISPs) offer economical DNS service, but they don't always allow you to manage the time to live.
Finally, some considerations must be addressed when actually transitioning a workload from one location to another.
There's a tradeoff between quiescing the system and the goal of zero system downtime when making a transition. To avoid losing data entered by users at the point when a server is brought down for maintenance, having a quiescing period is a best practice. During that time, the server shouldn't allow any further transactions to begin, instead gracefully completing all ongoing transactions. If the goal is zero downtime, we want to transfer immediately from the primary to the secondary system. However, this cannot be done easily without losing ongoing transactions running on the primary server. If zero downtime is required when making the transition, we must implement a more advanced active-active configuration.
Similarly, high-availability configurations involve a tradeoff in system portability. High-availability configurations are generally within a single data center and involve the addition of permanently deployed secondary systems, usually in clusters. If you need to migrate a whole set of primary and secondary servers to an alternate data center, the work to do that is considerably greater. In addition, there are complex dependencies between parts of the system, especially IP addresses embedded in firewall and load-balancer configuration files.
Best Practices for Tool Selection
Several principles in tool selection help to ensure use of tools that are compatible with operating and migrating on the cloud.
It's preferable to use tools and software that reduce sensitivity to network latency when working across data centers. Some software, especially software made to manage clusters, needs low network latency. Examples of tools that don't need low-latency network connections include WebSphere Job Manager, DB2 HADR for keeping databases synchronized, and rsync for keeping directory trees synchronized.
Prefer tools that can be used in a secure mode or can operate the whole system in a secure zone using virtual local area networks (VLANs). This is important for transferring data over the Internet.
Use tools that are reliable to transfer large files or to support long-running applications. Tunneling or proxifying over Secure Shell (SSH) are convenient and can make the connection secure, but SSH tunnels can be prone to breakage. It's better to use application-specific SSL connections or a VPN to ensure a connection with a long life.
Prefer tools that reduce downtime. Image capture is an example of a tool that lead to system downtime. WebSphere profile management tools can migrate an entire application profile to another system, but you need to shut down WebSphere to do this. In contrast, the WebSphere wsadmin scripting tool can allow more granular management without the need to shut down the application.
A Better Migration Approach: Maximizing Availability of Lightly Used Applications
In a traditional approach to high availability, we try to build and maintain redundant components for any part of the system that might fail, in order to avoid any one single point of failure. This is expensive and requires lots of expertise. In a cloud approach to high availability, we treat any node in a system as disposable and immediately replaceable by other nodes in a large cluster. This approach is not applicable to many business systems, including the lightly used applications we focus on in this article. However, an approach based on portability emphasizing automation and repeatability is widely applicable, especially for single-server lightly used applications. In this approach, we recognize that most causes of downtime are due to maintenance and within our control. We're ready to reinstantiate the application and reload data when needed, using cloud-based automation techniques, which avoids most of the downtime associated with simple image capture. This approach can also function as a highly available system if some tolerance for downtime can be accepted in the event of an unexpected system failure, which might be acceptable in many business contexts. The disadvantage of this approach is that it requires greater administrator expertise or development expense. Figure 3 illustrates this general approach, which will be the focus of the case studies in the remainder of this article.
Figure 3 Migration based on automation for repeatable installation.