For many database administrators (DBAs) and developers, deploying large numbers of merge replication subscribers is a pain point. To replicate the database throughout the system, the merge replication process pushes a snapshot of the publisher's database tables to each merge replication subscriber. The heart of the problem is how to push the database snapshot (possibly multiple gigabytes in size) over low-bandwidth connections and deploy it to all of your subscribersand do it within your maintenance window.
For example, consider the challenges of distributing your snapshot to 100 subscribers around the world over a weekend. When your California weekend begins, at 5:00 p.m. on Friday, it's already 10:00 a.m. on Saturday in Sydney, Australiaa time difference of 17 hours. So the weekend you thought was 48 hours long is actually only 31 hours (48 - 17 = 31). Even if you work around the clock, you only have 31 hours to pull off this snapshot distribution, even with staggered deployments.
The first time you create subscribers, you need to deploy your snapshots to those subscribers. You also may have to deploy individual subscribers occasionally in response to failure events; for example, metadata cleanup problems or hardware failure on the subscriber. Certain changes to your publication also can force unexpected topology-wide reinitialization, requiring snapshots to be sent down to all subscribers. To minimize downtime, you need to get the subscribers redeployed as quickly as possible.
The other problem is bandwidth. Your publisher might be connected to a network with a speed of 1 gigabit per second (Gbps), or in a data center with an OC3 pipethat's 155 megabits per second (Mbps). Assuming 100% efficiency of the network card, a snapshot of a 1 GB database on such a system takes 852 seconds to copy. However, subscribers typically use a connection with much less bandwidth and speed. Some subscribers may connect via cable modem, some via frame relay, and others via WiFi. With frame relay or cable modems, you can expect speeds of up to 6 Mbps (750 kilobytes per second), which means that your snapshot takes about 22 minutes.
That's assuming broadband connectivity. I have U.S. clients that synchronize using phone linesthey don't even have DSL, let alone cable modems. Many businesses, especially in commercial parks, only have DSL. This problem is exacerbated when you distribute a snapshot to Third World countries that have only phone lines or low-bandwidth connections, or to subscribers with wireless or lossy connections, which experience frequent interruptions. As a result, your snapshot deployment process needs to cope with network interruptions. Microsoft SQL Server 2005 will resume snapshot deployment after a failure, but an interrupted snapshot process eats into your maintenance window.
How can we deploy these snapshots as fast as possible? This article considers five issues:
- Push subscriptions vs. pull subscriptions
- Compressed snapshots
- FTP vs. UNCs
- Dynamic snapshots
- Replication settings
Let's dive right in with a look at each of these issues.
Push Subscriptions Versus Pull Subscriptions
A push subscription is initiated by the publisher; all the merge agents run at the publisher. A pull subscription is initiated at the subscriber; all merge agents run at the subscriber.
Several factors make pull subscriptions a better choice than push subscriptions:
- Pull subscriptions consume fewer resources on the publisher, so more resources are available on the publisher for the synchronization process. The cutoff point is around 10 subscribers; however, this limit is really dictated by load.
- Pull subscribers are optimized for WAN and low-latency links, making them faster than their pull counterparts.
- Dynamic snapshots (more about these later) offer some advantages with pull subscribers as opposed to push subscribers.
- Merge replication works best when it's scheduled, as opposed to running continuously, for two reasons:
- Only the final state of the row is merged during the synchronization. For example, if a row is updated 200 times between synchronizations, this approach results in only one row having to travel across the wire, as opposed to the 200 rows that would have to travel using transactional replication.
- Multiple subscriptions trying to synchronize at the same time can cause merge agents to lock with each other. To get around this problem, most merge architects either use merge hierarchies or limit the number of concurrent merge processes. You can set limits by using the publication property @max_concurrent_merge. To modify this property on the fly, issue a call like this: