Sams Teach Yourself Microsoft SQL Server 7 in 21 Days

Sams Teach Yourself Microsoft SQL Server 7 in 21 Days

By Richard Waymire and Rick Sawtell

How Does Replication Work?

How does replication actually work? I'm glad you asked. Replication might seem very complex on the surface, but when you break it down into its component parts, it's not too bad. Essentially, replication is handled by four different agents. Each agent has its own specialized job to do. When you put all the agents together, you get replication. The following are the agents:

  • Distribution agent— Moves information from the distribution database to the subscribers.
  • Log reader agent— Monitors the transaction log of all published databases that are using it for replication. When it finds transactions that are part of a publication, it copies them to the distribution database where they can then be applied to the subscribers by the distribution agent.
  • Merge agent— Merges modifications from multiple sites.
  • Snapshot agent— Moves a snapshot of the data before replication can begin. This is required. If a snapshot of the data does not exist at the subscriber, you cannot apply transactions to the subscriber. It is also used for the various types of snapshot replication.

Now take a closer look at how each agent differs in its task when faced with the different replication methodologies covered earlier. Keep in mind that a replication methodology is applied at the individual publication level and that each subscriber can take advantage of different methodologies that have been applied to different publications.

How Does Merge Replication Work?

In merge replication, the merge agent can live on the distribution server or on each subscription server. In a push scenario, the merge agent lives on the distribution server. In pull scenarios, the agent lives on each subscriber. Figure 17.7 outlines the replication process presented in the following steps:

  1. The snapshot agent (which lives on the distribution server) takes an initial snapshot of the data and moves it to the subscribers. Remember that the subscribers must first be synchronized with the publishers for replication to begin (with the exception of snapshot replication).
    17fig07.gif

    Figure 17.7 The merge replication process.

  2. A distribution working folder is created on the distribution server to handle merges.
  3. Replication now begins.
  4. The merge agent takes modifications from the publishers and applies them to the subscribers.
  5. The merge agent takes modifications from the subscribers and applies them to the publishers.
  6. The merge agent receives any update conflicts and takes the appropriate action.

For merge replication to work properly, some changes are made to the table schema as well as the distribution database. These changes are made to allow SQL Server to perform conflict resolution. Keep the following schema changes in mind when you decide to use merge replication:

  • System tables are added to the distribution working folder. These are used to track changes for use during synchronization as well as for conflict resolution.
  • SQL Server creates triggers on both the publishing servers and the subscription servers involved in merge replication. These triggers are fired when a data modification occurs in one of the tables involved in replication. Information about the change is stored in the system tables added to the distribution working folder. These saved changes allow you to track changes to each row or column of modified information.
  • SQL Server creates a new uniqueidentifier column for each row in tables being replicated. A GUID or ROWGUID is then added to uniquely identify that row. In this fashion, when a record is updated at different sources, the different updates can be differentiated.

When you allow updates to the same data to occur at multiple locations, you are going to run into conflicts. To resolve these conflicts, SQL Server does the following.

The system tables that are stored in the distribution working folder are used to track every change to a row. Each row's changes are listed and each entry has a ROWGUID. This history of modifications to a record is called the record's lineage.

Lineage is a history of changes made to a row involved in merge replication. Changes are identified by a ROWGUID attached to the row and stored in the distribution working folder on the distribution server.

Using the lineage, the merge agent can evaluate both the current values and the arriving values and automatically resolve conflicts based on priorities you have assigned. You can customize these priorities, which are stored as triggers, to create your own conflict resolution process. You learn more about this in tomorrow's lesson, Day 18, "Implementing Replication Methodologies."

After you pick a record, the synchronization process continues and the whole cycle repeats itself.

Snapshot Replication Internals

Remember that snapshot replication copies the entire article or publication wholesale from the publisher to the subscriber. This includes snapshot replication with updating subscribers. The updates are done at both the subscriber and the publisher, but when a synchronization occurs the subscriber's data is completely overwritten by the incoming replicated article.

In snapshot replication there is no merge agent; however, the distribution agent is used. If you are using a pull subscription, the distribution agent is found on the subscription server. If you are using a push subscription, the distribution agent is located on the distribution server. Using a push or pull subscription in this type of scenario depends on many factors, not the least of which is how busy your distribution server is and how you want to manage subscriptions. Do you want to manage subscriptions centrally (push subscription), or do you want to manage subscriptions at each subscriber (pull subscriptions)?

Figure 17.8 outlines the snapshot replication process presented in the following steps:

  1. The snapshot agent reads the published article and creates the table schema and data in the distribution working folder.
    17fig08.gif

    Figure 17.8 Snapshot replication process.

  2. The distribution agent reads these schema and rebuilds the tables on the subscriber.
  3. The distribution agent then moves the data into the newly created tables on the subscriber.
  4. Indexes (if used) are then re-created on the newly synchronized subscription database.

Transaction Replication Internals

Remember that transaction-based replication copies just the transactions that occurred in the published databases to the distribution database. The updates are then applied to the subscription database generally as they occur. This reduces latency. The subscription database should be thought of as read-only because this type of replication is one-way. Changes to the data can be made only at the publisher.

In transaction-based replication, there isn't a merge agent anymore; however, there is a log reader agent. Keep in mind that the snapshot agent is still around as well. You must have a basis for applying your transactions, and the snapshot agent accomplishes this for you. As in snapshot replication, the distribution agent is used. If you are using a pull subscription, the distribution agent is found on the subscription server. If you are using a push subscription, the distribution agent is located on the distribution server.

Figure 17.9 outlines the transaction replication process presented in the following steps:

  1. The snapshot agent reads the published article and creates the table schema and data in the distribution working folder in the same manner as shown in Figure 17.8.
  2. The distribution agent reads these schema and builds the tables on the subscriber.
    17fig09.gif

    Figure 17.9 Transaction replication process.

  3. The distribution agent then moves the data into the newly created tables on the subscriber.
  4. Indexes (if used) are then re-created on the newly synchronized subscription database.
  5. Now normal transactional replication can begin.
  6. The log reader agent watches the transaction logs of the publishing databases. When it finds a transaction, it moves the transaction to the distribution database, where it will be stored until the next synchronization process begins.
  7. When the synchronization process is called (either by a push from the distributor or a pull from a subscriber), the transaction is read by the distribution agent and then applied to the subscription database.

+ Share This