Monitoring and Troubleshooting Replication Performance
Replication Monitor has improved considerably when compared with previous releases of the software, and this is particularly true of monitoring replication performance. You saw that the Publications tab has two columns for tracking performance: Current Average Performance and Current Worst Performance. These columns display different values, depending on replication latency threshold, and can report one of the following:
- Excellent: latency is 0–34 percent of the threshold. For example, if the latency threshold is 30 seconds, and transactions are delivered within 3 seconds, performance is excellent.
- Good: Latency is 35–59 percent% of the threshold.
- Fair: Latency is 60–84 percent of the threshold.
- Poor: Latency is 85–99 percent of the threshold.
- Critical: Latency exceeds the threshold.
Replication latency is the amount of time it takes to commit a transaction in the subscription database after it is committed in the publication database. The default value for replication latency threshold is 30 seconds; you can view and change this setting by selecting a single publication in the left pane and then choosing the Warnings And Agents tab, as shown in the next figure.
If replication latency isn’t a huge concern in your environment you can turn off the warnings by un-checking the Enabled checkbox. When applying the initial snapshot, it is normal to expect the latency to go beyond 30 seconds; if you don’t want to see Critical performance reported, you can up the latency threshold to something more reasonable.
Replication Monitor enables you to configure alerts so that a database administrator can be notified about a particular performance condition. Alternatively, a job can be executed if such a condition occurs. The list of replication-related alerts available out of the box is shown in the following screen.
Clicking Configure brings up the dialog box for configuring alert properties. Replication alerts are configured much like any SQL Server alerts, so I won’t discuss them here.
Yet another improvement in monitoring replication performance is implemented through tracer tokens. If you click a particular publication in the left pane of the Replication Monitor, the right pane will have three tabs. The first tab simply shows you all subscriptions to the given publication and enables you to drill into details of a given subscription. The second tab, Tracer Tokens, enables you to insert a small amount of data (tracer token) in the transaction log of the published database and monitor how long it will take to deliver this data to the distributor and subscriber. As shown in the following figure, the particular subscription I examined had a total latency of five seconds (four seconds of which were spent delivering the transaction to the distributor, and the last second delivering the transaction from distributor to the subscriber).
Tracer tokens offer an easy way to determine the performance of your publications and to troubleshoot any performance-related issues. In this example, I saw that delivering data from publisher to distributor took four seconds; if that is above normal, I’d examine the connectivity between publisher and distributor. Furthermore, I might want to consider using a remote distributor or using pull subscriptions instead of push subscriptions to reduce the load on the publishing server. If the latency of delivering transactions from distributor to subscriber were unacceptable, I’d examine the connectivity between those servers and check any other activity on the subscriber that might have hindered this server from receiving the replicated commands.