Replication needs to be planned carefully. Setting things up is easy, but there is no magic UNDO button that will reverse all your actions. Therefore, be sure to test your plan thoroughly before implementing a replication solution. The following sections discuss some of the planning steps necessary for transactional replication.
Defining Server Roles
The first step is defining server roles using the publisher/subscriber/distributor model discussed earlier. Identifying publishers and subscribers is relatively simple: Publisher is the server in which data changes occur; subscriber is where changes need to be delivered.
Be careful when defining the distributor, however. It is often tempting to use the publishing server as a distributor because it can translate into saving the cost of purchasing a separate server. However, if you're implementing replication to reduce the load on the server or to separate read/write activity, you should strongly consider using a separate server dedicated to distributing replicated transactions. Bear in mind that every single transaction that occurs on your server must be recorded in the distribution database prior to being delivered to the subscribers. Some shops opt for using one of the subscriber servers as the distributor, which can work if the subscriber is not heavily taxed with the existing load. Using the subscriber as the distributor is also common in test and quality assurance environments.
Determining What Data Needs to be Replicated
You might now be thinking: "Oh, that's easy...just replicate all database objects, right?" Not so fast! If you have a single database server with 20 users and a 120MB database, you can safely publish all data and subscribe to it from another server. But in the real world, you should replicate only data that you must have on the subscribers. SQL Server replication is a mature technology that can move many millions of transactions each day given sufficient hardware resources.
But I'd hesitate putting my production servers to such a test. To accomplish the best possible performance without bringing your servers to their knees, consider carefully which tables, views, stored procedures and user-defined functions you must replicate.
Transactional replication involves three agents: snapshot, log reader, and distribution. The snapshot agent takes a snapshot of records on the publisher and copies the data out to the snapshot folder. The snapshot agent also generates scripts for database schema and includes CREATE TABLE and CREATE INDEX scripts.
The snapshot agent doesn't have to run continuously for transactional replication. If you get your replication working properly the first time, you might never have to run the snapshot agent again after the initial synchronization. However, if you do have problems with the subscriber servers missing data, the snapshot agent is there to help.
The log reader agent reads the transaction log on the published databases. This agent moves transactions that are marked for replication to the distribution database. The distribution agent delivers transactions from the distribution database to the subscribers. Log reader and distribution agents have to run continuously (or at scheduled intervals) to keep replication working.
In addition to snapshot, log reader, and distribution agents, replication also uses a few other jobs (agents) to keep things organized. The history cleanup agent, for example, is used to delete transactions that have already been delivered from the distribution database. Indeed, if this agent did not work, the distribution database would grow very large.
Agent Profile Settings
Replication agents are implemented as SQL Server jobs that call executable files with certain parameters. You should be aware that clicking through the replication wizards configures agents to run with default parameters. If you need to tweak agent parameters for troubleshooting or for performance reasons, you'll have to modify the replication agent's profile. (I'll discuss replication agents' parameters in the next article.)
Replication agents must have appropriate security permissions to read data on the publisher, move transactions to the distributor and apply the data and transactions to the subscribers. You can allow replication agents to run using security credentials of SQL Server Agent service; alternatively, you can define a login that has a more limited set of permissions. Security is not a joking matter: Allow your replication agents too much freedom, and a hacker can destroy your data on publishers as well as subscribers. On the other hand, not granting sufficient permissions to the agents prevents replication from working properly.
Identity columns present a special challenge for some replication scenarios. For example, consider what happens if you have multiple publishers sending transactions to a single subscriber. You want to split the user base among two servers, but many tables in your database have identity columns. How do you prevent identity values from colliding on two servers? You must manage the identity ranges on your servers somehow to avoid duplicate identity values on the subscriber. One solution is to allow only positive IDENTITY values on Server A and only negative IDENTITY values on Server B.
Collation is a combination of case-sensitivity and sort order used for character data. Earlier releases of SQL Server limited defining of collation to a particular server or database. With SQL Server 2000, you can define collation for each character column. Such freedom comes with the price of an administrative nightmare, however, if you have numerous character columns using different collations.
As far as replication is concerned, published and subscribed tables must have the same collation for corresponding columns. Having different collations on the publisher and subscriber might result in failure to replicate transactions. Furthermore, SELECT statements executed against the same table on the publisher and subscriber might return different data if tables have different collation.
Triggers and Constraints
Yet another consideration is whether triggers and constraints should be enforced on the subscribing servers. Triggers are often used to maintain data integrity by executing INSERT, UPDATE, or DELETE statements when a corresponding data modification occurs in related tables. Such statements might already be replicated from the publisher, however; therefore, if you're not careful, you might end up running the same transaction twiceonce through replicating the transaction from the publisher and then through a trigger that resides on the subscriber. On the other hand, if you allow modifying data on subscribers, you might want to enforce data integrity through triggers. Constraints check the existence of the related record prior to allowing the modification of a row in a table. However, depending on your replication scheme, the related table(s) might not be replicated.
The saving grace with triggers and constraints is the NOT FOR REPLICATION option, which allows SQL Server to disregard triggers and constraints for replicated transactions (but check them for all other transactions). If your subscriber databases are used for read-only purposes, you might wish to drop triggers and constraints from subscribing databases altogether.
TEXT and IMAGE Data
TEXT, NTEXT, and IMAGE are special data types for holding a large number of characters or images stored in binary format. Such data types are typically modified using the WRITETEXT and UPDATETEXT commands, which are not logged. You can also modify such data using INSERT/UPDATE statements, which are logged. Be sure to examine your application carefully to ensure that your text and image data changes are being logged. In addition, be aware that although you can replicate columns with the TEXT data type, SQL Server has a limit of characters that will be replicated for such data.
Immediate Updating and Queued Updating
When you want to use updatable subscribers, you need to consider whether transactions need to be delivered back to the publisher immediately or whether some delay is acceptable. Typically, you should opt for queued updating if continuous network connectivity is not guaranteed.