Microsoft's SQL Server Analysis Services (SSAS) is an enterprise class tool for building data analytics solutions. Although the product is feature-rich, flexible, and extensible, SSAS databases don't lend themselves very well to scaling up. The software is optimized for executing a single query extremely fast and to derive aggregated, summarized data easily — this is where Analysis Services shines and empowers users to make informed decisions quickly. However, once the number of concurrent requests exceeds a certain threshold (which can vary depending on the application complexity and your server's hardware), queries tend to take longer and cause the aggravation of the same users. Typically you can expect the aforementioned threshold to be between several dozen or several hundred users.
The performance issues worsen if your SSAS database objects must be frequently processed — meaning that you must frequently update dimensions and cubes to apply data changes which took place in the relational data source(s). You can upgrade your server hardware and there are a few configuration settings that might help tweak Analysis Services for multi-user scenarios.
However, by and large, Analysis Services solutions work best when scaled out: Query load is distributed across multiple, relatively inexpensive, servers. But how do you keep the data up to date on multiple servers without having to process data repeatedly from the same relational sources? Welcome to SSAS database synchronization.
Clearly, if you have a small SSAS environment with a handful of users you could easily manage all your processing tasks and query requests using a single instance. However, with Microsoft's vision of Business Intelligence for the masses, it is not uncommon to see SSAS applications serving thousands of users. This article lists various methods of synchronizing databases across multiple instances, and weighs the pros and cons of each approach. The article then focuses on the Analysis Services feature allowing database synchronization using a single XMLA (XML for Analysis) command.
Database administrators have several choices for ensuring that multiple Analysis Services instances have the same data:
- Use the built-in synchronization feature. This method allows the source as well as the target database to remain online while data files are copied. SSAS starts a transaction on the processing server so that you cannot process or alter the database while it is being synchronized. The synchronization target instance may or might not already have a copy of the database you are synchronizing. Synchronization is incremental: If the target database does not exist, then SSAS will transfer all files found in the source database; if the target database already exists, then SSAS will only copy those files that have changed since the last successful synchronization.
- Back up a database on the processing (source) instance and restore to query (target) instances. SSAS backup creates a single file containing the compressed copy of database files. The database remains online while it is being backed up; however, you cannot make any structural changes to the database, nor can you process any objects while the backup is collected. Furthermore, if source and target servers are geographically distant, you might have to copy the backup file to each query server before restoring from backup. SSAS doesn't support incremental or differential backups; you'll have to back up every file that makes up the database. Large database backups can be slow with Analysis Services 2005; the backup performance is a quantum leap faster with 2008 and later versions.
- Detach the database from the processing instance, copy files, and attach to each query instance. The detach/attach method became available with SSAS 2008 and allows us to choose our own method for copying files. As I discuss later in this article, the built-in synchronization's file copy is single-threaded, and SSAS copies only one file at a time. On the other hand, when attempting the detach/attach method, we could use Robocopy, which allows copying multiple files in parallel and can therefore be considerably faster than synchronization. The flexibility comes with additional work, however; unlike synchronization which handles file copying for us, we must write our own code for copying files when detaching and attaching database. Additionally, this method requires downtime; if the target database already exists on the query instance, we must detach it prior to attaching a new copy.