Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Choosing the Back End
- The DBA's Toolbox, Part 1
- The DBA's Toolbox, Part 2
- Scripting Solutions for SQL Server
- Building a SQL Server Lab
- Using Graphics Files with SQL Server
- Enterprise Resource Planning
- Customer Relationship Management (CRM)
- Building a Reporting Data Server
- Building a Database Documenter, Part 1
- Building a Database Documenter, Part 2
- Data Management Objects
- Data Management Objects: The Server Object
- Data Management Objects: Server Object Methods
- Data Management Objects: Collections and the Database Object
- Data Management Objects: Database Information
- Data Management Objects: Database Control
- Data Management Objects: Database Maintenance
- Data Management Objects: Logging the Process
- Data Management Objects: Running SQL Statements
- Data Management Objects: Multiple Row Returns
- Data Management Objects: Other Database Objects
- Data Management Objects: Security
- Data Management Objects: Scripting
- Powershell and SQL Server - Overview
- PowerShell and SQL Server - Objects and Providers
- Powershell and SQL Server - A Script Framework
- Powershell and SQL Server - Logging the Process
- Powershell and SQL Server - Reading a Control File
- Powershell and SQL Server - SQL Server Access
- Powershell and SQL Server - Web Pages from a SQL Query
- Powershell and SQL Server - Scrubbing the Event Logs
- SQL Server 2008 PowerShell Provider
- SQL Server I/O: Importing and Exporting Data
- SQL Server I/O: XML in Database Terms
- SQL Server I/O: Creating XML Output
- SQL Server I/O: Reading XML Documents
- SQL Server I/O: Using XML Control Mechanisms
- SQL Server I/O: Creating Hierarchies
- SQL Server I/O: Using HTTP with SQL Server XML
- SQL Server I/O: Using HTTP with SQL Server XML Templates
- SQL Server I/O: Remote Queries
- SQL Server I/O: Working with Text Files
- Using Microsoft SQL Server on Handheld Devices
- Front-Ends 101: Microsoft Access
- Comparing Two SQL Server Databases
- English Query - Part 1
- English Query - Part 2
- English Query - Part 3
- English Query - Part 4
- English Query - Part 5
- RSS Feeds from SQL Server
- Using SQL Server Agent to Monitor Backups
- Reporting Services - Creating a Maintenance Report
- SQL Server Chargeback Strategies, Part 1
- SQL Server Chargeback Strategies, Part 2
- SQL Server Replication Example
- Creating a Master Agent and Alert Server
- The SQL Server Central Management System: Definition
- The SQL Server Central Management System: Base Tables
- The SQL Server Central Management System: Execution of Server Information (Part 1)
- The SQL Server Central Management System: Execution of Server Information (Part 2)
- The SQL Server Central Management System: Collecting Performance Metrics
- The SQL Server Central Management System: Centralizing Agent Jobs, Events and Scripts
- The SQL Server Central Management System: Reporting the Data and Project Summary
- Time Tracking for SQL Server Operations
- Migrating Departmental Data Stores to SQL Server
- Migrating Departmental Data Stores to SQL Server: Model the System
- Migrating Departmental Data Stores to SQL Server: Model the System, Continued
- Migrating Departmental Data Stores to SQL Server: Decide on the Destination
- Migrating Departmental Data Stores to SQL Server: Design the ETL
- Migrating Departmental Data Stores to SQL Server: Design the ETL, Continued
- Migrating Departmental Data Stores to SQL Server: Attach the Front End, Test, and Monitor
- Tracking SQL Server Timed Events, Part 1
- Tracking SQL Server Timed Events, Part 2
- Patterns and Practices for the Data Professional
- Managing Vendor Databases
- Consolidation Options
- Connecting to a SQL Azure Database from Microsoft Access
- SharePoint 2007 and SQL Server, Part One
- SharePoint 2007 and SQL Server, Part Two
- SharePoint 2007 and SQL Server, Part Three
- Querying Multiple Data Sources from a Single Location (Distributed Queries)
- Importing and Exporting Data for SQL Azure
- Working on Distributed Teams
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Importing and Exporting Data for SQL Azure
Last updated Mar 28, 2003.
One of the primary tasks a data professional performs for their organization is data movement — taking data from one system and moving it into another. This might be from text files, XML documents, or other database systems.
I've explained how to do this for SQL Server in other articles here on InformIT. Because this is such a common task, there are many tools you can use based on what you need to get done. Even though SQL Azure is based on the SQL Server database engine, there are differences — some subtle, some large — that make this system handle this task differently.
You can still use many of the tools you're familiar with to move data in and out of SQL Azure, and I'll cover those here. What you should keep in mind is that you have the least control in a distributed database environment over the networking layer between the client system and the database itself. Microsoft has some of the largest networking connections available for their datacenters, but the limiting factor is any link between your data source and SQL Azure.
The implications of the networking consideration are that you need to ensure whatever method you use involves the smallest batch of work that you can tolerate (smaller batches take longer to complete in total) and that your method of transfer handles connection breaks with retry logic. Latency in your connection between your data source and SQL Azure affects both of these vectors.
Another consideration is data size. Obviously the larger the data set the more prominent the previous concerns with the previous issue of networking. But you also have to consider that your data set will fit in the database size on SQL Azure that you've paid for. The details of the database sizes and prices can be found here: http://www.microsoft.com/windowsazure/features/database/
Finally, you need to consider compatibility issues. The biggest issue here is the same as moving into on-premise SQL Server — data types. If you're coming from an Excel spreadsheet or a text file, you're probably already familiar with handling dates and times properly, or currency. But if you're coming from a SQL Server on-premise database, you might think that it's a simple matter of using the copy database function or a right-click in SQL Server Management Studio (SSMS) to export or import data, but it isn't.
Some of the differences you'll see in this area have to do with the fact that you don't own a "server" in SQL Azure. That means you don't have a local drive you can upload data from, or push data onto. Those issues can be worked around as I'll show you in this article, but it is something to keep in mind.
The other important consideration in this area is that some features from on-premise SQL Server that are not yet supported in SQL Azure, and some that are supported differently. For instance, many shops use the SQL Server Replication feature to move data in near real-time between systems. The Replication setup has many server-dependent components, and high latencies are an obvious problem. That means that as of this writing, SQL Replication is not supported for SQL Azure. But there are workarounds that I'll show you here for keeping data in sync between an on-premise system and SQL Azure.
Even data types can be an issue. If you're using a Common Language Runtime (CLR) — based data type, SQL Azure does not yet support that construct.
You'll notice throughout this explanation I've said "not yet" and "as of this writing". One of the advantages of a distributed database platform like SQL Azure is that updates come frequently. While you should evaluate those updates carefully to ensure that they do not break your code (another article on that later), you will find that new features are added multiple times a year, instead of once every few years. The original database sizes available in SQL Azure, were smaller than they are now, and new features and data types have been added. You should bookmark these pages to review what is and is not available at the time of your deployment: http://msdn.microsoft.com/en-us/library/ee336245.aspx and http://msdn.microsoft.com/en-us/library/ee336250.aspx and http://msdn.microsoft.com/en-us/library/ee621784.aspx
Also — keep your client tools up to date with the latest service packs. SSMS needs to be made "aware" of the changes in SQL Azure, and since it's an installed software package on your system it needs to have the latest patches available to keep up.
With that background, you can now start the process to move data in and out of SQL Azure.
Script and Check if Needed
If you're moving from a text file or Excel Spreadsheet to SQL Azure, then you can simply treat SQL Azure like any other destination, because you're aware that the data types and features between them are dramatic. Just as in an on-premise exercise, create a data map between the systems and so on.
If you're coming from SQL Server and do not yet have the database in SQL Azure, then you can find out what structures will and won't transfer in a simpler manner than scanning those documents (which you should still do) and meticulously checking your data structure. You can script the database to a file to see what will and will not work.
You'll need the latest version of SSMS installed with all of the latest updates. Open SSMS, connect to your source database, and then right click the database you want to copy or move. Select "Tasks" from the menu that appears.
Select "Next" at the start wizard page to move to the Choose Objects panel. On that panel you can pick and choose the objects you want to transfer. If you want to try this out, you can use the AdventureWorks sample database — don't worry, I'll stop before we move anything in to a SQL Azure database, so this test won't cost anything.
After you make your selections, select the "Next" button. The Set Scripting Options panel is the most important selections for SQL Azure. Change the defaults of where to save the file or where to send the results as you wish, but select the "Advanced" button before you select "Next". In the Advanced Scripting Options panel change the "Script for database engine type" to "SQL Azure Database" — that's a critical step. Then click "OK".
Now, back on the Set Scripting Options panel, click the "Next" button. This final panel allows you to review the options you selected, and to click the "Previous" button if you want to change anything. If you expand the plus-sign next to Options | General, you'll see the engine with the SQL Azure type selected. If not, use the "Previous" button to go fix that. Click the "Next" button when you're ready for the process to start.
If you're using the AdventureWorks database as I am, you'll notice a failure, and the process will stop. Click the "Save Report" button and select a location to store an HTML file with the details. This report will tell you where the first error the system encountered was. True, it would be far nicer if it found all of the errors, but in large part this triggers a type of error you'll see throughout your structures, and you can take out all of the common errors at one time with a script. Also, it's important to carefully evaluate each compatibility problem — blindly transferring one data type to another can be very dangerous. So even this "bump and go and repeat" method can be useful.
In this case, the scripting engine found an incompatibility with the CLR data type. That's especially important to know. You might be able to simply change the data type to another in a copy of the database and re-run this process. But there may be application logic that depends on that data type, so having this report handy is very useful.
I'll assume now that the database is either cleansed and ready for transfer or that it is another type of data source. For this article, I'll use a Comma-Separated Value (CSV) text file as a source, and a destination table that already exists on SQL Azure. The table in question belongs to a veterinarian office, and the structure looks like this:
CREATE TABLE [dbo].[Subject] -- Table for the farm animals the vet treats ([SubjectID] [int] IDENTITY(1,1) NOT NULL, -- Unique Identity for the animal, auto-generated [SubjectIdentifier] [varchar](100) NOT NULL, -- ID tag used by the farmer [SubjectDetails] [varchar](255) NULL, -- name of an XML file with details of the animal [SubjectVitals] [varchar](255) NULL, -- name of an XML file with SOAP (medical) information CONSTRAINT [PK_Subject] PRIMARY KEY CLUSTERED ([SubjectID] ASC) -- Clustered Primary Key, required by SQL Azure clusClustered Primary key, required by SQL Azure
The text file I have follows this structure, with different data sets for each run, called "subjects.csv":
BOV1000, BovineDescriptorBOV1000, BovineVitalsBOV1000 BOV2000, BovineDescriptorBOV2000, BovineVitalsBOV2000 BOV3000, BovineDescriptorBOV3000, BovineVitalsBOV3000 BOV4000, BovineDescriptorBOV4000, BovineVitalsBOV4000 BOV5000, BovineDescriptorBOV5000, BovineVitalsBOV5000
With more lines that follow for each animal. I'll use this small data set to move the data in and out of SQL Azure.
The simplest method to use for data transfer is SQL Server Integration Services, or SSIS. In the latest release of SQL Server Management Studio, you can connect to the data source or destination of SQL Azure. Remember, you'll get errors if the data structure is not supported, so I recommend that you clean your sources and destinations prior to starting this step.
Opening the latest version of SSMS, I connect to the SQL Azure database (not instance) that I've got with the table I referenced earlier.
By and large, it's the same process to work with SSIS in SQL Azure as it is with almost any data source and destination, with the idea that you want to keep the batch size small (I use from 30-50MB per batch) and watch the data types.
You can read more about how to work with the latest version of SSIS here: http://msdn.microsoft.com/en-us/library/ms141026.aspx If you do not use the FAST LOAD option on the destination, you will force a row-by-row operation which is slower but will tolerate the latency over the Internet better.
You can see a screen-by-screen description of this process here: http://blogs.msdn.com/b/sqlazure/archive/2010/05/19/10014014.aspx and another here: http://sqlserverpedia.com/wiki/Migrating_Data_to_SQL_Azure_Using_SSIS
SSIS has all the advantages of a graphical tool, and you can save the package you create with it and even run it from various scripting languages. If you don't need or want a graphical environment you can use the BCP (Bulk Copy Program) to move data into SQL Azure as well.
In fact, if you're used to this command-line tool, it can be easier to use than others. Once again, make sure you have the latest client tools installed. The BCP tool and several layers underneath it need to be Azure-aware, and that's only in the latest versions.
This is actually my method of choice for this process - I'm able to control the batch size easier, and I use this in many automated files I have already.
The two things to be aware of are the format files and the batch-size commands. In my case, I have an IDENTITY column defined to maintain Referential Integrity in my database, but not in the text file I'm importing from. Whether or not you use an Identity value (a debatable practice indeed) you will probably still have fields in the database that are not mapped properly. A format file can help you state which columns go where, data types and so on. If you're not familiar with a format file, there's a good description here: http://support.microsoft.com/kb/67409
If you do have a one-to-one mapping, you can simply use the bcp command like this:
bcp WAVS.dbo.Subject in c:\temp\subjects.csv -c -Uloginname@SQLAzureServerName -SNameofSQLAzureServer -t,
Let's dissect that a little.
bcp: The command itself
WAVS.dbo.Subject: The fully-qualified name of the database, schema, and table.
in: The direction. It sends from right to left, so the file that comes next is being sent to the SQL Azure database.
c:\temp\subjects.csv: The text file I described at the top of the article.
-c: Treat fields as characters, without asking for the field type of each one. Most likely you'll want more control that this, so you'll use the format file I mentioned a moment ago. In this table, all of the fields I imported were varchar() types, so this was acceptable.
-U: User name. Remember, SQL Azure uses SQL Server authentication in the format of UserName@FullyQualifiedSqlAzureDatabaseName
-S: The name of the SQL Azure account (server) in the format FullyQualifiedSqlAzureDatabaseName
-t,: By default bcp uses the tab character to separate data. Since my source uses a comma, I have the t (for terminating separator) and then a comma.
Since I haven't used the -P switch, the password will be requested. Be careful storing this information in a batch file - it can bite you. Better to pass that as a parameter from the system that runs the batch, where you can protect it.
What I don't have in this line is something you should experiment with - the -b "batch" switch. This switch sets how many rows will load up to SQL Azure before a commit operation. The lower number you set this, the more tolerant it will be in sending the data, but the slower the complete transfer.
You might want to experiment with the -h switch instead, which is documented here: http://msdn.microsoft.com/en-us/library/ms188267.aspx. This switch allows you to set the Kilobytes of data to transfer in a batch, rather than the number of rows. This can be a lot easier to get to a number between 30-50MB per batch, which is what seems to work from my location.
The official documentation for the bcp command is here: http://msdn.microsoft.com/en-us/library/ms162802.aspx
But what about replication? What if you want to keep SQL Azure in synchronization with SQL Server on-premise?
Replication is not currently supported in SQL Azure. One of the reasons is that a Replication Scenario requires careful planning of the various servers that act as the source and destination of data, the routing of that data, and the fact that in SQL Azure you're at the database level, not the Server or Instance level. In other words, the architectures are too different to currently allow it.
So does that mean you can't keep them in sync? You do have options.
Of course, you can use code to connect to a data source and to the SQL Azure database, and then use various ADO.NET commands as you do with any other application. The keys here are once again to control the batch size, and handle retry logic correctly.
This is often the approach you need to explore if the data needs to be transferred periodically. When I was new to databases years ago I had set up a rather sophisticated data transfer system between systems using various database tools on both sides. Needless to say, the system was rather fragile, and had multiple conflict issues. When I sat in on a code review (a rare thing for the DBA at that company) I brought up the issue. The senior developer said "we can just write that into the code, you know. It's just a connection string and some commits to us." What a revelation!
Of course, it's not always quite that simple, but since you can't guarantee the performance of the connection to SQL Azure from the client or source, you really need to start treating SQL Azure as part of a Service-Based Architecture rather than an on-premise data source. That means your developers will write code that will connect to the various sources of data and handle the proper transfers.
I have an example of this sort of code here: http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=392
Biztalk and SQL Azure Data Sync
If your developers don't own the application data you need to move, you still have options. The latest add-on packs for the Microsoft Biztalk program which helps you develop a Service Architecture has the capability to snap-in an Azure or SQL Azure data stream. That means you can send "messages" which are made up of data from literally hundreds of data sources to SQL or Windows Azure. You can read more about that here: http://btsazureadapters.codeplex.com/
There's also a newer program, which as of this writing is still in Customer Technical Preview, called SQL Azure Data Sync. This program, while not quite SQL Server Replication Services, allows you to keep data between an on-premise SQL Server and SQL Azure in sync. There is a lot of information on it here: http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-data-sync-overview.aspx