InformIT

Choosing Data Containers for .NET, Part 2

Date: Mar 21, 2003

Article is provided courtesy of Sams.

Return to the article

Jimmy Nilsson describes typed and untyped DataSets and tells why their built-in functionality makes them good data containers.
This template contains functions and styles to apply to a document to produce an XML file ready for upload to an article distribution server - Title Page

In the first installment of this series of articles, I discussed the DataReader. It's actually not what I call a data container, but it's a very good baseline for the tests because the other options use DataReader behind the scenes. In this second article, I discuss DataSets, both untyped and typed.

NOTE

Supplementing code for this article can be found at http://www.jnsk.se/
informit/container1.htm
.

Keep the Fire Burning

In the first article, I posed the question of whether Microsoft or I have determined the best data container for most situations. I vote for custom classes, and Microsoft seems to vote for DataSets. My friend J.C. Oberholzer sent me some feedback:

I tend to think you are right—Microsoft writes ADO.NET and updates it as they go along. If you combine different components, you may end up supporting different versions of ADO.NET and DataSets when Microsoft decides to upgrade ADO.NET for any one of their reasons. When working with multiple components to create a system and maintaining different versions of these components, I would rather maintain my own than rely on Microsoft to make the decisions. The DataSet has a lot of built-in functionality that this guy needs, some that another guy needs, and so on. My custom goodie has the functionality that I need and can debug.

That's an interesting viewpoint, in my opinion. To balance it a bit, though, I've also heard that because Microsoft is pushing DataSets so much, they will see to it that we have a smooth upgrade path in the future.

Okay, let's get started with the topic of today: DataSets. (But be sure to read the first article, if you haven't already!)

Background on the DataSet

When I worked with ADO and VB6, I often used disconnected ADO Recordsets. They were nice, especially regarding marshaling, because they used custom marshaling and could "travel" between processes. But there were caveats, of course. One was that you needed to send several resultsets, for example, when you had used several SELECT statements within one stored procedure; you couldn't disconnect a Recordset with several resultsets. The solution I most often used was to move the resultsets from the first Recordset into an array of Recordsets. It was kind of a hack, but it was useful.

Another important occurrence in the dark ages before .NET was that Microsoft showed the in-memory database (IMDB) in the beta versions of COM+ 1.0. The idea was to have an in-memory cache of data to work with so that the ordinary database server wasn't touched as often and, consequently, performance and scalability would increase. For different reasons, IMDB was withdrawn before the final release of COM+ 1.0.

Rumors say that IMDB was withdrawn for one or more of the following reasons:

With the release of ADO.NET, we got the DataSet class, which addresses both the need for a disconnected Recordset with several resultsets and the need for an in-memory cache. In Figure 1, you can see the different classes from which the DataSet is aggregated.

Figure 5Figure 1 Class model of the DataSet.

A DataSet can have one or more DataTables. (A DataTable can be thought of as a resultset.) Each DataTable can have DataRows, DataColumns, and Constraints. The DataSet also can have DataRelations between the different DataTables. With this model, you can build a complete representation of a database, but in memory.

When I discussed the DataReader, I said that concrete DataReaders—for example, SqlDataReader and OleDbDataReader—are called. That is not the case with the DataSet; it is independent of providers and data sources. You can create and fill a DataSet completely with simple code if you want to, without touching a database at all.

As a matter of fact, there is much to say about the DataSet—probably enough for complete books. Before moving to the next section, I'd like to summarize some of the other key built-in features of the DataSet:

Background on the Typed DataSet

So far, I've been talking about the untyped Dataset. You can also instruct Visual Studio .NET to create a typed DataSet for you. You do that by describing the schema in XML or graphically, as shown in Figure 2.

Figure 2Figure 2 A schema for a typed DataSet.

The schema is used for generating a class in your project with help from the XSD utility. (That is done automatically for you in Visual Studio .NET.) The generated class inherits from DataSet and somewhat encapsulates the untyped DataSet.

NOTE

I use the phrase "somewhat encapsulates" for how the typed DataSet hides information about the untyped DataSet because you can always access the untyped DataSet instead. That is because the typed DataSet inherits an untyped DataSet. In my opinion, this is a major weakness.

When you are using a typed DataSet, the schema of the DataSet is known at design time, so many of the tools in Visual Studio .NET can be used to increase productivity (for example, when you create the user interface). You also benefit by working with the DataSet in a type-safe manner when you write your code. Sure, you might still get runtime-type exceptions if your typed DataSet and the database aren't in synch, but many of us really like to have IntelliSense.

NOTE

The first time I saw IntelliSense in VB, my reaction was to ask where I could disable it. But I quickly got accustomed to using it, and I'm now very sure that it increases my productivity a lot. The most irritating thing about IntelliSense is that it doesn't work everywhere, such as in Notepad.

All readers, in a chorus: "Show us some code! Please!"

Okay, I'll do that, but first let's go over a quick summary of pros and cons.

Pros and Cons of the DataSet

The DataSet is certainly a powerful thing. Let's take a look at some advantages it offers (some are more or less the same as those previously mentioned as key functionality of the DataSet):

Of course, some drawbacks are associated with DataSets:

Pros and Cons of the Typed DataSet

The typed DataSet inherits from the untyped DataSet, so the pros and cons are largely the same. The typed DataSet also has these benefits:

These disadvantages are associated with typed DataSets:

Those of you who have read Martin Fowler's Patterns of Enterprise Application Architecture (Addison-Wesley, 2002) will recognize that the patterns typically used when working with DataSets are the Table Module pattern and the Transaction Script pattern.

DataSet Code Examples

Now let's take a look at some code samples, both from the server side and from the client side. First we'll address the server side. In Listing 1, you find some code for fetching data from the database with help from a stored procedure.

Listing 1: Code for Filling a DataSet

Dim aCommand As New SqlCommand _
(SprocOrder_FetchWithLines, _GetClosedConnection)
aCommand.CommandType = _
CommandType.StoredProcedure
aCommand.Parameters.Add _
("@id", SqlDbType.Int).Value = id

    Dim anAdapter As New SqlDataAdapter(aCommand)

    anAdapter.TableMappings.Add("Table", "Orders")
    anAdapter.TableMappings.Add _
  ("Table1", "OrderLines")

  anAdapter.TableMappings _
  (OrderTables.Orders).ColumnMappings.Add _
  ("Customer_Id", "CustomerId")
    anAdapter.TableMappings _
  (OrderTables.OrderLines).ColumnMappings.Add _
      ("Orders_Id", "OrderId")
    anAdapter.TableMappings _
  (OrderTables.Orders).ColumnMappings.Add _
      ("Product_Id", "ProductId")

    anAdapter.Fill(dataSet)

Note the basic pattern shown in Listing 1. First a Command is set up. Then comes a DataAdapter, and, finally, Fill is called on the DataAdapter.

NOTE

The code in Listing 1 doesn't show how the DataSet is instantiated. This is because the code in Listing 1 is from a utility method that can be used for filling both typed and untyped DataSets. For that to work, the DataSet is instantiated as a DataSet or an OrderDataSet, for example, outside of the utility method and is sent as a parameter.

Quite a lot of the code in Listing 1 relates to mappings. First is some mapping code for giving the first DataTable the Orders name and then for giving the second DataTable the OrderLines name. For typed DataSets, this is important: Without this, you will end up with four DataTables in the DataSet instead of two. For untyped DataSets, this is important only for creating meaningful names for the DataTables.

The second mapping section is for changing some of the column names used in the stored procedure. Again, for the typed DataSet, this is important, but for the untyped DataSet, this is merely for convenience.

Now let's look at some code from the client side. To browse the information in the DataSet, we could use the code in Listing 2. Note that here I'm browsing a DataSet with two resultsets (or, rather, DataTables).

Listing 2: Code for Browsing a DataSet

Dim anOrderDS As DataSet = _
_service.FetchOrderAndLines(_GetRandomId())

Dim anOrder As DataRow = _
anOrderDS.Tables(OrderTables.Orders).Rows(0)
_id = DirectCast(anOrder(OrderColumns.Id), _
Integer)
_customerId = DirectCast _
(anOrder(OrderColumns.CustomerId), Integer)
_orderDate = DirectCast _
(anOrder(OrderColumns.OrderDate), Date)

    Dim anOrderLine As DataRow
 For Each anOrderLine In anOrderDS.Tables _
 (OrderTables.OrderLines).Rows
_productId = DirectCast(anOrderLine _
(OrderLineColumns.ProductId), Integer)
_priceForEach = CType(anOrderLine _
 (OrderLineColumns.PriceForEach), Decimal)
_noOfItems = DirectCast(anOrderLine _
 (OrderLineColumns.NoOfItems), Integer)
_comment = DirectCast(anOrderLine _
 (OrderLineColumns.Comment), String)
    Next

NOTE

You might wonder about the idea of running a loop for the order lines and then just pushing the value of each column of each order line to a private variable, such as _productId. I do this so that the test runs end to end, all the way from the database to variables in the client. Therefore, I want to touch all columns in all rows of the data container.

Note in Listing 2 that I am referring to DataTables and DataColumns with enumerations. This is to make the code more readable than when magic integers are used and more efficient than when strings are used.

Let's compare the browse code for an untyped DataSet (just shown) with similar code for a typed DataSet. The version for the typed DataSet is found in Listing 3.

Listing 3: Code for Browsing a Typed DataSet

Dim anOrderDs As OrderDs = _
_service.FetchOrderAndLines(_GetRandomId())
Dim anOrder As OrderDs.OrdersRow = _
anOrderDs.Orders(0)

    _id = anOrder.Id
    _customerId = anOrder.CustomerId
    _orderDate = anOrder.OrderDate

    Dim anOrderLine As OrderDs.OrderLinesRow
    For Each anOrderLine In anOrderDs.OrderLines
      _productId = anOrderLine.ProductId
      _priceForEach = anOrderLine.PriceForEach
      _noOfItems = anOrderLine.NoOfItems
      _comment = anOrderLine.Comment
    Next

The code in Listing 3 is clearer and much shorter than the "same" code in Listing 2. This is because the schema is created at compile time, so you don't have to describe it over and over again in your code. Instead of referring to, for example, the generic DataRow class in Listing 2, I'm programming against specific types. I also can skip all the casting and conversions because all columns are in the "correct" data type already. That's definitely a way of reducing code bloat.

DataSet Tests

Time to discuss the test results. As with all the other test cases, there is a service-layer class for each test case. The service-layer classes for the DataSet test cases are shown in Figure 3.

Figure 3Figure 3 One example of a service-layer class.

The service-layer classes inherit, as usual, from MarshalByRefObject. They should be suitable as root classes when used via remoting.

NOTE

Note that the second method in class for the typed DataSet returns OrderDs2. That typed DataSet class has only an OrderLines DataTable. Otherwise, I would have had to use a workaround to avoid getting a constraint error when fetching only OrderLines from the database.

You might think that it would be more appropriate to send just a DataTable instead of a complete DataSet in this case. I will discuss that further in Part 5 of this series.

Result of the Tests

In the first part of the articles series, I gave you a sneak peak regarding the throughput test results of the untyped DataSet. Now it's time to show you the results for all test cases discussed so far.

Once again, I will use DataReader as a baseline. Therefore, I have recalculated all the values so that I get value 1 for DataReader; the rest of the data containers will have a value that is relative to the DataReader value, for easy comparison. The higher the value, the better.

Table 1: Results for the First Test Case: Reading One Row

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.6

0.6

1.4

1.7

Typed DataSet

0.4

0.5

1

1.1


Table 2: Results for the Second Test Case: Reading Many Rows

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.6

0.6

6.9

9.7

Typed DataSet

0.5

0.5

6

8.6


Table 3: Results for the Third Test Case: Reading One Master Row and Many Detail Rows

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.5

0.5

6.1

8.5

Typed DataSet

0.4

0.4

5.1

6.9


As you might guess, the five-users test uses 100% of the CPU because I'm not using any think time. That goes for both the AppDomain test and the cross-machines test.

In the cross-machines test, I should switch to several client machines, but I haven't done that yet. Perhaps I will rerun the tests in Part 5. On the other hand, the server in the five-users, cross-machines test uses approximately 80% of the CPU, so that would be the bottleneck.

This reminds me that I need to mention the test equipment. Because my company is small one (it's just me), I don't have a full-blown lab. Therefore, I have used three ordinary machines:

As you learned earlier, both the Untyped DataSet and the typed DataSet have more overhead than the DataReader in the AppDomain. On the other hand, they perform better than the DataReader in the cross-machines test, especially when several rows are fetched. This is just as expected. It's also expected that the typed DataSet carries more overhead than the untyped DataSet.

But some forthcoming results aren't as you might expect. I'll whet your appetite a bit by telling you that with custom classes for the third test—1 user and cross-machines—I get 16! (That is, it's 16 times more efficient to use custom classes than a DataReader for that specific test.) That is probably not what you expect from all talk about how efficient DataSets are. The untyped DataSet performs almost three times as poorly as custom classes when serialized across machines because DataSets are serialized as XML, even with a binary formatter. Test the code snippet in Listing 4, and open the results file in Notepad to see for yourself.

Listing 4: Code for Serializing a DataSet to a File

Dim fs As IO.FileStream = _
New IO.FileStream("c:\temp\ds.txt", IO.FileMode.Create)

Dim bf As New _
System.Runtime.Serialization. _
Formatters.Binary.BinaryFormatter _
(Nothing, New Runtime.Serialization.StreamingContext _

(Runtime.Serialization.StreamingContextStates.Remoting))

bf.Serialize(fs, anOrderDS)
fs.Close()

NOTE

You can read more about serialization aspects of DataSets in Dino Esposito's article "Binary Serialization of ADO.NET Objects" and in his book Applied XML Programming for Microsoft .NET (Microsoft Press, 2002). There Dino also discusses some workarounds to this problem. I will discuss the test result involved when using a workaround in Part 5 of this series.

Highly Subjective Results

It's time to add some grades for untyped and typed DataSets to my list of "highly subject results." In Table 4, you will find that I have assigned some grades according to the qualities discussed at the beginning of the article. A score of 5 is excellent, and a score of 1 is poor.

Table 4: Grades According to Qualities

 

Performance in AppDomain/Cross-Machines

Scalability in AppDomain/Cross-Machines

Productivity

Maintainability

Interoperability

DataReader

5/1

4/1

2

1

1

DataSet

3/3

3/3

4

3

4

Typed DataSet

2/2

2/2

5

4

5


I'd like to say a few words about each quality grade next.

Performance

Unlike the DataReader, both types of DataSets are marshalled by value. Therefore, performance is okay cross-machines, too.

Scalability

As I said last time, in this specific test I think performance and scalability go hand in hand, as those qualities were defined for this series of articles. It's important to note that DataSets won't hold open connections against the database, so using them entails less risk of killing scalability from holding on to connections too long.

Productivity

DataSets are great for productivity because you get a lot of functionality built in, debugged, and ready to use. Productivity is especially good for typed DataSets because there is a lot of design-time support for them in Visual Studio .NET.

In my opinion, the DataSet is very much about rapid application development (RAD) and does a good job regarding that.

Maintainability

I believe that maintainability will be pretty good for both types of DataSets. It's especially good for the typed DataSet because you have a strong contract against your code accessing it. On the other hand, I really like the idea of keeping the behavior together with the data, as with classic object-oriented solutions, thereby making it possible to get a very high degree of encapsulation. DataSets are useful for a more data-centric or document-centric approach so that you can let the behavior act on the data in the DataSets. This works very well, of course, but, in my opinion, in many situations long-term maintainability suffers.

Also worth mentioning is the loosely coupled model that typed DataSets use. That is, with an event-based model, you can use a specific typed DataSet in many situations, using different rules for each situation. You put the rules in event procedures in other classes instead of within the typed DataSet itself.

Interoperability

Finally, interoperability is pretty good for both types of DataSets. They serialize themselves to XML, but the DataSet also has the built-in possibility of a WriteXml() method that can be used to get a format other than the diffgram format that you get from the ordinary serialization of DataSets.

I decided to give the typed DataSet a score of 5 instead of 4 for interoperability because the XSD means a stronger contract with the client. In my opinion, that is desirable when it comes to interoperability.

Conclusion

In the first article in this series, I discussed the DataReader and concluded that it isn't meant to be used as a data container. That's hardly surprising. In this article, I discussed untyped and typed DataSets, which, are very nice data containers. DataSets are especially good thanks to all their built-in functionality. If you can benefit from that, DataSets are a winning choice. But I also raised the point that DataSets aren't great regarding performance and maintainability—more about that in an upcoming article.

Don't miss the third article in this series, about wrapped containers and generic containers.

800 East 96th Street, Indianapolis, Indiana 46240