InformIT

Data Containers for .NET, Part 3

Date: Apr 4, 2003

Article is provided courtesy of Sams.

Return to the article

In Part 3 of his series on finding the best data container, Jimmy Nilsson sets his sights on two more test subjects: the wrapped DataSet and the Hashtable.

Now we have reached Part 3 in this series of articles. So far I have discussed the DataReader, the untyped DataSet, and the typed DataSet. In this article, I focus on wrapping a container datatype and directly using a generic container datatype. In the wrapper discussion, I will wrap a DataSet in my examples; to illustrate the direct use of a generic container datatype, I will use a Hashtable.

Let's get started with the first topic, a wrapped container datatype. (But please start reading the Part 1 and Part 2, if you haven't already.)

Reuse by Inheritance or Containment

In Part 2 of this series, I discussed untyped and typed DataSets. The typed DataSet "is-an" untyped DataSet. The typed DataSet is derived from (or inherits from) the untyped DataSet. As a result, the typed DataSet gets all its functionality from the untyped DataSet and only has to extend the extra functionality. Another good thing is that because of the inheritance relationship, you can treat and work with a typed DataSet as if it were an untyped DataSet. This can be good, but encapsulation will be too weak. If you remember the old days before .NET, Microsoft always told us that reuse by containment was to be preferred over reuse by inheritance. COM didn't support inheritance, but that wasn't the only reason for that recommendation.

I'm not trying to say that inheritance is bad. But containment is preferred in some situations. To make a long story short, I'm not fond of the fact that typed DataSets inherit from untyped DataSets. By using containment instead, it's possible to increase encapsulation a lot by providing just a minimal interface—black box reuse instead of white box reuse.

NOTE

According to Robert C. Martin's great book "Agile Software Development: Principles, Patterns, and Practices" (Prentice Hall, 2002), "is-a" is too broad for explaining subtypes in an inheritance relationship. "Substitutable" is better. Regardless, my main objective regarding typed DataSets still stands.

Wrapping a DataSet

Imagine that you want to provide a certain interface for your data container, but you want to do it with as little up-front work as possible. Wrapping a generic container datatype is one neat solution to that problem. The datatype I have chosen to use in the following discussion is an untyped DataSet. Thanks to that, I get all the functionality of the untyped DataSet for free, while still maintaining a very small interface so that I can change the wrapped datatype when I want to, without affecting all the consumer code.

NOTE

Other possible options are to wrap, for example, an XML DOM, an Array, an ArrayList, and so on.

In Figure 1, you can see a sample Order class that wraps an untyped DataSet.

Figure 1Figure 1 Class model of a wrapped DataSet.


In the Order class, there are properties for getting the ID of the Order, the CustomerId, and the OrderDate. To get to the data of a specific order line, you need to specify an index as a parameter to the property that you want to inspect.

NOTE

The design of this Order class is simplistic, just for getting the job done and so that I have something for my tests. If you want to go this route, you should, of course, spend some more time designing the class.

Then it's very easy to add custom functionality that will live together with the data. I have added only a Validate() method in Figure 1, but when you start adding custom behavior this solution will take off. (That also goes for custom behavior in the property methods.)

Direct Use of a Generic Container Datatype

As you probably have noticed, the .NET Framework has a rich number of capable container datatypes, including ArrayList, Hashtable, and XML DOM. One option for a data container is to directly use one of those datatypes. In the old days before .NET, this was a pretty common solution because it was so hard to write components in COM that were marshaled by value. Therefore, we most often used disconnected ADO Recordsets or arrays for our data containers.

Using an array is still a possible solution. It's very lightweight and efficient, but it lacks expressiveness and the code isn't very clear—trade-offs all the time.

For the tests of directly using a container datatype, I have chosen to use a Hashtable. You can find a simplified UML diagram in Figure 2, describing the Hashtable.

Figure 2Figure 2 Class model of a Hashtable.

The basic idea of a Hashtable is to store values by keys, making it very efficient to retrieve the values again as long as you know the keys. You use Add() for storing and Item() for retrieving.

A problem with this solution arises when you need to store more than just one collection of rows. For example, I needed to store both the main information about an order and a collection with all the order lines of that specific order. The trick (or hack) I decided to use was to add a factor to the keys for all the order lines. (More about this when we get to the code.)

Some Pros and Cons with the Wrap Solution

The advantages and disadvantages depend on what datatype you wrap. Because I'm here wrapping the DataSet, the pros and cons are the same. But there are more. These are the pros:

And here are the cons:

Likewise, when directly using a generic container datatype, the pros and cons are highly dependent on what datatype you choose. Here are the pros:

And here are the cons:

Wrapped DataSet Code Examples

As usual, we are going to take a look at some code samples, both from the server side and from the client side. First we'll look at some from the server side. In Listing 1, you find some code for when data is fetched from the database with the help of a stored procedure.

Listing 1: Code for Filling a DataSet

Dim aCommand As New SqlCommand (SprocOrder_FetchWithLines, [ccc]
GetClosedConnection)
aCommand.CommandType = _CommandType.StoredProcedure
aCommand.Parameters.Add _("@id", SqlDbType.Int).Value = id

    Dim anAdapter As New SqlDataAdapter(aCommand)

    anAdapter.TableMappings.Add("Table", "Orders")
    anAdapter.TableMappings.Add _
  ("Table1", "OrderLines")

  anAdapter.TableMappings _  (OrderTables.Orders).[ccc]
ColumnMappings.Add _
  ("Customer_Id", "CustomerId")
    anAdapter.TableMappings _
  (OrderTables.OrderLines).ColumnMappings.Add _
     ("Orders_Id", "OrderId")
    anAdapter.TableMappings _
  (OrderTables.Orders).ColumnMappings.Add _
     ("Product_Id", "ProductId")

    anAdapter.Fill(dataSet)

NOTE

It's less important what names you use for the DataTables and Columns when you wrap the DataSet and thereby hide the names. I changed the name in Listing 1 because I was "lazy" and reused the same routine for filling a DataSet as I did for untyped and typed DataSets from Part 2.

The filled DataSet is then sent to the constructor of the wrapper class, as you can see in Listing 2.

Listing 2: Sending the Filled DataSet to the Constructor of the Wrapper Class

    Dim anOrder As New OrderWrap(anOrderDataSet)

NOTE

I know, using a DataSet in the constructor violates encapsulation. This is a simple solution for my tests, but in a real app you should definitely guard encapsulation better.

And now some code from the client side. To browse the information in the wrapped class, the code could look like it does in Listing 3. Note that here I'm browsing both an order and all its order lines, as usual. (Hopefully you find it useful to see the "same" code for all the different data container options so that you can compare the code side by side.)

Listing 3: Code for Browsing a Wrapped DataSet

Dim anOrder As OrderWrap = _service.FetchOrderAndLines[ccc]
(_GetRandomId())

    _id = anOrder.Id
    _customerId = anOrder.CustomerId
    _orderDate = anOrder.OrderDate

    Dim i As Integer
    For i = 0 To anOrder.NoOfLines - 1
      _productId = anOrder.ProductId(i)
      _priceForEach = anOrder.PriceForEach(i)
      _noOfItems = anOrder.NoOfItems(i)
      _comment = anOrder.Comment(i)
    Next

Listing 3 has very little and compact code, and I think it's pretty clear. As you saw, my solution was to have a property that returns the number of order lines (NoOfLines). That is used for finding the upper bound to be used in an ordinary For index loop.

Another solution—and a more appealing one—would be to hand out an IEnumerator for the order lines, but here I'm concerned about creating something that is good enough for solving the problem of creating something testable.

Now let's turn to the server-side code for the Hashtable.

Hashtable Code Examples

Listing 4 is some code for filling a Hashtable with one order and its order lines. In this case, the method receives a DataReader and uses that for filling the Hashtable.

Listing 4: Code for Filling a Hashtable

    Dim aDataReader As SqlDataReader = _
  DbHelper.FetchOrderAndLines(id)

    Dim anOrder As New Hashtable()
    aDataReader.Read()
    anOrder.Add(OrderColumns.Id, _
  aDataReader.GetInt32(OrderColumns.Id))
    anOrder.Add(OrderColumns.CustomerId, _
  aDataReader.GetInt32(OrderColumns.CustomerId))
    anOrder.Add(OrderColumns.OrderDate, _
  aDataReader.GetDateTime(OrderColumns.OrderDate))

    aDataReader.NextResult()
    Dim i As Integer
    While aDataReader.Read
      anOrder.Add(OrderLineColumns.ProductId * _
 Hack.Factor + i, aDataReader.GetInt32 _
 (OrderLineColumns.ProductId))
      anOrder.Add(OrderLineColumns.PriceForEach * _
 Hack.Factor + i, aDataReader.GetSqlMoney _
 (OrderLineColumns.PriceForEach).ToDecimal)
      anOrder.Add(OrderLineColumns.NoOfItems * _
 Hack.Factor + i, aDataReader.GetInt32 _
 (OrderLineColumns.NoOfItems))
      anOrder.Add(OrderLineColumns.Comment * _
 Hack.Factor + i, aDataReader.GetString _
 (OrderLineColumns.Comment))
      i += 1
    End While

    aDataReader.Close()
    Return anOrder

As I said earlier in this article, I need a little hack for storing both the main information about an order and the order lines in the same Hashtable. To solve that, I added a factor (Hack.Factor) to the keys for the order lines. Of course, that is a problematic choice of what that value should be. In the code in Listing 4, I chose 1, 000,000. (You can't see that in the code; that value is in a public enumeration.) If I chose to store several orders in the same Hashtable, that would mean that I couldn't have more than 1,000,000 orders. You probably don't want to have that many orders anyway in memory, but again, this is a hack.

NOTE

My friend Joe Cleland asked why I didn't use a separate Hashtable for each order's details, added to the first Hashtable. That's a good question and one that made me blush. I guess I could have said that I wanted to have just one Hashtable instead of two for each order, but I can't lie to my readers, can I?

Joe's suggestion is, of course, a much better one than my fast hack. I will add it to my tests for Part 5 in this series.

Finally, Listing 5 gives the code for when the client is inspecting the order information and navigating through all the order lines.

Listing 5: Code for Browsing the Hashtable

    Dim anOrder As Hashtable = _
  _service.FetchOrderAndLines(_GetRandomId())

    _id = DirectCast _
  (anOrder.Item(OrderColumns.Id), Integer)
    _customerId = DirectCast _
    (anOrder.Item(OrderColumns.CustomerId), Integer)
    _orderDate = DirectCast _
  (anOrder.Item(OrderColumns.OrderDate), Date)

    Dim i As Integer
    While anOrder.ContainsKey _
    (OrderLineColumns.ProductId * Hack.Factor + i)
      _productId = DirectCast(anOrder.Item _
 (OrderLineColumns.ProductId * _
 Hack.Factor + i), Integer)
      _priceForEach = DirectCast(anOrder.Item _
 (OrderLineColumns.PriceForEach * _
 Hack.Factor + i), Decimal)
      _noOfItems = DirectCast(anOrder.Item _
 (OrderLineColumns.NoOfItems * _
 Hack.Factor + i), Integer)
      _comment = DirectCast(anOrder.Item _
 (OrderLineColumns.Comment * _
 Hack.Factor + i), String)
      i += 1
    End While

The code in Listing 5 is much messier than the "same" code for the wrapped DataSet. This is the price you pay when you choose to go with something generic instead of something specific. The way I see it, in the generic case, you pay with more client-side code (and also with quite a lot of server-side code). In my opinion, having much client-side code is a very high price to pay. On the other hand, you will soon see that the Hashtable has good performance. Trade-offs, trade-offs....

Tests of the Week

Now for what many of you surely consider the highlight of each article: the test results. As usual, I have a service-layer class (business facade layer) for each test. You can see the service-layer classes in Figure 3.

Figure 3Figure 3 The two examples of the service-layer class.

The service-layer classes inherit, as usual, from MarshalByRefObject. This is because they should be suitable as root classes when used via Remoting.

NOTE

Note that the second method in the class for the wrapped DataSet returns OrderWrap2. That wrapped DataSet class has only an OrderLines DataTable. That's one solution to the problem when fetching only OrderLines from the database.

Result of the Tests

I've added the tests of the data container options to the results tables.

Again, I'm using the DataReader as a baseline. Therefore, I have recalculated all the values so that I get a value of 1 for the DataReader. The rest of the data containers have a value that is relative to the DataReader value. That makes it very easy to compare; the higher the value, the better.

Table 1: Results for the First Test Case: Reading One Row

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.6

0.6

1.4

1.7

Typed DataSet

0.4

0.5

1

1.1

Wrapped DataSet

0.5

0.6

1.3

1.7

Hashtable

0.9

1

3.5

3.9


Table 2: Results for the Second Test Case: Reading Many Rows

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.5

0.6

6.9

9.7

Typed DataSet

0.5

0.5

6

8.6

Wrapped DataSet

0.5

0.6

6.6

9.6

Hashtable

0.8

0.9

17

23.5


Table 3: Results for the Third Test Case: Reading One Master Row and Many Detail Rows

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.5

0.5

6.1

8.5

Typed DataSet

0.4

0.4

5.1

6.9

Wrapped DataSet

0.5

0.5

5.8

8

Hashtable

0.8

0.9

16.2

19.6


As you can see in these tables, the wrapped DataSet is very close to the untyped DataSet in performance. The wrapper I have created is very thin, so that similarity in performance is to be expected.

What might be more surprising is that the Hashtable is so close to the DataReader in the tests within the AppDomain; there's very little overhead. Even more surprising might be to see how efficient the Hashtable is across machines; it's compact and it's lightweight. But again, performance isn't everything. Let's compare a couple of other factors as well.

Highly Subjective Results

My "highly subjective results" table needs to be expanded with the two options we have tested in this article. In Table 4, I have assigned some grades according to the qualities discussed in the first part of this series. A score of 5 is excellent, and a score of 1 is poor.

Table 4: Grades According to Qualities

Performance in AppDomain /Cross-Machines

Scalability in AppDomain /Cross-Machines

Productivity

Maintainability

Interoperability

DataReader

5/1

4/1

2

1

1

DataSet

3/3

3/3

4

3

4

Typed DataSet

2/2

2/2

5

4

5

Wrapped DataSet

3/3

3/3

3

5

4

Hashtable

5/5

5/5

2

2

2


As usual, I'd like to say a few words about each quality grade.

Performance

I've already talked about the performance behavior of the two options in this article. The wrapped DataSet is very similar to the untyped DataSet. The Hashtable, on the other hand, is much more efficient and lightweight. It's almost twice as efficient as the wrapped DataSet in an AppDomain and almost three times as efficient across machines. Again, remember that these tests are end to end. If you isolate, for example, the serialization/deserialization, the difference is even bigger.

NOTE

As you might remember from Part 2, I discussed a workaround for bad performance because of how DataSets are serialized. Using a wrapped DataSet is a great way of dealing with the problem; I'll get back to this in Part 5 of this series. I will also show results for wrapping a Hashtable in that article. That's a very good way to avoid the hack feeling with a Hashtable and also the serialization issue with the DataSet.

Scalability

According to the definition of the scalability quality that I established in Part 1, scalability goes pretty much hand in hand with performance here. That's the reason for the same grades.

Productivity

When you wrap a DataSet, you get a lot of functionality for free, but you do have to create (and support) the interface on your own. A tool will help tremendously here. Without it, your productivity will suffer. That's the reason for the score of only 3. Pretty much the same is true for the Hashtable, but you get a little less for free and there is more code to write.

Maintainability

If you succeed in creating a good small interface for the wrapped DataSet, then maintainability will be very good. In the case of the Hashtable, the loose typing doesn't help in this aspect. It's not just the columns that are loosely coupled; it's how to reach related rows, too.

Interoperability

If the wrapping solution wraps a DataSet, you can easily use the functionality of the DataSet for interoperability. (But that might hurt encapsulation.) The Hashtable isn't very good at interoperability, and you have no control over it. The only thing that keeps it from getting a score of 1 here is that you can probably expect the receiver to have a Hashtable implementation to which the receiver can rehydrate the stream.

NOTE

As a matter of fact, I think the Hashtable was very close to a lower grade for both maintainability and interoperability.

If you just add the grades (and choose AppDomain or cross-machines), you will find that the wrapped DataSet is the best so far—not by far, but still best. On the other hand, I again want to stress that you should use your weights according to which qualities that you find the most important.

Conclusion

In Part 4 of this series, I'll focus on my current favorite option: custom classes and collections. Take it from me, you can't wait to hear that discussion.

800 East 96th Street, Indianapolis, Indiana 46240