InformIT

Choosing Data Containers for .NET: Part 4

Date: May 2, 2003

Article is provided courtesy of Sams.

Return to the article

In his continuing quest to find the best data container, Jimmy Nilsson puts custom classes to the test.

In the previous parts in this article series, I discussed the DataReader, the untyped DataSet, the typed DataSet, the wrapped DataSet, and the Hashtable. Now we come to my favorite part: In this fourth part, I discuss using custom classes.

As usual, I'd like to recommend that you read the previous parts in this series, if you haven't already, before proceeding.

Not Only Read-Only

To start, let's clarify one thing. When I recently spoke at a conference about data containers, I was asked afterward whether using custom classes was just for read-only data. This misunderstanding might have arisen because I'm discussing performance tests for fetching collections of data, and that is all I measure. Of course, the whole data container discussion is about updateable data, too.

With that out of the way, let's focus on our current topic.

A Somewhat Unusual Choice in the .NET Platform

As you probably remember, I started this series of articles by saying that although Microsoft often pushes DataSets for data containers and says that custom classes often give bad performance, I like custom classes a lot. I used to think the same way Microsoft did—for example, I followed that philosophy when I wrote my book .NET Enterprise Design with Visual Basic .NET and SQL Server 2000 (Sams, 2001), but I changed my mind when I started to investigate using a classic object-oriented domain model.

I previously believed in not using custom classes for the domain model mostly because of a lot of practical problems with it in the old world of COM and COM+. For example, it was pretty hard to write Marshalling By Value (MBV) components in COM, and it was undoable with VB6. We had to go for another language, typically C++. It was also pretty expensive to instantiate objects. In the case of COM+, configuring domain classes made the context overhead very expensive because it's not atypical to work with a couple of hundred domain objects in one request. With that knowledge, my first choice was to continue going for a data-centric approach in .NET and to use DataSets.

After a while, I wanted to try a more classic object-oriented approach in .NET. My idea was that I could stand a small overhead with an object-oriented domain model approach because I saw other advantages with it, such as chances for higher maintainability. Therefore, I executed some quick and dirty tests to find out about the amount of overhead. To my surprise, my initial tests showed a lower overhead for my object-oriented domain model approach.

After thinking some more about it and investigating it a bit more, I began thinking that it's quite natural for the object-oriented domain model to show the possibility of good performance. For example, DataSets are using many DataRow objects, too.

Regarding the three examples of problems with an object-oriented domain model in COM, those problems are actually nonproblems in .NET. At first, if you consider writing MBV components in .NET, the answer is simple (at least for simple cases): Just use the <Serializable()> attribute for the class. There is a lot more to serialization for advanced situations, but quite often using that attribute is enough.

Regarding the overhead for instantiation, which is very important if you need to create many objects, it's much smaller in .NET than for COM and VB6. Try it yourself by executing the code in Listings 1 and 2. The code in Listing 1 is for VB6, and the code in Listing 2 is for .NET. According to my quick and dirty tests, the .NET code is executed approximately 100 times faster. Normally (and hopefully), you do a lot more in an application than instantiating objects, but a difference this big will be noticeable in many applications.

Listing 1: VB6 Code for Instantiating 10 Million Dummy Objects

Dim i As Long
Dim theTest As Test
  
For i = 0 To 10000000
  Set theTest = New Test
  theTest.DoStuff
Next

Listing 2: VB.NET Code for Instantiating 10 Million Dummy Objects

Dim i As Integer = 0 
Dim theTest As Test

For i = 0 To 10000000
  theTest = New Test()
  theTest.DoStuff()
Next

GC.Collect()

Finally, the context overhead from configuring domain classes in COM+ is, in my opinion, more of a misuse of COM+ than a problem with the technology itself. If you deal with the configuration aspect on a service-layer level (or a business facade–layer level or application-layer level, or whatever you call it), you will find that what goes on "behind" that layer gets much smaller instantiation and call overhead compared to configuring the domain model classes.

NOTE

You will find much coverage of this subject in my book .NET Enterprise Design with Visual Basic .NET and SQL Server 2000 (Sams, 2001).

It now has been almost a year since my initial tests with a classic object-oriented domain model in .NET, but better late than never to write about the design implications for choosing data containers for .NET, right?

Data Transfer Objects or Smart Objects?

One major aspect that I always end up thinking about when I consider the custom classes approach is whether it's okay to send a subset of the domain model to rich clients—that is, sending smart objects with business rules. Another approach is to send data transfer objects (DTO), which are filled by copying data from the domain model at the application server. Then the DTOs are sent over the wire.

There are problems with both approaches (as always, of course). If we send domain model objects to the client, the client is tightly coupled to the domain model. On the other hand, we don't add to the overhead by having to copy data between objects, and we also lessen the development overhead by having less code to write and maintain. Furthermore, we don't lose the advantage of having business rules checked at the client, so the application server receives fewer unsaveable objects.

Versioning Issue

Another problem with letting the client talk directly to the domain model is, of course, that the client must use the correct version of the domain model, so the correct rules are checked. .NET has a lot of functionality for dealing with versioning, such as automatic download. It's also the case that even if wrong business rules are checked at the client, even if there is not a version conflict detected, you should check at the server side as well so that you don't accept any incorrect data. Most often, you shouldn't trust the client because you don't know what has been done to your data.

Of course, this doesn't solve the problem that an incorrect version of the domain model at the client might be too restrictive . . . .

Whether to use DTOs is actually a huge topic on its own. I just wanted to briefly touch upon it here. Anyway, no matter how important this design decision is, it doesn't actually affect our discussion of today because I'm mostly discussing data (not behavior) in this article. Even if you prefer not to send smart objects to your rich clients, the performance implications are important for your data transfer objects.

NOTE

Early on in this article, I said that data containers are about updating data also. I want to strongly emphasize that the approach of using custom classes is actually even more positive when it comes to writing because the objects that travel are smart and know about the business rules to check.

A Simplistic Custom Classes Example

So, how can we write an implementation of the custom classes to get a solution that is somewhat comparable to the examples shown so far? Of course, there are numerous solutions to choose from. I'll show you a very simplistic solution here. In Figure 1, you can see an UML diagram of this solution.

Figure 1Figure 1 UML diagram for the custom classes example.

What is simplistic with this solution, you might wonder? Well, many things. For example:

The focus of this article series is to give you an idea of how different data container solutions behave compared to each other, not to show you production-ready frameworks. You will also find that even though I am showing you a simplistic solution, there is plenty of room to add what you need and still get better performance than that for DataSets.

To create the type-safe collection class for holding OrderLine objects (that class is called OrderLines), I inherited from the CollectionBase class. That is a simple solution that is okay for this specific example, but it has its own pros and cons, as usual. I will get back to this subject in Part 5, when I discuss several other options.

Furthermore, each of the classes (Order, OrderLine, and OrderLines) has the <Serializable()> attribute, and no custom serialization is used—just the plain serialization that comes for free.

NOTE

To follow Microsoft's naming conventions, OrderLines technically should be called OrderLineCollection.

Also note that, unlike previous options that I discussed (except for the wrapped DataSet), this time there is custom behavior within the classes that also hold the data.

NOTE

Maybe you think that I'm retro and old-fashioned because I'm talking a lot about object orientation. I've heard that OO is from the 1980s. Nowadays the trend is that we should be message- or document-oriented. (By this, I think it's emphasizing passing state around, typically in a portable format between heterogeneous systems. Data is sent in a coarse granular format—that is, complete "documents.")

I can agree on message and document orientation. My point is that, for building those applications, under the surface, I think object orientation is still going very strong. Period.

Some Pros and Cons with the Customer Classes Solution

As you might guess, I have gathered a lot of advantages about using custom classes. But not to sound like a salesmen, I also have found some drawbacks.

Some advantages are listed here:

Some disadvantages are as follows:

NOTE

The disadvantage of the extra work is actually a big one. For starters, it's very complex to create a solid framework for a domain model regarding persistence. And when you are done with that, you might find that your productivity is bad because you have to write a lot of code by hand; therefore, you need to write tools.

The problems regarding complexity and productivity are well known—see, for example, Martin Fowler's book Patterns of Enterprise Application Architecture (Addison Wesley Professional, 2002). The common recommendation is to buy a product.

Custom Classes Code Examples

As usual, I take a look at some code samples, both from the server side and from the client side. First, here's some from the server side. In Listing 3, you find some code for using a DataReader to add the data to the custom classes.

Listing 3: Code for Filling the Custom Classes

Dim aDataReader As SqlDataReader = DbHelper.FetchOrderAndLines(id)

Dim anOrder As New Order()
aDataReader.Read()
anOrder.Id = aDataReader.GetInt32(OrderColumns.Id)
anOrder.CustomerId = aDataReader.GetInt32(OrderColumns.CustomerId)
anOrder.OrderDate = aDataReader.GetDateTime(OrderColumns.OrderDate)

aDataReader.NextResult()
While aDataReader.Read
  Dim anOrderLine As New OrderLine()
  anOrderLine.ProductId = _[ccc]
  aDataReader.GetInt32(OrderLineColumns.ProductId)
  anOrderLine.PriceForEach = _[ccc]
  aDataReader.GetDecimal(OrderLineColumns.PriceForEach)
  anOrderLine.NoOfItems = _[ccc]
  aDataReader.GetInt32(OrderLineColumns.NoOfItems)
  anOrderLine.Comment = _[ccc]
  aDataReader.GetString(OrderLineColumns.Comment)
  anOrder.OrderLines.Add(anOrderLine)
End While

aDataReader.Close()
Return anOrder

NOTE

The code for DbHelper.FetchOrderAndLines() was actually shown in the first article in this series. See Listing 1 in that article.

If you have worked with the DataReader at all, I think you'll find the code in Listing 3 quite simple. Worth mentioning is that, in this case, I know that there will only be one row (one order) in the first resultset, so I have no loop. Then I have a loop for the second resultset so that I can grab all the order lines. As you see, I use enums to address the correct columns in the DataReader.

NOTE

Now for a note that is a bit off-topic but still interesting: I always use Option Strict for my VB.NET projects. Therefore, it came as a surprise to me when I found out that the equivalent C# code to the VB.NET code shown in Listing 3 won't work directly. When using C#, the enum values must be cast to int when used in the GetInt32() function, for example. That is not needed for VB.NET.

Now, as usual, for some code from the client side. To browse the information in the custom classes, the code could look like it does in Listing 4.

Listing 4: Code for Browsing a Wrapped DataSet

Dim anOrder As Order = _service.FetchOrderAndLines(_GetRandomId())

_id = anOrder.Id
_customerId = anOrder.CustomerId
_orderDate = anOrder.OrderDate

Dim anOrderLine As OrderLine
For Each anOrderLine In anOrder.OrderLines
  _productId = anOrderLine.ProductId
  _priceForEach = anOrderLine.PriceForEach
  _noOfItems = anOrderLine.NoOfItems
  _comment = anOrderLine.Comment
Next

In Listing 4, you find the smallest amount of client-side code shown in all the articles—and it's the clearest, too. I think I've said it before, but I think it's a great deal to buy a simple API for the client programmer and to pay with some more server-side code.

Test of the Week

It's time to take a look at the test results, but let's take a quick look at the remoting end point first. As usual, I have a service-layer (business facade–layer) class for the test (and it's used in the test within a single AppDomain, too). In Figure 2, you can see the service-layer class.

Figure 2Figure 2 The service-layer class of the week.

As you saw in Figure 2, the interface of the service-layer class is more granular compared to earlier approaches. Here, Order is returned if that is what is needed, or OrderLines is returned if that is what is needed, and so on.

Result of the Tests

Let's add the tests of the custom classes to the results tables.

Once again, I will use the DataReader as a baseline. Therefore, I have recalculated all the values so that I get a value of 1 for the DataReader and so that the rest of the data containers have a value that is relative to the DataReader value. That makes it easy to compare. The higher the value is, the better.

Table 1: Results for the First Test Case: Reading One Row

 

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.6

0.6

1.4

1.7

Typed DataSet

0.4

0.5

1

1.1

Wrapped DataSet

0.5

0.6

1.3

1.7

Hashtable

0.9

1

3.5

3.9

Custom Classes

1

1

4

4.2


Table 2: Results for the Second Test Case: Reading Many Rows

 

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross-Machines

5 Users, Cross-Machines

DataReader

1

1

1

1

Untyped DataSet

0.5

0.6

6.9

9.7

Typed DataSet

0.5

0.5

6

8.6

Wrapped DataSet

0.5

0.6

6.6

9.6

Hashtable

0.8

0.9

17

23.5

Custom Classes

1

1

15.9

22.1


Table 3: Results for the Third Test Case: Reading One Master Row and Many Detail Rows

 

1 User, in AppDomain

5 Users, in AppDomain

1 User, Cross- Machines

5 Users, Cross- Machines

DataReader

1

1

1

1

Untyped DataSet

0.5

0.5

6.1

8.5

Typed DataSet

0.4

0.4

5.1

6.9

Wrapped DataSet

0.5

0.5

5.8

8

Hashtable

0.8

0.9

16.2

19.6

Custom Classes

0.9

1

16

19.2


As you saw above in the result tables, the custom classes are very close to the Hashtable in performance. For the tests in an AppDomain, the custom classes are almost too close to the DataReader. This is sort of a rounding error. The DataReader is actually a little faster, which is to be expected because there is less work to do in that case. There are no intermediate objects to instantiate and no extra movement of data.

Highly Subjective Results

My "highly subjective results" table needs to be expanded with the last option. In Table 4, you will find that I have assigned some grades according to the qualities discussed in the first part of this series. A score of 5 is excellent, and a score of 1 is poor.

Table 4: Grades According to Qualities

 

Performance in AppDomain/Cross- Machines

Scalability in AppDomain/Cross- Machines

Productivity

Maintainability

Interoperability

DataReader DataSet

5/1

4/1

2

1

1

 

3/3

3/3

4

3

4

Typed DataSet

2/2

2/2

5

4

5

Wrapped DataSet

3/3

3/3

3

5

4

Hashtable

5/5

5/5

2

2

2

Custom Classes

5/5

5/5

2

5

3


As usual, I'd like to say a few words about each quality grade.

Performance

In the first article of this series, I told you about Microsoft's warnings that custom classes might give bad performance compared to DataSets. Because of that, the test results of today might be very surprising because the performance of custom classes seems to be just fine.

NOTE

I'm pretty sure that what Microsoft meant is that because it's hard to create a solution based on custom classes, the performance might be the opposite of what I have shown above. I can understand that, but I also dislike it. You can get bad performance out of misuse of everything.

Am I too picky with Microsoft? Don't worry—I like their work with .NET very much. Even so, I think I should tease them a little bit.

Scalability

Once again, I think that the scalability goes pretty much hand in hand with performance here. (This is in accordance with the definition of the scalability quality in Part 1.)

Productivity

If you decide to roll your own solution with custom classes, your short-term productivity will suffer. There are solutions for that, but out of the box, I have to give a low grade for productivity because of this.

Maintainability

My main reason for starting to again investigate a domain model is that I believe maintainability might gain a lot. In my opinion, the prime goal and effect of object orientation is increased maintainability. For example, it's a good way to deal with complexity.

NOTE

As you might have noticed, I wrote "once again" above. I have experimented with something like a domain model several times, but I've never been very happy with the result. For example, I mentioned some of the problems I had with a domain model in VB6 and COM at the beginning of this article.

Interoperability

Interoperability is okay because you will automatically get XML serialization. On the other hand, it will take some extra work to get the XML structure that you want, and you have to take extra precautions that sent and received XML follows the XML schema that you have agreed upon with the heterogeneous client.

Before this round of tests, the wrapped DataSet was the best if you just added the grades (and chose in AppDomain or cross machines). Now custom classes take over the lead.

Conclusion

In this article, the last of the promised options was discussed. In my opinion, using custom classes often is the best choice, but, of course, it's not a cure-all solution.

Even though I discussed only one option today, I mentioned several subjects that will be covered in Part 5, the last one in this series of articles. Next time we'll discuss several disparate subjects and wrap up the whole series.

800 East 96th Street, Indianapolis, Indiana 46240