Choosing Data Containers for .NET: Part 4
Date: May 2, 2003
Article is provided courtesy of Sams.
In the previous parts in this article series, I discussed the DataReader, the untyped DataSet, the typed DataSet, the wrapped DataSet, and the Hashtable. Now we come to my favorite part: In this fourth part, I discuss using custom classes.
As usual, I'd like to recommend that you read the previous parts in this series, if you haven't already, before proceeding.
Not Only Read-Only
To start, let's clarify one thing. When I recently spoke at a conference about data containers, I was asked afterward whether using custom classes was just for read-only data. This misunderstanding might have arisen because I'm discussing performance tests for fetching collections of data, and that is all I measure. Of course, the whole data container discussion is about updateable data, too.
With that out of the way, let's focus on our current topic.
A Somewhat Unusual Choice in the .NET Platform
As you probably remember, I started this series of articles by saying that although Microsoft often pushes DataSets for data containers and says that custom classes often give bad performance, I like custom classes a lot. I used to think the same way Microsoft didfor example, I followed that philosophy when I wrote my book .NET Enterprise Design with Visual Basic .NET and SQL Server 2000 (Sams, 2001), but I changed my mind when I started to investigate using a classic object-oriented domain model.
I previously believed in not using custom classes for the domain model mostly because of a lot of practical problems with it in the old world of COM and COM+. For example, it was pretty hard to write Marshalling By Value (MBV) components in COM, and it was undoable with VB6. We had to go for another language, typically C++. It was also pretty expensive to instantiate objects. In the case of COM+, configuring domain classes made the context overhead very expensive because it's not atypical to work with a couple of hundred domain objects in one request. With that knowledge, my first choice was to continue going for a data-centric approach in .NET and to use DataSets.
After a while, I wanted to try a more classic object-oriented approach in .NET. My idea was that I could stand a small overhead with an object-oriented domain model approach because I saw other advantages with it, such as chances for higher maintainability. Therefore, I executed some quick and dirty tests to find out about the amount of overhead. To my surprise, my initial tests showed a lower overhead for my object-oriented domain model approach.
After thinking some more about it and investigating it a bit more, I began thinking that it's quite natural for the object-oriented domain model to show the possibility of good performance. For example, DataSets are using many DataRow objects, too.
Regarding the three examples of problems with an object-oriented domain model in COM, those problems are actually nonproblems in .NET. At first, if you consider writing MBV components in .NET, the answer is simple (at least for simple cases): Just use the <Serializable()> attribute for the class. There is a lot more to serialization for advanced situations, but quite often using that attribute is enough.
Regarding the overhead for instantiation, which is very important if you need to create many objects, it's much smaller in .NET than for COM and VB6. Try it yourself by executing the code in Listings 1 and 2. The code in Listing 1 is for VB6, and the code in Listing 2 is for .NET. According to my quick and dirty tests, the .NET code is executed approximately 100 times faster. Normally (and hopefully), you do a lot more in an application than instantiating objects, but a difference this big will be noticeable in many applications.
Listing 1: VB6 Code for Instantiating 10 Million Dummy Objects
Dim i As Long Dim theTest As Test For i = 0 To 10000000 Set theTest = New Test theTest.DoStuff Next
Listing 2: VB.NET Code for Instantiating 10 Million Dummy Objects
Dim i As Integer = 0 Dim theTest As Test For i = 0 To 10000000 theTest = New Test() theTest.DoStuff() Next GC.Collect()
Finally, the context overhead from configuring domain classes in COM+ is, in my opinion, more of a misuse of COM+ than a problem with the technology itself. If you deal with the configuration aspect on a service-layer level (or a business facadelayer level or application-layer level, or whatever you call it), you will find that what goes on "behind" that layer gets much smaller instantiation and call overhead compared to configuring the domain model classes.
NOTE
You will find much coverage of this subject in my book .NET Enterprise Design with Visual Basic .NET and SQL Server 2000 (Sams, 2001).
It now has been almost a year since my initial tests with a classic object-oriented domain model in .NET, but better late than never to write about the design implications for choosing data containers for .NET, right?
Data Transfer Objects or Smart Objects?
One major aspect that I always end up thinking about when I consider the custom classes approach is whether it's okay to send a subset of the domain model to rich clientsthat is, sending smart objects with business rules. Another approach is to send data transfer objects (DTO), which are filled by copying data from the domain model at the application server. Then the DTOs are sent over the wire.
There are problems with both approaches (as always, of course). If we send domain model objects to the client, the client is tightly coupled to the domain model. On the other hand, we don't add to the overhead by having to copy data between objects, and we also lessen the development overhead by having less code to write and maintain. Furthermore, we don't lose the advantage of having business rules checked at the client, so the application server receives fewer unsaveable objects.
Versioning Issue
Another problem with letting the client talk directly to the domain model is, of course, that the client must use the correct version of the domain model, so the correct rules are checked. .NET has a lot of functionality for dealing with versioning, such as automatic download. It's also the case that even if wrong business rules are checked at the client, even if there is not a version conflict detected, you should check at the server side as well so that you don't accept any incorrect data. Most often, you shouldn't trust the client because you don't know what has been done to your data.
Of course, this doesn't solve the problem that an incorrect version of the domain model at the client might be too restrictive . . . .
Whether to use DTOs is actually a huge topic on its own. I just wanted to briefly touch upon it here. Anyway, no matter how important this design decision is, it doesn't actually affect our discussion of today because I'm mostly discussing data (not behavior) in this article. Even if you prefer not to send smart objects to your rich clients, the performance implications are important for your data transfer objects.
NOTE
Early on in this article, I said that data containers are about updating data also. I want to strongly emphasize that the approach of using custom classes is actually even more positive when it comes to writing because the objects that travel are smart and know about the business rules to check.
A Simplistic Custom Classes Example
So, how can we write an implementation of the custom classes to get a solution that is somewhat comparable to the examples shown so far? Of course, there are numerous solutions to choose from. I'll show you a very simplistic solution here. In Figure 1, you can see an UML diagram of this solution.
Figure 1 UML diagram for the custom classes example.
What is simplistic with this solution, you might wonder? Well, many things. For example:
How to fill the objects with data from the storage: I am just using public properties for both getting and setting the values. In a real application, you typically need another way of setting the internal fields when grabbing values from the database than by using public property setters.
Method parameters should often be objects: As you see in Figure 2, I use primitive parametersfor example, the order ID. This is because it is a similar solution to the previously discussed options. In a more advanced scenario, we might find that we get pretty long parameter lists. In a real situation, it would probably be more natural to send complex objects instead as parameters.
Richer behavior is "expected": In Figure 1, only Validate() methods are used. That is all behavior that is exposed. The biggest value of custom classes as data containers comes when you can add a lot of behavior also. In a real application, you typically will find a lot of custom methods and, of course, a lot of code in the property setters.
Subcollection is too little encapsulated: As you will see, I expose too much of the implementation of the Order class because the consumer can get directly to the OrderLines (which is the collection class that holds OrderLine instances). A better approach is most often to use the encapsulate collection refactoring.
The focus of this article series is to give you an idea of how different data container solutions behave compared to each other, not to show you production-ready frameworks. You will also find that even though I am showing you a simplistic solution, there is plenty of room to add what you need and still get better performance than that for DataSets.
To create the type-safe collection class for holding OrderLine objects (that class is called OrderLines), I inherited from the CollectionBase class. That is a simple solution that is okay for this specific example, but it has its own pros and cons, as usual. I will get back to this subject in Part 5, when I discuss several other options.
Furthermore, each of the classes (Order, OrderLine, and OrderLines) has the <Serializable()> attribute, and no custom serialization is usedjust the plain serialization that comes for free.
NOTE
To follow Microsoft's naming conventions, OrderLines technically should be called OrderLineCollection.
Also note that, unlike previous options that I discussed (except for the wrapped DataSet), this time there is custom behavior within the classes that also hold the data.
NOTE
Maybe you think that I'm retro and old-fashioned because I'm talking a lot about object orientation. I've heard that OO is from the 1980s. Nowadays the trend is that we should be message- or document-oriented. (By this, I think it's emphasizing passing state around, typically in a portable format between heterogeneous systems. Data is sent in a coarse granular formatthat is, complete "documents.")
I can agree on message and document orientation. My point is that, for building those applications, under the surface, I think object orientation is still going very strong. Period.
Some Pros and Cons with the Customer Classes Solution
As you might guess, I have gathered a lot of advantages about using custom classes. But not to sound like a salesmen, I also have found some drawbacks.
Some advantages are listed here:
Self-contained classes, so the client might (if desired) receive business rules, too.
Object orientation, with all its advantages. For example, it is pretty direct to apply classic design patterns, such as the GoF ones.
Full flexibility.
Typically type-safe operations.
Minimal interface.
Good performance.
Some disadvantages are as follows:
A lot of extra work required if you do it by hand
Somewhat weak for integration as is with heterogeneous clients
NOTE
The disadvantage of the extra work is actually a big one. For starters, it's very complex to create a solid framework for a domain model regarding persistence. And when you are done with that, you might find that your productivity is bad because you have to write a lot of code by hand; therefore, you need to write tools.
The problems regarding complexity and productivity are well knownsee, for example, Martin Fowler's book Patterns of Enterprise Application Architecture (Addison Wesley Professional, 2002). The common recommendation is to buy a product.
Custom Classes Code Examples
As usual, I take a look at some code samples, both from the server side and from the client side. First, here's some from the server side. In Listing 3, you find some code for using a DataReader to add the data to the custom classes.
Listing 3: Code for Filling the Custom Classes
Dim aDataReader As SqlDataReader = DbHelper.FetchOrderAndLines(id) Dim anOrder As New Order() aDataReader.Read() anOrder.Id = aDataReader.GetInt32(OrderColumns.Id) anOrder.CustomerId = aDataReader.GetInt32(OrderColumns.CustomerId) anOrder.OrderDate = aDataReader.GetDateTime(OrderColumns.OrderDate) aDataReader.NextResult() While aDataReader.Read Dim anOrderLine As New OrderLine() anOrderLine.ProductId = _[ccc] aDataReader.GetInt32(OrderLineColumns.ProductId) anOrderLine.PriceForEach = _[ccc] aDataReader.GetDecimal(OrderLineColumns.PriceForEach) anOrderLine.NoOfItems = _[ccc] aDataReader.GetInt32(OrderLineColumns.NoOfItems) anOrderLine.Comment = _[ccc] aDataReader.GetString(OrderLineColumns.Comment) anOrder.OrderLines.Add(anOrderLine) End While aDataReader.Close() Return anOrder
NOTE
The code for DbHelper.FetchOrderAndLines() was actually shown in the first article in this series. See Listing 1 in that article.
If you have worked with the DataReader at all, I think you'll find the code in Listing 3 quite simple. Worth mentioning is that, in this case, I know that there will only be one row (one order) in the first resultset, so I have no loop. Then I have a loop for the second resultset so that I can grab all the order lines. As you see, I use enums to address the correct columns in the DataReader.
NOTE
Now for a note that is a bit off-topic but still interesting: I always use Option Strict for my VB.NET projects. Therefore, it came as a surprise to me when I found out that the equivalent C# code to the VB.NET code shown in Listing 3 won't work directly. When using C#, the enum values must be cast to int when used in the GetInt32() function, for example. That is not needed for VB.NET.
Now, as usual, for some code from the client side. To browse the information in the custom classes, the code could look like it does in Listing 4.
Listing 4: Code for Browsing a Wrapped DataSet
Dim anOrder As Order = _service.FetchOrderAndLines(_GetRandomId()) _id = anOrder.Id _customerId = anOrder.CustomerId _orderDate = anOrder.OrderDate Dim anOrderLine As OrderLine For Each anOrderLine In anOrder.OrderLines _productId = anOrderLine.ProductId _priceForEach = anOrderLine.PriceForEach _noOfItems = anOrderLine.NoOfItems _comment = anOrderLine.Comment Next
In Listing 4, you find the smallest amount of client-side code shown in all the articlesand it's the clearest, too. I think I've said it before, but I think it's a great deal to buy a simple API for the client programmer and to pay with some more server-side code.
Test of the Week
It's time to take a look at the test results, but let's take a quick look at the remoting end point first. As usual, I have a service-layer (business facadelayer) class for the test (and it's used in the test within a single AppDomain, too). In Figure 2, you can see the service-layer class.
Figure 2 The service-layer class of the week.
As you saw in Figure 2, the interface of the service-layer class is more granular compared to earlier approaches. Here, Order is returned if that is what is needed, or OrderLines is returned if that is what is needed, and so on.
Result of the Tests
Let's add the tests of the custom classes to the results tables.
Once again, I will use the DataReader as a baseline. Therefore, I have recalculated all the values so that I get a value of 1 for the DataReader and so that the rest of the data containers have a value that is relative to the DataReader value. That makes it easy to compare. The higher the value is, the better.
Table 1: Results for the First Test Case: Reading One Row
|
1 User, in AppDomain |
5 Users, in AppDomain |
1 User, Cross-Machines |
5 Users, Cross-Machines |
DataReader |
1 |
1 |
1 |
1 |
Untyped DataSet |
0.6 |
0.6 |
1.4 |
1.7 |
Typed DataSet |
0.4 |
0.5 |
1 |
1.1 |
Wrapped DataSet |
0.5 |
0.6 |
1.3 |
1.7 |
Hashtable |
0.9 |
1 |
3.5 |
3.9 |
Custom Classes |
1 |
1 |
4 |
4.2 |
Table 2: Results for the Second Test Case: Reading Many Rows
|
1 User, in AppDomain |
5 Users, in AppDomain |
1 User, Cross-Machines |
5 Users, Cross-Machines |
DataReader |
1 |
1 |
1 |
1 |
Untyped DataSet |
0.5 |
0.6 |
6.9 |
9.7 |
Typed DataSet |
0.5 |
0.5 |
6 |
8.6 |
Wrapped DataSet |
0.5 |
0.6 |
6.6 |
9.6 |
Hashtable |
0.8 |
0.9 |
17 |
23.5 |
Custom Classes |
1 |
1 |
15.9 |
22.1 |
Table 3: Results for the Third Test Case: Reading One Master Row and Many Detail Rows
|
1 User, in AppDomain |
5 Users, in AppDomain |
1 User, Cross- Machines |
5 Users, Cross- Machines |
DataReader |
1 |
1 |
1 |
1 |
Untyped DataSet |
0.5 |
0.5 |
6.1 |
8.5 |
Typed DataSet |
0.4 |
0.4 |
5.1 |
6.9 |
Wrapped DataSet |
0.5 |
0.5 |
5.8 |
8 |
Hashtable |
0.8 |
0.9 |
16.2 |
19.6 |
Custom Classes |
0.9 |
1 |
16 |
19.2 |
As you saw above in the result tables, the custom classes are very close to the Hashtable in performance. For the tests in an AppDomain, the custom classes are almost too close to the DataReader. This is sort of a rounding error. The DataReader is actually a little faster, which is to be expected because there is less work to do in that case. There are no intermediate objects to instantiate and no extra movement of data.
Highly Subjective Results
My "highly subjective results" table needs to be expanded with the last option. In Table 4, you will find that I have assigned some grades according to the qualities discussed in the first part of this series. A score of 5 is excellent, and a score of 1 is poor.
Table 4: Grades According to Qualities
|
Performance in AppDomain/Cross- Machines |
Scalability in AppDomain/Cross- Machines |
Productivity |
Maintainability |
Interoperability |
DataReader DataSet |
5/1 |
4/1 |
2 |
1 |
1 |
|
3/3 |
3/3 |
4 |
3 |
4 |
Typed DataSet |
2/2 |
2/2 |
5 |
4 |
5 |
Wrapped DataSet |
3/3 |
3/3 |
3 |
5 |
4 |
Hashtable |
5/5 |
5/5 |
2 |
2 |
2 |
Custom Classes |
5/5 |
5/5 |
2 |
5 |
3 |
As usual, I'd like to say a few words about each quality grade.
Performance
In the first article of this series, I told you about Microsoft's warnings that custom classes might give bad performance compared to DataSets. Because of that, the test results of today might be very surprising because the performance of custom classes seems to be just fine.
NOTE
I'm pretty sure that what Microsoft meant is that because it's hard to create a solution based on custom classes, the performance might be the opposite of what I have shown above. I can understand that, but I also dislike it. You can get bad performance out of misuse of everything.
Am I too picky with Microsoft? Don't worryI like their work with .NET very much. Even so, I think I should tease them a little bit.
Scalability
Once again, I think that the scalability goes pretty much hand in hand with performance here. (This is in accordance with the definition of the scalability quality in Part 1.)
Productivity
If you decide to roll your own solution with custom classes, your short-term productivity will suffer. There are solutions for that, but out of the box, I have to give a low grade for productivity because of this.
Maintainability
My main reason for starting to again investigate a domain model is that I believe maintainability might gain a lot. In my opinion, the prime goal and effect of object orientation is increased maintainability. For example, it's a good way to deal with complexity.
NOTE
As you might have noticed, I wrote "once again" above. I have experimented with something like a domain model several times, but I've never been very happy with the result. For example, I mentioned some of the problems I had with a domain model in VB6 and COM at the beginning of this article.
Interoperability
Interoperability is okay because you will automatically get XML serialization. On the other hand, it will take some extra work to get the XML structure that you want, and you have to take extra precautions that sent and received XML follows the XML schema that you have agreed upon with the heterogeneous client.
Before this round of tests, the wrapped DataSet was the best if you just added the grades (and chose in AppDomain or cross machines). Now custom classes take over the lead.
Conclusion
In this article, the last of the promised options was discussed. In my opinion, using custom classes often is the best choice, but, of course, it's not a cure-all solution.
Even though I discussed only one option today, I mentioned several subjects that will be covered in Part 5, the last one in this series of articles. Next time we'll discuss several disparate subjects and wrap up the whole series.