InformIT

Choosing Data Containers, Part 5

Date: Jul 25, 2003

Article is provided courtesy of Sams.

Return to the article

In this final part of his series on finding the best data container, Jimmy Nilsson looks at how the new architecture he works on behaves in relation to data container characteristics.

The previous article in this series discussed custom classes and what good performance they showed in the tests. But performance isn't everything, and the custom classes I used in the examples were very simple. (For example, they helped a lot less than optimally in maintainability in that incarnation.)

In this fifth and last article in the series, I discuss how the new architecture that I currently work on behaves when it comes to data container characteristics. The new architecture is also based on custom classes or the Domain Model pattern if we use the de-facto pattern name for it. (See Martin Fowler, Patterns of Enterprise Application Architecture. Addison-Wesley, 2002, ISBN: 0-321-12742-0.)

Will my new architecture, which has a lot more functionality, lose tremendously compared to the results in Part 4? And what tweaks have been used to get decent performance? These are the main issues dealt with here.

As usual, I recommend that you read the previous articles in this series if you haven't before: Choosing Data Containers for .NET, Part 1, Part 2, Part 3, and Part 4.

NOTE

What happened to the plan? I was going to discuss a lot of different things that I mentioned in Parts 1–4. When I started writing this fifth article, I decided instead to not cover a lot of ground in a shallow way, but to "dig one single hole a little more deeply."

A Step Toward Realism

As I said, my new architecture is based on custom classes. It is, in my opinion, a lot more realistically usable than the code for custom classes that I discussed last time. The new architecture is ongoing work, but at the moment still more realistic...

NOTE

You can read more about my work with the new architecture in another article series (A Pure Object Oriented Domain Model by a DB-GuyPart 1: Introduction, Part 2: One Base Class, Part 3: The Consumer Perspective, Part 4: The Application Layer Perspective, and Part 6: The Persistence Access Layer Perspective.) I also discuss my work in my blog. So I will give you only a brief overview here.

Don't worry: I'll also talk about several things that I haven't discussed before. All these things are related to serialization, but before that, some basics about the architecture.

First, I have a base class called EntityBase that entity classes such as Order and OrderLine typically inherit from. The EntityBase class is an implementation of the Layer Supertype pattern (See Martin Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley, 2002. ISBN 0-321-12742-0.) In Figure 1, you see a UML diagram describing this relationship between the classes.

Figure 1Figure 1 UML diagram for the custom classes example.

As shown in Figure 1, Order and OrderLine (but here called NewArchOrder and NewArchOrderLine) are very similar to what was shown in the previous article. The main difference here is that they inherit from EntityBase, and therefore get a lot of functionality for free.

Some of the functionalities that EntityBase provides are the following:

EntityBase is actually only a productivity booster. Everything regarding persistence, for example, can be done by just implementing a couple of custom interfaces instead. In this article, however, we assume that EntityBase is used (and that the interfaces mentioned therefore are implemented by EntityBase).

Overview

The EntityBase is found in a layer called the Domain Model. In Figure 2, you see an overview of the different layers in the new architecture.

Figure 2Figure 2 Overview of new architecture.

As shown in the figure, one way to use the Domain Model is to let the client work directly with it. But there is nothing stopping you from letting the client work only with Data Transfer Objects instead. (See Fowler, Patterns of Enterprise Application Architecture.) What I mean is that this discussion is important, no matter what you let the client interact with.

As you get a feeling for Figure 2, this time there is a lot more code that will cooperate to provide the result (actually, too much for those simple tests, but beneficial for real applications). What is the extra code? Stay tuned.

New Architecture Code Examples

As usual, we will take a look at some code samples, both from the server-side and from the client-side. Also as usual, I start from the server-side, but I have a lot more code to cover. First is the code in the class for the Service layer. As you can see in Figure 3, the class is called FetchingOrderInformation__NewArch.

Figure 3Figure 3 The Service layer class of the week.

NOTE

Do you remember my saying last time that I thought I should use entity objects for parameters when doing a fetch by key? As a matter of fact, I have changed my mind. One reason is that each time an object is deserialized, you get a new object. Another reason is that it feels pretty wasteful to send a complete but empty object over the wire just to let the server-side receive the key for the row to be fetched in the database. I therefore find it okay to use plain values instead. But it's nice to not use them as Integer, for example, but as a custom valuetype so you encapsulate the datatype, if possible.

Also, an attribute such as CustomerId shouldn't be used in an Order class. Instead, the Order class should have a Customer instance. Anyway, I haven't done that in the object model used in the article because it would make the tests too much like comparing apples and oranges. Also, when using full-blown objects instead, it gives more functionality—and if you want that functionality, you probably find it okay to pay for it, too.

Listing 1 shows the code for FetchOrderAndLines() from the Service layer.

Listing 1—Service Layer Code

Public Function FetchOrderAndLines _
(ByVal id As Integer) As NewArchOrder
  Return POrder.FetchOrderAndLines(id)
End Function

If you compare the code in Listing 1 with code shown in the previous articles, you find that I delegate all work regarding fetching and instantiating objects to a class called POrder, which lives in the Persistence Access layer. The responsibility of POrder is to encapsulate everything about the database schema regarding orders so that no knowledge about it is found in the Service layer or the Domain Model. The POrder is an implementation of the Data Mapper pattern. (See Fowler, Patterns of Enterprise Application Architecture.)

The FetchOrderAndLines() method of the POrder class is shown in Listing 2.

Listing 2—Persistence Access Layer Code

Public Shared Function FetchOrderAndLines_
(ByVal id As Integer) As NewArchOrder
  Dim aUnitOfWork As New CommandUnitOfWork()

  _AddFetchOrderAndLines2UnitOfWork(id, aUnitOfWork)

  Return _InstantiateOrderAndLines(aUnitOfWork)
End Function

The code in Listing 2 is again very different from what was used in previous articles. What's going on here is that I use an implementation of the Unit of Work pattern. (See Fowler, Patterns of Enterprise Application Architecture.)

You can think of the aUnitOfWork instance as a collector for collecting all information about what should be done against the database. That collecting work has been factored out to the _AddFetchOrderAndLines2UnitOfWork(), which we will look at in a minute. The second task for aUnitOfWork in Listing 2 is to execute statements against the database, and that is factored out to _InstantiateOrderAndLines().

NOTE

The biggest benefit of the Unit of Work pattern is when I execute updates against the database and need an explicit transaction because the transaction will be "compressed" to the shortest timeframe without delays within. Even so, I benefit from the Unit of Work here, too, because it's also a helper for database access that encapsulates a lot of ADO.NET details from the bulk code.

In Listing 3, you find the implementation of _AddFetchOrderAndLines2UnitOfWork().

Listing 3—_AddFetchOrderAndLines2UnitOfWork()

Private Shared Sub _AddFetchOrderAndLines2UnitOfWork _
(ByVal id As Integer, ByVal unitOfWork As UnitOfWork)

  With unitOfWork
    .AddSprocCall(Sprocs.FetchOrderWithLines.Name)

    .AddParameter(Sprocs.Parameters.Id.Name, id, _
    Sprocs.Parameters.Id.Size, False)
  End With
End Sub

In Listing 3, you see that a stored procedure call was added to the unitOfWork, and a parameter was set. There was not very much information about what to do in this case. Anyway, it's clean and simple.

Listing 4 shows the implementation of _InstantiateOrderAndLines().

Listing 4—_InstantiateOrderAndLines()

Private Shared Function _InstantiateOrderAndLines _
(ByVal unitOfWork As UnitOfWork) As NewArchOrder
  Dim anOrder As NewArchOrder

  Dim aResult As IDataReader = _
  unitOfWork.ExecuteReturnDataReader()

  Try
    aResult.Read()
    anOrder = _InstantiateOrderHelper(aResult, False)

    aResult.NextResult()

    Do While aResult.Read
      anOrder.AddOrderLine _
      (_InstantiateOrderLineHelper(aResult))
    Loop

    _EndInitializeFromPersistence(anOrder)
  Finally
    aResult.Close()
  End Try

  Return anOrder
End Function

In Listing 4, a lot more work is done. Here, the unitOfWork is told to execute the statement against the database. Then, once again, more work has been factored out. The work for instantiating a single order and a single order line is found in _InstantiateOrderHelper() and _InstantiateOrderLineHelper(). So, the DataReader is first sent to _InstantiateOrderHelper(); then it's sent to _InstantiateOrderLineHelper() to get the second resultset processed. When the DataReader (and its two resultsets) are processed, the DataReader is closed, and the newly instantiated Order is returned.

_InstantiateOrderHelper() and _InstantiateOrderLineHelper() are pretty similar to each other, so I think it's enough to show _InstantiateOrderHelper(), which is shown in Listing 5.

Listing 5—_InstantiateOrderHelper()

Private Shared Function _InstantiateOrderHelper _
(ByVal dataReader As IDataReader, _
ByVal expanded As Boolean) As NewArchOrder
  Dim anOrder As New NewArchDomain.NewArchOrder _
  (Guid.NewGuid, expanded)

  _StartInitializeFromPersistence(anOrder)

  With anOrder
    .Id = dataReader.GetInt32(Sprocs.OrderColumns.Id)
    .CustomerId = dataReader.GetInt32 _
    (Sprocs.OrderColumns.CustomerId)
    .OrderDate = dataReader.GetDateTime _
    (Sprocs.OrderColumns.OrderDate)
  End With

  Return anOrder
End Function

In Listing 5, I set the anOrder instance in a mode in which it is initialized from persistence so that the dirty support isn't "started," and possible validation rules aren't checked. Then, I set the properties by moving the data from the DataReader to the entity instance. Note that I don't end the initialization mode in Listing 5. That isn't initialized until the complete order instance, including its order lines, is done (late in Listing 2).

As shown in Listing 5, I used Guid.NewGuid instead of fetching a Guid from the database. The reason is because the architecture currently supports Guids only for primary keys, but the database that I used for the prior tests uses INT+IDENTITY for primary keys. Therefore, I just fake a value here.

Also note that I used IDataReader in the code above instead of SqlDataReader as before. That feels much cleaner and better!

The Usual Type of Client-Side Code

And now, as usual, some code from the client-side. To browse through the information in the custom classes, the code could look Listing 6.

Listing 6—Code for browsing through an Order and its OrderLines

Dim anOrder As NewArchOrder = _
_service.FetchOrderAndLines(
_GetRandomId()) _id = anOrder.Id _customerId = anOrder.CustomerId _orderDate = anOrder.OrderDate Dim anOrderLine As NewArchOrderLine For Each anOrderLine In anOrder.OrderLines _productId = anOrderLine.ProductId _priceForEach = anOrderLine.PriceForEach _noOfItems = anOrderLine.NoOfItems _comment = anOrderLine.Comment Next

In Listing 6, you find exactly the same client-side code as for custom classes (discussed last time). The only differences are the type names: NewArchOrder and NewArchOrderLine instead of Order and OrderLine. So, the simplicity at the client-side is still there, and that's very important, in my opinion.

Support Rich Databinding

One major advantage of the DataSet is its built-in support for rich databinding. You get rudimentary databinding automatically when inheriting from CollectionBase or using ArrayList, for example; but that is much weaker than what the DataSet provides. For example, assume that you bind a grid to a DataSet in a Windows Form. The user edits one of the rows, but then presses Escape. The DataSet then knows how to translate that event and cancel the edit. That's just one example of the way a DataSet can give a rich user interface experience.

I haven't put any energy at all into adding rich databinding support in my new architecture yet, but my plan is to use the Decorator pattern. (See Gamma, Helm, Johnson, Vlissides, Design Patterns. Addison-Wesley, 1995, ISBN: 0201633612.) I will decorate (wrap) collections and entity instances at the client side with rich databinding support.

If you find rich databinding important, you should read Rockford Lhotka's great book (Rockford Lhotka, Visual Basic .NET Business Objects. APress, 2003.) He discusses how to add support for rich databinding to a Domain Model at length.

A Couple of Tweaks that Help Performance

As always, when you do something, you have to find a balance among competing requirements. One of the requirements for my new architecture is to get decent performance while also saving the client-side programmer and the business-tier programmer from writing a lot of code. I have done a couple of performance-related tweaks that I want to tell you about. One thing that goes for all the tweaks is that I want to not have to use custom serialization (implementing ISerializable) if I don't have to. Supporting custom serialization with hand-written code is tedious and error-prone. When using code generation, this is less an issue, but we have one task fewer to support in our generator if we don't need custom serialization. So, let's start to see how far we can get when avoiding it.

The first tweak is "collapsed."

NOTE

Before we get going, please note that this article is based on version 1.0 of the .NET Framework. Some of what I discuss here might have changed in 1.1.

Collapsed Objects

The objects can be in one of two states: collapsed or expanded. When an object is collapsed, it typically has only a few properties and is read-only. When an object is expanded, it typically has many properties, is writable, and remembers the values found in the database at the time of instantiation. By using collapsed when appropriate, you reduce serialization size but also memory footprint. The typical situation for collapsed is when you want to fill lists, but you should also use it when it works in other situations. Because collapsed works for the tests today, I think it's fair to use it here. To fulfill the tests, I need only to fetch data, not prepare for updates. Therefore, I need only one set of values for each object, and collapsed is ideal.

Statuses in One Single Byte

I need to hold a couple of statuses for each instance—such as dirty, deleted, and expanded, for example. Instead of having one Boolean field for each, I have one Byte field, and let one bit in that Byte field describe each status. To make the code in EntityBase not ugly, I also have private properties for setting and getting the statuses. According to my tests, I saved 35 bytes for one instance in serialization size by doing this.

Object Vector

All fields—such as CustomerId and OrderDate—are internally stored in an object vector in EntityBase. The main reason for this isn't performance, but to let the EntityBase take care of standard tasks without the subclasses (Order and OrderLine, for example) having to think about it at all. One example is IsValid(). The EntityBase can check the fields and subcollections, the subcollections of the subcollections, and so on without the subclasses having to do anything explicitly on their own (except checking their specific rules, of course).

But as it turns out, this detail helps reduce serialization size a little bit because there is only one variable name for several fields instead of one for each field.

NonSerialized()

There's another thing—which isn't first and foremost a performance optimization—that can help when it comes to reducing serialization size: to mark context-aware fields with <NonSerialized()>. For EntityBase, an example is the expander instance, which helps to expand a collapsed instance, implicitly or explicitly. I don't want that instance to travel over the network anyway, so it's just fine to mark it as <NonSerialized()>.

Another trick is to cache values in EntityBase that are marked as <NonSerialized()> (an example is the offsets I need to use for navigating in the object vector). Those values can be found by asking the subclasses, but because the EntityBase needs them very frequently, I cache them in the EntityBase and I only access them in EntityBase via private read-only properties. Once again, the fields in EntityBase are not serialized to save on serialization space. This is a tweak that probably can be used for many situations.

Using "Dumb" Collections

When I started the work with the new architecture, I let the collections maintain information about the instances, such as whether any of the instances were dirty or not. To make that work, I had to let the instances tell the collections about it. Unfortunately, it proved very slow to use AddHandler to let the collection receive events from the instances. Instead, I used an old-fashioned callback, but I decided later that it wasn't really needed in my applications, so I save a bit in serialization size, too. It also makes serialization much cleaner. As a matter of fact, I actually use ArrayList quite a lot instead of my custom collections. It saves on dumb code that has to be written, but it also makes it problematic to go for cool tweaks with custom serialization in the future. But when necessary, of course, I can always shift to custom collection classes instead.

Another Struggle with Guids

I don't know what it is with me and Guids. I actually like Guids and their characteristics a lot, but they surprise me every now and then. One big surprise was what I wrote about in a previous article for InformIT: I found a huge slowdown when inserting rows to large tables with Guids (or rather UNIQUEIDENTIFIERs) for primary keys.

I found that when having a Guid field in an instance, that Guid field added 107 bytes to the serialization size!!! When I added one more to the same instance, the second one took 40 bytes. When I changed the code and stored the Guid as a bytearray instead, each Guid took 35 bytes in serialization size (still a lot, but less than the ordinary Guid type).

My new architecture currently assumes a Guid as primary key for each table, but that isn't the case with the tables used for the tests here. Therefore, I used <NonSerialized()> for the Guid variable in order not to make the tests unfair.

Oops, Maintainability Must Be Lost!

As discussed in the last article, I consider maintainability to increase when using custom classes (at least when used well). And now I'm talking about obscure tweaks that feel a lot like premature optimizations. That must mean that maintainability loses a lot!

Nope, I don't think so. All those tweaks are done in EntityBase, and that's not where you'll spend a lot of maintenance time when it is somewhat ready. The bulk of the maintenance work is put in the subclasses; for example, when a requirement changes. As a matter of fact, one advantage of the new architecture is that you have to write very little code in the subclasses.

Currently, the only thing you have to write in the current version of the architecture is what is shown in Listing 7 and Listing 8.

Listing 7—Declarations and Code for Dealing with the Object Vector

Private Enum Properties
  PriceForEach
  NoOfItems
  ProductId
  Comment
End Enum

Private Const LastCollapsedField As Properties = Properties.Comment
Private Const LastExpandableField As Properties = Properties.Comment
Private Const FirstComplexField As Properties = Properties.Comment

Protected Overrides Function _
_PositionForLastCollapsedField() As Integer
  Return LastCollapsedField
End Function

Protected Overrides Function _
_PositionForLastExpandableField() As Integer
  Return LastExpandableField
End Function

Protected Overrides Function _
_PositionForFirstComplexField() As Integer
  Return FirstComplexField
End Function

As shown in Listing 7, you have to define what each property in the object vector in the EntityBase means. You also have to define offsets in the object vector and answer questions from the EntityBase about the offsets.

Listing 8—An Example of a Property

Public Property PriceForEach() As Decimal
  Get
    Return DirectCast(_GetProperty _
    (Properties.PriceForEach), Decimal)
  End Get
  Set(ByVal Value As Decimal)
    _SetProperty(Value, Properties.PriceForEach, Nothing)
  End Set
End Property

Listing 8 shows an example of a property get/set. You have to write one such piece of code for each property.

Except for these, the only code you have to write in your subclasses are interesting things such as custom behavior. And that's the stuff you really want to spend the time on.

Even if you decide to use code generation and something like the Generation Gap pattern (see John Vlissides, Pattern Hatching. Addison-Wesley, 1998: ISBN 0-201-43293-5) for generating your subclasses, it's pretty nice to not have more code generated than necessary. Generating the code shown above is pretty simple.

The Generation Gap Pattern

If you apply the Generation Gap, you make the methods shown in Listing 8 overridable; and you never write custom code in those classes, but generate them and regenerate them (again and again, when you need to). Instead, you inherit them and write the custom code in the new subclasses.

And a Few Tweaks Not Applied Yet

Of course, there are a lot more tweaks you can use. For example:

A New Test Application

The test results shown in the previous articles were collected with a scriptable console test application. That works well, but it looks a bit boring and isn't very visual to use in demonstrations. Therefore, I wrote a GUI application to use for a presentation (see Figure 4).

Figure 4Figure 4 A GUI for running the tests—quick-and-dirty or for demonstration.

NOTE

The numbers shown in Figure 4 are just an example from one test round. The results about to be discussed are an average of three tests, after having skipped the extreme good and bad values. (By the way, you can find the GUI, as well as the supplemental code for this article, here).

Results of the Throughput Tests

I re-executed all tests with the GUI, but this time with a setup different from that used in the previous articles. I used only one single machine for everything, so "cross machines" actually means "cross AppDomains" instead. The test results are as before, but what is really important is that you run tests on your own. With real stuff instead. Results from simple tests (such as those following) give only a hint about what you can expect. Nothing more.

Okay, here goes... As usual, all throughput values are recalculated with the values for DataReader as the base. The higher the values, the better. The values within parenthesis are from the tests executed in Parts 1–4.

Table 1—Read One Row

 

One User in AppDomain

One User, Cross Machines

DataReader

1

1

Untyped DataSet

0.6 (0.6)

1.5 (1.4)

Typed DataSet

0.5 (0.4)

1 (1)

Wrapped DataSet

0.6 (0.5)

1.5 (1.3)

Hashtable

1 (0.9)

3.7 (3.5)

Custom Classes

1 (1)

4.1 (4)

New Architecture

1

3.8


Table 2—Read Many Rows

 

One User in AppDomain

One User, Cross Machines

DataReader

1

1

Untyped DataSet

0.7 (0.5)

9.1 (6.9)

Typed DataSet

0.5 (0.5)

8 (6)

Wrapped DataSet

0.6 (0.5)

8.9 (6.6)

Hashtable

0.9 (0.8)

24.4 (17)

Custom Classes

1 (1)

22.3 (15.9)

New Architecture

0.9

20


Table 3—Read One Master Row and Many Detail Rows

 

One User in AppDomain

One User, Cross Machines

DataReader

1

1

Untyped DataSet

0.6 (0.5)

8.2 (6.1)

Typed DataSet

0.5 (0.4)

6.6 (5.1)

Wrapped DataSet

0.6 (0.5)

7.9 (5.8)

Hashtable

0.9 (0.8)

24.4 (16.2)

Custom Classes

1 (0.9)

23.4 (16)

New Architecture

0.9

20.8


NOTE

Something happened to one of the machines I used when running the tests for Parts 1–4 six months ago. It works, but it is soooo slow. So I would have had to switch that machine in the test setup. If I did so, something had changed (together with a lot of small things, of course, such as new versions of this and that). Therefore, I thought it was interesting to show the results from a completely different test environment instead.

To summarize the test results: It's pretty obvious that the New Architecture has performance characteristics that are so far pretty close to the simplified Custom Classes tests. That is very good news. I'm satisfied with that for the moment.

Also worth mentioning is that all tests except the DataReader gain from the setup used here (one single machine instead of several machines that cooperate). It's especially obvious regarding cross machines (or cross AppDomains). My conclusion about why we get this result is that marshalling data is less expensive cross AppDomains relative cross machines, than several small calls (which is the case when the DataReader is used) cross AppDomains relative cross machines.

Conclusion

Without having to change to custom serialization, we get pretty good performance in the new architecture. With a couple of tweaks, we can save a lot of space when the serialized objects are moved over the wire, and that translates directly in better throughput. And thanks to the Layer Supertype pattern the tweaks won't cost that much when it comes to maintainability. (See Fowler, Patterns of Enterprise Application Architecture.)

800 East 96th Street, Indianapolis, Indiana 46240