Home > Articles > Software Development & Management > Object Technology

  • Print
  • + Share This
From the author of

The Cost of Contexts

Now that you understand the relationship between contexts and interface pointers, it is time to focus on the relationship between contexts and objects. Whenever the SCM is asked to instantiate a new instance of a configured class, it consults the catalog to see what services the class requires. The SCM uses this information to decide whether the current context can meet the environmental needs of the new object without additional interception. If it can, the SCM decides to put the new object in its creator's context. If it cannot, the SCM puts the new object in a new context of its own. As it turns out, the SCM makes the latter choice the vast majority of the time. In practice, that means that there are likely to be lots of contexts in a COM+ process.

A Context for Every Object

Most configured classes' declarative attribute settings are such that each new instance of the class has to live in a new context of its own. This is certainly the case for classes that use the default settings assigned by the catalog when they are initially deployed. This "context for every object" approach is not mandated by the context programming model itself; it is simply a result of how the runtime's services are currently implemented.

Consider the seemingly innocuous class attribute described in Table 3-1, EventTrackingEnabled, which controls the COM+ statistics service. If a class is registered with this option set to true (the default), the interception plumbing gathers statistics about its usage, including how many instances of the class exist, how many method calls those objects are currently processing, and the aggregate time the processing is taking in milliseconds. These bits of information are fed to Component Services Explorer, which displays them so that an administrator can monitor ongoing work in a system. To keep these counters accurate for each class, interception plumbing has to know when each call enters and leaves each object. The only way to get this information is to intercept every call. The only way to guarantee that every call will be intercepted is to put each object in a context of its own and never let anyone have direct access to it. This is exactly what the SCM does for classes deployed with EventTrackingEnabled = true.

Table 3-1 The EventTrackingEnabled class attribute








Runtime tracks work in progress

The revelation that the mechanism COM+ uses to gather usage statistics forces each object into a context of its own may prompt you to turn this option off. However, this will not resolve the issue. The declarative transaction and just-in-time activation services require that objects live in contexts of their own as well. Further, even if you turn these services off for a particular class, its application's security settings may force each instance into its own context anyway. If an application is configured to support component-level security, regardless of whether it is enabled, every instance of every class in that application will be in its own context. In this case, it does not matter how individual classes are configured.

The Cost of Contexts in Time

If every object lives in its own context, every call to every object will be intercepted. How much time does all this interception take? Not surprisingly, many factors influence the speed of interception, including the types of the arguments being passed to a method, the sort of marshaler (type library or proxy/stub DLL) being used, whether the call stack has to be moved to another thread, the services the destination context is using, and, of course, the hardware resources COM+ has at its disposal. The only way to know how long interception takes in a particular situation is to test it.

It is possible, however, to talk about the relative performance of interception in a general way. The left column of Table 3-2 lists the three possible degrees of interception. The right column lists the order of magnitude of the number of calls per second you can expect to make at each level, assuming the methods do nothing other than return S_OK. If a caller has a raw pointer to an object, no interception occurs. This is degree 0. Each method call is nothing more than a C++ virtual invocation. In this case, you can expect to make on the order of ten million calls per second.

Table 3-2 Degrees of interception and relative throughput



Order of Magnitude of Calls per Second


None, raw pointer to object



Interception without thread switch



Interception with thread switch*

1,000 or less

If a caller has a pointer to a proxy that forwards calls without a thread switch, interception occurs, and performance is three orders of magnitude lower. This is degree 1. In this case, you can expect to make on the order of ten thousand calls per second. The cost here, in addition to services being invoked, is in making a deep copy of each call's stack frame and translating each context-relative interface pointer, as required. For example, if a caller invokes this method:

HRESULT DoStuff([in] IUnknown pObj, [in] BSTR bstr, [in] long *pn);

the entire stack frame will be duplicated, as shown in Figure 3-9, so that the IUnknown pointer, pUnk, can be converted to a pointer to a proxy that is appropriate for the destination context. Copying the entire stack frame may seem like overkill when a call is going to be serviced on the caller's thread. Interface pointers are the only arguments that really need to be marshaled in this situation; in theory, everything else could be left as is. The type library marshaler, which is used by interfaces marked with either the oleautomation or dual keyword, actually makes some optimizations along these lines. Unfortunately, they apply only to primitive types like longs and shorts, not more complex types like BSTRs and SAFEARRAYs, as shown in Figure 3-10. This makes the type library marshaler marginally faster in some cases, but it turns out to be marginally slower in others; therefore, in general, it does not offer a consistent performance advantage. Hopefully some future version of COM+ will optimize further the behavior of cross-context, same-thread calls.

Figure 3-9
A stack frame for a cross-context, same-thread call

Figure 3-10
An optimized stack frame for a cross-context, same-thread call

If a caller has a pointer to a proxy that forwards calls with a thread switch, performance drops at least one more order of magnitude. This is degree 2. In this last case, you can expect to make on the order of one thousand calls per second. In these situations, a deep copy of the stack frame is always necessary to move a call to another thread. The additional reduction in performance reflects the additional overhead of the thread switch and whatever cross-process or cross-machine communication is necessary.

The Cost of Contexts in Space

Contexts exact a price in space as well as time. Each context is represented by a set of data structures maintained in OLE32.DLL, and they consume memory. Each proxy, channel, and stub also consumes memory. Again, the exact overhead depends on several factors, including what exactly a context is configured to do and whether the interception plumbing has to worry about moving call stacks to other threads or processes. COM+ uses entirely different channel implementations depending on whether a thread switch is required. As a general guideline, the price of a context is between 2,048 and 3,072 bytes, or 2K and 3K. Contexts accessed through channels that switch threads use approximately 3K each. Contexts accessed through channels that do not switch threads are closer to 2K. By contrast, each instance of a generic wizard-generated ATL class—without additional data members—consumes 32 bytes of memory (a minimal ATL class can whittle this down to 8 bytes). Each instance of a generic class implemented in Visual Basic 6 consumes approximately 165 bytes.

Here is a formula for estimating the memory footprint of a COM+ process. It is based on the assumption that each object resides in its own context and is used by a single client (i.e., you are following the object-per-client model). It accounts for the difference in memory consumption of the two types of channels. It ignores the additional overhead introduced by other DLLs your objects might be using.

((n2 x (s2 + 3,072)) + (n1 x (s1 + 2,048))) ÷ 1,024 = k KB.

The variables n and s refer to the number and average size of objects being used. The subscripts 2 and 1 indicate the degree of interception necessary to access those objects. Specifically, the variable n2 is the number of objects in contexts accessed via a proxy and a thread switch—degree 2 interception. The variable n1 is the total number of objects in contexts accessed via a proxy but no thread switch—degree 1 interception. The variable s2 is the average size of the n2 objects in bytes. The variable s1 is the average size of the n1 objects in bytes. The result, k, is the estimated memory footprint of the process in kilobytes (KB).

For example, if you have 500 clients in other processes accessing one object each, n2 = 500. If each of those 500 objects uses three additional objects in contexts that can be reached without a thread switch, n1 = (500 x 3) = 1,500. Altogether, in this case, there are 2,000 objects in 2,000 contexts. If the average size of these objects is s2 = s1 = 32 bytes (the size of a generic ATL object), the memory consumed by the process is approximately k = 4,560 KB, or just under 4.5 megabytes (MB). The actual memory consumed by the objects themselves is 62 KB, about 1.5 percent of the total.

Figure 3-11 shows the memory consumption statistics for this same scenario with four different average sizes for objects. The vertical axis measures memory consumption in kilobytes. The horizontal measures average object size. The shaded area at the bottom of each bar represents the space consumed by contexts and interception plumbing. It is the same in all four cases because there are always 2,000 contexts. The white area at the top of each bar represents the space consumed by objects themselves. It varies across all four cases because each case represents a different average object size. The numbers above the bars indicate the percentage of overall memory the objects consume in each case.

Figure 3-11
Memory consumed by 2,000 objects in 2,000 contexts

The leftmost bar represents the case just described. The other three bars represent the same usage scenario with larger objects. If the average size of the objects is 165 bytes (the size of a generic VB object), the process uses close to k = 4,820 KB, or 4.7 MB, of which 320 KB is consumed by the objects, or 7 percent. If the average size of the objects is 500 bytes, the process consumes k = 5,470 KB, or 5.4 MB. The objects themselves account for around 970 KB, or 18 percent of the total. In this example, objects would have to average 2,300 bytes apiece to account for 50 percent of the memory consumed by their processes.

Are Contexts Worth the Price?

If these numbers are depressing, remember that contexts and interception are not all pain and no gain. You get runtime services in return. How much overhead in time and space would your own version of these services incur? More important, how long would it take to implement and maintain them? Unless you are prepared to integrate your objects with transactions by hand, build your own security framework, and so on, the benefit of contexts and interception cannot be overlooked.

That being said, however, you also have to remember that a COM+ process has a limited set of hardware resources. There is a limit to the number of threads a COM+ process can use to handle client requests effectively without introducing contention for CPU cycles. Every COM+ process also has access to a finite amount of physical memory. Its allotment is assigned by the operating system, which is doing its best to share physical memory—a very precious resource—among all the processes running on a machine. If your COM+ system is accessed via the Web, for instance, you have to share memory with Internet Information Server (IIS), which wants lots of it to cache files. If a process's virtual memory consumption outstrips its physical memory allotment, the operating system starts swapping the process's virtual memory pages out to disk. A virtual page will be swapped back into physical memory when an attempt to reference it generates a page fault. The more virtual memory a process consumes, the more likely it is to generate page faults. The cost of paging virtual memory into and out of physical memory is very high because disks are very slow, at least compared to RAM. In other words, the cost of memory consumption in space ultimately becomes a cost in time, because more and more time is spent waiting for virtual pages to move to and from disk.

If the goal is scalability, you need to use each middle-tier server's hardware resources as efficiently as possible. You need to free threads as fast as you can so they can be used to process other requests. You certainly want to minimize the time threads spend waiting for virtual memory to be paged in from disk. Obviously, as Chapter 1 explains, efficient use of these resources will take you only so far; eventually you will need more hardware. But the thriftier you are, the more throughput you will squeeze out of each middle-tier server and the longer you will postpone the acquisition of additional machines. Interception takes time, and contexts take space. You should take care to use contexts and interception only when you really need them.


*Note: Cross-thread, cross-process, and cross-machine calls are lumped into one category for simplicity's sake. Obviously there are performance differences among these three cases, but they are not always intuitive. Some cross-thread calls in a process run more slowly than calls across processes, for instance. In general, they all provide roughly the same degree of performance degradation, so treating them as one is not unreasonable.

  • + Share This
  • 🔖 Save To Your Account