Home > Articles > Programming > Windows Programming

Exploring the CLR

  • Print
  • + Share This
This chapter is from the book

MSIL

I mentioned earlier that the two major contents of a .NET assembly are metadata and MSIL code. Now that I have beaten the subject of .NET metadata to death, let's move the discussion along to MSIL. You can think of MSIL as a virtual assembly language. It defines a set of assembly language-like operations that are easily translated into the native instruction set of most modern CPUs.

Common Intermediate Language (CIL)

In the documentation for the .NET Framework SDK, you will see MSIL referred to as CIL. You may also hear the CLR referred to as the CLI. The difference between these names is that the CLI and CIL (as well as the CTS) are part of a specification that Microsoft has submitted to the European Computer Manufacturer's Association (ECMA) for ratification as a standardized platform. The CLR and MSIL are Microsoft's proprietary implementation of the CLI and CIL, respectively. Other vendors are now free to implement the CLI or CIL although it remains to be seen if any will actually do so. You can find out everything you want to know about MSIL in Partition III of the CLI documentation, which you can find in the Tool Developer's Guide documentation.

The CLR uses the VES to compile the MSIL code into native code on the fly as you run a managed code executable. This process of converting MSIL code into native code is called JIT compilation. The VES does not compile the code all at once; it compiles it method by method as the code is used. As soon as the MSIL for a method is compiled into native code, the CLR replaces the MSIL code for that method with the compiled native code so that the CLR can simply run the native code the next time the method is called.

NOTE

It would be very easy to think that MSIL is equivalent to Java byte codes and that the CLR is equivalent to the Java runtime. Although MSIL is actually conceptually similar to Java bytecodes, which also define a virtual assembly language, the CLR does not interpret MSIL code as the Java runtime does. MSIL code is always compiled into native and then executed. Just remember that MSIL is never interpreted, and the CLR is not an interpreter.

MSIL implements a stack-based instruction set. What this means is that you execute an instruction by first loading the arguments into a Last In First Out (LIFO) data structure. This data structure is called a stack, and loading arguments into this data structure is usually referred to as pushing arguments onto the stack. You then execute an instruction, which will pop the arguments off the stack and replace the item at the top of the stack with the result of the operation. The CIL documentation uses the following stack transition diagram to illustrate this type of operation.

value1, value2 ‡…, result

This diagram indicates that two values (value1 and value2) must be pushed onto the stack prior to executing this operation. The values are pushed left to right in this notation so value2 is at the top of the stack. The instruction will then perform some operation on the stack operands, pop the operands off the stack and leave the result on the top of the stack. An example of an operation that would have a stack transition diagram like this is the multiplication (mul) instruction. Here is the stack transition diagram for a unary instruction (one that only requires a single operand). The neg instruction, which simply negates a number, is an example of such an instruction.

value ‡…, result

Notice that I only push one argument on the stack and then the operation will pop the argument off the stack and leave the result at the top of the stack. Some instructions require no stack operand. The best examples of this are the push and pop operations themselves. Here is the stack transition diagram for the ldarg function, which you will use to push method arguments onto the stack.

‡…, value

This simply indicates that no arguments need to be pushed onto the stack prior to executing the instruction, and the result of the operation is that the specified value will reside on the top of the stack.

There are two major categories of instructions in MSIL: (1) base instructions and (2) object model instructions. The base instructions are the instructions that you use to move data on and off the stack; perform essential arithmetic like add, subtract, multiply, divide; and perform bitwise operations like AND, OR, NOT, XOR, and left and right shifts. The base instructions also include branching operations that are used by most programming language to implement flow of control, such as branch on false, branch on not false, branch if equal, branch if greater than, unconditional branch, and so forth. The basic instruction set also includes comparison instructions like compare equal, compare greater than, and compare less than and functions for copying and initializing memory blocks and calling and returning from methods. The base instruction set forms a Turing Complete set of operations, meaning that, with these instructions, you can perform all the calculations expected of a modern computer. The MSIL instruction set closely mirrors the instruction set of a modern microprocessor. This makes it simple to implement an MSIL-to-native-code compiler.

Table 3-9 lists some of the instructions in the base instruction set for moving data on and off the stack.

Table 3–9 Instructions for moving data on and off the stack

Instruction

Description

ldarg num

Pushes argument num onto the stack.

ldarg.0, ldarg.1, ldarg.2, ldarg.3

Is the short form for pushing arguments 0, 1, 2, or 3 on to the stack.

ldarga num

Pushes the address of argument num on to the stack. The ldarga instruction should only be used for by-ref parameter passing. In most cases, you should use ldarg.

ldloc indx

Pushes the local variable identified by index (indx) on to the stack.

ldloc.0, ldloc.1, ldloc.2, ldloc.3

Is the short form for pushing local variable with index 0, 1, 2, or 3.

ldloca indx

Pushes the address of the local variable identified by index (indx) on to the stack.

ldc.<type> value

Pushes the numeric constant value of type <type> on to the stack, e.g. idc.i4 5 will push the value 5 onto the stack as a 4-byte integer, idc.i8 3 will push the value 3 onto the stack as an 8-byte integer, ldc.r4 3.5 will push the value 4-byte float 3.5 onto the stack, ldc.r8 4.8 will push the 8-byte float (double) value 4.8 on to the stack.

starg num

Stores the item at the top of the stack (pops the stack) into argument num.

stloc indx

Pops the stack into the local variable identified by index (indx).

stloc.0, stloc.1, stloc.2, stloc.3

Is the short form for popping the stack into local variable with index 0, 1, 2, or 3.

stind.<type>

The stack transition diagram for this instruction is as follows: addr, val ‡… This instruction stores the value val of type <type> into the address identified by addr. stind.i4 will store a 4-byte integer at the specified address.

Pop

Removes the top element on the stack.


Table 3–10 lists some of the instructions in the base instruction set for performing arithmetic operations on data.

Table 3–10 Instructions for performing arithmetic operations

Instruction

Description

Stack Transition Diagram

add

Adds value1 and value2.

value1, value2 ‡…, result

sub

Subtracts value2 from value1.

value1, value2 ‡…, result

mul

Multiplies two values.

value1, value2 ‡…, result

div

Divides two values and returns a quotient or floating point result.

value1, value2 ‡…, result

rem

Calculates the remainder of value1 divided by value2.

value1, value2 ‡…, result


Table 3–11 lists some of the branching and flow control instructions in the base instruction set.

Table 3–11 Branching and flow control instructions

Instruction

Description

Stack Transition Diagram

beq target

Branch to instruction (target) if value1 = value2.

value1, value2 ‡…

bne.un target

Branch to instruction (target) if value1 <> value2 or is unordered.

value1, value2 ‡…

bge target

Branch to instruction (target) if value1 >= value2.

value1, value2 ‡…

bgt target

Branch to instruction (target) if value1 > value2.

value1, value2 ‡…

ble target

Branch to instruction (target) if value1 <= value2.

value1, value2 ‡…

brfalse target

Branch to instruction (target) if value = false, null, zero.

value ‡…

brtrue target

Branch to instruction (target) if value = non-false, non-null.

value ‡…

br target

Unconditional branch.

‡…

call method

Calls the method identified by method.

, arg1, arg2 argn ‡…, retVal (not always returned)

ret

Return from a method The methods stack must be empty except for the return value (if there is one). This return value will be copied from the method's stack to the stack of its caller.

Return value on method's stack (not always present) ‡…, return value on callers stack (not always present)


Table 3-12 contains some of the bitwise instructions in the base instruction set.

Table 3–12 Bit manipulation instructions

Instruction

Description

Stack Transition Diagram

or

Computes the bitwise OR of value1 and value2.

value1, value2 ‡…, result

and

Computes the bitwise AND of value1 and value2.

value1, value2 ‡…, result

xor

Computes the bitwise XOR of value1 and value2.

value1, value2 ‡…, result

not

Computes the bitwise complement of the value on the top of the stack.

value ‡…, result


The object model instructions are built on the base instructions, and they provide a common set of services to high-level, object-oriented programming languages. Let's take a look at some of the object model instructions. These instructions provide a set of services that simplify the development of high-level, object-oriented languages. These services include accessing and updating the fields of an object, making late-bound (virtual) method calls, boxing and unboxing objects, creating arrays and accessing and updating the elements of an array, instantiating and type-casting objects, and throwing exceptions. Table 3–13 contains a partial list of the instructions in the object model instruction set.

Table 3–13 object model instructions

Instruction

Description

Stack Transition Diagram

newobj ctor

Create a new, uninitialized object or value type and call its constructor.

arg1, argN ‡… , obj

ldfld field

Push a field of an object onto the stack.

obj ‡…, value

callvirt method

Calls a late-bound method on an object.

obj, arg1, argN ‡…, return value (optional)

stfld field

Updates the value of a field of a specified object with a new value.

obj, value ‡… ,

box valueTypeToken

Converts a value type object to a reference type object.

valueObj ‡…, refObj

unbox valueTypeToken

Converts the boxed (reference type) representation of a value type back to its value type form.

refObj ‡…, valueObj

castclass class

Casts an object to a specified class.

obj ‡…, obj2

initobj classtoken

Initializes all the fields of the value object to null or a 0 of the specified type.

addrOfValueObj ‡…,

cpobj classtoken

Copies a value object of type indicated by classtoken from sourceObj to destObj.

destValueObj, srcValueObj ‡… ,

ldobj classtoken

Loads an instance of the value type indicated by classtoken onto the stack.

addrOfValueObj ‡…, valueObj

stobj classtoken

Copies an instance of the type indicated by classtoken from the stack into memory.

addr, valueObj ‡…,

newarr etype

Creates a new array of the type indicated by etype.

numElems ‡… , array

ldelem.<type>

Pushes the element of type <type> onto the stack. ldelem.i4 will push the element as 4-byte integer, ldelem.i8 will push the element as an 8-byte integer, and so forth

array, index ‡…, value

stelem.<type>

Stores the value on the stack of type <type> into the array element indicated by index.

array, index, value ‡… ,

ldlen

Pushes the number of elements of an array on to the stack.

array ‡…, length


Now that you know more about MSIL than you probably wanted to know, let's take a look at some MSIL code. I'll start by taking a look at

the MSIL code for the GetSalary method in the Manager class. To view the MSIL, perform the following steps:

  1. Open a Visual Studio .NET command prompt by making the following selection from the Start menu: Programs | Microsoft Visual Studio .NET | Visual Studio .NET Tools | Visual Studio .NET Command Prompt.

  2. Change directories to the location where you built the multifile assembly that you have been using throughout this chapter.

  3. Enter the following command at the Visual Studio .NET command prompt:

ildasm Manager.mod

It's important that you run ildasm on the Manager.mod module because this is the file that contains the implementation of the Manager class. In the ildasm main window, find the GetSalary method in the Manager class as shown in Figure 3–6.

Figure 06Figure 3–6 The Manager class as viewed in ildasm.


Now double-click the GetSalary method, and you should see a window that contains the following code. I did clean up the code slightly to make it a little more readable.

.method public hidebysig virtual instance valuetype 
[mscorlib]System.Decimal GetSalary() cil managed
{
 // Code size    22 (0x16)
 .maxstack 2
 .locals init (valuetype [mscorlib]System.Decimal V_0)
 IL_0000: ldarg.0
 IL_0001: call    instance valuetype System.Decimal 

[.module Employee.mod]Employee::GetSalary()
 IL_0006: ldarg.0
 IL_0007: ldfld valuetype System.Decimal Manager::mBonus
 IL_000c: call  valuetype System.Decimal 
System.Decimal::op_Addition(
valuetype System.Decimal,
    valuetype System.Decimal)
 IL_0011: stloc.0
 IL_0012: br.s    IL_0014
 IL_0014: ldloc.0
 IL_0015: ret
} // end of method Manager::GetSalary

The C# source code for this method is as follows:

public override decimal GetSalary()
{
  return base.GetSalary()+mBonus;
}

The first two lines in the MSIL code declare the maximum stack size for the method and initialize a local variable for the method:

.maxstack 2
 .locals init (valuetype [mscorlib]System.Decimal V_0)

You don't explicitly declare a local variable in this method, but the compiler has to create one in the generated MSIL to store the return value of this method temporarily before you return it. To understand the next few lines, you have to first know that every nonstatic method of a class has an implied zeroth argument that contains the "this" pointer for the current instance:

IL_0000: ldarg.0
 IL_0001: call    instance valuetype System.Decimal 
 [.module Employee.mod]Employee::GetSalary()

Therefore, the previous code pushes the "this" pointer on the stack and then calls the GetSalary method of the base class of Manager (Employee). This method returns the Salary (without the Manager's bonus) of the Employee. The return value of the method will be left on the top of the stack. This is the convention that MSIL uses for return values of methods.

On the next three lines, you load the "this" pointer for the current object and then use the ldfld instruction from the Object Model instruction set to load the mBonus field from the Manager instance:

IL_0006: ldarg.0
IL_0007: ldfld   valuetype System.Decimal 
Manager::mBonus
IL_000c: call    valuetype System.Decimal 
System.Decimal::op_Addition(
valuetype System.Decimal, 
valuetype System.Decimal)

The ldfld instruction will pop the "this" pointer off the stack. Therefore, after the ldfld instruction, the top two items on the stack will be the Salary for the current instance (without Manager's bonus) and the bonus amount (in the mBonus) field for the current Manager instance. The next line of code calls the op_Addition method in the System.Decimal class to add these two values together, yielding the total salary for the Manager. Remember a Decimal is a valuetype object that is declared in the Base class library; the regular add instruction will not work for this type. After the op_Addition method call, the result will be at the top of the stack.

The next four lines store the return value from the op_Addition method in the local variable for this method:

IL_0011: stloc.0
IL_0012: br.s    IL_0014
IL_0014: ldloc.0
IL_0015: ret

I'll just have to come clean and admit that I have no idea why the compiler generated the line of code labeled IL_0012. Notice that this line of code is just an unconditional branch to the next line. The next line after this oddity simply loads a local variable on to the stack. This line is pushing the return value on to the stack. You must push the return value of a method on to the stack (if there is one) before you return. The final line is a call of the ret instruction, which will return from the method and put the return value at the top of the stack of the calling method.

Do you see how simple it is to read MSIL after you understand the base and object model instruction sets? Let's try out your newly acquired knowledge on some system code. The base class library in the .NET Framework contains a full-featured set of collection classes. The collection classes include stack, queue, hashtable, arraylist, and sorted list classes. You can learn a lot about how the stack collection class is implemented by examining the implementation of the push method.

To do this, navigate to the latest version of your .NET Framework directory. On my machine, the directory is D:\WINNT\Microsoft.NET\Framework\v1.0.3705. The collection classes are found in an assembly called mscorlib.dll. Enter the following command at a command prompt in this directory:

ildasm mscorlib.dll

Now find the stack class as shown in Figure 3-7 and double-click the push method within this class.

Figure 07Figure 3–7 The System.Collection.Stack class as viewed in ildasm.


The following code shows the MSIL code for the push method in the System.Collection.Stack class. Once again, I did clean up the code a little to enhance its readability. Let's break down this code section by section.

.method public hidebysig newslot virtual 
    instance void Push(object obj) cil managed
{
 // Code size    100 (0x64)
 .maxstack 5
 .locals (object[] V_0, int32 V_1)
 IL_0000: ldarg.0
 IL_0001: ldfld   int32 Stack::_size
 IL_0006: ldarg.0
 IL_0007: ldfld   object[] Stack::_array
 IL_000c: ldlen
 IL_000d: conv.i4
 IL_000e: bne.un.s  IL_003c
 IL_0010: ldc.i4.2
 IL_0011: ldarg.0
 IL_0012: ldfld   object[] Stack::_array
 IL_0017: ldlen
 IL_0018: conv.i4
 IL_0019: mul
 IL_001a: conv.ovf.u4
 IL_001b: newarr   System.Object
 IL_0020: stloc.0
 IL_0021: ldarg.0
 IL_0022: ldfld   object[] Stack::_array
 IL_0027: ldc.i4.0
 IL_0028: ldloc.0
 IL_0029: ldc.i4.0
 IL_002a: ldarg.0
 IL_002b: ldfld   int32 .Stack::_size
 IL_0030: call    void System.Array::Copy(
class System.Array,int32,
class System.Array,int32,int32)
 IL_0035: ldarg.0
 IL_0036: ldloc.0
 IL_0037: stfld   object[] Stack::_array
 IL_003c: ldarg.0
 IL_003d: ldfld   object[] Stack::_array
 IL_0042: ldarg.0
 IL_0043: dup
 IL_0044: ldfld   int32 Stack::_size
 IL_0049: dup
 IL_004a: stloc.1
 IL_004b: ldc.i4.1
 IL_004c: add
 IL_004d: stfld   int32 Stack::_size
 IL_0052: ldloc.1
 IL_0053: ldarg.1
 IL_0054: stelem.ref
 IL_0055: ldarg.0
 IL_0056: dup
 IL_0057: ldfld   int32 Stack::_version
 IL_005c: ldc.i4.1
 IL_005d: add
 IL_005e: stfld   int32 Stack::_version
 IL_0063: ret
} // end of method Stack::Push

The first line of code declares two local variables called V_0, and V_1, which are typed as an object array and a 4-byte integer, respectively.

 .locals (object[] V_0, int32 V_1)

I will call these two local variables newArray and curSize. You will see why shortly. The next seven lines of code are simply an "if" statement.

 IL_0000: ldarg.0
 IL_0001: ldfld   int32 Stack::_size
 IL_0006: ldarg.0
 IL_0007: ldfld   object[] Stack::_array
 IL_000c: ldlen
 IL_000d: conv.i4
 IL_000e: bne.un.s  IL_003c

If you recall that the instruction ldarg.0 loads the "this" pointer for the current object and that the ldfld instruction loads a field of an object, you can see that lines (IL_0000 thru IL_007) are simply loading two private variables of the stack object onto the execution stack: (1) the current number of elements in the stack collection (_size) and (2) an object array (_array) that holds the contents of the stack collection. The next two lines (IL_000C and IL_000D) will replace the top item on the execution stack with the declared length of the internal array and convert the length to a 4-byte integer. Line IL_000e will branch to line IL_003C if the top two items in the execution stack are not equal to each other, in other words, if the number of elements in the stack collection is not equal to the declared length of the internal array. If I was to convert this MSIL code into C#, it would look as follows.

If ( this._size == this._array.Length)
{
// Lines IL_0010 thru IL_0037 go here
}

Therefore, the lines of MSIL code shown previously (IL_0000 thru IL_000e) are testing to see if the internal storage that was allocated for the stack contents is full. Let's look at lines IL_0010 through IL_0037 to see how the stack collection expands its internal storage if there are too many elements to fit within the current storage buffer. In other words, you are looking to see what happens when the "if" statement evaluates to true. These lines are as follows:

 IL_0010: ldc.i4.2
 IL_0011: ldarg.0
 IL_0012: ldfld   object[] Stack::_array
 IL_0017: ldlen
 IL_0018: conv.i4
 IL_0019: mul
 IL_001a: conv.ovf.u4
 IL_001b: newarr   System.Object
 IL_0020: stloc.0
 IL_0021: ldarg.0
 IL_0022: ldfld   object[] Stack::_array
 IL_0027: ldc.i4.0
 IL_0028: ldloc.0
 IL_0029: ldc.i4.0
 IL_002a: ldarg.0
 IL_002b: ldfld   int32 Stack::_size
 IL_0030: call    void System.Array::Copy(
class System.Array,
        int32,class System.Array,
        int32,int32)
 IL_0035: ldarg.0
 IL_0036: ldloc.0
 IL_0037: stfld   object[] Stack::_array

Line IL_0010 loads the number 2 onto the stack. Lines IL_0011 through IL_0018 load the current length of the array that contains the stored elements of the stack. Line IL_0019 multiplies this length by 2, and line IL_001a converts the multiplication result to an unsigned integer. Line IL_001b instantiates a new array of System.Object instances whose length is the multiplication result, and line IL_0020 stores this new array in the local variable with index zero. Remember I had called this local variable newArray earlier. Do you see why? This local variable is used to store the new expanded array. So now, if you converted lines IL_0010 through IL_0020 into C#, you would get the following code:

newArray=new System.Object[(uint)(2*this._array.Length)];

In other words, if the internal storage array for a stack collection is full, when you attempt to add a new item to the stack collection, the stack collection will create a new array that is double the size of the existing array. On lines IL_0021 through IL_0030, the MSIL is setting up a call to the Copy method in the Array class to copy the contents of the existing array (_array) into the new larger array (newArray). The copy function takes five parameters in order from left to right: (1) the source array, (2) the starting index in the source array to copy from, (3) the destination array, (4) the starting index in the destination array to copy to, and (5) the number of elements to copy from the source to the destination array. You must push all of these arguments, in order from left to right, onto the execution stack before calling the Copy method. In other words, you push the first argument (the source array) first and the last argument (the number of elements to copy) last. Lines IL_0021 and IL_0022 push the _array field of the stack collection instance onto the execution stack; this is the source array. Line IL_0027 pushes 0 for the starting index in the source array. Line IL_0028 pushes the destination array onto the execution stack; remember this array is stored in the local variable at index 0. Line IL_0029 pushes the starting index in the destination array onto the stack, and lines IL_002a and IL_002b push the current size of the stack collection onto the stack, which is the number of elements that you want to copy from the source to the destination. Line IL_0030 makes the actual call to the copy function.

The next three lines push the this pointer and the newArray local variable onto the execution stack and then store the newArray into the _array local variable.

IL_0035: ldarg.0
IL_0036: ldloc.0
IL_0037: stfld   object[] Stack::_array

This will remove the only outstanding reference to the _array environment variable and thereby free it to be garbage-collected the next time the garbage collection algorithm runs.

So far, the code that you have looked at is the code that will be executed only if the internal storage for the stack needs to be expanded. The next section of code is the instructions in the push method of the stack collection class that actually places the new element on to the stack. Let's take a look at this code, which runs from IL_003c to IL_0063:

 IL_003c: ldarg.0
 IL_003d: ldfld   object[] Stack::_array
 IL_0042: ldarg.0
 IL_0043: dup
 IL_0044: ldfld   int32 Stack::_size
 IL_0049: dup
 IL_004a: stloc.1
 IL_004b: ldc.i4.1
 IL_004c: add
 IL_004d: stfld   int32 Stack::_size
 IL_0052: ldloc.1
 IL_0053: ldarg.1
 IL_0054: stelem.ref
 IL_0055: ldarg.0
 IL_0056: dup
 IL_0057: ldfld   int32 Stack::_version
 IL_005c: ldc.i4.1
 IL_005d: add
 IL_005e: stfld   int32 Stack::_version
 IL_0063: ret

The lines from IL_003c to IL_004c are difficult to understand unless you understand how the stack works. Just keep in mind that most instructions require one, two, or three arguments to be on the stack. The instruction will use the arguments on the execution stack and then pop them off the stack when it is done and replace the arguments with the result of the operation if there is one. Line IL_003c pushes the "this" pointer for the Stack collection on to the execution stack. Line IL_003d pushes the _array member field of the Stack collection onto the execution stack; it also pops the "this" pointer off the execution stack. Line IL_0042 pushes the "this" pointer onto the execution stack, and the next line, IL_0043, duplicates the top element on the stack. The next two lines load the _size field on to the stack and duplicate it. The ldfld instruction also uses one of the "this" pointers on the stack. Therefore, after line IL_0049, the execution stack will look like Figure 3–8.

Figure 08Figure 3–8 The execution stack after Instruction IL_0049.

Line IL_004a will store the top item on the execution stack into the local variable with index 1. Remember that I called this variable currSize. This instruction will also pop one of the _size values off the execution stack (see Figure 3–8). Line IL_004b pushes the literal value 1 on to the execution stack, and line IL_004c will add the value 1 to the remaining _size value on the top of the execution stack (and also pop the remaining _size value off the execution stack, replacing it with the result of the operation). The top of the execution stack will now contain the value _size + 1. Line IL_004d will store this value into the _size field and pop the remaining "this" pointer off the execution stack. The top of the execution stack will now contain the _array field. Line IL_0052 will push the currSize local variable, which contains the size of the array, on to the execution stack. This value will be used as the index position for the new element in the stack collection. Line IL_0053 will load the argument to this method on to the execution stack. In this case, the argument is the object that you are pushing on to the stack collection.

NOTE

Remember argument 0 is the "this" pointer. Argument 1 is the first real argument.

Line IL_0054 will store the object into the internal array at the index specified on line IL_0052. It will also pop the _array field, the currSize field, and the object (argument 1) off the execution stack. Lines IL_0055 and IL_0056 load the "this" pointer onto the stack and duplicate it. Line IL_0057 will push the _version field onto the execution stack (and pop one of the "this" pointers off the stack). Line IL_005c will load the literal value 1 on to the stack, and line IL_005d will add 1 to _version, and line IL_005e stores this new value back to the _version field. In other words, lines IL_0055 through IL_005e are equivalent to the following C# code:

this._version = this._version + 1;

It's not clear to me what the _version field is used for. I looked at the code for the Pop method in the stack collection class, and the _version field is incremented each time you call the pop method also. The _version field appears to track the number of times you perform an operation on the stack collection. It's not clear why this is necessary because I could not find any properties or methods that return or use this information. The final line of MSIL code, IL_0063, is obviously just a return instruction. The push method in the System.Collections.Stack class has no return value (it's typed as a void), so the execution stack is left empty when you return. Therefore, now I can show you the complete, decompiled C# code for the Push method in the System.Collections.Stack class.

public void Push(System.Object arg1)
{
  System.Object[] newArray;
  int currSize;
  if ( this._size == this._array.Length)
{
   newArray=new System.Object
    [(uint)(2*this._array.Length)];
   System.Array.Copy(_array,0,newArray,0,this._size);
   _array=newArray;
  }
  currSize=this._size;
  this._size=this._size+1;
  this._array[currSize]=arg1;
  this._version=this._version+1;
}

I think you can see how easy that was. The key difference between a .NET assembly and a regular Win32 DLL is that the Win32 DLL contains machine code, an almost unintelligible encoding of 0s and 1s. Sure, you can disassemble this code into x86 assembly language code, but native, x86 assembly language still does not contain the kind of high-level instructions (particularly the object model instructions) that make MSIL so easy to decipher. Moreover, it is much easier to read assembly language code if it uses only stack-based instructions like MSIL does. When an instruction set has lots of CPU registers, it is harder to figure out what's going on because different compilers will use these registers in different ways.

The fact that MSIL code is so easy to read is both good news and bad news. The good news is that you can always understand exactly how a class or method is implemented, even if you do not have the source code for the class. I have used my ability to decipher MSIL code a number of times while writing this book to gain insight into how certain aspects of the .NET Framework's class library are implemented.

NOTE

Now that you know how the push method in the stack collection class is implemented, you can be smarter about how you use it. Because you know that the push method will double the size of the internal storage every time it has to grow the stack, you know that it is important to try to initialize the stack to the largest size that you think you might need. One of the constructors for the stack collection class allows you to specify the initial size for the stack's internal storage.

The bad news, of course, is that, if you write your code and ship it as a .NET assembly, other people will be able to decipher your code. Your competitors can easily gain access to your intellectual property, and a savvy programmer can easily steal your proprietary algorithms. At first, when I realized this, I was horrified, and I questioned whether this fact alone would cause people to avoid using the .NET Framework. After thinking about it more, I realized that it's probably not a big deal as long as you know that this issue exists. For a start, obfuscation technology already exists that makes it nearly impossible for someone to decipher the MSIL code in your .NET assemblies.

NOTE

Some of the .NET obfuscation products that are available include Salamander, which is made by a company called Remote Soft (see http://www.remotesoft.com for more information), DotFuscator, from preEmptive Solutions (see http://www.preemptive.com), and Demeanor from Wise Owl (see http://www.wiseowl.com). Unfortunately, these obfuscators cost anywhere from a few hundred to more than a thousand dollars. Desaware makes an open source obfuscator called QND-Obfuscator, which is available for "free" if you purchase an e-book for $39.95. See http://www.desaware.com for more information.

If you're worried that determined intellectual property thieves may someday find a way to subvert these obfuscators (a valid concern), you can still hide your most sensitive intellectual property by implementing key algorithms as an unmanaged COM server using Visual C++ or VB6 and then use COM Interop (which I discuss extensively in this book) to call the methods in this COM server.

  • + Share This
  • 🔖 Save To Your Account