Physical Layout of a .NET Assembly
The principal unit of deployment and versioning in the .NET Framework is the assembly. One of the most important features of an assembly is the fact that it is self-describing. That is, it contains data that describes the data, otherwise known as metadata. The .NET Framework provides two APIs to extract the metadata from an assembly. The first is a very well-documented unmanaged COM interface, and the other is through the runtime services of reflection. Aided with the documentation provided with the .NET SDK, there is a third method that requires a knowledge of the physical layout of the assembly. This article is about the third method.
Portable Executable Format
When you compile a JAVA program, you typically get a .class file that in turn is run using the java.exe, which loads your program into the JVM, and starts it running. When you compile a program in .NET, you get an assembly. If it is a library, you will get a .DLL file. If it is an executable, you will get an .EXE file. To run a .NET program, Microsoft has taken the extra step to incorporate a .NET assembly into a standard Windows PE file. For example, take the simple Hello World program, shown in Listing 1:
Listing 1C# Hello World
using System; class Hello { public static void Main() { System.Console.WriteLine("Hello world!"); } }
Compiling this program with 'csc HelloWorld.cs' generates a HelloWorld.exe. This file is a .NET assembly, but it is also a standard PE file.
There have been numerous articles written about the PE file format. Two of my favorites are both by Matt Pietrek. He first wrote about the PE file format in March of 1994: "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format." Then, he updated the pedump.exe utility in his article that began in the February 2002 edition of MSDN Magazine: "Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format."
You can view the contents of an assembly as a PE file using Matt Peitrek's PEDump utility, or you can use the the dumpbin utility that has been shipping with every development environment for some time now. Dumpbin is more readily available, but using the PEDump utility, you have the advantage of source along with Matt Pietrek's discussion of PE file format. With VC7, dumpbin is in \Program Files\Microsoft Visual Studio .Net\vc7\bin. You can easily set up your environment to follow along with the rest of this article by executing \Program Files\Microsoft Visual Studio .Net\Common7\Tools\VSVARS32.bat. Running 'dumpbin /ALL HelloWorld.exe' generates something like the output shown in Listing 2.
Listing 2Dumpbin Output
Dump of file HelloWorld.exe PE signature found File Type: EXECUTABLE IMAGE FILE HEADER VALUES . . . OPTIONAL HEADER VALUES . . . 22DE entry point (004022DE) clr Header: 48 cb 2.00 runtime version 207C [ 214] RVA [size] of MetaData Directory 1 flags 6000001 entry point token . . . Section contains the following imports: mscoree.dll 402000 Import Address Table 4022B8 Import Name Table 0 time date stamp 0 Index of first forwarder reference 0 _CorExeMain
In order to save space, the listing has been truncated considerably to include only the portions of the file that pertain to .NET. Starting at the end of the listing first, the imports make sure that the mscoree.dll is loaded into the process. This is the standard way of allowing a PE file to specify which DLLs it depends on. For .NET assemblies, there is only one dependency: mscoree.dll. If you remove or rename mscoree.dll, you will receive the standard DLL cannot be found error. Looking at the OPTIONAL HEADER VALUES section, there is one entry in particular that is interestingthe entry point. For this file, the entry point is 22DE, or if the load does not have any conflicts, the address translates to 04022DE. Looking at that address in the dumpbin output:
004022C0: 00 00 5F 43 6F 72 45 78 65 4D 61 69 6E 00 6D 73 .._CorExeMain.ms 004022D0: 63 6F 72 65 65 2E 64 6C 6C 00 00 00 3B 00 FF 25 coree.dll...;.ÿ% 004022E0: 00 20 40 00 . @.
At address 04022DE, there is the following set of bytes: FF 25 00 20 40 00. These bytes roughly translate into a jump indirect 00402000. So the only code so far is a hook that starts the code running in the CLR. Now when the CLR starts to run, it examines the CLR Header section, loads the metadata and then starts running at the token specified as entry point token. Once decoded, the CLR can determine from the tables where the entry point token is and start a managed execution at that point. In this case, the token is 0x06000001, which corresponds to the first entry of table #6, which is the MethodDef table that corresponds to Main (more on tables, tokens, and metadata later). This has been a really brief discussion on how an assembly tunnels in behind a PE format to be executed. Of primary concern here is the metadata. How do you get at the metadata associated with an assembly?