Home > Articles > Programming > Windows Programming

Physical Layout of a .NET Assembly

📄 Contents

  1. Portable Executable Format
  2. Tables and Metadata
  3. The #~ Stream
  • Print
  • + Share This
  • 💬 Discuss
The principal unit of deployment and versioning in the .NET Framework is the assembly. For a general overview of the physical layout of the assembly, and to learn what is contained in it, read this article by Kevin Burton.
Much of this article was derived from .NET Common Laguage Runtime Unleashed (Sams, 2002).

The principal unit of deployment and versioning in the .NET Framework is the assembly. One of the most important features of an assembly is the fact that it is self-describing. That is, it contains data that describes the data, otherwise known as metadata. The .NET Framework provides two APIs to extract the metadata from an assembly. The first is a very well-documented unmanaged COM interface, and the other is through the runtime services of reflection. Aided with the documentation provided with the .NET SDK, there is a third method that requires a knowledge of the physical layout of the assembly. This article is about the third method.

Portable Executable Format

When you compile a JAVA program, you typically get a .class file that in turn is run using the java.exe, which loads your program into the JVM, and starts it running. When you compile a program in .NET, you get an assembly. If it is a library, you will get a .DLL file. If it is an executable, you will get an .EXE file. To run a .NET program, Microsoft has taken the extra step to incorporate a .NET assembly into a standard Windows PE file. For example, take the simple Hello World program, shown in Listing 1:

Listing 1—C# Hello World

using System;
class Hello 
{
  public static void Main()
  {
    System.Console.WriteLine("Hello world!");
  }
}

Compiling this program with 'csc HelloWorld.cs' generates a HelloWorld.exe. This file is a .NET assembly, but it is also a standard PE file.

There have been numerous articles written about the PE file format. Two of my favorites are both by Matt Pietrek. He first wrote about the PE file format in March of 1994: "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format." Then, he updated the pedump.exe utility in his article that began in the February 2002 edition of MSDN Magazine: "Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format."

You can view the contents of an assembly as a PE file using Matt Peitrek's PEDump utility, or you can use the the dumpbin utility that has been shipping with every development environment for some time now. Dumpbin is more readily available, but using the PEDump utility, you have the advantage of source along with Matt Pietrek's discussion of PE file format. With VC7, dumpbin is in \Program Files\Microsoft Visual Studio .Net\vc7\bin. You can easily set up your environment to follow along with the rest of this article by executing \Program Files\Microsoft Visual Studio .Net\Common7\Tools\VSVARS32.bat. Running 'dumpbin /ALL HelloWorld.exe' generates something like the output shown in Listing 2.

Listing 2—Dumpbin Output

Dump of file HelloWorld.exe

PE signature found

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES
. . .

OPTIONAL HEADER VALUES
. . .
      22DE entry point (004022DE)

 clr Header:

       48 cb
      2.00 runtime version
      207C [   214] RVA [size] of MetaData Directory
        1 flags
     6000001 entry point token
. . .

 Section contains the following imports:

  mscoree.dll
        402000 Import Address Table
        4022B8 Import Name Table
           0 time date stamp
           0 Index of first forwarder reference

          0 _CorExeMain

In order to save space, the listing has been truncated considerably to include only the portions of the file that pertain to .NET. Starting at the end of the listing first, the imports make sure that the mscoree.dll is loaded into the process. This is the standard way of allowing a PE file to specify which DLLs it depends on. For .NET assemblies, there is only one dependency: mscoree.dll. If you remove or rename mscoree.dll, you will receive the standard DLL cannot be found error. Looking at the OPTIONAL HEADER VALUES section, there is one entry in particular that is interesting—the entry point. For this file, the entry point is 22DE, or if the load does not have any conflicts, the address translates to 04022DE. Looking at that address in the dumpbin output:

 004022C0: 00 00 5F 43 6F 72 45 78 65 4D 61 69 6E 00 6D 73 .._CorExeMain.ms
 004022D0: 63 6F 72 65 65 2E 64 6C 6C 00 00 00 3B 00 FF 25 coree.dll...;.ÿ%
 004022E0: 00 20 40 00                   . @.

At address 04022DE, there is the following set of bytes: FF 25 00 20 40 00. These bytes roughly translate into a jump indirect 00402000. So the only code so far is a hook that starts the code running in the CLR. Now when the CLR starts to run, it examines the CLR Header section, loads the metadata and then starts running at the token specified as entry point token. Once decoded, the CLR can determine from the tables where the entry point token is and start a managed execution at that point. In this case, the token is 0x06000001, which corresponds to the first entry of table #6, which is the MethodDef table that corresponds to Main (more on tables, tokens, and metadata later). This has been a really brief discussion on how an assembly tunnels in behind a PE format to be executed. Of primary concern here is the metadata. How do you get at the metadata associated with an assembly?

  • + Share This
  • 🔖 Save To Your Account

Discussions

comments powered by Disqus