Common Language Runtime
In this section, we delve more deeply into the structure of .NET by examining the CLR. We look at the design goals of the CLR and discuss the rationale for using managed code and a runtime. We outline the design of the CLR, including the concepts of MSIL, metadata, and JIT compilation. We compare the CLR with the Java Virtual Machine. We discuss the key concept in .NET of assembly, which is a logical grouping of code. We explore the central role of types in .NET and look at the Common Type System (CTS). We explain the role of managed data and garbage collection. Finally, we use the Intermediate Language Disassembler (ILDASM) tool to gain some insight into the structure of assemblies.
Design Goals of the CLR
The CLR has the following design goals:
Simplify application development
Support multiple programming languages
Provide a safe and reliable execution environment
Simplify deployment and administration
Provide good performance and scalability
Simple Application Development
With more than 2,500 classes, the .NET Framework class library provides enormous functionality that the programmer can reuse. The object-oriented and component features of .NET enable organizations to create their own reusable code. Unlike COM, the programmer does not have to implement any plumbing code to gain the advantages of components. Automatic garbage collection greatly simplifies memory management in applications. The CLR facilitates powerful tools such as Visual Studio.NET that can provide common functionality and the same UI for multiple languages.
The CLR was designed from the ground up to support multiple languages. This feature is the most significant difference between .NET and Java, which share a great deal in philosophy.
The CTS makes interoperability between languages virtually seamless. The same built-in data types can be used in multiple languages. Classes defined in one language can be used in another language. A class in one language can even inherit from a class in another language. Exceptions can be thrown from one language to another.
Programmers do not have to learn a new language in order to use .NET. The same tools can work for all .NET languages. You can debug from one language into another.
Safe Execution Environment
With the CLR, a compiler generates MSIL instructions, not native code. It is this managed code that runs. Hence, the CLR can perform runtime validations on this code before it is translated into native code. Types are verified. Subscripts are verified to be in range. Unsafe casts and uninitialized variables are prevented.
The CLR performs memory management. Managed code cannot access memory directly. No pointers are allowed. This means that your code cannot inadvertently write over memory that does not belong to it, possibly causing a crash or other bad behavior.
The CLR can enforce strong security. One of the challenges of the software world of third party components and downloadable code is that you open your system to damage from executing code from unknown sources. You might want to restrict Word macros from accessing anything other than the document that contains them. You want to stop potentially malicious Web scripts. You even want to shield your system from bugs of software from known vendors. To handle these situations, .NET security includes Code Access Security (CAS).
Simpler Deployment and Administration
With the CLR, the unit of deployment becomes an assembly, which is typically an EXE or a DLL. The assembly contains a manifest, which allows much more information to be stored.
An assembly is completely self-describing. No information needs to be stored in the registry. All the information is in one place, and the code cannot get out of sync with information stored elsewhere, such as in the registry, a type library, or a header file.
The assembly is the unit of versioning, so that multiple versions can be deployed side by side in different folders. These different versions can execute at the same time without interfering with each other.
Assemblies can be private or shared. For private assembly deployment, the assembly is copied to the same directory as the client program that references it. No registration is needed, and no fancy installation program is required. When the component is removed, no registry cleanup is needed, and no uninstall program is required. Just delete it from the hard drive.
In shared assembly deployment, an assembly is installed in the Global Assembly Cache (or GAC). The GAC contains shared assemblies that are globally accessible to all .NET applications on the machine. A download assembly cache is accessible to applications such as Internet Explorer that automatically download assemblies over the network.
You may like the safety and ease-of-use features of managed code, but you may be concerned about performance. It is somewhat analogous to the concerns of early assembly language programmers when high-level languages came out.
The CLR is designed with high performance in mind. JIT compilation is designed into the CLR. The first time a method is encountered, the CLR performs verifications and then compiles the method into native code (which will contain safety features, such as array bounds checking). The next time the method is encountered, the native code executes directly.
Memory management is designed for high performance. Allocation is almost instantaneous, just taking the next available storage from the managed heap. Deallocation is done by the garbage collector, which Microsoft has tweaked for efficiency.
Why Use a CLR?
Why did Microsoft create a CLR for .NET? Let's look at how well the goals just discussed could have been achieved without a CLR, focusing on the two main goals of safety and performance. Basically, there are two philosophies. The first is compile-time checking and fast native code at runtime. The second is runtime checking.
Without a CLR, we must rely on the compiler to achieve safety. This places a high burden on the compiler. Typically, there are many compilers for a system, including third-party compilers. It is not robust to trust that every compiler from every vendor will adequately perform all safety checking. Not every language has features supporting adequate safety checking. Compilation speed is slow with complex compilation. Compilers cannot conveniently optimize code based on enhanced instructions available on some platforms but not on others. What's more, many features (such as security) cannot be detected until runtime.
Design of Common Language Runtime
So we want a runtime. How do we design it? One extreme is to use an interpreter and not a compiler at all. All the work is done at runtime. We have safety and fast builds, but runtime performance is very slow. Modern systems divide the load between the front-end compiler and the back-end runtime.
The front-end compiler does all the checking it can do and generates an intermediate language. Examples include
P-code for Pascal
Bytecode for Java
The runtime does further verification based on the actual runtime characteristics, including security checking.
With JIT compilation, native code can be generated when needed and subsequently reused. Runtime performance becomes much better. The native code generated by the runtime can be more efficient, because the runtime knows the precise characteristics of the target machine.
Microsoft Intermediate Language
All managed code compilers for Microsoft .NET generate MSIL. MSIL is machine-independent and can be efficiently compiled into native code.
MSIL has a wide variety of instructions:
Standard operations such as load, store, arithmetic and logic, branch, etc.
Calling methods on objects
Before executing on a CPU, MSIL must be translated by a JIT compiler. There is a JIT compiler for each machine architecture supported. The same MSIL will run on any supported machine.
Besides generating MSIL, a managed code compiler emits metadata. Metadata contains very complete information about the code module, including the following:
Version and locale information
All the types
Details about each type, including name, visibility, etc.
Details about the members of each type, such as methods, the signatures of methods, etc.
Types are at the heart of the programming model for the CLR. A type is analogous to a class in most object-oriented programming languages, providing an abstraction of data and behavior, grouped together. A type in the CLR contains the following:
- Fields (data members)
There are also built-in primitive types, such as integer and floating point numeric types, strings, etc. In the CLR, there are no functions outside of types, but all behavior is provided via methods or other members. We discuss types under the guise of classes and value types when we cover VB.NET.
Metadata is the "glue" that binds together the executing code, the CLR, and tools such as compilers, debuggers, and browsers. On Windows, MSIL and metadata are packaged together in a standard Windows PE file. Metadata enables "Intellisense" in Visual Studio. In .NET, you can call from one language to another, and metadata enables types to be converted transparently. Metadata is ubiquitous in the .NET environment.
Before executing on the target machine, MSIL is translated by a JIT compiler to native code. Some code typically will never be executed during a program run. Hence, it may be more efficient to translate MSIL as needed during execution, storing the native code for reuse.
When a type is loaded, the loader attaches a stub to each method of the type. On the first call, the stub passes control to the JIT, which translates to native code and modifies the stub to save the address of the translated native code. On subsequent calls to the method, the native code is called directly.
As part of JIT compilation, code goes through a verification process. Type safety is verified, using both the MSIL and metadata. Security restrictions are checked.
Common Type System
At the heart of the CLR is the Common Type System (CTS). The CTS provides a wide range of types and operations that are found in many programming languages. The CTS is shared by the CLR and by compilers and other tools.
The CTS provides a framework for cross-language integration and addresses a number of issues:
Similar, but subtly different, types (for example, Integer is 16 bits in VB6, but int is 32 bits in C++; strings in VB6 are represented as BSTRs and in C++ as char pointers or a string class of some sort; and so on)
Limited code reuse (for example, you can't define a new type in one language and import it into another language)
Inconsistent object models
Not all CTS types are available in all languages. The CLS establishes rules that must be followed for cross-language integration, including which types must be supported by a CLS-compliant language. Built-in types can be accessed through the System class in the Base Class Library (BCL) and through reserved keywords in the .NET languages.
In Chapter 4, we begin our discussion of data types with the simple data types. We continue the discussion of types in Chapter 11, where we introduce reference types such as class and interface. At all times, you should bear in mind that there is a mapping between types in VB.NET, represented by keywords, and the types defined by the CTS, as implemented by the CLR.
Managed Data and Garbage Collection
Managed code is only part of the story of the CLR. A significant simplification of the programming model is provided through managed data. When an application domain is initialized, the CLR reserves a contiguous block of storage known as the managed heap. Allocation from the managed heap is extremely fast. The next available space is simply returned, in contrast to the C runtime, which must search its heap for space that is large enough.
Deallocation is not performed by the user program but by the CLR, using a process known as garbage collection. The CLR tracks the use of memory allocated on the managed heap. When memory is low, or in response to an explicit call from a program, the CLR "garbage collects" (or frees up all unreferenced memory) and compacts the space that is now free into a large contiguous block.