Download Sample Project
Introduction
Source code that is written and executed under the .NET Common Language Runtime (CLR) is referred to as managed code. The managed compiler translates the associate source code files into low-level .NET CIL code, an assembly manifest, and type metadata. Hence, MSIL is one of the .NET framework's supported programming languages, where we can create, build, and compile .NET application using standalone CIL code. Moreover, MSIL code is the backbone of every .NET assembly and the deeper you dig into the CIL instruction set, the better you'll understand the inner working of advanced .NET application development. This article will present an introduction into CIL code semantics by presenting a simple program using CIL opcodes, and discuss the role of the CIL compiler, ILASM.EXE to build and execute the resulting .NET assembly without using the typical Visual Studio IDE build process.
Essentials
Programming with the CIL instruction can be a challenge because CIL developers encounter CLR's built-in grammar directly. This grammer is referred to as opcodes, instead of the more user-friendly languages such as C#, F#, and VB.NET, which use a more English-like language syntax. It is advisable to have the following tools installed at when programming CIL directly.
- Visual Studio IDE 2010, or later
- .NET Framework 4.0, or later
- ILDASM.EXE, ILASM.EXE utilities
- Notepad++
- SharpDevelop (optional)
- Xamarin Studio (optional)
Although CIL code could be authored via simple notepad editor, it is recommended that you write CIL code using a full-fledged editor, like SharpDevelop.
MSIL Internals
A .NET assembly contains CIL code, which is conceptually similar to Java bytecode in that it is not compiled to platform-specific instructions until it is about to be executed. The .NET CLR leverages a just-in-time (JIT) compiler for each CPU targeted by the runtime, each optimized for the underlying platform. The .NET binaries contain metadata that describes the characteristics of every type within the binary. The metadata is officially termed a manifest, which contains information about the current version of the assembly, a list of all externally referenced assemblies, and culture information.
Figure 1 illustrates that .NET source code is eventually compiled into CIL, rather than directly to native machine instructions. Because all .NET languages are compiled to a common language, this allows components written in different .NET languages to interact with each other. Furthermore, CIL code provides the same benefits java professionals have come accustomed to.
Figure 1: The .NET Compilation Life-Cycle
.NET translates each supported programming language to CIL mnemonics. Intermediate codes (IL) tend to be cryptic and difficult to understand. For instance, while loading a string variable into memory, we don't employ a user friendly opcode name StringLoading, but rather ldstr. Consider the C# program in Listing 1. This simple program adds two numeric values using the testCalculation method. The .NET binaries do not contain platform-specific instructions, but rather platform-agnostic IL code, which is generated by the C# compiler (CSC.EXE) during the build process.
Listing 1: Simple C# console application
class Program
{
static void Main(string[] args)
{
// Method Calling
testCalculation(20,40);
Console.ReadKey();
}
// Demo static Method
static void testCalculation(int iPar1, int iPar2)
{
int Result;
Result = iPar1 + iPar2;
Console.WriteLine("Calculation Output :: {0}",Result);
}
}
Once you compile this code, the CLR locates, and loads, the .NET binary into memory and you ultimately end up with a single *.EXE assembly that contains a manifest, metadata, and CIL instructions. Fortunately, the .NET framework ships with an excellent utility to disassemble any .NET binary into its corresponding IL code: ILDASM.EXE.
We could employ the ILDASM.EXE utility to disassemble the IL code either in either command-prompt mode or using a typical GUI representation. If you were to open this assembly using ILDASM.EXE in GUI mode, you can see the real back-end representation of each C# statement in corresponding CIL opcode instruction sets as shown in Figure 2.
Figure 2: The CIL Type.exe Assembly
The ILDASM.EXE loads any .NET assembly and displays its contents, including CIL code, manifest, and metadata. ILDASM.EXE is typically capable of dumping the entire metadata from .NET binaries in CIL opcode representation. Let's double click the testCalculation method to examine its underlying generated CIL as shown in Figure 3.
Figure 3: CIL Code
In addition, you can press Ctrl+M to explore the type metadata for the currently loaded assembly. This will display the testCalculation as shown in Figure 4.
Figure 4: Metadata
IL Opcode Grammar
CIL is a full-fledged, object-oriented programming language like C#. It includes all the typical OOP features such as inheritance, classes, control statements, interfaces, and much more. As previously mentioned, we can author .NET application directly in MSIL without even using the Visual Studio IDE. But why is CIL programming is so important to understand? Because it aid developers to better write, debug, and maintain code. Table 1 includes a brief list, with descriptions, of the typical Common Intermediate Language (CIL) instruction set.
Table 1: IL opcode meanings
Opcode |
Description |
nop |
No operation is performed either |
sub, div, add, mul, rem |
Perform basic math operations |
add.ovf |
Specify signed integer value with overflow check |
add.ovf.un |
Specify un-signed integer value with overflow check |
box, unbox |
Reference type to value type and vice-versa conversion |
br.s, br |
Jumping to another label at a specific location |
castclass |
Type casting of an instance to different type |
bgt |
Control branching statement for greater than condition |
beg |
Control branching statement for equal to condition |
ble |
Control branching statement for less than or equal to condition |
break |
Typical debugger breakpoint |
brnull |
Branch to target if value is null |
dup |
Duplicate the value on the stack |
call |
Calls the specified method |
ceq |
If value1 is equal to value2 then push 1 else 0 |
cgt |
If value1 is greater to value2 then push 1 else 0 |
clt |
If value1 is lesser to value2 then push 1 else 0 |
ldc |
Loads the constants value into memory stack |
ldobj |
Loads the object value into memory stack |
ldstr |
Loads string value into memory stack |
ldarg |
Loads the address of an argument of a function into memory stack |
arglist |
Return argument list handle for current method |
readonly |
Specify the array address operation performs no type check at runtime |
starg |
Stores value from stack to method argument lists |
stloc |
Obtain current value of a variable from stack and copy into local variable |
stobj |
Store a type value to a memory address |
callvirt |
Calling a virtual function (VC++, C++/CLI) |
brtrue.s |
Branch execution in case condition is non-zero of true |
brfalse.s |
Branch execution continues in case condition false |
pop |
Removes a value from the top of stack |
ret |
Terminate the flow of execution of method body |
throw |
Throws an exception |
rethrow |
Rethrows the current exception |
tail |
Terminates current method call |
volatile |
Specifies that a pointer reference is volatile |
Similarly, Table 2 illustrates how many C# data types map to corresponding CIL types.
Table 2: CIL Data Types Mapping
CIL Data Type |
C# Counterpart |
int32 |
int |
unsigned int32 |
uint |
int64 |
long |
float32 |
float |
float64 |
double |
bool |
bool |
string |
string |
object |
object |
char |
char |
unsigned int8 |
byte |
Creating Your First IL Program
So, ready to take up the challenge? Authoring pure IL code is a rather cumbersome task compared to authoring languages like C#. We can develop any type of application. For instance, console, Windows, and Web-based applications, but developers might feel restricted while coding without support for IntelliSense. IL coding can be done using any straight text editor such as Notepad--this is the real beauty of IL coding. We are going to write a simple “Hello World!” program on Notepad and later compile that code using ILASM.EXE utility. To do this, open Notepad, enter the code shown in Listing 2, and save it with an *.IL extension, such as "Helloworld.il".
Listing 2: First “Hello World” program coding in IL
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
.ver 4:0:0:0
}
.assembly cilHelloWorld
{
.hash algorithm 0x00008004
.ver 1:0:0:0
}
.module cilHelloWorld.exe
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00020003
// =============== CLASS MEMBERS DECLARATION ===================//
.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 8
IL_0000: nop
IL_0001: ldstr "First CIL program, Hello World!"
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: call string [mscorlib]System.Console::ReadLine()
IL_0011: pop
IL_0012: ret
}
//=================Constructor================================//
.method public hidebysig specialname rtspecialname instance void .ctor() cil managed
{
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ret
} // end of constructor
}
// ======================End of Class================================//
As mentioned, CIL supports .NET classes, methods, namespaces, and types using attributes and directives. The important thing to remember about CIL directives is that they are never crafted with a dot prefix, as you might see in C#.
To build this program, open the Visual Studio command prompt to compile using ILASM.EXE, which compiles and debugs the HelloWorld.il file and produce a corresponding executable file.
HelloWorld.il Compilation Process using ILASM.EXE
E:\>ilasm /exe HelloWorld.il /output=CompileHelloWorld.exe
Microsoft (R) .NET Framework IL Assembler. Version 4.0.30319.17929
Copyright (c) Microsoft Corporation. All rights reserved.
Assembling 'HelloWorld.il' to EXE --> 'CompileHelloWorld.exe'
Source file is UTF-8
Assembled method cilHelloWorld.Program::Main
Creating PE file
Emitting classes:
Class 1: cilHelloWorld.Program
Emitting fields and methods:
Global
Class 1 Methods: 1;
Emitting events and properties:
Global
Class 1
Writing PE file
Operation completed successfully
After finishing with the IL coding or making any type of modification to the code, it is recommended that you verify each build using the PEVERIFY.EXE command line utility, which examines all labels within the specified assembly for valid CIL directives.
CompileHelloWorld.exe Verification
E:\>peverify CompileHelloWorld.exe
Microsoft (R) .NET Framework PE Verifier. Version 4.0.30319.18020
Copyright (c) Microsoft Corporation. All rights reserved.
All Classes and Methods in CompileHelloWorld.exe Verified.
Finally, it is time to test the generated .NET assembly (executable) file, to confirm it produces the desired output. The program can be run by entering its name at the command prompt.
CompileHelloWorld.exe Execution
E:\>CompileHelloWorld.exe
First CIL program, Hello World!
Programmers usually don't need to be too concerned with the binary opcodes unless they write some extremely low-level .NET software. Instead, CIL code might be of particular interest to those trying to reverse engineer some .NET software, attempt to fix buggy software or detects subtle vulnerabilities by disassembling the executable. Sometimes missed coding glitches inadvertently make it into the final executable can be exploited later by malicious hackers. Reverse engineers typically tend to utilize CIL code for adding or removing features in existing software when source code is not available.
Code Analysis
The HelloWorld.il file begins by declaring the .assembly extern token for referencing MSCORLIB.DLL file. The .publickeytoken attribute specifies the public key token value of the MSCORLIB.DLL file, and the .ver attribute determines the version of the .NET platform that the application will target.
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89)
.ver 4:0:0:0
}
The next section defines the assembly namespace name as "cilHelloWorld", followed by its version number 1.0.0.0 and hashing algorithm attributes.
.assembly cilHelloWorld
{
.hash algorithm 0x00008004
.ver 1:0:0:0
}
Then the .module directive determines the type of final producing assembly as such executable or DLL file.
.module cilHelloWorld.exe
Thereafter, the imagebase directive to 0x00400000 which establish the base address where the binary is to be loaded.
.imagebase 0x00400000
The .file directive adds some definition to the manifest of the assembly, which is useful for documentation.
.file alignment 0x00000200
The .stackreserve directive configures the default stack size to 0x00100000.
.stackreserve 0x00100000
The .subsystem directive indicates if the application is a console or GUI-based program. Here, it is set to 3 for a console-based program. Set it to 2 for GUI-based programs.
.subsystem 0x0003
The .corflags directive establishes the default run-time header information in the CLI.
.corflags 0x00020003
After defining all essentials directives such as .module, .corflags, .imagebase and so on, we define the class Program type, which extends the System.Object type. Here, the beforefieldinit stipulates that the type should be initialized before a static field value.
.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object
I will discuss all .NET type definitions in detail, in terms of IL coding, in an upcoming article but here, it is essential to mention the definition of a default class constructor in the IL file.
.method public hidebysig specialname rtspecialname instance void .ctor() cil managed
The Program class contains the definition for the application entry point method void Main. Here, the hidebysig conceal the base class interface of this method as follows.
.method private hidebysig static void Main(string[] args) cil managed
The method which is the entry point is of a program will always contain the following directive.
.entrypoint
The .maxstack directive, set here with a default value of 8, specifies the maximum number of variables pushed in to the stack while executing.
.maxstack 8
Now, the real implementation starts in the Main() method body, by portraying various tokens. These token are called code labels (IL_0001, IL006). In fact, these code labels are completely optional and can be removed.
IL_0000: nop
IL_0001: ldstr "First CIL program, Hello World!"
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: call string [mscorlib]System.Console::ReadLine()
IL_0011: pop
IL_0012: ret
The code above starts with nop (no operation). Then the ldstr instruction loads a string with a value "First CIL program, Hello World!" into the memory stack. Finally, the call instruction invokes the Console.WriteLine() method to print that string. After another nop, a pop instruction removes the current value from the top of stack and places it into a local variable, and the program terminates using the ret instruction.
Conclusion
As we have seen, .NET assemblies contain CIL code, which is compiled to platform-specific instructions using the JIT compiler. In addition, we have explored assembly metadata, and manifest contents by examining the CIL opcode using the ILDASM.EXE utility as well as the description of keywords typically used when writing CIL code. Using only IL keywords or labels, we have created a simple "Hello world!" program in genuine IL code, and came to better understand IL code in the process.
Note: This article is the first of a two-part series on MSIL programming. Also see MSIL Programming Part 2.
References
- ECMA-335 manual
- MS-Press book visual C# 2008 : The language
End-User License
Use of this article and any related source code or other files is governed
by the terms and conditions of
.
Author Information
Ajay Yadav
Ajay Yadav is an author, Cyber Security Specialist, Subject-Matter-Expert, Software Engineer, and System Programmer with more than eight years of work experience on diverse technology domains. He earned a Master and Bachelor Degree in Computer Science, along with numerous premier professional certifications from Microsoft, EC-council, and Red-hat. For several years, he has been researching on Reverse Engineering, Secure Source Coding, Advance Software Debugging, Vulnerability Assessment, System Programming and Exploit Development. He is a regular contributor to various international programming journals as well as assists developer community with writing blogs, research articles, tutorials, training material and books on sophisticated technology. His spare time activity includes tourism, movies and meditation. He can be reached at om.ajay007@gmail.com;