Login


MSIL Programming Part 1

By Ajay Yadav on 10/24/2014
Language: CIL
Technology: .NET
Platform: Windows
License: CPOL
Views: 43,768
Frameworks & Libraries » .NET » General » MSIL Programming Part 1

MSIL Demo Project

Download Sample Project Download Sample Project

Introduction

Source code that is written and executed under the .NET Common Language Runtime (CLR) is referred to as managed code. The managed compiler translates the associate source code files into low-level .NET CIL code, an assembly manifest, and type metadata. Hence, MSIL is one of the .NET framework's supported programming languages, where we can create, build, and compile .NET application using standalone CIL code. Moreover, MSIL code is the backbone of every .NET assembly and the deeper you dig into the CIL instruction set, the better you'll understand the inner working of advanced .NET application development. This article will present an introduction into CIL code semantics by presenting a simple program using CIL opcodes, and discuss the role of the CIL compiler, ILASM.EXE to build and execute the resulting .NET assembly without using the typical Visual Studio IDE build process.

Essentials

Programming with the CIL instruction can be a challenge because CIL developers encounter CLR's built-in grammar directly. This grammer is referred to as opcodes, instead of the more user-friendly languages such as C#, F#, and VB.NET, which use a more English-like language syntax. It is advisable to have the following tools installed at when programming CIL directly.

  • Visual Studio IDE 2010, or later
  • .NET Framework 4.0, or later
  • ILDASM.EXE, ILASM.EXE utilities
  • Notepad++
  • SharpDevelop (optional)
  • Xamarin Studio (optional)

Although CIL code could be authored via simple notepad editor, it is recommended that you write CIL code using a full-fledged editor, like SharpDevelop.

MSIL Internals

A .NET assembly contains CIL code, which is conceptually similar to Java bytecode in that it is not compiled to platform-specific instructions until it is about to be executed. The .NET CLR leverages a just-in-time (JIT) compiler for each CPU targeted by the runtime, each optimized for the underlying platform. The .NET binaries contain metadata that describes the characteristics of every type within the binary. The metadata is officially termed a manifest, which contains information about the current version of the assembly, a list of all externally referenced assemblies, and culture information.

Figure 1 illustrates that .NET source code is eventually compiled into CIL, rather than directly to native machine instructions. Because all .NET languages are compiled to a common language, this allows components written in different .NET languages to interact with each other. Furthermore, CIL code provides the same benefits java professionals have come accustomed to.

Figure 1: The .NET Compilation Life-Cycle

The .NET Compilation Life-Cycle

.NET translates each supported programming language to CIL mnemonics. Intermediate codes (IL) tend to be cryptic and difficult to understand. For instance, while loading a string variable into memory, we don't employ a user friendly opcode name StringLoading, but rather ldstr. Consider the C# program in Listing 1. This simple program adds two numeric values using the testCalculation method. The .NET binaries do not contain platform-specific instructions, but rather platform-agnostic IL code, which is generated by the C# compiler (CSC.EXE) during the build process.

Listing 1: Simple C# console application

class Program
{
    static void Main(string[] args)
    {
        // Method Calling
        testCalculation(20,40);
        Console.ReadKey(); 
    }
        
    // Demo static Method
    static void testCalculation(int iPar1, int iPar2)
    {
        int Result;
        Result = iPar1 + iPar2;
        Console.WriteLine("Calculation Output :: {0}",Result);  
    }
}

Once you compile this code, the CLR locates, and loads, the .NET binary into memory and you ultimately end up with a single *.EXE assembly that contains a manifest, metadata, and CIL instructions. Fortunately, the .NET framework ships with an excellent utility to disassemble any .NET binary into its corresponding IL code: ILDASM.EXE.

We could employ the ILDASM.EXE utility to disassemble the IL code either in either command-prompt mode or using a typical GUI representation. If you were to open this assembly using ILDASM.EXE in GUI mode, you can see the real back-end representation of each C# statement in corresponding CIL opcode instruction sets as shown in Figure 2.

Figure 2: The CIL Type.exe Assembly

The CIL Type.exe Assembly

The ILDASM.EXE loads any .NET assembly and displays its contents, including CIL code, manifest, and metadata. ILDASM.EXE is typically capable of dumping the entire metadata from .NET binaries in CIL opcode representation. Let's double click the testCalculation method to examine its underlying generated CIL as shown in Figure 3.

Figure 3: CIL Code

CIL Code

In addition, you can press Ctrl+M to explore the type metadata for the currently loaded assembly. This will display the testCalculation as shown in Figure 4.

Figure 4: Metadata

Metadata

IL Opcode Grammar

CIL is a full-fledged, object-oriented programming language like C#. It includes all the typical OOP features such as inheritance, classes, control statements, interfaces, and much more. As previously mentioned, we can author .NET application directly in MSIL without even using the Visual Studio IDE. But why is CIL programming is so important to understand? Because it aid developers to better write, debug, and maintain code. Table 1 includes a brief list, with descriptions, of the typical Common Intermediate Language (CIL) instruction set.

Table 1: IL opcode meanings

Opcode Description
nop No operation is performed either
sub, div, add, mul, rem Perform basic math operations
add.ovf Specify signed integer value with overflow check
add.ovf.un Specify un-signed integer value with overflow check
box, unbox Reference type to value type and vice-versa conversion
br.s, br Jumping to another label at a specific location
castclass Type casting of an instance to different type
bgt Control branching statement for greater than condition
beg Control branching statement for equal to condition
ble Control branching statement for less than or equal to condition
break Typical debugger breakpoint
brnull Branch to target if value is null
dup Duplicate the value on the stack
call Calls the specified method
ceq If value1 is equal to value2 then push 1 else 0
cgt If value1 is greater to value2 then push 1 else 0
clt If value1 is lesser to value2 then push 1 else 0
ldc Loads the constants value into memory stack
ldobj Loads the object value into memory stack
ldstr Loads string value into memory stack
ldarg Loads the address of an argument of a function into memory stack
arglist Return argument list handle for current method
readonly Specify the array address operation performs no type check at runtime
starg Stores value from stack to method argument lists
stloc Obtain current value of a variable from stack and copy into local variable
stobj Store a type value to a memory address
callvirt Calling a virtual function (VC++, C++/CLI)
brtrue.s Branch execution in case condition is non-zero of true
brfalse.s Branch execution continues in case condition false
pop Removes a value from the top of stack
ret Terminate the flow of execution of method body
throw Throws an exception
rethrow Rethrows the current exception
tail Terminates current method call
volatile Specifies that a pointer reference is volatile

Similarly, Table 2 illustrates how many C# data types map to corresponding CIL types.

Table 2: CIL Data Types Mapping

CIL Data Type C# Counterpart
int32 int
unsigned int32 uint
int64 long
float32 float
float64 double
bool bool
string string
object object
char char
unsigned int8 byte

Creating Your First IL Program

So, ready to take up the challenge? Authoring pure IL code is a rather cumbersome task compared to authoring languages like C#. We can develop any type of application. For instance, console, Windows, and Web-based applications, but developers might feel restricted while coding without support for IntelliSense. IL coding can be done using any straight text editor such as Notepad--this is the real beauty of IL coding. We are going to write a simple “Hello World!” program on Notepad and later compile that code using ILASM.EXE utility. To do this, open Notepad, enter the code shown in Listing 2, and save it with an *.IL extension, such as "Helloworld.il".

Listing 2: First “Hello World” program coding in IL

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )        
  .ver 4:0:0:0
}
.assembly cilHelloWorld
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}
.module cilHelloWorld.exe

.imagebase           0x00400000
.file alignment      0x00000200
.stackreserve        0x00100000
.subsystem           0x0003       
.corflags            0x00020003   

// =============== CLASS MEMBERS DECLARATION ===================//

.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object
{
    .method private hidebysig static void  Main(string[] args) cil managed
    {
        .entrypoint
        .maxstack  8
        IL_0000:  nop
        IL_0001:  ldstr      "First CIL program, Hello World!"
        IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000b:  nop
        IL_000c:  call       string [mscorlib]System.Console::ReadLine()
        IL_0011:  pop
        IL_0012:  ret
    } 
    //=================Constructor================================//
    .method public hidebysig specialname rtspecialname instance void  .ctor() cil managed
    {
        .maxstack  8
        IL_0000:  ldarg.0
        IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
        IL_0006:  ret
    } // end of constructor
}

// ======================End of Class================================//

As mentioned, CIL supports .NET classes, methods, namespaces, and types using attributes and directives. The important thing to remember about CIL directives is that they are never crafted with a dot prefix, as you might see in C#.

To build this program, open the Visual Studio command prompt to compile using ILASM.EXE, which compiles and debugs the HelloWorld.il file and produce a corresponding executable file.

HelloWorld.il Compilation Process using ILASM.EXE

E:\>ilasm /exe HelloWorld.il /output=CompileHelloWorld.exe

Microsoft (R) .NET Framework IL Assembler.  Version 4.0.30319.17929
Copyright (c) Microsoft Corporation.  All rights reserved.
Assembling 'HelloWorld.il'  to EXE --> 'CompileHelloWorld.exe'
Source file is UTF-8

Assembled method cilHelloWorld.Program::Main
Creating PE file

Emitting classes:
Class 1:        cilHelloWorld.Program

Emitting fields and methods:
Global
Class 1 Methods: 1;

Emitting events and properties:
Global
Class 1
Writing PE file
Operation completed successfully

After finishing with the IL coding or making any type of modification to the code, it is recommended that you verify each build using the PEVERIFY.EXE command line utility, which examines all labels within the specified assembly for valid CIL directives.

CompileHelloWorld.exe Verification

E:\>peverify CompileHelloWorld.exe

Microsoft (R) .NET Framework PE Verifier.  Version  4.0.30319.18020
Copyright (c) Microsoft Corporation.  All rights reserved.

All Classes and Methods in CompileHelloWorld.exe Verified.

Finally, it is time to test the generated .NET assembly (executable) file, to confirm it produces the desired output. The program can be run by entering its name at the command prompt.

CompileHelloWorld.exe Execution

E:\>CompileHelloWorld.exe
First CIL program, Hello World!

Programmers usually don't need to be too concerned with the binary opcodes unless they write some extremely low-level .NET software. Instead, CIL code might be of particular interest to those trying to reverse engineer some .NET software, attempt to fix buggy software or detects subtle vulnerabilities by disassembling the executable. Sometimes missed coding glitches inadvertently make it into the final executable can be exploited later by malicious hackers. Reverse engineers typically tend to utilize CIL code for adding or removing features in existing software when source code is not available.

Code Analysis

The HelloWorld.il file begins by declaring the .assembly extern token for referencing MSCORLIB.DLL file. The .publickeytoken attribute specifies the public key token value of the MSCORLIB.DLL file, and the .ver attribute determines the version of the .NET platform that the application will target.

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89)        
  .ver 4:0:0:0
}

The next section defines the assembly namespace name as "cilHelloWorld", followed by its version number 1.0.0.0 and hashing algorithm attributes.

.assembly cilHelloWorld
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}

Then the .module directive determines the type of final producing assembly as such executable or DLL file.

.module cilHelloWorld.exe

Thereafter, the imagebase directive to 0x00400000 which establish the base address where the binary is to be loaded.

.imagebase   0x00400000

The .file directive adds some definition to the manifest of the assembly, which is useful for documentation.

.file alignment      0x00000200

The .stackreserve directive configures the default stack size to 0x00100000.

.stackreserve        0x00100000

The .subsystem directive indicates if the application is a console or GUI-based program. Here, it is set to 3 for a console-based program. Set it to 2 for GUI-based programs.

.subsystem           0x0003

The .corflags directive establishes the default run-time header information in the CLI.

.corflags            0x00020003

After defining all essentials directives such as .module, .corflags, .imagebase and so on, we define the class Program type, which extends the System.Object type. Here, the beforefieldinit stipulates that the type should be initialized before a static field value.

.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object

I will discuss all .NET type definitions in detail, in terms of IL coding, in an upcoming article but here, it is essential to mention the definition of a default class constructor in the IL file.

.method public hidebysig specialname rtspecialname instance void .ctor() cil managed

The Program class contains the definition for the application entry point method void Main. Here, the hidebysig conceal the base class interface of this method as follows.

.method private hidebysig static void Main(string[] args) cil managed

The method which is the entry point is of a program will always contain the following directive.

.entrypoint

The .maxstack directive, set here with a default value of 8, specifies the maximum number of variables pushed in to the stack while executing.

.maxstack 8

Now, the real implementation starts in the Main() method body, by portraying various tokens. These token are called code labels (IL_0001, IL006). In fact, these code labels are completely optional and can be removed.

IL_0000:  nop
IL_0001:  ldstr      "First CIL program, Hello World!"
IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
IL_000b:  nop
IL_000c:  call       string [mscorlib]System.Console::ReadLine()
IL_0011:  pop
IL_0012:  ret

The code above starts with nop (no operation). Then the ldstr instruction loads a string with a value "First CIL program, Hello World!" into the memory stack. Finally, the call instruction invokes the Console.WriteLine() method to print that string. After another nop, a pop instruction removes the current value from the top of stack and places it into a local variable, and the program terminates using the ret instruction.

Conclusion

As we have seen, .NET assemblies contain CIL code, which is compiled to platform-specific instructions using the JIT compiler. In addition, we have explored assembly metadata, and manifest contents by examining the CIL opcode using the ILDASM.EXE utility as well as the description of keywords typically used when writing CIL code. Using only IL keywords or labels, we have created a simple "Hello world!" program in genuine IL code, and came to better understand IL code in the process.

Note: This article is the first of a two-part series on MSIL programming. Also see MSIL Programming Part 2.

References

  1. ECMA-335 manual
  2. MS-Press book visual C# 2008 : The language

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Ajay Yadav

Ajay Yadav is an author, Cyber Security Specialist, Subject-Matter-Expert, Software Engineer, and System Programmer with more than eight years of work experience on diverse technology domains. He earned a Master and Bachelor Degree in Computer Science, along with numerous premier professional certifications from Microsoft, EC-council, and Red-hat. For several years, he has been researching on Reverse Engineering, Secure Source Coding, Advance Software Debugging, Vulnerability Assessment, System Programming and Exploit Development. He is a regular contributor to various international programming journals as well as assists developer community with writing blogs, research articles, tutorials, training material and books on sophisticated technology. His spare time activity includes tourism, movies and meditation. He can be reached at om.ajay007@gmail.com;