VMRay Analyzer is currently still in its hot beta phase, however we plan to finish our first product release soon. In the past few weeks, we not only fixed bugs, but also improved our software by adding a number of additional functionalities. As you may already know, our new analyzer is not only capable of dissecting different kinds of usermode malware, but it is also able to offer comprehensive analysis of 64-bit kernelmode rootkits, such as TDL4, Gapz, and Uroburos. Since dynamic rootkit analysis is a brand new subject, and kernel land monitoring is completely different from dealing with usermode components, we needed some new concepts to process and display the results. In the following chapter, we describe how Code Blocks, Execution Paths, and Continuations can assist us in understanding the semantics and control flow of a kernel rootkit.
Code Blocks and Execution Paths
A code block (CB) is a sequence of instructions that starts at a certain memory address and is then executed sequentially. In contrast to a basic block, a code block may contain multiple branch operations and hence, it may be executed in the form of different execution paths (EP). Usually a code block is simply a function that is first called by the operating system or driver, then performs some subfunction calls on its own, and then returns. In some cases, it can be an APC, a DPC, or just a thread routine. In other cases, it is either an interrupt handler or a callback function. While the instructions of a code block are normally stored contiguously, this is not necessary. In fact, the instructions can be placed in arbitrary locations. The only important thing is that they call each other by direct control flow instructions, e.g., CALL or RET operations.
During our analysis we identify all executed code blocks, record all different observable execution paths, and display them appropriately. The following figure shows an example output of this:
In this example, code block CB #6 is executed 2,222 times in the form of the three different execution paths EP #10 (2,190 times), EP #12 (once), and EP #17 (31 times). Please note that there may be more possible execution paths than shown, but during our analysis only three have been executed. If we take a look into the different execution paths, we obviously always get the same starting address and starting instructions in all three cases. Let’s start with EP#10:
The return value of each of these checks is 0 if a different module than kernel32.dll has been loaded. In our example, this is the case for 2,190 other modules that have been loaded. However, there are two more execution paths for this code block in which a different behavior can be seen. Let’s first look at EP #17, which is executed 31 times within 31 processes:
Within this execution path the check for *kernel32.dll is successful and thus, the resulting sequence differs from EP#10. This time the malware sets up an Asynchronous Procedure Call (APC) and then also returns. The third execution path of this code block, EP #12, is executed only once. This happens within the process smss.exe and the rootkit behaves slightly different, but in the end it also sets up the APC and then returns back:
Code Block Chaining and Continuations
The concept of code blocks and execution paths already helps us a lot with understanding the semantics of unknown kernel code. However, we have combined it with an additional concept called Continuations. Kernel rootkits are normally not a monolithic piece of code. Rather, they interact with the system in various places and use different mechanisms to gain control whenever it is needed. In order to understand the correlation and interaction between the different parts of the rootkit, we identify related code blocks, and chain them together. This time, in contrast to code blocks, we focus on instructions that are connected by indirect control flow. If, for example one, a code block creates a new thread and another code block constitutes the executed thread routine, a continuation-relationship is then created between the creator of the thread and its target function.
The following example shows the interaction between the different initial stages of the TDL4 rootkit in the form of code block continuations:
If you click on the picture above and zoom in, you can see the different steps that were chained together:
Everything starts, when the system is booted up and the kernel debugger initialization routine within kdcom.dll is called.
TDL4 overwrites this routine during the bootloader to gain control during system boot; hence, the code block CB#1 is triggered when this function is called.
There is only one execution path of CB#1 and, hence, it is collapsed with EP#1 in the graph and no different branches exist.
Within CB#1 a Work item is created, which is later on called internally by the kernel, and though, there is no direct control flow between CB#1 and CB#2, we detect the indirect relation that was set up by calling ExQueueWorkItem.
CB#2 creates two different continuations: it first registers a callback routine that is called each time, a new module is mapped into memory by calling PsSetLoadImageNotifyRoutine, and then it creates another work item. Accordingly, there are two edges in the graph originating from CB#2/EP#2.
The “load image notification”-callback routine is constituted by CB#6 and, as we already explained it in the first part of this text, please note that the example described above was generated on a different target operating system and therefore, different numbers were used.
As we already described before, EP#12 as well as EP#15 both create an APC before returning. Since the same function address is used in both cases, the two different execution paths merge again in CB#7.
All the following stages use the same concepts as before.
Conclusion
Dynamic analysis of kernelmode code is a sophisticated and demanding task. In this scenario, the code under observation has the highest system privileges and can do whatever it wants. For instance, it may directly access and reconfigure the hardware, modify loaded system drivers, and even replace parts of the operating system. Therefore, one cannot trust or rely on the traditional control flow mechanisms and transitions as they are valid and hence, expected in the regular mode of operation. Besides that, the malicious code is, in most cases, heavily fragmented and distributed over the entire system. There is no sense in simply listing all the monitored malicious operations in a row or grouping the observations per process and thread, since we are used to for usermode malware. Instead, we need new concepts and ideas on how we can aggregate and present the collected data in a way that constitutes a real benefit to the malware analyst. We are confident that we have succeeded in that task!