Introduction
In April of 2022 we’ve observed new Emotet samples which implemented considerable changes to the way they store and decode their configuration.
For Emotet, the relevant information stored in a config file is the IP address and a port number . Each of them is stored in the form of a DWORD. Previously, those DWORDs were placed as an encrypted list in the data section of the PE file. It was fairly easy to locate and extract the encrypted blob.
However, with the new changes this is no longer the case. In this blog post we’ll look at the introduced changes and explore an idea of how we can fully statically extract the configuration file .
At the beginning of November 2022 Emotet restarted its spamming campaign after a longer break. The samples being distributed show only minor changes and the approach described in this blog post still applies.
The changes
In the newer version of Emotet, the configuration is no longer stored in an encrypted form in the data section. Instead, each IP and port pair is only obfuscated using data transform obfuscation like encoding literals (Figure 1-a).
When executing the instructions, the final value is produced at runtime. It uses various arithmetic operations and random constants to obfuscate the actual data. However, if we look closely most of it is just junk code. It is fairly easy to extract this via dynamic analysis or emulation .
Of course ‘easy’ is very subjective here. You still need to setup the appropriate context, i.e, correct memory regions, stack alignment and CPU registers might be required. However, if that can be taken care of, you then just have to identify the relevant functions and emulate them. A fully static approach doesn’t require a context, but correct instruction processing is more difficult. Since the data is generated from arithmetic operations, you have to be able to resolve them correctly.
One option is to transform the instruction into another limited set of other instructions. An intermediate representation (IR) for which a series of compiler optimizations could be applied. This has the advantage that the deobfuscation logic can be implemented in a platform independent way.
The IR Idea
We start our investigation by looking at the sample in IDA. Although IDA’s disassembly is very hard to understand since it’s just a bunch of arithmetic and boolean operations, the decompilation of the same function becomes very straightforward to read (Figure 1-b).
This is because while decompiling IDA also uses an IR, which they call microcode . Internally, they use it to perform compiler optimizations. This step eliminates all the dead code, performs constant propagation and various other compiler optimization techniques.
They also have multiple levels of optimizations which one can explore using, e.g., the Lucid plugin. Since it works this well in IDA we are also going to attempt this route.
IR overview
Intermediate Representation (IR) is a representation (an abstract machine language) between the source and the target language. The representation should happen without loss of information. It’s used, e.g., by compilers to perform optimizations that are independent of the target machine. Later, we’ll use this property to extract the configuration data.
For our purpose, we need something that is able to lift a binary without access to source code, is mature, well tested, and actively maintained. We chose VEX, which is the intermediate representation used by the Valgrind framework. It’s also used in the binary analysis framework angr. They also maintain python binding called pyvex. It is able to lift binary code, we don’t lose information and it’s side-effects-free.
Since we’ve chosen VEX we have to understand how it represents instructions. VEX divides the code into code blocks (superblocks, IRSB), which roughly correspond to a basic block. It then translates the instructions to statements (operations with side-effects) and expressions (operations without side-effects).
All the intermediate values are stored in temporary variables called tX, where X is a counter. Each translated block of code starts with an IMark. An IMark is in fact also a statement that just describes the location and length of the corresponding assembly instruction. The documentation can be found in this header file and some examples of such conversions in Table 1.
Assembly
VEX representation
mov [rsp+18h+arg_0], 9A97h
1 | —— IMark(0x400004, 8, 0) ——
2 | t132 = Add64(t0,0x0000000000000020)
3 | STle(t132) = 0x00009a97
4 | PUT(rip) = 0x000000000040000c
mov r8, rcx
1 | —— IMark(0x400014, 3, 0) ——
2 | t1 = GET:I64(rcx)
3 | PUT(r8) = t1
4 | PUT(rip) = 0x0000000000400017
mov [rdx], ecx
1 | —— IMark(0x4000c8, 2, 0) ——
2 | t74 = GET:I64(rdx)
3 | t255 = 64to32(t253)
4 | STle(t74) = t255
5 | PUT(rip) = 0x00000000004000ca
Diving Deeper
Emotet implements the obfuscation of a single IP and port pair inside one unique function which consists of a single basic block with one input and one exit.
We can utilize pyvex to translate each function (basic block) into an intermediate representation which we can then work on. This gives us the advantage that we don’t have to worry too much about the differences between each function and each build of Emotet.
We always work on an IR and not on a complex instruction set. It also allows us to approach this problem in a more generic way. We start our deobfuscation journey by comparing the actual disassembly of an Emotet function, holding the encoded IP and port, with the VEX IR produced by parsing it (Figure 2).
As you might have noticed, the VEX output is very verbose and one assembly instruction corresponds to multiple statements and expressions.
Figure 2-a: IDA disassembly of an Emotet function
Figure 2-b: the IR representation
By first analyzing the disassembled function, we can identify certain points of interest. For example, since this is a 64-bit code, we know that according to the Microsoft x64 calling convention the arguments are passed in RCX, RDX, R8, and R9 registers. The function responsible for deobfuscating the IP and port takes two arguments which means the first two are used to pass parameters.
We also notice that final, deobfuscated values are being written to the memory locations that the function arguments point to (Figure 3). So the arguments are pointers and the data that we are interested in is being written to these locations. This is once again confirmed by looking at the output produced by IDA’s decompiler.
If we investigate other functions which also encode the IP and port DWORDs, we realize that the decoded values are always stored at the location the arguments are pointing to. This should be our starting point when analyzing the IRSB block.
Figure 3: Disassembly snippet showing memory writes.
The following x64 instructions, which move ECX into the 8 bytes at memory location R8/RDX:
mov [r8], ecx
mov [rdx], ecx
translate into a get expression and a store statement (Table 1). Such memory moves are our candidates that should hold the final values. We can iterate over the IRSB block looking for a combination of get followed by a store. An example of that can be found in line 176 and 178 (Figure 2-b).
However, that’s only the first step. If we look at the IR at line 178, where the store operation that we are interested in is performed, we notice that a temporary register is used (t255). This register was assigned at line 177 and its source was the register t253. The assignment to register t253 was done on line 172 and the source for that is t71. This goes on and on. Since we are interested in the final value assigned to the memory locations passed in as arguments, we need to find the starting point and do the necessary calculations ourselves.
At this point it’s worth noting that the currently distributed samples made some minor adjustments. Instead of two arguments, the decoding function now takes only one. This is a pointer to a structure with two DWORD elements. The only thing that changes in the approach described above is that we need to additionally find a memory write to a location shifted by 4 bytes (Figure 4).
Figure 4: Disassembly snippet showing memory writes in a sample from November 2022.
The overall approach could be the following:
First, we can perform a sweep and create a mapping of registers that we are interested in. Those registers should be the ones related to our output value. Some of those might be the result of certain arithmetic operations as can be seen in line 171. So in addition to the registers of interest we also need to track those operations and which registers are being used as input there. In some other cases, the value is loaded from the stack or from another register that holds the stack pointer (line 166) and we need to track those loads and stack locations too.
The approach can be summarized by the following steps:
Create a mapping between temporary registers
If a register is the result of a binary or unary operation on other register(s), add the operation and the registers to the mapping
If a register is assigned a value from memory, add the load and the location to the mapping
After we’ve collected the necessary information and created the mapping we can start reconstructing the values. We need to resolve the registers, propagate all values and perform necessary arithmetic operations.
In the following, we present an example of a manual reconstruction of values for the function in Appendix A. The example is presented in the form of a graph where all intermediate registers and operations are converted to nodes. We then eliminate all intermediate nodes that don’t change the actual value greatly simplifying the representation (Figure 5):
Figure 5: A graph containing the dependence of registers on other registers or operations and removal of unneeded intermediate stepsIn the graph form, all registers are marked blue and operations green. The random constants that are loaded from the stack are marked as white. We start with register t74. The location that the value in that register points to is being written to in line 178 (Figure 2-b). From the graph, we know that registers t255 and t71 don’t make any changes and we can safely eliminate them. We can apply the same rule to the other registers in the graph, removing most of the blue nodes.
We also know that behind the Load operations are constant values placed on the stack, so we can eliminate them too after retrieving the constant. That leaves us with a reduced graph seen on the right-hand side. What’s left is to perform the XOR operation on the two constant values. After that we are presented with the final value encoding the port number and a flag (0x1F900001).
By applying the same approach to the other value (IP address) we are able to reduce it to just the final value in the same manner that IDA’s decompiler is doing it. What’s left at this point is to extract all other functions which are obfuscating the C2 information and apply the steps discussed above. We’ve used this method to extract all C2 addresses (Figure 6) that Emotet uses as can be seen in this report.
Figure 6: VMRay Analyzer – Report with extracted C2 addresses utilizing IR approach.
Conclusions
As we have seen, it’s possible to extract Emotet’s configuration statically and in a generic way by utilizing an approach similar to what compilers use. This has the advantage that
our algorithms can be abstracted away from the very complex instruction set and
are more robust to changes in the underlying obfuscation.
This method can also be applied to other malware families that utilize similar obfuscation techniques.
The Malware Configuration Extraction feature of VMRay Analyzer is able to automatically extract relevant C2 information from Emotet samples and present them as actionable IOCs in the report.
Mateusz Lukaszewski
Mateusz is a Threat Researcher at VMRay Labs. His recent projects cover in-depth analysis of emerging and evolving malware.
References
https://www.bleepingcomputer.com/news/security/emotet-botnet-starts-blasting-malware-again-after-4-month-break/
https://www.eecis.udel.edu/~cavazos/cisc471-672-spring2018/lectures/Lecture-16.pdf
https://research.openanalysis.net/emotet/emulation/config/dumpulator/malware/2022/05/19/emotet_x64_emulation.html
https://hex-rays.com/blog/hex-rays-microcode-api-vs-obfuscating-compiler/
https://angr.io/
https://github.com/angr/pyvex
https://github.com/gaasedelen/lucid
https://github.com/angr/vex/blob/master/pub/libvex_ir.h
Appendix A:
IOCs
Initial DLL
SHA256
d5415fac6b6576702e52afae605748b6e5a72a920bf381f365d35b3411aa9fbf
19fcf233637e0ca65c4eef3b234d3c79ad1604b524da1b1f292cf7e7dcaf13aa
Payload
SHA256:
dc0489d026530618569f6fd4e082401b1c3d31c7aaad3685ebea0454fc15be12
Appendix B:
View IR representation