Opcode Coverage-Guided Virtualization Deobfuscation Based on Symbolic Execution


Obfuscation is a common concept centered around the idea of semantically equivalent transformations that increase the difficulty of analyzing a binary. It can be deemed necessary in different circumstances to hide sensitive parts of a program. However, deobfuscation by inverting the introduced program transformations is always only a matter of time. Nonetheless, virtualization obfuscators are a state of the art technique and can significantly slow down the process of reverse engineering. Sophisticated deobfuscation approaches for virtualized code exist. Many different tech- niques such as symbolic execution, dynamic taint analysis and binary instrumentation are employed to assist devirtualization. One common requirement of existing solutions is instruction tracing. The completeness of code coverage from instruction traces is question- able as soon as conditional branches occur in the control flow of the program. Therefore, the proposed approach utilizes solely symbolic execution and its path exploration abilities for deobfuscation of virtualization. Code coverage is maximized by prioritizing not yet executed regions of the virtualized code. A categorization strategy is necessary to divide the obfuscated areas in sections to explore and sections to avoid. This enables the discovery of new paths by guiding symbolic exploration. Given enough computational resources, all possible paths in the virtualization obfuscation are covered. The dispatch jumps are concretized in an iterative way to find and map all virtual opcodes. Additionally, control flow that occurs within the virtualized code is extracted without intervening context switches from native code and fetching of byte-code. The results of the deobfuscation are virtual graphs that contain these simplified represen- tations of control flow. They allow an easy recognition of high-level structures such as loops within the virtual code and aid a reverse engineer in conducting further analysis of a virtualized binary. The automated recognition of such patterns requires a high code coverage. An extensive evaluation shows that sufficiently high coverage is achieved by the implemented approach and encourages different directions of future work. Furthermore, many interesting insights about virtualization obfuscation and symbolic execution have been gathered.

TU Wien