Embedded Binary Rewriting - Utilizing Ghidra and LLVM

Abstract

A suitable strategy for patching and updating applications is an essential cornerstone of a modern IT environment. While in an open source context, vulnerable or outdated systems can be easily patched, this is not the case for closed source systems. Therefore, the use of binary rewriting frameworks can be seen as beneficial, especially when investigating IoT applications, as these applications are often closed-source. In this work, a prototype binary rewriting framework was developed to explore the possibilities of using Ghidra and the LLVM framework to handle ELF binaries and embedded system images for ARM processors. The reliance on a binary reverse engineering framework such as Ghidra can be seen as beneficial for processing binaries and embedded system images, as these platforms already provide different analyzers for different architectures. However, transforming Ghidra’s internal representation (P-code) into sound LLVM IR code is non-trivial, since not all language constructs can be trivially mapped to each other. Therefore, this thesis discusses the transformation of various language constructs such as phi-nodes, type representations, and pointer arithmetic before highlighting important pitfalls that can arise when transforming embedded system images. Furthermore, the prototype was evaluated on a few selected binaries to highlight that the transformation process does not produce any noteworthy runtime overhead. The current limitations of the prototyping and transformation process, such as dealing with misidentified code sections or types and the build process, are briefly demonstrated using the images of the Zephyr and FreeRTOS embedded systems.

Publication
TU Wien