As defined in a seminal taxonomy by Chikofsky and Cross (Chikofsky and Cross II 1990) software reverse engineering is:

“the process of analyzing a subject system to (i) identify the systems components and their inter-relationships and (ii) create representations of the system in another form or at a higher level of abstraction.”

Such an analysis process pertains to artifacts that can be at an extremely low-level, e.g., binary files or system execution traces, or at intermediate-level, e.g., such as source code, patches, and defects, and the history and discussions associated to these artifacts.

Software reverse engineering is a mature research field with high practical relevance: often, the only way to get an understanding of a large and complex software system is through these lower-level artifacts, especially when higher-level artifacts are absent or outdated. In general, almost every time one has to evolve an existing software system—and especially when source code or binary is the only reliable source of documentation—reverse engineering becomes a key element of software evolution (Canfora and Di Penta 2007). Indeed, in such cases reverse engineering is highly recommended by the IEEE Std. 1219 (IEEE 1999) for software maintenance.

This special section features six papers on the topic of reverse engineering, using an array of techniques and data sources: some use very low-level artifacts, while some use higher-level ones and historical information.

The first article from this special section, “On the detection of custom memory allocators in C binaries” (Chen et al. 2016), focuses on binary analysis. The article describes a technique to detect custom memory allocators and deallocators, which is vital to properly detect and track data structures in performance-critical applications. The tool implementing the technique, MemBrush, has been evaluated on a large number of real-world applications, and was found to have a high accuracy. MemBrush can then transfer that data to existing reverse engineering tools.

The article “Scalable data structure detection and classification for C/C++ binaries” (Haller et al. 2016) also deals with binary analysis. The tool presented in the article, MemPick, analyzes the links between the objects in memory in order to detect higher-level data structures. MemPick can detect several commonly used data structures, including several types of linked lists, trees, or graphs. The tool was evaluated on 30 different systems.

In “Inferring extended finite state machine models from software executions” (Walkinshaw et al. 2016), the focus shifts to execution traces. More specifically, the goal of the article is to infer Extented Finite State Machines (EFSMs) from the excution traces. EFSMs can blend the control and data aspects of the software system under study. The technique presented to infer EFSM is based on machine learning, and is evaluated quantitatively and qualitatively on three software systems.

The remainder of the special section, starting with “Mining architectural violations from version history” (Maffort et al. 2016), exploits higher-level information. The article combines static and historical analysis in the context of checking the architectural conformance of a software system. Since detecting architectural violations in this way is a challenging problem, the article also proposes an iterative process for experts to verify the conformance semi-automatically.

In the article “Evaluating the impact of design pattern and anti-pattern dependencies on changes and faults” (Jaafar et al. 2016), the focus is on classes depending on design pattern (and anti-pattern) elements, under the assumption that the good and/or bad characteritics of patterns and anti-patterns may propagate to their dependencies. The article puts that assumption to the test by analysing the fault-proneness and change-proneness of depencies of six design patterns and ten anti-patterns, in the sequence of releases of three software systems.

Finally, this special section concludes with “Investigating technical and non-technical factors influencing modern code review” (Baysal et al. 2016), in which the code review process of two large open-source projects is studied, in order to better understand the factors that might result in whether the code submissions are evaluated in a timely manner or not. Reverse engineering techniques are applied in order to reconstruct the patch submission process from the information in the issue tracking and code review systems.

These six articles exemplify the breath, depth and quality of the research performed in the field of software reverse engineering.