Advertisement

Shingled Graph Disassembly: Finding the Undecideable Path

  • Richard Wartell
  • Yan Zhou
  • Kevin W. Hamlen
  • Murat Kantarcioglu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8443)

Abstract

A probabilistic finite state machine approach to statically disassembling x86 machine language programs is presented and evaluated. Static disassembly is a crucial prerequisite for software reverse engineering, and has many applications in computer security and binary analysis. The general problem is provably undecidable because of the heavy use of unaligned instruction encodings and dynamically computed control flows in the x86 architecture. Limited work in machine learning and data mining has been undertaken on this subject. This paper shows that semantic meanings of opcode sequences can be leveraged to infer similarities between groups of opcode and operand sequences. This empowers a probabilistic finite state machine to learn statistically significant opcode and operand sequences in a training corpus of disassemblies. The similarities demonstrate the statistical significance of opcodes and operands in a surrounding context, facilitating more accurate disassembly of new binaries. Empirical results demonstrate that the algorithm is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.

Keywords

Binary analysis disassembly reverse-engineering probabilistic finite state machines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Krishnamoorthy, N., Debray, S., Fligg, K.: Static detection of disassembly errors. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE), pp. 259–268 (2009)Google Scholar
  3. 3.
    Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, Inc., San Francisco (2008)Google Scholar
  4. 4.
    Hex-Rays: The IDA Pro disassembler and debugger, http://www.hex-rays.com/idapro
  5. 5.
    GNU Project.: Gnu binary utilities (2012), http://sourceware.org/binutils/docs-2.22/binutils/index.html
  6. 6.
    Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Proceedings of the 9th Working Conference on Reverse Engineering (WCRE), pp. 45–54 (2002)Google Scholar
  7. 7.
    Intel: Intel\(^{\hbox{\scriptsize\textregistered}}\) architecture software developer’s manual (2011), http://www.intel.com/design/intarch/manuals/243191.htm
  8. 8.
    Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part I. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005)CrossRefGoogle Scholar
  9. 9.
    Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part II. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1026–1039 (2005)CrossRefGoogle Scholar
  10. 10.
    Invisigoth of KenShoto: Visipedia, http://visi.kenshoto.com

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Richard Wartell
    • 1
  • Yan Zhou
    • 2
  • Kevin W. Hamlen
    • 2
  • Murat Kantarcioglu
    • 2
  1. 1.MandiantChina
  2. 2.Computer Science DepartmentThe University of Texas at DallasUSA

Personalised recommendations