Metamorphic code generation from LLVM bytecode
- 317 Downloads
- 2 Citations
Abstract
Metamorphic software changes its internal structure across generations with its functionality remaining unchanged. Metamorphism has been employed by malware writers as a means of evading signature detection and other advanced detection strategies. However, code morphing also has potential security benefits, since it can serve to increase the “genetic diversity” of software. We have created a metamorphic code generator within the LLVM compiler framework. LLVM is a three-phase compiler that supports multiple source languages and target architectures. It uses a common intermediate representation (IR) bytecode in its optimizer. Consequently, any supported high-level programming language is transformed to this IR bytecode as part of the LLVM compilation process. Our metamorphic generator functions at the IR bytecode level, which provides many advantages over morphing at the assembly or source code level. The morphing techniques that we employ include dead code insertion and transposition, where the dead code is actually executed within the morphed code, making its detection and removal more challenging. We have verified the effectiveness of our code morphing using hidden Markov model analysis.
Keywords
Hide Markov Model Intermediate Representation Base File Dead Code Hide Markov Model ClassifierReferences
- 1.The Mental Driller, Metamorphism in practice or “How I made MetaPHOR and what I’ve learnt” (2002). http://download.adamas.ai/dlbase/Stuff/VX%20Heavens%20Library/vmd01.html
- 2.An example of metamorphic virus. http://spth.virii.lu/main.html
- 3.Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)CrossRefGoogle Scholar
- 4.Sridhara, S., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. Hacking Tech. 9(2), 49–58 (2013)CrossRefGoogle Scholar
- 5.Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)CrossRefGoogle Scholar
- 6.Gao, X., Stamp, M.: Metamorphic software for buffer overflow mitigation. In: Dey, P.P., Amin, M.N. (eds.) Proceedings of 3rd Conference on Computer Science and its Applications. San Diego, California (2005)Google Scholar
- 7.Stamp, M.: Risks of monoculture, Inside Risks 165. Commun. ACM 47(3):120 (2004). http://www.csl.sri.com/users/neumann/insiderisks04.html#165 Google Scholar
- 8.Open Malware. http://www.offensivecomputing.net/
- 9.Virus Construction Kits. http://computervirus.uw.hu/ch07lev1sec7.html
- 10.Attaluri, S., McGhee, S., Stamp, M.: Profile hidden markov models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)CrossRefGoogle Scholar
- 11.Lattner, C., Adve, V.: Architecture for a next generation GCC. In: First GCC Annual Developer’s Summit (2003). http://llvm.org/pubs/2003-05-01-GCCSummit2003pres.pdf
- 12.The LLVM Compiler Infrastructure Project. http://llvm.org/
- 13.Sharif, M. et al.: Impending Malware Analysis Using Conditional Code Obfuscation. College of Computing, Georgia Institute of Technology. http://cyber4.us/sites/default/files/Impeding%20Malware%20Analysis%20Using%20Conditional%20Code%20Obfuscation-NDSS2008.pdf
- 14.Ma, W., et al.: Shadow attacks: automatically evading system-call behavior. J. Comput. Virol. 8(1–2), 1–13 (2012)CrossRefGoogle Scholar
- 15.Kazi, S., Stamp, M.: Hidden Markov models for software piracy detection. Inf. Secur. J. A Glob. Perspect. 22(3), 140–149 (2013)CrossRefGoogle Scholar
- 16.Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013) (to appear)Google Scholar
- 17.Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)CrossRefGoogle Scholar
- 18.Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)CrossRefGoogle Scholar
- 19.Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2013)CrossRefGoogle Scholar
- 20.Panda Security, Virus, worms, trojans and backdoors: other harmful relatives of viruses (2011). http://www.pandasecurity.com/homeusers-cms3/security-info/about-malware/generalconcepts/concept-2.html
- 21.Aycock, J.: Computer Viruses and Malware. Springer, New York (2006)Google Scholar
- 22.Filiol, E.: Computer Viruses: From Theory to Applications, vol. 1, pp. 19–38. Birkhäuser (2005)Google Scholar
- 23.Computer virus creation kit. http://www.informit.com/articles/article.aspx?p=366890&seqNum=6
- 24.Beaucamps, P.: Advanced metamorphic techniques in computer viruses. In: International Conference on Computer, Electrical, and Systems Science, and Engineering, CESSE’07. Venice, Italy (2007)Google Scholar
- 25.Filiol, E.: Metamorphism, formal grammars and undecidable code mutation. Int. J. Comput. Sci. 2, 70–75 (2007)Google Scholar
- 26.Zbitskiy, P.: Code mutation techniques by means of formal grammars and automatons. J. Comput. Virol. 5(3), 199–207 (2009)CrossRefGoogle Scholar
- 27.LLVM Programming Manual. http://llvm.org/docs/ProgrammersManual.html
- 28.The Lifelong Code Optimization Project. http://www-faculty.cs.uiuc.edu/vadve/lcoproject.html
- 29.LLVM Architecture. http://www.aosabook.org/en/llvm.html
- 30.Lattner, C., Adve, V.: A compilation framework for lifelong program analysis and transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (2004). http://www.cgo.org/cgo2004/papers/06_76_lattner_c.pdf
- 31.Praher, J.: A Change Framework Based on the Low Level Virtual Machine Compiler Infrastructure. Thesis Report, Johannes Kepler University (2007). http://llvm.cs.uiuc.edu/pubs/2007-04-PraherMSThesis.pdf
- 32.LLVM, IR Bytecode Format. http://llvm.org/releases/1.3/docs/BytecodeFormat.html
- 33.LLVM Helloworld in C. http://projects.prabir.me/compiler/wiki/LLVMHelloworldInC.ashx
- 34.Stamp, M.: A revealing introduction to hidden Markov models (2012). http://www.cs.sjsu.edu/stamp/RUA/HMM.pdf
- 35.Linux coreutils source code. http://ftp.gnu.org/gnu/coreutil
- 36.Tamboli, T.: Metamorphic code generation from LLVM IR bytecode, Master’s Project 301 (2013). http://scholarworks.sjsu.edu/etd_projects/301/
- 37.Spike Fuzzer Source Code. http://www.immunitysec.com/resources-freesoftware.shtml
- 38.Introduction to fuzzing using spike fuzzer. http://resources.infosecinstitute.com/intro-to-fuzzing/
- 39.Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)CrossRefGoogle Scholar