Skip to main content

A large study on the effect of code obfuscation on the quality of java code

Abstract

Context: Obfuscation is a common technique used to protect software against malicious reverse engineering. Obfuscators manipulate the source code to make it harder to analyze and more difficult to understand for the attacker. Although different obfuscation algorithms and implementations are available, they have never been directly compared in a large scale study.

Aim: This paper aims at evaluating and quantifying the effect of several different obfuscation implementations (both open source and commercial), to help developers and project managers to decide which algorithms to use.

Method: In this study we applied 44 obfuscations to 18 subject applications covering a total of 4 millions lines of code. The effectiveness of these source code obfuscations has been measured using 10 code metrics, considering modularity, size and complexity of code.

Results: Results show that some of the considered obfuscations are effective in making code metrics change substantially from original to obfuscated code, although this change (called potency of the obfuscation) is different on different metrics. In the paper we recommend which obfuscations to select, given the security requirements of the software to be protected.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. http://sandmark.cs.arizona.edu/downloads.html

  2. http://www.allatori.com/

  3. http://www.zelix.com/Klassmaster/

  4. On the downside, Sandmark is quite old and it cannot handle the newest Java constructs, from Java version 1.5 onwards.

  5. Most of these switches are self-explanatory, but http://www.zelix.com/Klassmaster/docs/obfuscateStatement.htmlprovides a full description.

  6. http://sourceforge.net

  7. As provided in the “Recently updated” section of the Java applications, http://sourceforge.net/directory/language:java/os:linux/freshness:recently-updated/.

  8. http://www.scitools.com/

  9. Available at http://www.spinellis.gr/sw/ckjm/

  10. Detailed analysis not reported for reason of space shows that the majority of them are different each other.

  11. As suggested by Collberg (Collberg et al. 2003), we use the potency to measure the magnitude of the difference of a specific metric between clear and obfuscated code.

  12. Available obfuscation tools are ProGuard, yGuard, JODE, JavaGuard, RetroGuard, jarg, etc

References

  • Anckaert B, Madou M, De Sutter B, De Bus B, De Bosschere K, Preneel B (2007) Program obfuscation: a quantitative approach. In: Proceedings of the 2007 ACM workshop on quality of protection, QoP ’07,pp. 15-20. ACM, New York, NY, USA. doi:10.1145/1314257.1314263

  • Basili V, Briand L, Melo W (1996) A validation of object-oriented design metrics as quality indicators. Software engineering. IEEE Trans 22(10):751–761

    Google Scholar 

  • Ceccato M, Capiluppi A, Falcarin P, Boldyreff C (2013) A large study on the effect of code obfuscation on the quality of java code: Detailed analysis of data. Tech. rep., FBK, TR-FBK-SE-2013-3,. http://se.fbk.eu/sites/se.fbk.eu/files/TR-FBK-SE-2013-3.pdf

  • Ceccato M, Di Penta M, Nagra J, Falcarin P, Ricca F, Torchiano M, Tonella P Towards experimental evaluation of code obfuscation techniques. In:proceedings of the 4th ACM workshop on quality of protection, QoP ’08, pp. 39–46. ACM, New York, NY, USA (2008). doi:10.1145/1456362.1456371

  • Ceccato M, Penta M, Falcarin P, Ricca F, Torchiano M, Tonella P A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical software engineeringpp. 1-35 (2013). doi:10.1007/s10664-013-9248-x

  • Ceccato M, Penta MD, Nagra J, Falcarin P, Ricca F, Torchiano M, Tonella P (2009) The effectiveness of source code obfuscation: An experimental assessment. In: ICPC. IEEE Comput Soc:178–187

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20:476–493. doi:10.1109/32.295895. http://dl.acm.org/citation.cfm?id=630808.631131

  • Cohen FB (1993) Operating system protection through program evolution. Comput Secur 12:565–584. doi:10.1016/0167-4048(93)90054-9. http://dl.acm.org/citation.cfm?id=179007.179012

  • Collberg C, Myles G, Huntwork A (2003) Sandmark–a tool for software protection research. IEEE Secur Priv 1:40–49. doi:10.1109/MSECP.2003.1219058. http://dl.acm.org/citation.cfm?id=939830.939941

  • Collberg C, Thomborson C, Low D (1997) A taxonomy of obfuscating transformations. Tech Rep:148. http://www.cs.auckland.ac.nz/%7Ecollberg/Research/Publications/CollbergThomborsonLow97a/index.html

  • Collberg CS, Thomborson C (2002) Watermarking, tamper-proofing, and obfuscation: tools for software protection. IEEE Trans Softw Eng 28:735–746. doi:10.1109/TSE.2002.1027797. http://dl.acm.org/citation.cfm?id=636196.636198

  • Falcarin P, Collberg C, Atallah M, Jakubowski M (2011) Guest editors’ introduction:software protection. IEEE Softw 28:24–27. doi:10.1109/MS.2011.34. doi:10.1109/MS.2011.34

  • Goto H, Mambo M, Matsumura K, Shizuya H (2000) An approach to the objective and quantitative evaluation of tamper-resistant software, In: Proceedings of the third international workshop on information security, ISW ’00. Springer-Verlag, London, UK, pp 82–96. http://dl.acm.org/citation.cfm?id=648024.744206

  • Heffner K, Collberg C (2004) The obfuscation executive. In:Information security. Springer, pp 428–440

  • Hosking AL, Nystrom N, Whitlock D, Cutts Q, Diwan A (2001) Partial redundancy elimination for access path expressions. Software:practice and experience, vol 31. doi:10.1002/spe.371

  • Jakubowski MH, Saw CW, Venkatesan R (2009) Iterated transformations and quantitative metrics for software protection. In: SECRYPT

  • Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects, monographs of system dependability, vol. models and methodology of system dependability, pp. 69–81. Oficyna Wydawnicza Politechniki Wroclawskiej, Wroclaw, Poland

  • Karnick M, MacBride J, McGinnis S, Tang Y, Ramachandran R (2006) A qualitative analysis of java obfuscation. In: proceedings of 10th IASTED international conference on software engineering and applications, Dallas TX, USA

  • Kouznetsov P Jad - the fast JAva Decompiler. http://www.kpdus.com/jad.html

  • Linn C, Debray S (2003) Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM conference on computer and communications security, CCS ’03,pp. 290–299. ACM, New York, NY, USA. doi:10.1145/948109.948149

  • Lv Z, Ri S, Uhvhdufk DE, Dw D, Wkh Y, Ri X, Srsxodu W, Zrun QDS, Vkrzhg ZH (2005) On the relationship between cyclomatic complexity and oo ness, 9th ECOOP workshop on quantitative approaches in ObjectOriented software engineering

  • McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng:308–320

  • Sheskin D (2007) Handbook of parametric and nonparametric statistical procedures (4th Ed.). Chapman & All

  • Simon F, Steinbrückner F, Lewerentz C (2001) Metrics based refactoring. In: Proceedings of the Fifth European Conference on software maintenance and Reengineering, CSMR ’01, pp. 30-. IEEE Computer Society, Washington, DC, USA. http://dl.acm.org/citation.cfm?id=794203.795287

  • Sutherland I, Kalb GE, Blyth A, Mulley G (2006) An empirical examination of the reverse engineering process for binary files. Comput & Secur 25(3):221–228

    Article  Google Scholar 

  • Udupa SK, Debray SK, Madou M Deobfuscation:reverse engineering obfuscated code. In:proceedings of the 12th Working conference on reverse engineering, pp. 45–54. IEEE Computer Society, Washington, DC, USA (2005). http://dl.acm.org/citation.cfm?id=1107841.1108171

  • Vasa R, Schneider J.g. (2003) Evolution of cyclomatic complexity in object oriented software. Proceedings of 7th ECOOP workshop on quantitative approaches in ObjectOriented software engineering QAOOSE, vol 03. http://www.it.swin.edu.au/personal/jschneider/Pub/qaoose03.pdf

  • Visaggio CA, Pagin GA, Canfora G (2013) An empirical study of metric-based methods to detect obfuscated code. Int J Secur & Appl 7(2):59

  • Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in software engineering - an introduction, Kluwer Academic Publishers

  • Wyseur B (2009) White-box cryptography. Ph.D. thesis, Katholieke Universiteit Leuven. http://www.cosic.esat.kuleuven.be/publications/talk-98.pdf

  • Zeng Y, Liu F, Luo X, Yang C (2011) Software watermarking through obfuscated interpretation: Implementation nad analysis. J Multimed 6(4):329–339

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Marco Torchiano for the interesting discussion on the analysis procedure and the Zelix Klassmaster™developers for the full evaluation copy of their tool and the feedback provided.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariano Ceccato.

Additional information

Communicated by: Andrea De Lucia

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ceccato, M., Capiluppi, A., Falcarin, P. et al. A large study on the effect of code obfuscation on the quality of java code. Empir Software Eng 20, 1486–1524 (2015). https://doi.org/10.1007/s10664-014-9321-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9321-0

Keywords

  • Code Metrics
  • Cyclomatic Complexity
  • Subject Application
  • Holm Correction
  • Software Protection