Improved Malware Classification through Sensor Fusion Using Disjoint Union

  • Charles LeDoux
  • Andrew Walenstein
  • Arun Lakhotia
Part of the Communications in Computer and Information Science book series (CCIS, volume 285)


In classifying malware, an open research question is how to combine similar extracted data from program analyzers in such a way that the advantages of the analyzers accrue and the errors are minimized. We propose an approach to fusing multiple program analysis outputs by abstracting the features to a common form and utilizing a disjoint union fusion function. The approach is evaluated in an experiment measuring classification accuracy on fused dynamic trace data on over 18,000 malware files. The results indicate that a naïve fusion approach can yield improvements over non-fused results, but the disjoint union fusion function outperforms naïve union by a statistically significant amount in three of four classification methods applied.


Disjoint Union Data Fusion System Call Sensor Fusion Fusion Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, W.H., Ford, R.: How not to be seen II: The defenders fight back. IEEE Security & Privacy 5(6), 65–68 (2007)CrossRefGoogle Scholar
  2. 2.
    Anubis: Analyzing unknown binaries (June 2011),
  3. 3.
    Balzarotti, D., Cova, M., Karlberger, C., Kruegel, C., Kirda, E., Vigna, G.: Efficient detection of split personalities in malware. In: Network and Distributed System Security, NDSS (2010)Google Scholar
  4. 4.
    Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S.P., Yang, K.: On the (Im)possibility of Obfuscating Programs. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 1–18. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Bayer, U., Kruegel, C.: TTAnalyze: A tool for analyzing malware. In: Proceedings of the 15th European Institute for Computer Antivirus Research (EICAR 2006) Annual Conference (2006)Google Scholar
  6. 6.
    Boudjemaa, R., Forbes, A.: Parameter estimation methods for data fusion. NPL Report CMSC 38(04) (2004)Google Scholar
  7. 7.
    Chen, X., Andersen, J., Mao, Z., Bailey, M., Nazario, J.: Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware. In: Proceedings of the IEEE International Conference on Dependable Systems and Networks, Anchorage, AK, U.S.A., pp. 177–186 (2008)Google Scholar
  8. 8.
    Collberg, C., Nagra, J.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Addison-Wesley Professional (2009)Google Scholar
  9. 9.
    CWSandbox: behavior-based malware analysis (June 2011),
  10. 10.
    Hall, D., Llinas, J.: An introduction to multisensor data fusion. Proceedings of the IEEE 85(1), 6–23 (1997)CrossRefGoogle Scholar
  11. 11.
    Islam, R., Tian, R., Batten, L., Versteeg, S.: Classification of malware based on string and function feature selection. In: Cybercrime and Trustworthy Computing, Workshop, pp. 9–17 (2010)Google Scholar
  12. 12.
    Kang, M.G., Yin, H., Hanna, S., McCamant, S., Song, D.: Emulating emulation-resistant malware. In: Proceedings of the 1st ACM Workshop on Virtual Machine Security, pp. 11–22. ACM, Chicago (2009)CrossRefGoogle Scholar
  13. 13.
    Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of the 13th USENIX Security Symposium, pp. 255–270. Usenix (2004)Google Scholar
  14. 14.
    Laskov, P., Lippman, R.: Machine learning in adversarial environments. Machine Learning 81, 115–119 (2010)CrossRefGoogle Scholar
  15. 15.
    Lu, Y., Din, S., Zheng, C., Gao, B.: Using multi-feature and classifier ensembles to improve malware detection. Journal of C.C.I.T. 39(2) (November 2010)Google Scholar
  16. 16.
    Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale: Rapid prototyping for complex data mining tasks. In: Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T. (eds.) Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940. ACM (2006)Google Scholar
  17. 17.
    Trinius, P., Willems, C., Holz, T., Rieck, K.: A malware instruction set for behavior-based analysis. Tech. Rep. TR-2009-07, University of Mannheim (2009)Google Scholar
  18. 18.
    Walenstein, A., Hefner, D., Wichers, J.: Header information in malware families and impact on automated classifiers. In: Proceedings of the 5th International Conference on Malicious and Unwanted Software, pp. 15–22. IEEE CSP (2010)Google Scholar
  19. 19.
    Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using CWSandbox. IEEE Security & Privacy 5(2), 32–39 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Charles LeDoux
    • 1
  • Andrew Walenstein
    • 2
  • Arun Lakhotia
    • 1
  1. 1.Center for Advanced Computer StudiesUniversity of Louisiana at LafayetteLafayetteU.S.A.
  2. 2.School of Computing and Informatics, Computer Science ProgramUniversity of Louisiana at LafayetteLafayetteU.S.A.

Personalised recommendations