Finding Non-trivial Malware Naming Inconsistencies

  • Federico Maggi
  • Andrea Bellini
  • Guido Salvaneschi
  • Stefano Zanero
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7093)

Abstract

Malware analysts, and in particular antivirus vendors, never agreed on a single naming convention for malware specimens. This leads to confusion and difficulty—more for researchers than for practitioners—for example, when comparing coverage of different antivirus engines, when integrating and systematizing known threats, or comparing the classifications given by different detectors. Clearly, solving naming inconsistencies is a very difficult task, as it requires that vendors agree on a unified naming convention. More importantly, solving inconsistencies is impossible without knowing exactly where they are. Therefore, in this paper we take a step back and concentrate on the problem of finding inconsistencies. To this end, we first represent each vendor’s naming convention with a graph-based model. Second, we give a precise definition of inconsistency with respect to these models. Third, we define two quantitative measures to calculate the overall degree of inconsistency between vendors. In addition, we propose a fast algorithm that finds non-trivial (i.e., beyond syntactic differences) inconsistencies. Our experiments on four major antivirus vendors and 98,798 real-world malware samples confirm anecdotal observations that different vendors name viruses differently. More importantly, we were able to find inconsistencies that cannot be inferred at all by looking solely at the syntax.

Keywords

Naming Tree Pattern Class Simple Pattern Malicious Code Scatter Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carr, J.: Inside Cyber Warfare: Mapping the Cyber Underworld. O’Reilly Media, Inc. (2009)Google Scholar
  2. 2.
    Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: WWW, pp. 281–290. ACM, New York (2010)Google Scholar
  3. 3.
    Kelchner, T.: The (in)consistent naming of malcode. Comp. Fraud & Security (2), 5–7 (2010)Google Scholar
  4. 4.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Harley, D.: The game of the name malware naming, shape shifters and sympathetic magic. In: CEET 3rd Intl. Conf. on Cybercrime Forensics Education & Training, San Diego, CA (2009)Google Scholar
  6. 6.
    Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: SocialCom, Minneapolis, MN (August 2010)Google Scholar
  7. 7.
    Tarjan, R.: Depth-First Search and Linear Graph Algorithms. SIAM J. on Comp. 1(2) (1972)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Federico Maggi
    • 1
  • Andrea Bellini
    • 1
  • Guido Salvaneschi
    • 1
  • Stefano Zanero
    • 1
  1. 1.Dipartimento di Elettronica e InformazionePolitecnico di MilanoItaly

Personalised recommendations