International Journal of Information Security

, Volume 14, Issue 1, pp 15–33 | Cite as

The MALICIA dataset: identification and analysis of drive-by download operations

  • Antonio Nappa
  • M. Zubair Rafique
  • Juan Caballero
Regular Contribution

Abstract

Drive-by downloads are the preferred distribution vector for many malware families. In the drive-by ecosystem, many exploit servers run the same exploit kit and it is a challenge understanding whether the exploit server is part of a larger operation. In this paper, we propose a technique to identify exploit servers managed by the same organization. We collect over time how exploit servers are configured, which exploits they use, and what malware they distribute, grouping servers with similar configurations into operations. Our operational analysis reveals that although individual exploit servers have a median lifetime of 16 h, long-lived operations exist that operate for several months. To sustain long-lived operations, miscreants are turning to the cloud, with 60 % of the exploit servers hosted by specialized cloud hosting services. We also observe operations that distribute multiple malware families and that pay-per-install affiliate programs are managing exploit servers for their affiliates to convert traffic into installations. Furthermore, we analyze the exploit polymorphism problem, measuring the repacking rate for different exploit types. To understand how difficult is to takedown exploit servers, we analyze the abuse reporting process and issue abuse reports for 19 long-lived servers. We describe the interaction with ISPs and hosting providers and monitor the result of the report. We find that 61 % of the reports are not even acknowledged. On average, an exploit server still lives for 4.3 days after a report. Finally, we detail the Malicia dataset we have collected and are making available to other researchers.

Keywords

Drive-by download operations Malicia dataset Malware distribution Cybercrime 

References

  1. 1.
    Allatori java obfuscator. http://www.allatori.com/
  2. 2.
    Anderson, D.S., Fleizach, C., Savage, S., Voelker, G.M.: Spamscatter: characterizing internet scam hosting infrastructure. In: USENIX Security Symposium, Boston, MA (August 2007)Google Scholar
  3. 3.
  4. 4.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: International Symposium on Recent Advances in Intrusion Detection, Queensland, Australia (September 2007)Google Scholar
  5. 5.
    Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, San Diego, CA (February 2009)Google Scholar
  6. 6.
    Bfk: Passive dns replication. http://www.bfk.de/bfk_dnslogger.html
  7. 7.
    Caballero, J., Grier, C., Kreibich, C., Paxson, V.: Measuring pay-per-install: the commoditization of malware distribution. In: USENIX Security Symposium, San Francisco, CA (August 2011)Google Scholar
  8. 8.
    Caida: As Ranking (October 2012). http://as-rank.caida.org
  9. 9.
    Canali, D., Balzarotti, D., Francillon, A.: The role of web hosting providers in detecting compromised websites. In: International World Wide Web Conference, Rio de Janeiro, Brazil (May 2013)Google Scholar
  10. 10.
    Cho, C.Y., Caballero, J., Grier, C., Paxson, V., Song, D.: Insights from the inside: A view of botnet management from infiltration. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats. San Jose, CA, April (2010)Google Scholar
  11. 11.
    Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: International World Wide Web Conference, Raleigh, NC (April 2010)Google Scholar
  12. 12.
    Crocker, D.: Mailbox Names for Common Services, Roles and Functions. RFC 2142 (May 1997)Google Scholar
  13. 13.
    Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: low-overhead mostly static javascript malware detection. In: USENIX Security Symposium, San Francisco, CA (August 2011)Google Scholar
  14. 14.
    Cool exploit kit—a new browser exploit pack. http://malware.dontneedcoffee.com/2012/10/newcoolek.html/
  15. 15.
    Daigle, L.: Whois Protocol Specification. RFC 3912 (September 2004)Google Scholar
  16. 16.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)Google Scholar
  17. 17.
    Falk, J.: Complaint Feedback Loop Operational Recommendations. RFC 6449 (November 2011)Google Scholar
  18. 18.
    Falk, J., Kucherawy, M.: Creation and Use of Email Feedback Reports: An Applicability Statement for the Abuse Reporting Format (arf). RFC 6650 (June 2012)Google Scholar
  19. 19.
    Grier, C., Ballard, L., Caballero, J., Chachra, N., Dietrich, C.J., Levchenko, K., Mavrommatis, P., McCoy, D., Nappa, A., Pitsillidis, A., Provos, N., Rafique, M.Z. Rajab, M.A., Rossow, C., Thomas, K., Paxson, V., Savage, S., Voelker, G.M.: Manufacturing compromise: the emergence of exploit-as-a-service. In: ACM Conference on Computer and Communications Security, Raleigh, NC (October 2012)Google Scholar
  20. 20.
    Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: ACM Conference on Computer and Communications Security. Chicago, IL (October 2011)Google Scholar
  21. 21.
    John, J.P., Moshchuk, A., Gribble, S.D., Krishnamurthy, A.: Studying spamming botnets using Botlab. In: Symposium on Networked System Design and Implementation, Boston, MA (April 2009)Google Scholar
  22. 22.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 4. Wiley, New York (1990)CrossRefGoogle Scholar
  23. 23.
    Krawetz, N.: Average Perceptual Hash (May 2011). http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
  24. 24.
    Kreibich, C., Weaver, N., Kanich, C., Cui, W., Paxson, V.: GQ: Practical containment for measuring modern malware systems. In: Internet Measurement Conference, Berlin, Germany (November 2011)Google Scholar
  25. 25.
    Li, Z., Alrwais, S., Xie, Y., Yu, F., Wang, X.: Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures. In: IEEE Symposium on Security and Privacy, San Francisco, CA (May 2013)Google Scholar
  26. 26.
  27. 27.
  28. 28.
    Malware domain list http://malwaredomainlist.com/
  29. 29.
    Morrison, T.: How Hosting Providers can Battle Fraudulent Sign-ups (October 2012). http://www.spamhaus.org/news/article/687/how-hosting-providers-can-battle-fraudulent-sign-ups
  30. 30.
    Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the web. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)Google Scholar
  31. 31.
    New Dutch Notice-and-Take-Down Code Raises Questions (October 2008). http://www.edri.org/book/export/html/1619
  32. 32.
    Nappa, A., Rafique, M.Z., Caballero, J.: Driving in the cloud: an analysis of drive-by download operations and abuse reporting. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)Google Scholar
  33. 33.
    Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: USENIX Security Symposium, Washington, DC (August 2013)Google Scholar
  34. 34.
    Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: Symposium on Networked System Design and Implementation, San Jose, CA (April 2010)Google Scholar
  35. 35.
    Perdisci, R., Vamo, M.U.: Towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference, Orlando, FL (December 2012)Google Scholar
  36. 36.
    Polychronakis, M., Mavrommatis, P., Provos, N.: Ghost turns zombie: exploring the life cycle of web-based malware. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, CA (April 2008)Google Scholar
  37. 37.
    Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: USENIX Security Symposium, San Jose, CA (July 2008)Google Scholar
  38. 38.
    Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, N.: The ghost in the browser: analysis of Web-based malware. In: USENIX Workshop on Hot Topics on Understanding Botnets, Cambridge, UK (April 2007)Google Scholar
  39. 39.
    Rafique, M.Z., Caballero, J.: Firma: Malware clustering and network signature generation with mixed network behaviors. In: International Symposium on Recent Advances in Intrusion Detection, St. Lucia (October 2013)Google Scholar
  40. 40.
    Rafique, M.Z., Huygens, C., Caballero, J.: Network Dialog Minimization and Network Dialog Diffing: Two Novel Primitives for Network Security Applications. Technical Report TR-IMDEA-SW-2014-001, IMDEA Software Institute, Madrid, Spain (March 2014). https://software.imdea.org/~juanca/papers/TR-IMDEA-SW-2014-001.pdf
  41. 41.
    Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Paris, France (July 2008)Google Scholar
  42. 42.
    Rossow, C., Dietrich, C.J.: Provex: Detecting botnets with encrypted command and control channels. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)Google Scholar
  43. 43.
    Rossow, C., Dietrich, C.J., Bos, H., Cavallaro, L., van Steen, M., Freiling, F.C., Pohlmann, N.: Sandnet: network traffic analysis of malicious software. In: Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria (April 2011)Google Scholar
  44. 44.
  45. 45.
    Shafranovich, Y., Levine, J., Kucherawy, M.: An Extensible Format for Email Feedback Reports. RFC 5965 (August 2010). Updated by RFC 6650Google Scholar
  46. 46.
    Shue, C., Kalafut, A.J., Gupta, M.: Abnormally malicious autonomous systems and their internet connectivity. IEEE/ACM Transactions of Networking 20(1), (2012)Google Scholar
  47. 47.
  48. 48.
    Stone-Gross, B., Christopher, Kruegel, Almeroth, K., Moser, A., Kirda, E.: Fire: Finding rogue networks. In: Annual Computer Security Applications Conference, Honolulu, HI (December 2009) Google Scholar
  49. 49.
  50. 50.
    The spamhaus project (October 2012) http://www.spamhaus.org/
  51. 51.
  52. 52.
  53. 53.
    Walls, R.J., Levine, B.N., Liberatore, M., Shields, C.: Effective digital forensics research is investigator-centric. In: USENIX Workshop on Hot Topics in Security, San Francisco, CA (August 2011)Google Scholar
  54. 54.
    Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)Google Scholar
  55. 55.
  56. 56.
    Wyke, J.: The Zeroaccess Botnet: Mining and Fraud for Massive Financial Gain (September 2012). http://www.sophos.com/en-us/why-sophos/our-people/technical-papers/zeroaccess-botnet.asp:x
  57. 57.
    X-arf: Network abuse reporting 2.0. http://x-arf.org/
  58. 58.
    Xylitol: Blackhole exploit kits update to v2.0 (September 2011). http://malware.dontneedcoffee.com/2012/09/blackhole2.0.html
  59. 59.
    Xylitol: Tracking Cyber Crime: Hands Up Affiliate (Ransomware) (December 2011). http://www.xylibox.com/2011/12/tracking-cyber-crime-affiliate.html
  60. 60.
    Zauner, C.: Implementation and Benchmarking of Perceptual Image Hash Functions. Master’s thesis, Upper Austria University of Applied Sciences, Hagenberg, Austria (July 2010)Google Scholar
  61. 61.
    Zelix klassmaster heavy duty protection. http://www.zelix.com/klassmaster/
  62. 62.
    Zhang, J., Seifert, C., Stokes, J. W., Lee, W.: Arrow: Generating signatures to detect drive-by downloads. In: International World Wide Web Conference, Hyderabad, India (April 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Antonio Nappa
    • 1
    • 2
  • M. Zubair Rafique
    • 3
  • Juan Caballero
    • 1
  1. 1.IMDEA Software InstituteMadridSpain
  2. 2.Universidad Politécnica de MadridMadridSpain
  3. 3.iMinds-DistriNetKU LeuvenLeuvenBelgium

Personalised recommendations