Advertisement

On the Performance of Online Learning Methods for Detecting Malicious Executables

  • Marcus A. Maloof
Chapter

We present results from an empirical study of seven online-learning methods on the task of detecting previously unseen malicious executables. Malicious software has disrupted computer and network operation and has compromised or destroyed sensitive information. Methods of machine learning, which build predictive models that generalize training data, have proven useful for detecting previously unseen malware. In previous studies, batch methods detected malicious and benign executables with high true-positive and true-negative rates, but doing so required significant time and space, which may limit applicability. Online methods of learning can update models quickly with only a single example, but potential trade-offs in performance are not well-understood for this task. Accuracy of the best performing online methods was 93#x0025;, which was 3-4% lower than that of batch methods. For applications that require immediate updates of models, this may be an acceptable trade-off. Our study characterizes these tradeoffs, thereby giving researchers and practitioners insights into the performance of online methods of machine learning on the task of detecting malicious executables.

Keywords

Ensemble Method Base Learner Concept Drift Batch Method Online Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37-66 (1991)Google Scholar
  2. [2]
    Asuncion, A., Newman, D.J.: UCI machine learning repository. Web site, School of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/∼mlearn/MLRepository.html (2007)
  3. [3]
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of Internet malware. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 178-197. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 57, 2007.ProceedingsCrossRefGoogle Scholar
  4. [4]
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin-Heidelberg (2006)zbMATHGoogle Scholar
  5. [5]
    Blum, A.: Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning 26, 5-23 (1997)CrossRefMathSciNetGoogle Scholar
  6. [6]
    Bruner, J.S., Goodnow, J.J., Austin, G.A.: A Study of Thinking. Wiley & Sons, New York, NY (1956). Republished in 1986 and 1990 by Transaction Publishers, New Brunswick, NJGoogle Scholar
  7. [7]
    Cova, M., Balzarotti, D., Felmetsger, V., Vigna, G.: Swaddler: An approach for the anomaly-based detection of state violations in web applications. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 63-86. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 5-7,2007. ProceedingsCrossRefGoogle Scholar
  8. [8]
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines—and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  9. [9]
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71-80. ACM Press, New York, NY (2000)CrossRefGoogle Scholar
  10. [10]
    Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103-130 (1997)zbMATHCrossRefGoogle Scholar
  11. [11]
    Early, J.P., Brodley, C.E.: Behavioral features for network anomaly detection. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 107-124. Springer, Berlin-Heidelberg (2006)CrossRefGoogle Scholar
  12. [12]
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148-156. Morgan Kaufmann, San Francisco, CA (1996)Google Scholar
  13. [13]
    Hand, D.J., Yu, K.: Idiot's Bayes: Not so stupid after all? International Statistical Review 69, 385-398 (2001)zbMATHCrossRefGoogle Scholar
  14. [14]
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13-30 (1963)zbMATHCrossRefMathSciNetGoogle Scholar
  15. [15]
    Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985-996. Morgan Kaufmann, San Francisco, CA (1995)Google Scholar
  16. [16]
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 123-130. IEEE Press, Los Alamitos, CA (2003)Google Scholar
  17. [17]
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470-478. ACM Press, New York, NY (2004)CrossRefGoogle Scholar
  18. [18]
    Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721-2744 (2006). Special Issue on Machine Learning in Computer SecurityMathSciNetGoogle Scholar
  19. [19]
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755-2790 (2007)Google Scholar
  20. [20]
    Langley, P., Sage, S.: Tractable average-case analysis of naive Bayesian classifiers. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 220-228. Morgan Kaufmann, San Francisco, CA (1999)Google Scholar
  21. [21]
    Langley, P.W.: Elements of Machine Learning. Morgan Kaufmann, San Francisco, CA (1996)Google Scholar
  22. [22]
    Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285-318 (1988)Google Scholar
  23. [23]
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212-261 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  24. [24]
    Lo, R.W., Levitt, K.N., Olsson, R.A.: MCF: A malicious code filter. Computers & Security 14(6), 541-566 (1995)CrossRefGoogle Scholar
  25. [25]
    Maloof, M.A.: Concept drift. In: J. Wang (ed.) Encyclopedia of Data Warehousing and Mining, pp. 202-206. Information Science Publishing, Hershey, PA (2005)Google Scholar
  26. [26]
    Maloof, M.A. (ed.): Machine Learning and Data Mining for Computer Security: Methods and Applications. Springer, Berlin-Heidelberg (2006)Google Scholar
  27. [27]
    Maloof, M.A.: Some basics concepts of machine learning and data mining. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 23-43. Springer, Berlin-Heidelberg (2006)CrossRefGoogle Scholar
  28. [28]
    Maloof, M.A., Stephens, G.D.: ELICIT: A system for detecting insiders who violate need-to-know. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 146-166. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 5-7, 2007. ProceedingsCrossRefGoogle Scholar
  29. [29]
    Maron, O., Moore, A.: Hoeffding races: Accelerating model selection search for classification and function approximation. In: Advances in Neural Information Processing Systems 6, pp. 59-66. Morgan Kaufmann, San Francisco, CA (1994)Google Scholar
  30. [30]
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York, NY (1997)zbMATHGoogle Scholar
  31. [31]
    Ng, A.Y., Jordon, M.I.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA (2002)Google Scholar
  32. [32]
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA (1993)Google Scholar
  33. [33]
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Upper Saddle River, NJ (2003)Google Scholar
  34. [34]
    Schapire, R.E.: The boosting approach to machine learning: An overview. In: D.D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, B. Yu (eds.) Nonlinear Estimation and Classification, Lecture Notes in Statistics, vol. 171, pp. 149-172. Springer, Berlin-Heidelberg (2003)Google Scholar
  35. [35]
    Schlimmer, J.C.: Concept acquisition through representational adjustment. Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine (1987)Google Scholar
  36. [36]
    Schlimmer, J.C., Granger, R.H.: Beyond incremental processing: Tracking concept drift. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 502-507. AAAI Press, Menlo Park, CA (1986)Google Scholar
  37. [37]
    Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38-49. IEEE Press, Los Alamitos, CA (2001)Google Scholar
  38. [38]
    Shields, T.C.: An introduction to information assurance. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 7-21. Springer, Berlin-Heidelberg (2006)CrossRefGoogle Scholar
  39. [39]
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377-382. ACM Press, New York, NY (2001)CrossRefGoogle Scholar
  40. [40]
    Tesauro, G., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5-6 (1996)CrossRefGoogle Scholar
  41. [41]
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226-235. ACM Press, New York, NY (2003)CrossRefGoogle Scholar
  42. [42]
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA (2005)zbMATHGoogle Scholar
  43. [43]
    Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412-420. Morgan Kaufmann, San Francisco, CA (1997)Google Scholar
  44. [44]
    Yu, Z., Tsai, J.J.P., Weigert, T.: An automatically tuning intrusion detection system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37(2), 373-384 (2007)CrossRefGoogle Scholar
  45. [45]
    Zhang, Y., Lee, W., Huang, Y.A.: Intrusion detection techniques for mobile wireless networks. Wireless Networks 9(5), 545-556 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceGeorgetown UniversityWashingtonUSA

Personalised recommendations