We present results from an empirical study of seven online-learning methods on the task of detecting previously unseen malicious executables. Malicious software has disrupted computer and network operation and has compromised or destroyed sensitive information. Methods of machine learning, which build predictive models that generalize training data, have proven useful for detecting previously unseen malware. In previous studies, batch methods detected malicious and benign executables with high true-positive and true-negative rates, but doing so required significant time and space, which may limit applicability. Online methods of learning can update models quickly with only a single example, but potential trade-offs in performance are not well-understood for this task. Accuracy of the best performing online methods was 93#x0025;, which was 3-4% lower than that of batch methods. For applications that require immediate updates of models, this may be an acceptable trade-off. Our study characterizes these tradeoffs, thereby giving researchers and practitioners insights into the performance of online methods of machine learning on the task of detecting malicious executables.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37-66 (1991)
Asuncion, A., Newman, D.J.: UCI machine learning repository. Web site, School of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/∼mlearn/MLRepository.html (2007)
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of Internet malware. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 178-197. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 57, 2007.Proceedings
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin-Heidelberg (2006)
Blum, A.: Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning 26, 5-23 (1997)
Bruner, J.S., Goodnow, J.J., Austin, G.A.: A Study of Thinking. Wiley & Sons, New York, NY (1956). Republished in 1986 and 1990 by Transaction Publishers, New Brunswick, NJ
Cova, M., Balzarotti, D., Felmetsger, V., Vigna, G.: Swaddler: An approach for the anomaly-based detection of state violations in web applications. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 63-86. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 5-7,2007. Proceedings
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines—and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71-80. ACM Press, New York, NY (2000)
Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103-130 (1997)
Early, J.P., Brodley, C.E.: Behavioral features for network anomaly detection. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 107-124. Springer, Berlin-Heidelberg (2006)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148-156. Morgan Kaufmann, San Francisco, CA (1996)
Hand, D.J., Yu, K.: Idiot's Bayes: Not so stupid after all? International Statistical Review 69, 385-398 (2001)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13-30 (1963)
Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985-996. Morgan Kaufmann, San Francisco, CA (1995)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 123-130. IEEE Press, Los Alamitos, CA (2003)
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470-478. ACM Press, New York, NY (2004)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721-2744 (2006). Special Issue on Machine Learning in Computer Security
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755-2790 (2007)
Langley, P., Sage, S.: Tractable average-case analysis of naive Bayesian classifiers. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 220-228. Morgan Kaufmann, San Francisco, CA (1999)
Langley, P.W.: Elements of Machine Learning. Morgan Kaufmann, San Francisco, CA (1996)
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285-318 (1988)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212-261 (1994)
Lo, R.W., Levitt, K.N., Olsson, R.A.: MCF: A malicious code filter. Computers & Security 14(6), 541-566 (1995)
Maloof, M.A.: Concept drift. In: J. Wang (ed.) Encyclopedia of Data Warehousing and Mining, pp. 202-206. Information Science Publishing, Hershey, PA (2005)
Maloof, M.A. (ed.): Machine Learning and Data Mining for Computer Security: Methods and Applications. Springer, Berlin-Heidelberg (2006)
Maloof, M.A.: Some basics concepts of machine learning and data mining. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 23-43. Springer, Berlin-Heidelberg (2006)
Maloof, M.A., Stephens, G.D.: ELICIT: A system for detecting insiders who violate need-to-know. In: Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, vol. 4637, pp. 146-166. Springer, Berlin-Heidelberg (2007). Tenth International Conference, RAID 2007, Gold Coast, Australia, September 5-7, 2007. Proceedings
Maron, O., Moore, A.: Hoeffding races: Accelerating model selection search for classification and function approximation. In: Advances in Neural Information Processing Systems 6, pp. 59-66. Morgan Kaufmann, San Francisco, CA (1994)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York, NY (1997)
Ng, A.Y., Jordon, M.I.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA (1993)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Upper Saddle River, NJ (2003)
Schapire, R.E.: The boosting approach to machine learning: An overview. In: D.D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, B. Yu (eds.) Nonlinear Estimation and Classification, Lecture Notes in Statistics, vol. 171, pp. 149-172. Springer, Berlin-Heidelberg (2003)
Schlimmer, J.C.: Concept acquisition through representational adjustment. Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine (1987)
Schlimmer, J.C., Granger, R.H.: Beyond incremental processing: Tracking concept drift. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 502-507. AAAI Press, Menlo Park, CA (1986)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38-49. IEEE Press, Los Alamitos, CA (2001)
Shields, T.C.: An introduction to information assurance. In: M.A. Maloof (ed.) Machine learning and data mining for computer security: Methods and applications, pp. 7-21. Springer, Berlin-Heidelberg (2006)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377-382. ACM Press, New York, NY (2001)
Tesauro, G., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5-6 (1996)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226-235. ACM Press, New York, NY (2003)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA (2005)
Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412-420. Morgan Kaufmann, San Francisco, CA (1997)
Yu, Z., Tsai, J.J.P., Weigert, T.: An automatically tuning intrusion detection system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37(2), 373-384 (2007)
Zhang, Y., Lee, W., Huang, Y.A.: Intrusion detection techniques for mobile wireless networks. Wireless Networks 9(5), 545-556 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag US
About this chapter
Cite this chapter
Maloof, M.A. (2009). On the Performance of Online Learning Methods for Detecting Malicious Executables. In: Machine Learning in Cyber Trust. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-88735-7_5
Download citation
DOI: https://doi.org/10.1007/978-0-387-88735-7_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-88734-0
Online ISBN: 978-0-387-88735-7
eBook Packages: Computer ScienceComputer Science (R0)