Event Detection Based on Nonnegative Matrix Factorization: Ceasefire Violation, Environmental, and Malware Events
Event detection is a very important problem across many domains and is a broadly applicable encompassing many disciplines within engineering systems. In this paper, we focus on improving the user’s ability to quickly identify threat events such as malware, military policy violations, and natural environmental disasters. The information to perform these detections is extracted from text data sets in the latter two cases. Malware threats are important as they compromise computer system integrity and potentially allow the collection of sensitive information. Military policy violations such as ceasefire policies are important to monitor as they disrupt the daily lives of many people within countries that are torn apart by social violence or civil war. The threat of environmental disasters takes many forms and is an ever-present danger worldwide, and indiscriminate regarding who is harmed or killed. In this paper, we address all three of these threat event types using the same underlying technology for mining the information that leads to detecting such events. We approach malware event detection as a binary classification problem, i.e., one class for the threat mode and another for non-threat mode. We extend our novel classifier utilizing constrained low rank approximation as the core algorithm innovation and apply our Nonnegative Generalized Moody-Darken Architecture (NGMDA) hybrid method using various combinations of input and output layer algorithms. The new algorithm uses a nonconvex optimization problem via the nonnegative matrix factorization (NMF) for the hidden layer of a single layer perceptron and a nonnegative constrained adaptive filter for the output layer estimator. We first show the utility of the core NMF technology for both ceasefire violation and environmental disaster event detection. Next NGMDA is applied to the problem of malware threat events, again based on the NMF as the core computational tool. Also, we demonstrate that an algorithm should be appropriately selected for the data generation process. All this has critical implications for design of solutions for important threat/event detection scenarios. Lastly, we present experimental results on foreign language text for ceasefire violation and environmental disaster events. Experimental results on a KDD competition data set for malware classification are presented using our new NGMDA classifier.
KeywordsMalware detection Event detection Perceptron Clustering Nonnegative matrix factorization Adaptive filtering Hybrid classifier Topic modeling Classification
The authors thank Dr. Richard Boyd for his significant contributions to SmallK and LEAN, both of which were essential tools used in our topic refinement process. We would also like to thank Dr. Elizabeth Whitaker for the opportunity to present this work and the Information and Communication Laboratory (ICL) at Georgia Tech Research Institute for supporting this research. This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) XDATA program grant FA8750-12-2-0309. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.
- 3.Drake, B., Huang, T., Cistola, C.: Malware detection based on new implementations of the moody-darken single-layer perceptron architecture: when the data speaks, are we listening? In: Ahram, T., Karwowski, W. (eds.) Proceedings of the 7th International Conference on Applied Human Factors and Ergonomics, Human Factors in Cybersecurity, Orlando, Florida, USA, 27–21 July. Springer (2016, invited paper)Google Scholar
- 5.Haykin, S.: Adaptive Filter Theory, 5th edn. Pearson Education, Inc., New Jersey (2014)Google Scholar
- 6.Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, Upper Saddle River (2009)Google Scholar
- 7.Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, New York (2005)Google Scholar
- 11.Franc, V., Hlaváč, V., Navara, M.: Sequential coordinate-wise algorithm for the non-negative least squares problem. In: Proceedings of the 11th International Conference on Computer Analysis of Images and Patterns (CAIP), pp. 407–414 (2005)Google Scholar
- 12.Drake, B., Kim, J., Mallick, M., Park, H.: Supervised Raman spectra estimation based on nonnegative rank deficient least squares. In: Proceedings 13th International Conference on Information Fusion, Edinburgh, UK (2010)Google Scholar
- 13.Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory, vol. 2. Prentice Hall PTR, Englewood Cliffs (1998)Google Scholar