Advertisement

The Artificial Immune Ecosystem: A Bio-Inspired Meta-Algorithm for Boosting Time Series Anomaly Detection with Expert Input

  • Fabio Guigou
  • Pierre Collet
  • Pierre Parrend
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10199)

Abstract

One of the challenges in machine learning, especially in the Big Data era, is to obtain labeled data sets. Indeed, the difficulty of labeling large amounts of data had lead to an increasing reliance on unsupervised classifiers, such as deep autoencoders. In this paper, we study the problem of involving a human expert in the training of a classifier instead of using labeled data. We use anomaly detection in network monitoring as a field of application. We demonstrate how using crude, already existing monitoring software as a heuristic to choose which points to label can boost the classification rate with respect to both the monitoring software and the classifier trained on a fully labeled data set, with a very low computational cost. We introduce the Artificial Immune Ecosystem meta-algorithm as a generic framework integrating the expert, the heuristic and the classifier.

Keywords

Artificial immune system Boosting Anomaly detection Time series Machine learning 

References

  1. 1.
    Silverstein, A.M.: Paul ehrlich, archives and the history of immunology. Nat. Immunol. 6(7), 639–639 (2005)CrossRefGoogle Scholar
  2. 2.
    Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself discrimination in a computer. In: Proceedings of the 1994 IEEE Symposium on Security and Privacy, p. 202. IEEE (1994)Google Scholar
  3. 3.
    Hofmeyr, S.A., Forrest, S.: An immunological model of distributed detection and its application to computer security. The University of New Mexico (1999)Google Scholar
  4. 4.
    Aickelin, U., Cayzer, S.: The danger theory and its application to artificial immune systems (2008). arXiv preprint arXiv:0801.3549
  5. 5.
    Freitas, A.A., Timmis, J.: Revisiting the foundations of artificial immune systems for data mining. IEEE Trans. Evol. Comput. 11(4), 521–540 (2007)CrossRefGoogle Scholar
  6. 6.
    Montechiesi, L., Cocconcelli, M., Rubini, R.: Artificial immune system via euclidean distance minimization for anomaly detection in bearings. Mech. Syst. Signal Process. 76–77, 380–393 (2015)Google Scholar
  7. 7.
    Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)Google Scholar
  8. 8.
    Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 28(4), 851–881 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bagnall, A., Janacek, G.: A run length transformation for discriminating between auto regressive time series. J. Classif. 31(2), 154–178 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)Google Scholar
  11. 11.
    Wei, L., Kumar, N., Lolla, V.N., Keogh, E.J., Lonardi, S., Ratanamahatana, C.A.: Assumption-free anomaly detection in time series. In: SSDBM 2005, vol. 5, pp. 237–242 (2005)Google Scholar
  12. 12.
    Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: Time series anomaly discovery with grammar-based compression. In: EDBT, pp. 481–492 (2015)Google Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). doi: 10.1007/3-540-59119-2_166CrossRefGoogle Scholar
  14. 14.
    Babenko, B., Yang, M.H., Belongie, S.: A family of online boosting algorithms. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1346–1353. IEEE (2009)Google Scholar
  15. 15.
    Beygelzimer, A., Kale, S., Luo, H.: Optimal and adaptive algorithms for online boosting (2015). arXiv preprint arXiv:1502.02651
  16. 16.
    Volkova, S.: Data stream mining: A review of learning methods and frameworks (2012)Google Scholar
  17. 17.
    Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24775-3_36CrossRefGoogle Scholar
  18. 18.
    Chen, L., Kamel, M.S.: Design of multiple classifier systems for time series data. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 216–225. Springer, Heidelberg (2005). doi: 10.1007/11494683_22CrossRefGoogle Scholar
  19. 19.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar
  20. 20.
    Valko, M., Kveton, B., Huang, L., Ting, D.: Online semi-supervised learning on quantized graphs (2012). arXiv preprint arXiv:1203.3522
  21. 21.
    Zhang, G., Jiang, Z., Davis, L.S.: Online semi-supervised discriminative dictionary learning for sparse representation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 259–273. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_20CrossRefGoogle Scholar
  22. 22.
    Kveton, B., Philipose, M., Valko, M., Huang, L.: Online semi-supervised perception: Real-time learning without explicit feedback. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 15–21. IEEE (2010)Google Scholar
  23. 23.
    Veeramachaneni, K., Arnaldo, I.: AI2: Training a big data machine to defend. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), April 2016Google Scholar
  24. 24.
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)CrossRefGoogle Scholar
  25. 25.
    Mi, Y.: Imbalanced classification based on active learning smote. Res. J. Appl. Sci. Eng. Technol. 5, 944–949 (2013)Google Scholar
  26. 26.
    Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30115-8_7CrossRefGoogle Scholar
  27. 27.
    Saunier, N., Midenet, S., Grumbach, A.: Stream-based learning through data selection in a road safety application. In: STAIRS 2004, Proceedings of the Second Starting AI Researchers Symposium, vol. 109, pp. 107–117(2004)Google Scholar
  28. 28.
    Forman, G., Cohen, I.: Learning from little: Comparison of classifiers given little training. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 161–172. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30116-5_17CrossRefGoogle Scholar
  29. 29.
    Chinchor, N., Sundheim, B.: MUC-5 evaluation metrics. In: Proceedings of the 5th Conference on Message Understanding, pp. 69–78. Association for Computational Linguistics (1993)Google Scholar
  30. 30.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Fabio Guigou
    • 1
    • 2
    • 4
  • Pierre Collet
    • 2
    • 4
  • Pierre Parrend
    • 2
    • 3
    • 4
  1. 1.IPLineCaluire-et-cuireFrance
  2. 2.ICube LaboratoryUniversité de StrasbourgStrasbourgFrance
  3. 3.ECAM Strasbourg-EuropeSchiltigheimFrance
  4. 4.Complex System Digital Campus (UNESCO Unitwin)ParisFrance

Personalised recommendations