Skip to main content

Online Influence Forest for Streaming Anomaly Detection

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XXI (IDA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13876))

Included in the following conference series:

  • 773 Accesses

Abstract

As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper proposes an unsupervised, online, and incremental anomaly detection ensemble of influence trees that implement adaptive mechanisms to deal with inactive or saturated leaves. This proposal features the fourth standardized moment, also known as kurtosis, as the splitting criteria and the isolation score, Shannon’s information content, and the influence function of an instance as the anomaly score. In addition to improving interpretability, this proposal is also evaluated on publicly available datasets, providing a detailed discussion of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ramírez-Gallego, S., et al.: A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239, 39–57 (2017)

    Article  Google Scholar 

  2. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)

    Article  Google Scholar 

  3. Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 21(2), 6–22 (2019)

    Article  Google Scholar 

  4. Guha, S., Mishra, N., Roy, G., Schrijvers, O.: Robust random cut forest based anomaly detection on streams. In: International Conference on Machine Learning. PMLR, pp. 2712–2721 (2016)

    Google Scholar 

  5. Thimonier, H., Popineau, F., Rimmel, A., Doan, B.-L., Daniel, F.: Tracinad: measuring influence for anomaly detection. arXiv preprint arXiv:2205.01362 (2022)

  6. Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674 (2017)

    Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  8. Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 8th IEEE International Conference on Data Mining. IEEE, vol. 2008, pp. 413–422 (2008)

    Google Scholar 

  9. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  10. Schölkopf, B.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems, vol. 12 (1999)

    Google Scholar 

  11. Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining. IEEE, vol. 2007, pp. 504–515 (2007)

    Google Scholar 

  12. Salehi, M., Rashidi, L.: A survey on anomaly detection in evolving data: [with application to forest fire risk prediction]. ACM SIGKDD Explorations Newsl. 20(1), 13–23 (2018)

    Article  Google Scholar 

  13. Putina, A., Sozio, M., Rossi, D., Navarro, J.M.: Random histogram forest for unsupervised anomaly detection. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE, pp. 1226–1231 (2020)

    Google Scholar 

  14. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ding, Z., Fei, M.: An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC 46(20), 12–17 (2013)

    Google Scholar 

  16. Tan, S., Ting, K., Liu, F.T.: Fast anomaly detection for streaming data. In: 22nd International Joint Conference on Artificial Intelligence, pp. 1511–1516 (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-254

  17. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceeding of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002). https://doi.org/10.1145/347090.347107

  18. Loperfido, N.: Kurtosis-based projection pursuit for outlier detection in financial time series. European J. Financ. 26(2–3), 142–164 (2020)

    Article  Google Scholar 

  19. Hampel, F.R.: The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 69(346), 383–393 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  20. Fiori, A.M., Zenga, M.: The meaning of kurtosis, the influence function and an early intuition by l. faleschini, Statistica 65(2), 135–144 (2005)

    Google Scholar 

  21. Lovric, M., et al.: International Encyclopedia of Statistical Science. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  22. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: International Workshop on Artificial Intelligence and Statistics. PMLR, pp. 229–236 (2001)

    Google Scholar 

  23. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  24. Rayana, S.: Odds library. http://odds.cs.stonybrook.edu/ (2016)

  25. Lavin, A., Ahmad, S.: Evaluating real-time anomaly detection algorithms-the numenta anomaly benchmark. In: IEEE ICMLA, pp. 38–44 (2015)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by Fundação para a Ciência e Tecnologia (FCT), Portugal - 2021.04908.BD, NOVA LINCS - UIDB/04516/2020, CityCatalyst - POCI-01-0247-FEDER-046119, financed by FEDER, and by the CHIST-ERA grant CHIST-ERA-19-XAI-012, and project CHIST-ERA/0004/2019 and partially supported by the CHIST-ERA grant CHIST-ERA-19-XAI-012, funded by FCT. Also, this work is financed by the ERDF - European Regional Development Fund, through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme under the Portugal 2020 Partnership Agreement, within project City Analyser, with reference POCI-01-0247-FEDER-039924.

All the supports mentioned above are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inês Martins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martins, I., Resende, J.S., Gama, J. (2023). Online Influence Forest for Streaming Anomaly Detection. In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham. https://doi.org/10.1007/978-3-031-30047-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30047-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30046-2

  • Online ISBN: 978-3-031-30047-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics