Mondrian forest for data stream classification under memory constraints

Khannouz, Martin; Glatard, Tristan

doi:10.1007/s10618-023-00970-4

Mondrian forest for data stream classification under memory constraints

Published: 17 October 2023

Volume 38, pages 569–596, (2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

188 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Supervised learning algorithms generally assume the availability of enough memory to store data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this paper, we adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. In particular, we design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. Moreover, we design node trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints. We evaluate our algorithms on a variety of real and simulated datasets, and we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations, whereas different node trimming mechanisms should be adopted depending on whether a concept drift is expected. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-aware very fast decision tree

Article Open access 20 March 2021

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Article 14 February 2021

Dynamic Forest for Learning from Data Streams with Varying Feature Spaces

Notes

available here.
available here.

References

Akbar D, Omid S, Tristan G, Emad Shihab (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 19(22):5026
Article Google Scholar
Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII. Springer, Berlin Heidelberg, pp 249–260
Alberto C, Bartosz K (2020) Kappa updated ensemble for drifting data stream mining. Mach Learning 109:175–218
Article MathSciNet Google Scholar
Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in intelligent data analysis VIII. Springer, Berlin Heidelberg, pp 249–260
Google Scholar
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bifet Albert, Gavaldà Ricard (apr 2007) Learning from Time-Changing Data with Adaptive Windowing. In: Proceedings of the 2007 SIAM international conference on data mining. society for industrial and applied mathematics
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. ACM Press
Bifet A, Zhang J, Fan W, He C, Zhang J, Qian J, Holmes G, Pfahringer B (2017) Extremely Fast Decision Tree Mining for Evolving Data Streams. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’17, pp 1733–1742,
Cano A, Krawczyk B (2022) ROSE: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 111(7):2561–2599
Article MathSciNet Google Scholar
Dan MT, Scott S, Andrew G, and Ilya K (2014) . Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises, RecoFit
Dutta L, Bharali S (2021) Tinyml meets iot: a comprehensive survey. Internet of Things 16:100461
Article Google Scholar
Elbasi S, Büyükçakı, Bonab H, Can F (2021) On-the-fly ensemble pruning in evolving data streams
Gama J, Sebastião R, Rodrigues P P (2009) Issues in Evaluation of Stream Learning Algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 329-338
Gupta C, Suggala A S, Goyal A, Simhadri H V, Paranjape B, Kumar A, Goyal S, Udupa R, Varma M, Jain P (06–11 Aug 2017) ProtoNN: Compressed and accurate kNN for resource-scarce devices. In Doina Precup and Yee Whye Teh, editors, In: Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp 1331–1340. PMLR
HelloAlone. C++ implementation of the mondrian forest, (2018)
Honnikoll N, Baidari I (2021) Mean error rate weighted online boosting method. The Comput J 66(1):1–15
Article Google Scholar
Khannouz M, Li B, Glatard T (2019) OrpailleCC: a library for data stream analysis on embedded systems. The J Open Source Softw 4:1485
Article ADS Google Scholar
Kumar A, Goyal S, Varma M (2017) Resource-efficient machine learning in 2 kb ram for the internet of things. In: Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pp 1935-1944. JMLR.org
Lakshminarayanan B (2014) Python implementation of the mondrian forest
Lakshminarayanan B, Roy DM, Teh Y W (2014) Mondrian Forests: efficient online random forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, vol 4, pp 3140–3148. Curran Associates, Inc
Logacjov A, Bach K, Kongsvold A, Bårdstu HB, Mork PJ (2021) HARTH: a human activity recognition dataset for machine learning. Sensors 21(23):7853
Article ADS PubMed PubMed Central Google Scholar
Martin K, Tristan Glatard (2020) A benchmark of data stream classification for human activity recognition on connected objects. Sensors (Basel, Switzerland) 20(22):6486
Article Google Scholar
Montiel Jacob, Bifet Albert, Losing Viktor, Read Jesse, Abdessalem Talel (dec 2018) Learning fast and slow: a unified batch/stream Framework. In: 2018 IEEE international conference on big data (Big Data). IEEE
Morris D, Saponas TS, Guillory A, Kelner I (2014) RecoFit: Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, pp 3225-3234, New York, NY, USA. Association for Computing Machinery
Murshed M G Sarwar, Murphy C, Hou D, Khan N, Ananthanarayanan G, Hussain F (2021) Machine learning at the network edge: A survey. ACM Comput. Surv., 54(8)
Oresti B, Juan-Manuel G, Miguel D, Hector P, Ignacio R (2014) Window size impact in human activity recognition. Sensors 14(4):6474–6499
Article Google Scholar
Ray PP (2021) A review on tinyml: State-of-the-art and prospects. J King Saud Univ - Comput and Inf Sci 34(4):1595–1623
Google Scholar
Reiss Attila, Stricker Didier (2012) Introducing a New Benchmarked Dataset for Activity Monitoring. In 2012 16th international symposium on wearable computers, pp 108–109
Sugawara Yu, Oyama Satoshi, Kurihara Masahito (2021) Adaptive rotation forests: Decision tree ensembles for sequential learning. In: 2021 IEEE international conference on systems, Man, and Cybernetics (SMC), pp 613–618
Ustad A, Logacjov A, Trollebø SØ, Thingstad P, Vereijken B, Bach K, Maroni NS (2023) Validation of an activity type recognition model classifying daily physical behavior in older adults: the HAR70+ model. Sensors 23(5):2368
Article ADS PubMed PubMed Central Google Scholar

Download references

Funding

This work was funded by a Strategic Project Grant of the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada
Martin Khannouz & Tristan Glatard

Authors

Martin Khannouz
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Glatard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Martin Khannouz or Tristan Glatard.

Ethics declarations

Conflict of interest

The computing platform was obtained with funding from the Canada Foundation for Innovation. The authors have no conflicts of interest.

Additional information

Responsible editor: Joao Gama.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khannouz, M., Glatard, T. Mondrian forest for data stream classification under memory constraints. Data Min Knowl Disc 38, 569–596 (2024). https://doi.org/10.1007/s10618-023-00970-4

Download citation

Received: 20 September 2022
Accepted: 24 July 2023
Published: 17 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10618-023-00970-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mondrian forest for data stream classification under memory constraints

Abstract

Access this article

Similar content being viewed by others

Energy-aware very fast decision tree

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Dynamic Forest for Learning from Data Streams with Varying Feature Spaces

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mondrian forest for data stream classification under memory constraints

Abstract

Access this article

Similar content being viewed by others

Energy-aware very fast decision tree

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Dynamic Forest for Learning from Data Streams with Varying Feature Spaces

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation