Abstract
Dynamic, data-driven methods are key to enabling the deployment of accurate and efficient data stream mining (DSM) systems, by invoking suitably configured queries in real-time on streams of input data. With the proliferation of technologies for big data analytics, application areas for stream mining are numerous and continually expanding—representative examples include healthcare, climate monitoring, surveillance, and network security. Due to the typically physical separation among data sources and computational resources, it is often necessary to deploy such stream mining systems in a distributed fashion, where local learners have access to disjoint subsets of the data that is to be mined, and forward their intermediate results to ensemble learners that combine the results from the local learners; this is true also in the case of edge computing where computation can take place at the data source. In such a distributed, data stream mining context, DDDAS principles must be incorporated strategically at all levels of the design and implementation process to effectively manage trade-offs among stream mining accuracy, resource requirements, performance, and energy efficiency. This chapter presents methodologies for such DDDAS-integrated, design and implementation of data stream mining systems, referring to these methods as Dataflow- and DDDAS-integrated Adaptive DSM system Design (DDADD), and which combine the methods of dataflow-based signal processing system design with the DDDAS paradigm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016). ArXiv:1603.04467v2 [cs.DC]
Abeshu, A., Chilamkurti, N.: Deep learning: the frontier for distributed attack detection in fog-to-things computing. IEEE Communications Magazine 56(2), 169–175 (2018)
Awad, A., Bader-El-Den, M., McNicholas, J., Briggs, J.: Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. International journal of medical informatics 108, 185–195 (2017)
Bhattacharyya, S.S., Deprettere, E., Leupers, R., Takala, J. (eds.): Handbook of Signal Processing Systems, third edn. Springer (2019)
Blasch, E.P., Ravela, S., Aved, A.J. (eds.): Handbook of Dynamic Data Driven Applications Systems. Springer (2018)
Blum, A.: Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning 26(1), 5–23 (1997)
Boutellier, J., Hautala, I.: Executing dynamic data rate actor networks on OpenCL platforms. In: Proceedings of the IEEE Workshop on Signal Processing Systems, pp. 98–103 (2016)
Calloway, S., Venegas, L.: The new HIPAA law on privacy and confidentiality. Nursing Administration Quarterly 26(4), 40–54 (2002)
Canzian, L., van der Schaar, M.: A network of cooperative learners for data–driven stream mining. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 2908–2912 (2014)
Chan, P.K., Stolfo, S.J.: Experiments on multistrategy learning by meta-learning. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 314–323 (1993)
Chen, J., Li, K., Deng, Q., Li, K., Philip, S.Y.: Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Transactions on Industrial Informatics (2019)
Dennis, J.B.: First version of a data flow procedure language. Tech. rep., Laboratory for Computer Science, Massachusetts Institute of Technology (1975)
Fall, K., Varadhan, K.: The ns Manual (formerly ns Notes and Documentation) (2011)
Fan, W., Stolfo, S.J., Zhang, J.: The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 362–366 (1999)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Herbster, M., Warmuth, M.K.: Tracking the best expert. Machine Learning 32(2), 151–178 (1998)
Issariyakul, T., Hossain, E.: Introduction to Network Simulator NS2, second edn. Springer (2012)
Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms, second edn. Wiley–IEEE Press (2011)
Lee, E.A., Parks, T.M.: Dataflow process networks. Proceedings of the IEEE pp. 773–799 (1995)
Leupers, R., Aguilar, M.A., Eusse, J.F., Castrillon, J., Sheng, W.: MAPS: A software development environment for embedded multicore applications. In: S. Ha, J. Teich (eds.) Handbook of Hardware/Software Codesign, pp. 917–949. Springer (2017)
Li, H., Sudusinghe, K., Liu, Y., Yoon, J., van der Schaar, M., Blasch, E., Bhattacharyya, S.S.: Dynamic, data-driven processing of multispectral video streams. IEEE Aerospace & Electronic Systems Magazine 32(7), 50–57 (2017)
Lin, S., Liu, Y., Lee, K., Li, L., Plishker, W., Bhattacharyya, S.S.: The DSPCAD framework for modeling and synthesis of signal processing systems. In: S. Ha, J. Teich (eds.) Handbook of Hardware/Software Codesign, pp. 1–35. Springer (2017)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)
Madroãl, D., et al.: PAPIFY: Automatic instrumentation and monitoring of dynamic dataflow applications based on PAPI. IEEE Access 7, 111,801–111,812 (2019)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Integrating novel class detection with classification for concept-drifting data streams. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 79–94 (2009)
Minku, L.L., Yao, X.: DDD: A new ensemble approach for dealing with concept drift. IEEE Transactions on Knowledge and Data Engineering 24(4), 619–633 (2012)
Park, B., Kargupta, H.: Distributed data mining: Algorithms, systems, and applications. In: N. Ye (ed.) Data Mining Handbook. Lawrence Erlbaum Associates (2004)
Shen, C., Plishker, W., Wu, H., Bhattacharyya, S.S.: A lightweight dataflow approach for design and implementation of SDR systems. In: Proceedings of the Wireless Innovation Conference and Product Exposition, pp. 640–645. Washington DC, USA (2010)
Vo, T.T., Nguyen, T.D., Vo, M.T.: Ubiquitous sensor network for development of climate change monitoring system based on solar power supply. In: Proceedings of the International Conference on Advanced Technologies for Communications, pp. 121–124 (2013)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235 (2003)
Wang, S., Tuor, T., Salonidis, T., Leung, K.K., Makaya, C., He, T., Chan, K.: When edge meets learning: Adaptive control for resource-constrained distributed machine learning. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 63–71. IEEE (2018)
Won, S., Cho, I., Sudusinghe, K., Xu, J., Zhang, Y., van der Schaar, M., Bhattacharyya, S.S.: A design methodology for distributed adaptive stream mining systems. In: Proceedings of the International Conference on Computational Science, pp. 2482–2491. Barcelona, Spain (2013)
Acknowledgements
This work is supported by the US Air Force Office of Scientific Research under the Dynamic Data and Information Processing Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Xu, J., Sudusinghe, K., Schaar, M.v.d., Bhattacharyya, S.S. (2023). Adaptive Data Stream Mining (DSM) Systems. In: Darema, F., Blasch, E.P., Ravela, S., Aved, A.J. (eds) Handbook of Dynamic Data Driven Applications Systems. Springer, Cham. https://doi.org/10.1007/978-3-031-27986-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-27986-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27985-0
Online ISBN: 978-3-031-27986-7
eBook Packages: Computer ScienceComputer Science (R0)