PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

Tickle, Rebecca; Triguero, Isaac; Figueredo, Grazziela P.; Mesgarpour, Mohammad; John, Robert I.

doi:10.1007/s12559-019-09638-y

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

Published: 10 April 2019

Volume 11, pages 434–458, (2019)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Rebecca Tickle ORCID: orcid.org/0000-0001-9274-0409¹,
Isaac Triguero¹,
Grazziela P. Figueredo²,
Mohammad Mesgarpour³ &
…
Robert I. John¹

422 Accesses
4 Citations
Explore all metrics

Abstract

Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. A hot spot is defined as a region of high likelihood of occurrence of a particular event. To identify hot spots, location data for those events is required, which is typically collected by telematics devices. These sensors are constantly gathering information, generating very large volumes of data. Current state-of-the-art solutions are capable of identifying hot spots from big static batches of data by means of variations of clustering or instance selection techniques that pre-process the original input data, providing the most relevant locations. However, these approaches neglect to address changes in hot spots over time. This paper presents a dynamic bio-inspired approach to detect hot spots in big data streams. This computational intelligence method is designed and applied to the transportation sector as a case study to identify incidents in the roads caused by heavy goods vehicles. We adapt an immune-based algorithm to account for the temporary aspect of hot spots inspired by the idea of pheromones, which is then subsequently implemented using Apache Spark Streaming. Experimental results on real datasets with up to 4.5 million data points—provided by a telematics company—show that the algorithm is capable of quickly processing large streaming batches of data, as well as successfully adapting over time to detect hot spots. The outcome of this method is twofold, both reducing data storage requirements and demonstrating resilience to sudden changes in the input data (concept drift).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Multi-agent System for Dynamic Clustering Applied to Itineraries Regularities and Traffic Prediction

Dynamic Stream Clustering Using Ants

DG2CEP: a near real-time on-line algorithm for detecting spatial clusters large data streams through complex event processing

Article Open access 15 April 2019

Notes

References

Alpaydin E. Introduction to machine learning. Cambridge: The MIT Press; 2014.
Google Scholar
Anderson TK. Kernel density estimation and k-means clustering to profile road accident hotspots. Accid Anal Prev 2009;41(3):359–64.
Article PubMed Google Scholar
Barros RSM, Santos SGTC. A large-scale comparison of concept drift detectors. Inf Sci 2018;451-452:348–70.
Article Google Scholar
Beringer J, Hüllermeier E. Efficient instance-based learning on data streams. Intelligent Data Analysis 2007; 11(6):627–50.
Article Google Scholar
Braithwaite A, Li Q. Transnational terrorism hot spots: identification and impact evaluation. Conflict Management and Peace Science 2007;24(4):281–96.
Article Google Scholar
Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.
Article Google Scholar
Cheng W, Washington SP. Experimental evaluation of hotspot identification methods. Accid Anal Prev 2005;37(5):870–81.
Article PubMed Google Scholar
Chu F, Zaniolo C. Fast and light boosting for adaptive mining of data streams. Advances in Knowledge Discovery and Data Mining, p 282–92. In: Dai H, Srikant R, and Zhang C, editors; 2004.
Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM 2010;53(1):72–7.
Article Google Scholar
Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cogn Comput 2016;8(1):30–8.
Article Google Scholar
Dorigo M, Di Caro G. Ant colony optimization: a new meta-heuristic. Proceedings of the 1999 congress on evolutionary computation, 1999. IEEE; 1999. p. 1470–7.
Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern B (Cybernetics) 1996;26(1):29–41.
Article CAS Google Scholar
Elen B, Peters J, van Poppel M, Bleux N, Theunis J, Reggente M, Standaert A. The Aeroflex: a bicycle for mobile air quality measurements. Sensors (Switzerland) 2013;13(1):221–40.
Article CAS Google Scholar
Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd; 1996. p. 226–31.
Figueredo GP, Ebecken NFF, Augusto DA, Barbosa HJC. An immune-inspired instance selection mechanism for supervised classification. Memetic Computing 2012;4:135–47.
Article Google Scholar
Figueredo GP, Ebecken NFF, Barbosa HJC. The SUPRAIC algorithm: a suppression immune based mechanism to find a representative training set in data classification tasks. ICARIS, Lecture notes in computer science. Berlin: Springer; 2007. p. 59–70.
Figueredo GP, Triguero I, Mesgarpour M, Guerra AM, Garibaldi JM, John RI. An immune-inspired technique to identify heavy goods vehicles incident hot spots. IEEE Transactions on Emerging Topics in Computational Intelligence 2017;1(4):248–58.
Article Google Scholar
Gama J. Knowledge discovery from data streams, 1st ed. Boca Raton: Chapman & hall/CRC; 2010.
Book Google Scholar
García S, Derrac J, Cano J, Herrera F. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 2012;34(3):417–35.
Article PubMed Google Scholar
García S, Luengo J, Herrera F. Data preprocessing in data mining. Berlin: Springer Publishing Company, Incorporated; 2014.
Google Scholar
Han J, Kamber M, Tung AKH. Spatial clustering methods in data mining: a survey. In: Miller HJ and Han J, editors. Milton Park: Taylor and Francis; 2001.
Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.
Google Scholar
Heylighen F. Stigmergy as a universal coordination mechanism I: definition and components. Cogn Syst Res 2016; 38:4–13. https://doi.org/10.1016/j.cogsys.2015.12.002. Special Issue of Cognitive Systems Research – Human-Human Stigmergy.
Article Google Scholar
Hulten G, Spencer L, Domingos P. Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. New York: ACM; 2001. p. 97–106.
Klinkenberg R. Learning drifting concepts: example selection vs. example weighting. Intelligent Data Analysis 2004;8(3):281–300.
Article Google Scholar
Krawczyk B. Active and adaptive ensemble learning for online activity recognition from data streams. Knowl-Based Syst 2017;138:69–78.
Article Google Scholar
Krawczyk B, Cano A. Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 2018;68:677–92.
Article Google Scholar
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M. Ensemble learning for data stream analysis: a survey. Information Fusion 2017;37:132–56.
Article Google Scholar
Krempl G, žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J. Open challenges for data stream mining research. SIGKDD Explor Newsl 2014;16(1):1–0.
Article Google Scholar
Mesgarpour M, Landa-Silva D, Dickinson I. Overview of telematics-based prognostics and health management systems for commercial vehicles. Activities of Transport Telematics 2013;395:123–30.
Article Google Scholar
Molina D, LaTorre A, Herrera F. An insight into bio-inspired and evolutionary algorithms for global optimization: Review, analysis, and lessons learnt over a decade of competitions. Cogn Comput 2018;10(4):517–44. https://doi.org/10.1007/s12559-018-9554-0.
Article Google Scholar
Montella A. A comparative analysis of hotspot identification methods. Accid Anal Prev 2010;42(2):571–81.
Article PubMed Google Scholar
Passini MLC, Estébanez KB, Figueredo GP, Ebecken NFF. A strategy for training set selection in text classification problems. Int J Adv Comput Sci Appl 2013;4(6):54–60.
Google Scholar
Perallos A, Hernandez-Jayo U, Onieva E, García-zuazola IJ. Intelligent transport systems: technologies and applications, 1st ed. Hoboken: Wiley Publishing; 2015.
Book Google Scholar
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 2017;239:39–57.
Article Google Scholar
Shen YY, Liu CL. Incremental adaptive learning vector quantization for character recognition with continuous style adaptation. Cogn Comput 2018;10(2):334–46.
Article Google Scholar
Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T. 2014. Big data clustering: a review. In: International conference on computational science and its applications, Springer; p. 707–20.
Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7 (6):706–14.
Article Google Scholar
Sousa R, Gama J. Multi-label classification from high-speed data streams with adaptive model rules and random rules. Progress in Artificial Intelligence 2018;7(3):177–87.
Article Google Scholar
Street WN, Kim Y. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01, p. 377–82.
Triguero I, Figueredo GP, Mesgarpour M, Garibaldi JM, John RI. 2017. Vehicle incident hot spots identification: an approach for big data. In: 2017 IEEE Trustcom/bigdataSE/ICESS, p. 901–8.
Van Brummelen G. Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton: Princeton University Press; 2012.
Book Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, p. 15–28.
Zaharia M, Das T, Li H, Shenker S, Stoica I. 2012. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX conference on hot topics in cloud computing, p. 10–0.
Zhao L, Wang L, Xu Q. Data stream classification with artificial endocrine system. Appl Intell 2012;37 (3):390–404.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Soft Computing and Intelligent Information Systems research group from the University of Granada, for allowing us to use their big data infrastructure to carry out the experiments.

Author information

Authors and Affiliations

The Automated Scheduling Optimisation and Planning Research Group, School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, UK
Rebecca Tickle, Isaac Triguero & Robert I. John
The Advanced Data Analysis Centre, School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, UK
Grazziela P. Figueredo
Microlise, Farrington Way, Eastwood, Nottingham, NG16 3AG, UK
Mohammad Mesgarpour

Authors

Rebecca Tickle
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Triguero
View author publications
You can also search for this author in PubMed Google Scholar
Grazziela P. Figueredo
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mesgarpour
View author publications
You can also search for this author in PubMed Google Scholar
Robert I. John
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isaac Triguero.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tickle, R., Triguero, I., Figueredo, G.P. et al. PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams. Cogn Comput 11, 434–458 (2019). https://doi.org/10.1007/s12559-019-09638-y

Download citation

Received: 11 May 2018
Accepted: 10 March 2019
Published: 10 April 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s12559-019-09638-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

Abstract

Access this article

Similar content being viewed by others

Adaptive Multi-agent System for Dynamic Clustering Applied to Itineraries Regularities and Traffic Prediction

Dynamic Stream Clustering Using Ants

DG2CEP: a near real-time on-line algorithm for detecting spatial clusters large data streams through complex event processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

Abstract

Access this article

Similar content being viewed by others

Adaptive Multi-agent System for Dynamic Clustering Applied to Itineraries Regularities and Traffic Prediction

Dynamic Stream Clustering Using Ants

DG2CEP: a near real-time on-line algorithm for detecting spatial clusters large data streams through complex event processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation