Skip to main content

Dynamic clustering with binary social spider algorithm for streaming dataset

Abstract

Technical advancement in various fields like social network, health instruments and astronomical devices poses massive capturing and sensing capacity that enables huge data generations. This demands substantial storage space and voluminous data processing capacity. Streaming data clustering imparts an efficient method for handling this dataset by extracting significant information. In this article, dynamic estimation of clusters in evolving data stream is designed by incorporating swarm optimization technique. One of the recently reported algorithms inspired from the social behavior of spiders residing in huge colonies is reformulated in binary domain. The main contribution is to use the binary social spider optimization (BSSO) for dynamic data clustering of evolving dataset (DSC-BSSO). The proposed work is able to prove efficiency and efficacy as compared to the other recent existing algorithms. BSSO is well tested on various benchmark unimodal, multimodal and binary optimization functions. Results are reported in terms of parametric and nonparametric. The testing of DSC-BSSO is also done on various streaming datasets in terms of time and memory complexity. The proposed work is able to obtain compact and well-separated clusters in less than one-fourth of a minute for about 10,000 samples.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  • Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: Computer applications and industrial electronics (ISCAIE), 2016 IEEE symposium. IEEE, pp 67–72

  • Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795

    Article  Google Scholar 

  • Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36

    Article  Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435

    Article  Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES (2018a) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intelli Decis Technol Prepr, 12:1–12

  • Abualigah LM, Khader AT, Hanandeh ES (2018b) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466

    Article  Google Scholar 

  • Azad MAK, Rocha AMAC, Fernandes EMGP (2014) Improved binary artificial fish swarm algorithm for the 0–1 multidimensional knapsack problems. Swarm Evol Comput 14:66–75

    Article  Google Scholar 

  • Chuang L-Y, Chang H-W, Chung-Jui T, Yang C-H (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38

    Article  Google Scholar 

  • Cuevas E, Cienfuegos M (2013) A new algorithm inspired in the behavior of the social-spider for constrained optimization. Expert Syst Appl 41(2):412–425

    Article  Google Scholar 

  • Cuevas E, Cienfuegos M (2014) A new algorithm inspired in the behavior of the social-spider for constrained optimization. Expert Syst Appl 41(2):412–425

    Article  Google Scholar 

  • de Andrade Silva J, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238

    Article  Google Scholar 

  • Digalakis JG, Margaritis KG (2002) An experimental study of benchmarking functions for genetic algorithms. Int J Comput Math 79(4):403–416

    MathSciNet  Article  Google Scholar 

  • Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Article  Google Scholar 

  • Falcon R, Almeida M, Nayak A (2011) Fault identification with binary adaptive fireflies in parallel and distributed systems. In: Evolutionary computation (CEC). IEEE Congress, pp 1359–1366

  • Feng Y, Wang G-G, Deb S, Mei L, Zhao X-J (2017) Solving 0–1 knapsack problem by a novel binary monarch butterfly optimization. Neural Comput Appl 28(7):1619–1634

    Article  Google Scholar 

  • Firpi HA, Goodman E (2004) Swarmed feature selection. In: In Information Theory, ISIT Proceedings. International Symposium on. IEEE, pp 112–118

  • Garca S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms behaviour: a case study on the CEC2005 special session on real parameter optimization. J Heuristics 15(6):617

    Article  Google Scholar 

  • Ghaemi A, Rashedi E, Pourrahimi AM, Kamandar M, Rahdari F (2017) Automatic channel selection in EEG signals for classification of left or right hand movement in Brain Computer Interfaces using improved binary gravitation search algorithm. Biomed Signal Process Control 33:109–118

    Article  Google Scholar 

  • Houck CR, Joines J, Kay MG (1995) A genetic algorithm for function optimization: a Matlab implementation. Ncsu-ie tr 95(09):1–10

    Google Scholar 

  • Islam MJ, Li X, Mei Y (2017) A time-varying transfer function for balancing the exploration and exploitation ability of a binary PSO. Appl Soft Comput 59:182–196

    Article  Google Scholar 

  • Jansen T, Wegener I (2005) Real royal road functions—where crossover provably is essential. Discrete Appl Math 149(1–3):111–125

    MathSciNet  Article  Google Scholar 

  • Kanan HR, Faez K, Taheri SM (2007) Feature selection using ant colony optimization (ACO): a new method and comparative study in the application of face recognition system. In: Industrial conference on data mining. Springer, Berlin, pp 63–76

  • Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Systems, man, and cybernetics, 1997. Computational cybernetics and simulation, IEEE International Conference on, vol 5, pp 4104–4108

  • Komaki GM, Kayvanfar V (2015) Grey Wolf Optimizer algorithm for the two-stage assembly flow shop scheduling problem with release time. J Comput Sci 8:109–120

    Article  Google Scholar 

  • Maulik U, Saha I (2010) Automatic fuzzy clustering using modified differential evolution for image classification. IEEE Trans Geosci Remote Sens 48(9):3503–3510

    Article  Google Scholar 

  • Mirjalili S, Wang G-G, Coelho LS (2014) Binary optimization using hybrid particle swarm optimization and gravitational search algorithm. Neural Comput Appl 25(6):1423–1435

    Article  Google Scholar 

  • Motieghader H, Najafi A, Sadeghi B, Masoudi-Nejad A (2017) A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Inform Med Unlocked 9:246–254

    Article  Google Scholar 

  • Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of Page-Hinkley, an approach for fault detection in an agro-alimentary production system. In: Proceedings of the Asian control conference, vol 2. IEEE, pp 815–818

  • Nakamura RYM, Pereira LAM, Costa KA, Rodrigues D, Papa JP, Yang X-S (2012) BBA: a binary bat algorithm for feature selection. In: 2012, IEEE 25th SIBGRAPI conference on graphics, Patterns and Images, pp 291–297

  • Nozarian S, Soltanpoor H, VafaeiJahan M (2011) A binary model on the basis of cuckoo search algorithm in order to solve the problem of knapsack 1–0. In: International conference of sysem engineering and modeling (ICSEM), pp 67–71

  • Omran MGH, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332

    MathSciNet  Article  Google Scholar 

  • Ozturk C, Hancer E, Karaboga D (2015) Dynamic clustering with improved binary artificial bee colony algorithm. Appl Soft Comput 28:69–80

    Article  Google Scholar 

  • Panda A, Pani S (2018) An orthogonal parallel symbiotic organism search algorithm embodied with augmented Lagrange multiplier for solving constrained optimization problems. Soft Comput 22(8):2429–2447

    Article  Google Scholar 

  • Prasad D, Mukherjee A, Mukherjee V (2017) Application of chaotic krill herd algorithm for optimal power flow with direct current link placement problem. Chaos Solitons Fractals 103:90–100

    MathSciNet  Article  Google Scholar 

  • Ramos CCO, Souza AN, Chiachia G, Falco AX, Papa JP (2011) A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. Comput Electr Eng 37(6):886–894

    Article  Google Scholar 

  • Rnndles RH (1986) Nonparametric statistical inference. Technometrics 28(3):275–275

    Article  Google Scholar 

  • Rodrigues D, Pereira LAM, Almeida TNS, Papa JP, Souza AN, Ramos CC, Yang X-S (2013) BCS: a binary cuckoo search algorithm for feature selection. In: Circuits and systems (ISCAS), IEEE international symposium, pp 465–468

  • Rodrigues D, Yang X-S, De Souza AN, Papa JP (2015) Binary flower pollination algorithm and its application to feature selection. In: Recent advances in swarm intelligence and evolutionary computation. Springer, pp 85–100

  • Saki F, Kehtarnavaz N (2016) Online frame-based clustering with unknown number of clusters. Pattern Recognit 57:70–83

    Article  Google Scholar 

  • Shehab M, Khader AT, Al-Betar MA, Abualigah LM (2017) Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: Information technology (ICIT), 2017 8th international conference. IEEE, pp 36–43

  • Shilane D, Martikainen J, Dudoit S, Ovaska SJ (2008) A general framework for statistical performance comparison of evolutionary computation algorithms. Inf Sci 178(14):2870–2879

    Article  Google Scholar 

  • Shukla UP, Nanda SJ (2016) Cluster analysis of evolving data streams using centroid initialization methods. In: Electrical, computer and electronics engineering (UPCON), 2016 IEEE Uttar Pradesh section international conference, pp 624–629

  • Shukla UP, Nanda SJ (2016) Parallel social spider clustering algorithm for high dimensional datasets. Eng Appl Artif Intell 56:75–90

    Article  Google Scholar 

  • Shukla UP, Nanda SJ (2018) A binary social spider optimization algorithm for unsupervised band selection in compressed hyperspectral images. Expert Syst Appl 97:336–356

    Article  Google Scholar 

  • Suresh K, Kumarappan N (2013) Hybrid improved binary particle swarm optimization approach for generation maintenance scheduling problem. Swarm Evol Comput 9:69–89

    Article  Google Scholar 

  • Wang C-D, Lai J-H, Huang D, Zheng W-S (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424

    Article  Google Scholar 

  • Wang L, Xu Y, Mao Y, Fei M (2010) A discrete harmony search algorithm. In: Life system modeling and intelligent computing. Springer, Berlin, pp 37–43

  • Whitley E, Ball J (2002) Statistics review 6: nonparametric methods. Crit Care 6(6):509

    Article  Google Scholar 

  • Wu L, Zuo C, Zhang H (2015) A cloud model based fruit fly optimization algorithm. Knowl Based Syst 89:603–617

    Article  Google Scholar 

Download references

Acknowledgements

The research work is funded by institute fellowship from Ministry of HRD, Govt. of India, to Urvashi P. Shukla to pursue her PhD work at MNIT Jaipur.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urvashi Prakash Shukla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals or humans performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shukla, U.P., Nanda, S.J. Dynamic clustering with binary social spider algorithm for streaming dataset. Soft Comput 23, 10717–10737 (2019). https://doi.org/10.1007/s00500-018-3627-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3627-6

Keywords

  • Dynamic clustering
  • Page-Hinkley statistical test
  • Social spider optimization
  • Wilcoxon’s pair test