Soft Computing

, Volume 21, Issue 9, pp 2269–2281 | Cite as

Classification with boosting of extreme learning machine over arbitrarily partitioned data

Methodologies and Application


Machine learning-based computational intelligence methods are widely used to analyze large-scale data sets in this age of big data. Extracting useful predictive modeling from these types of data sets is a challenging problem due to their high complexity. Analyzing large amount of streaming data that can be leveraged to derive business value is another complex problem to solve. With high levels of data availability (i.e., Big Data), automatic classification of them has become an important and complex task. Hence, we explore the power of applying MapReduce-based distributed AdaBoosting of extreme learning machine (ELM) to build a predictive bag of classification models. Accordingly, (1) data set ensembles are created; (2) ELM algorithm is used to build weak learners (classifier functions); and (3) builds a strong learner from a set of weak learners. We applied this training model to the benchmark knowledge discovery and data mining data sets.


Extreme learning machine AdaBoost Ensemble methods MapReduce 


Compliance with ethical standards

Conflict of interest

The author declare that they have no conflicts of interest.


  1. Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. In: Proceedings of the fifth Turkish artificial intelligence and artificial neural networks symposium (TAINN 96)Google Scholar
  2. Baldi P, Sadowski P, Whiteson D (2014) Searching for exotic particles in high-energy physics with deep learning. Nature Commun 5Google Scholar
  3. Bartlett P (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44:525–536MathSciNetCrossRefMATHGoogle Scholar
  4. Bhatt R, Sharma G, Dhall A, Chaudhury S (2009) Efficient skin region segmentation using low complexity fuzzy decision tree model. In: 2009 Annual IEEE India Conference (INDICON), pp 1–4Google Scholar
  5. Bhimji W, Bristow T, Washbrook A (2014) Hepdoop: high-energy physics analysis using hadoop. J Phys Conf Ser 513:022004 (IOP Publishing)Google Scholar
  6. Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149:456–463. Advances in neural networks selected papers from the tenth international symposium on neural networks (ISNN 2013) Advances in extreme learning machines selected articles from the international symposium on extreme learning machines (ELM 2013)Google Scholar
  7. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonMATHGoogle Scholar
  8. Catak F, Balaban M (2013) Cloudsvm: training an svm classifier in cloud computing systems. In: Zu Q, Hu B, Eli A (eds) Pervasive computing and the networked world, vol 7719 of Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 57–68Google Scholar
  9. Chen J, Zheng G, Chen H (2013) Elm-mapreduce: Mapreduce accelerated extreme learning machine for big spatial data analysis. In: 2013 10th IEEE International Conference on Control and Automation (ICCA), pp 400–405Google Scholar
  10. Choi J, Choi C, Ko B, Kim P (2014) A method of ddos attack detection using http packet pattern and rule engine in cloud computing environment. Soft Comput 18(9):1697–1703CrossRefGoogle Scholar
  11. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRefGoogle Scholar
  12. Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612Google Scholar
  13. Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, New York, pp 23–37Google Scholar
  14. He Y, Tan H, Luo W, Mao H, Ma D, Feng S, Fan J (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th International conference on parallel and distributed systems (ICPADS), pp 473–480Google Scholar
  15. Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. Trans Neural Netw 13:415–425CrossRefGoogle Scholar
  16. Huang GB, Chen L (2006) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(1618):3460–3468. Advances in neural information processing (ICONIP 2006)/brazilian symposium on neural networks (SBRN 2006)Google Scholar
  17. Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(1618):3056–3062. Neural network applications in electrical engineering selected papers from the 3rd international work-conference on artificial neural networks (IWANN 2005)Google Scholar
  18. Huang GB, Li MB, Chen L, Siew CK (2008) Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71(46):576–583. Neural networks: algorithms and applications 4th international symposium on neural networks 50 years of artificial intelligence: a neuronal approach campus multidisciplinary in perception and intelligenceGoogle Scholar
  19. Huang GB, Zhu QY, Siew CK (2006a) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the international joint conference on neural networks, pp 985–990Google Scholar
  20. Huang GB, Zhu QY, Siew CK (2006b) Extreme learning machine: theory and applications. Neurocomputing 70(13):489–501. Neural networks selected papers from the 7th Brazilian symposium on neural networks (SBRN ’04)Google Scholar
  21. Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17:879–892CrossRefGoogle Scholar
  22. Khomtchouk B, Van Booven D, Wahlestedt C (2014) Heatmapgenerator: high performance rnaseq and microarray visualization software suite to examine differential gene expression levels using an r and c++ hybrid computational pipeline. Source Code Biol Med 9(1)Google Scholar
  23. Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Adv Neural Inf Process Syst 231–238 (MIT Press)Google Scholar
  24. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefMATHGoogle Scholar
  25. Lan Y, Hu Z, Soh YC, Huang G-B (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3–4):417–425CrossRefGoogle Scholar
  26. Landesa-Vzquez I, Alba-Castro JL (2013) Double-base asymmetric adaboost. Neurocomputing 118:101–114CrossRefGoogle Scholar
  27. Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17:1411–1423CrossRefGoogle Scholar
  28. LIBSVM (2015) Libsvm data: classification, regression, and multi-label.
  29. Lu Y, Roychowdhury V, Vandenberghe L (2008) Distributed parallel support vector machines in strongly connected networks. IEEE Trans Neural Netw 19:1167–1178CrossRefGoogle Scholar
  30. Makhoul J, Kubala F, Schwartz R, Weischedel R (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp 249–252Google Scholar
  31. Malerba D, Esposito F, Semeraro G (1996) A further comparison of simplification methods for decision-tree induction. In Fisher D, Lenz H (eds) Learning. Springer, New York, pp 365–374Google Scholar
  32. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
  33. Ogiela M, Castiglione A, You I (2014) Soft computing for security services in smart and ubiquitous environments. Soft Comput 18(9):1655–1658CrossRefGoogle Scholar
  34. Panda B, Herbach JS, Basu S, Bayardo RJ (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc VLDB Endow 2:1426–1437CrossRefGoogle Scholar
  35. Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics (Oxford, England) 25:1363–1369Google Scholar
  36. Schmidtmann I, Hammer G, Sariyar M, Gerhold-Ay A, des öffentlichen Rechts K (2009) Evaluation des krebsregisters nrw–schwerpunkt record linkage. Abschlußbericht vom 11Google Scholar
  37. Sun Z, Fox G (2012) Study on parallel svm based on mapreduce. In: International conference on parallel and distributed processing techniques and applications. Citeseer, pp 16–19Google Scholar
  38. Sun T, Shu C, Li F, Yu H, Ma L, Fang Y (2009) An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International conference on parallel and distributed computing, applications and technologies, pp 494–499Google Scholar
  39. Sun Y, Yuan Y, Wang G (2011) An os-elm based distributed ensemble classification framework in P2P networks. Neurocomputing 74(16):2438–2443. Advances in extreme learning machine: theory and applications biological inspired systems. Computational and ambient intelligence selected papers of the 10th international work-conference on artificial neural networks (IWANN2009)Google Scholar
  40. Tang J, Deng C, Huang G-B, Zhao B (2015) Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans Geosci Remote Sens 53:1174–1185CrossRefGoogle Scholar
  41. Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, (New York, NY, USA). ACM, pp 11–18Google Scholar
  42. UCI (2011) Record linkage comparison patterns data set.
  43. Wang B, Huang S, Qiu J, Liu Y, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149:224–232. Advances in neural networks selected papers from the 10th international symposium on neural networks (ISNN 2013) Advances in extreme learning machines selected articles from the international symposium on extreme Learning machines (ELM 2013)Google Scholar
  44. Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(13):262–268. Machine learning for signal processing (MLSP 2006)/life system modelling, simulation, and bio-inspired computing (LSMS 2007)Google Scholar
  45. Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm: distributed extreme learning machine with mapreduce. World Wide Web 17(5):1189–1204CrossRefGoogle Scholar
  46. Xu L, Kim H, Wang X, Shi W, Suh T (2014) Privacy preserving large scale dna read-mapping in mapreduce framework using fpgas. In: 2014 24th International conference on field programmable logic and applications (FPL). IEEE, pp 1–4Google Scholar
  47. Zhang C, Li F, Jestes J (2012) Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, EDBT ’12, (New York, NY, USA). ACM, pp 38–49Google Scholar
  48. Zhao X-G, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451CrossRefGoogle Scholar
  49. Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Jaatun M, Zhao G, Rong C (eds) Cloud computing, vol 5931 of Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 674–679Google Scholar
  50. Zong W, Huang GB (2011) Face recognition based on extreme learning machine. Neurocomputing 74(16):2541–2551. Advances in extreme learning machine: theory and applications biological inspired systems. Computational and ambient intelligence selected papers of the 10th international work-conference on artificial neural networks (IWANN2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.TÜBİTAK BİLGEM, Cyber Security InstituteKocaeliTurkey

Personalised recommendations