# Classification with boosting of extreme learning machine over arbitrarily partitioned data

## Abstract

Machine learning-based computational intelligence methods are widely used to analyze large-scale data sets in this age of big data. Extracting useful predictive modeling from these types of data sets is a challenging problem due to their high complexity. Analyzing large amount of streaming data that can be leveraged to derive business value is another complex problem to solve. With high levels of data availability (i.e., Big Data), automatic classification of them has become an important and complex task. Hence, we explore the power of applying MapReduce-based distributed AdaBoosting of extreme learning machine (ELM) to build a predictive bag of classification models. Accordingly, (1) data set ensembles are created; (2) ELM algorithm is used to build weak learners (classifier functions); and (3) builds a strong learner from a set of weak learners. We applied this training model to the benchmark knowledge discovery and data mining data sets.

## Keywords

Extreme learning machine AdaBoost Ensemble methods MapReduce## Notes

## Compliance with ethical standards

## Conflict of interest

The author declare that they have no conflicts of interest.

## References

- Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. In: Proceedings of the fifth Turkish artificial intelligence and artificial neural networks symposium (TAINN 96)Google Scholar
- Baldi P, Sadowski P, Whiteson D (2014) Searching for exotic particles in high-energy physics with deep learning. Nature Commun 5Google Scholar
- Bartlett P (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44:525–536MathSciNetCrossRefMATHGoogle Scholar
- Bhatt R, Sharma G, Dhall A, Chaudhury S (2009) Efficient skin region segmentation using low complexity fuzzy decision tree model. In: 2009 Annual IEEE India Conference (INDICON), pp 1–4Google Scholar
- Bhimji W, Bristow T, Washbrook A (2014) Hepdoop: high-energy physics analysis using hadoop. J Phys Conf Ser 513:022004 (IOP Publishing)Google Scholar
- Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149:456–463. Advances in neural networks selected papers from the tenth international symposium on neural networks (ISNN 2013) Advances in extreme learning machines selected articles from the international symposium on extreme learning machines (ELM 2013)Google Scholar
- Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonMATHGoogle Scholar
- Catak F, Balaban M (2013) Cloudsvm: training an svm classifier in cloud computing systems. In: Zu Q, Hu B, Eli A (eds) Pervasive computing and the networked world, vol 7719 of Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 57–68Google Scholar
- Chen J, Zheng G, Chen H (2013) Elm-mapreduce: Mapreduce accelerated extreme learning machine for big spatial data analysis. In: 2013 10th IEEE International Conference on Control and Automation (ICCA), pp 400–405Google Scholar
- Choi J, Choi C, Ko B, Kim P (2014) A method of ddos attack detection using http packet pattern and rule engine in cloud computing environment. Soft Comput 18(9):1697–1703CrossRefGoogle Scholar
- Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRefGoogle Scholar
- Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612Google Scholar
- Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, New York, pp 23–37Google Scholar
- He Y, Tan H, Luo W, Mao H, Ma D, Feng S, Fan J (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th International conference on parallel and distributed systems (ICPADS), pp 473–480Google Scholar
- Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. Trans Neural Netw 13:415–425CrossRefGoogle Scholar
- Huang GB, Chen L (2006) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(1618):3460–3468. Advances in neural information processing (ICONIP 2006)/brazilian symposium on neural networks (SBRN 2006)Google Scholar
- Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(1618):3056–3062. Neural network applications in electrical engineering selected papers from the 3rd international work-conference on artificial neural networks (IWANN 2005)Google Scholar
- Huang GB, Li MB, Chen L, Siew CK (2008) Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71(46):576–583. Neural networks: algorithms and applications 4th international symposium on neural networks 50 years of artificial intelligence: a neuronal approach campus multidisciplinary in perception and intelligenceGoogle Scholar
- Huang GB, Zhu QY, Siew CK (2006a) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the international joint conference on neural networks, pp 985–990Google Scholar
- Huang GB, Zhu QY, Siew CK (2006b) Extreme learning machine: theory and applications. Neurocomputing 70(13):489–501. Neural networks selected papers from the 7th Brazilian symposium on neural networks (SBRN ’04)Google Scholar
- Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17:879–892CrossRefGoogle Scholar
- Khomtchouk B, Van Booven D, Wahlestedt C (2014) Heatmapgenerator: high performance rnaseq and microarray visualization software suite to examine differential gene expression levels using an r and c++ hybrid computational pipeline. Source Code Biol Med 9(1)Google Scholar
- Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Adv Neural Inf Process Syst 231–238 (MIT Press)Google Scholar
- Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefMATHGoogle Scholar
- Lan Y, Hu Z, Soh YC, Huang G-B (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3–4):417–425CrossRefGoogle Scholar
- Landesa-Vzquez I, Alba-Castro JL (2013) Double-base asymmetric adaboost. Neurocomputing 118:101–114CrossRefGoogle Scholar
- Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17:1411–1423CrossRefGoogle Scholar
- LIBSVM (2015) Libsvm data: classification, regression, and multi-label. http://ntucsu.csie.ntu.edu.tw/
- Lu Y, Roychowdhury V, Vandenberghe L (2008) Distributed parallel support vector machines in strongly connected networks. IEEE Trans Neural Netw 19:1167–1178CrossRefGoogle Scholar
- Makhoul J, Kubala F, Schwartz R, Weischedel R (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp 249–252Google Scholar
- Malerba D, Esposito F, Semeraro G (1996) A further comparison of simplification methods for decision-tree induction. In Fisher D, Lenz H (eds) Learning. Springer, New York, pp 365–374Google Scholar
- Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
- Ogiela M, Castiglione A, You I (2014) Soft computing for security services in smart and ubiquitous environments. Soft Comput 18(9):1655–1658CrossRefGoogle Scholar
- Panda B, Herbach JS, Basu S, Bayardo RJ (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc VLDB Endow 2:1426–1437CrossRefGoogle Scholar
- Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics (Oxford, England) 25:1363–1369Google Scholar
- Schmidtmann I, Hammer G, Sariyar M, Gerhold-Ay A, des öffentlichen Rechts K (2009) Evaluation des krebsregisters nrw–schwerpunkt record linkage. Abschlußbericht vom 11Google Scholar
- Sun Z, Fox G (2012) Study on parallel svm based on mapreduce. In: International conference on parallel and distributed processing techniques and applications. Citeseer, pp 16–19Google Scholar
- Sun T, Shu C, Li F, Yu H, Ma L, Fang Y (2009) An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International conference on parallel and distributed computing, applications and technologies, pp 494–499Google Scholar
- Sun Y, Yuan Y, Wang G (2011) An os-elm based distributed ensemble classification framework in P2P networks. Neurocomputing 74(16):2438–2443. Advances in extreme learning machine: theory and applications biological inspired systems. Computational and ambient intelligence selected papers of the 10th international work-conference on artificial neural networks (IWANN2009)Google Scholar
- Tang J, Deng C, Huang G-B, Zhao B (2015) Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans Geosci Remote Sens 53:1174–1185CrossRefGoogle Scholar
- Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, (New York, NY, USA). ACM, pp 11–18Google Scholar
- UCI (2011) Record linkage comparison patterns data set. https://archive.ics.uci.edu/ml/datasets/Record+Linkage+Comparison+Patterns
- UCI (2014) Higgs data set. https://archive.ics.uci.edu/ml/datasets/HIGGS
- UCI (2014) Susy data set. https://archive.ics.uci.edu/ml/datasets/SUSY
- Wang B, Huang S, Qiu J, Liu Y, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149:224–232. Advances in neural networks selected papers from the 10th international symposium on neural networks (ISNN 2013) Advances in extreme learning machines selected articles from the international symposium on extreme Learning machines (ELM 2013)Google Scholar
- Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(13):262–268. Machine learning for signal processing (MLSP 2006)/life system modelling, simulation, and bio-inspired computing (LSMS 2007)Google Scholar
- Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm: distributed extreme learning machine with mapreduce. World Wide Web 17(5):1189–1204CrossRefGoogle Scholar
- Xu L, Kim H, Wang X, Shi W, Suh T (2014) Privacy preserving large scale dna read-mapping in mapreduce framework using fpgas. In: 2014 24th International conference on field programmable logic and applications (FPL). IEEE, pp 1–4Google Scholar
- Zhang C, Li F, Jestes J (2012) Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, EDBT ’12, (New York, NY, USA). ACM, pp 38–49Google Scholar
- Zhao X-G, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451CrossRefGoogle Scholar
- Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Jaatun M, Zhao G, Rong C (eds) Cloud computing, vol 5931 of Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 674–679Google Scholar
- Zong W, Huang GB (2011) Face recognition based on extreme learning machine. Neurocomputing 74(16):2541–2551. Advances in extreme learning machine: theory and applications biological inspired systems. Computational and ambient intelligence selected papers of the 10th international work-conference on artificial neural networks (IWANN2009)Google Scholar