Evolving learners’ behavior in data mining

Pise, Nitin; Kulkarni, Parag

doi:10.1007/s12530-016-9167-3

Evolving learners’ behavior in data mining

Original Paper
Published: 21 September 2016

Volume 8, pages 243–259, (2017)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Nitin Pise¹ &
Parag Kulkarni¹

393 Accesses
4 Citations
Explore all metrics

Abstract

An evaluation and choice of learning algorithms is a current research area in data mining, artificial intelligence and pattern recognition, etc. Supervised learning is one of the tasks most frequently used in data mining. There are several learning algorithms available in machine learning field and new algorithms are being added in machine learning literature. There is a need for selecting the best suitable learning algorithm for a given data. With the information explosion of different learning algorithms and the changing data scenarios, there is a need of smart learning system. The paper shows one approach where past experiences learned are used to suggest the best suitable learner using 3 meta-features namely simple, statistical and information theoretic features. The system tests 38 UCI benchmark datasets from various domains using nine classifiers from various categories. It is observed that for 29 datasets, i.e., 76 % of datasets, both the predicted and actual accuracies directly match. The proposed approach is found to be correct for algorithm selection of these datasets. New proposed equation of finding classifier accuracy based on meta-features is determined and validated. The study compares various supervised learning algorithms by performing tenfold cross-validation paired t test. The work helps in a critical step in data mining for selecting the suitable data mining algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RETRACTED ARTICLE: Impact of the learning set’s size

Article 01 September 2020

Adil Korchi, Mohamed Dardor & El Houssine Mabrouk

Monitoring a Dynamic Weighted Majority Method Based on Datasets with Concept Drift

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

Article 20 April 2022

Alberto Cano & Bartosz Krawczyk

References

Alexandros K, Melanie H (2001) Model selection via meta learning: a comparitive study. Int J Artif Intell Tools 10(4):525–554
Article Google Scholar
Alpaydin E (2010) Introduction to machine learning. PHI learning, New Delhi
MATH Google Scholar
Bouckaert R (2003) Choosing between two learning algorithms on calibrated tests. In: Proceedings of 20th international conference on machine learning. Morgan Kaufmann, pp 51–58
Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. In: de Mantaras R, Plaza E (eds) Machine learning: proceedings of the 11th European conference on machine learning ECML2000. Springer, Berlin, pp 63–74
Brazdil P, Soares C, Da Costa J (2003) Ranking learning algorithms: using ibl and meta-learning on accuracy and time results. Mach Learn 50(3):251–277
Article MATH Google Scholar
Brazdil P, Giraud Carrier C, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, Berlin
MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Cai Q, He H, Man H (2014) Imbalanced evolving self-organizing learning. Neurocomputing 133:258–270
Article Google Scholar
Caruana R, Niculescu-Mizil A (2006) An Empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International conference on machine learning (ICML2006), pp 161–168
Chapelle O, Scholkopf B, Zien A (2006) Semi-Supervised Learning. MIT Press, Cambridge
Book Google Scholar
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Cleveland W, Devlin S (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 403:596–610
Article MATH Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Curran K, Yuan P, Coyle D (2011) Using acoustic sensors to discriminate between nasal and mouth breathing. Int J Bioinform Res Appl 7(4):382–396
Google Scholar
de Tiago PF, da Silva AJ, Ludermir TB, de Oliveira WR (2014) An automatic methodology for construction of multi-classifier systems based on the combination of selection and fusion. Prog Artif Intell 2:205–215
Article Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924
Article Google Scholar
Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273
Article MATH Google Scholar
EI-Hefnawy N (2014) Solving bi-level problems using modified particle swarm optimization algorithm. Int J Artif Intell 12(2):88–101
Google Scholar
Fan L, Lei M (2006) Reducing cognitive overload by meta-learning assisted algorithm selection. In: Proceedings of 5th IEEE international conference on cognitive informatics, pp 120–125
Frank A, Asuncion A (2010) UCI machine learning repository (online). http://archive.ics.uci.edu/ml. Accessed 4 Aug 2012
Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
Article MATH MathSciNet Google Scholar
Hall P, Racine J, LI QL (2004) Cross-validation and the estimation of conditional probability densities. J Am Stat Assoc 99(468):1015–1026
Article MATH MathSciNet Google Scholar
Han J, Kamber M (2011) Data mining concepts and techniques. Morgan Kaufman Publishers, San Francisco
MATH Google Scholar
Hormozi H, Hormozi E, Nohooji HR (2012) The classification of the applicable machine learning methods in robots manipulators. Int J Machine Learn and Comput 2(5):560–563
Article Google Scholar
Joachims T (1999) Making large-scale svm learning practical advances in kernel methods. In: Schölkopf B, Burges C, Smola A (eds) Support vector learning. MIT Press, Cambridge
Google Scholar
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Book MATH Google Scholar
Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26:159–190
Article Google Scholar
Kou G, Wu W (2014) An analytic hierarchy model for classification algorithms selection in credit risk analysis. Math probl Eng 2014:1–7. doi:10.1155/2014/297563
Article Google Scholar
Kulkarni P (2012) Reinforcement and systemic machine learning for decision making, IEEE press series on systems science and engineering. Wiley, New Jersey
Kwon O, Sim JM (2013) Effects of data set features on the performances of classification algorithms. Expert Syst Appl 40:1847–1857
Article Google Scholar
Leo B (2001) Random forests. Machine Learn 45(1):5–32
Article Google Scholar
Liu Q, Cao J (2010) A recurrent neural network based on projection operator for extended general variational inqualities. IEEE Trans Syst Man Cybern-Part B Cybern 40(3):928–938
Article Google Scholar
Liu Q, Dang C, Cao J (2010a) A novel recurrent neural network with one neuron and finite-time convergence for kwinners-take-all operation. IEEE Transactions on neural networks 21(7):1140–1148
Article Google Scholar
Liu Q, Cao J, Chen G (2010b) A novel recurrent neural network with finite-time convergence for linear programming Neural Comput. 22(11):2962–2978
Google Scholar
Mark H, Eibe F, Geoffrey H, Bernhard P, Peter R, Ian H (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Ellis Horwood Series in Artifcial Intelligence. Ellis Horwood, Chichester
Google Scholar
Mitchell T (1997) Machine learning. Burr Ridge, Mcgraw Hill
MATH Google Scholar
Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52:239–281
Article MATH Google Scholar
Nakamura M, Otsuka A, Kimura H (2014) Automatic selection of classification algorithms for non-experts using meta-features. China-USA Business Review. 13(3):199–205
Google Scholar
Oduguwa V, Tiwari A, Roy R (2005) Evolutionary computing in manufacturing industry: an overview of recent applications. Applied soft computing 5(3):281–299
Article Google Scholar
Peng W, Flach PA, Soares C, Brazdil P (2002) Improved data set characterisation for meta-learning. In: proceedings of the fifth international confernce on discovery science, LNAI 2534, pp 141–152
Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In: Proceedings of the 17th international conference on machine learning, 743–750
Pinto F, Soares C, Mendes-Moreira (2014) A framework to decompose and develop meta features. In: Proceedings of Meta-learning and algorithm selection workshop at 21st European conference on artificial intelligence, Prague, Czech Republic, 32–36
Pise N, Kulkarni P (2008) A survey of semi-supervised learning methods. In: Proceedings of international conference on computational intelligence and security, Suzhou, China, pp 30–34
Polikar R (2006) Ensemble based system in decision making. IEEE Circuit Syst Mag 6(3):21–45
Article Google Scholar
Preitl S, Precup R, Fodor J, Bede B (2006) Iterative feedback tuning in fuzzy control systems. Theory Appl Acta Polytech Hung 3(3):81–96
Google Scholar
Quinlan J (1993) C45 programs for machine learning. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Romero C, Olmo JL, Ventura S (2013) A meta-learning approach for recommending a subset of white-box classification algorithms for Moodle datasets. In: Proceedings of 6th international conference on educational data mining, Memphis, TN, USA, 268–271
Rosales-Pérez A, Gonzalez JA, Coello CAC, Escalante HJ, Reyes-Garcia CA (2014) Multi-objective model type selection. Neurocomputing 146:83–94. doi:10.1016/j.neucom.2014.05.077
Article Google Scholar
Saitta L, Neri F (1998) Learning in the ‘Real World’. Mach Learn 30(2–3):133–163
Article Google Scholar
Sewell M (2009) Machine Learning, http://machinelearningmartinsewell.com/machine-learning.pdf. Accessed 18 Sept 2014
Sleenman D, Rissakis M (1995) Consulatant-2: pre and post-processing of machine learning applications. Int J Hum Comput Stud 43(1):43–63
Article Google Scholar
Smith-Miles K (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 4(1):6–25
Google Scholar
Sun Y (2007) Cost-sensitive boosting for classification of imbalanced data. PhD thesis, department of electrical and computer engineering, University of Waterloo, Ontario, Canada
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tan P, Steinbach M, Kumar V (2013) Introduction to data mining, 2nd edn. Addison-Wesley, pp 792
Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142
Article MATH Google Scholar
Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. J Artif Intell Rev 18(2):77–95
Article Google Scholar
Witten IH, Frank E, Hall M (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann series in data management systems, Morgan Kaufmann Publishers, CA
Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67–82
Article Google Scholar
Yegnanarayana B (2005) Artificial neural networks. New Delhi, PHI
Google Scholar

Download references

Acknowledgments

The Authors wish to thank the Editors and the anonymous Reviewers for their detailed comments and suggestions which significantly contributed to the improvement of the manuscript. The authors acknowledge support and help by Suhas Gore, P.G. student during the work.

Author information

Authors and Affiliations

College of Engineering, Shivajinagar, Pune, 411 005, India
Nitin Pise & Parag Kulkarni

Authors

Nitin Pise
View author publications
You can also search for this author in PubMed Google Scholar
Parag Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitin Pise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pise, N., Kulkarni, P. Evolving learners’ behavior in data mining. Evolving Systems 8, 243–259 (2017). https://doi.org/10.1007/s12530-016-9167-3

Download citation

Received: 29 September 2014
Accepted: 01 September 2016
Published: 21 September 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s12530-016-9167-3

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolving learners’ behavior in data mining

Abstract

Access this article

Similar content being viewed by others

RETRACTED ARTICLE: Impact of the learning set’s size

Monitoring a Dynamic Weighted Majority Method Based on Datasets with Concept Drift

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keyword

Navigation

Evolving learners’ behavior in data mining

Abstract

Access this article

Similar content being viewed by others

RETRACTED ARTICLE: Impact of the learning set’s size

Monitoring a Dynamic Weighted Majority Method Based on Datasets with Concept Drift

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation