Abstract
Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naïve Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naïve Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Marsland S (2015) Machine learning: an algorithmic perspective. CRC
Breiman L (2001) Mach Learn 45(1):5–32
Kesavaraj G, Sukumaran S (2013) A study on classification techniques in data mining. In: Fourth international conference on computing, communications and networking technologies (ICCCNT). Tiruchengode, pp 1–7
Vijayarani S, Divya M (2011) An efficient algorithm for generating classification rules. Int J Comput Sci Technol 2(4)
Panicker SS, Gayathri P (2019) A survey of machine learning techniques in physiology based mental stress detection. Biocybernet Biomed Eng 39
Liu E, Effiok E, Hitchcock J (2020) Survey on health care applications in 5G networks. IET Commun 14(7)
Jin J, Sun W, Al-Turjman F, Bilal Khan M (2020) Activity pattern mining for healthcare. IEEE Access 8
Ismail WN, Hassan MM, Alsalamah HA, Fortino G (2020) CNN-based health model for regular health factors analysis in Internet-of-Medical things environment. IEEE Access 8
Shroff S, Pise S, Chalekar P, Panicker S (2015) Thyroid disease diagnosis: a survey. In: IEEE sponsored 9th international conference on intelligent systems and control (ISCO)
Seddigh N, Nandy B, Bennett D, Ren Y, Dolgikh S, Zeidler C, Knoe J (2020) A framework and system for classification of encrypted network traffic using machine learning. In: 15th international conference on network and service management (CNSM)
Ammar D, De Moor K, Skorin-Kapov L, Fiedler M, Heegaard PE (2020) Exploring the usefulness of machine learning in the context of WebRTC performance estimation. In: 2019 IEEE 44th conference on local computer networks (LCN)
Cruz A, Ampatzidis Y, De Bellis L, Pierro R, Panatton A, Materaz A (2020) Automatic diagnosis of Olive quick decline syndrome and Grapevine yellows for the agriculture industry. In: Second international conference on artificial intelligence for industries (AI4I)
Suwa K, Cap QH, Kotani R, Uga H, Kagiwada S, Iyatomi H (2020) A comparable study: intrinsic difficulties of practical plant diagnosis from wide-angle images. In: 20 IEEE international conference on big data (Big Data)
Shirahatti J, Patil R, Akulwar P (2018) A survey paper on plant disease identification using machine learning approach. In: 3rd International conference on communication and electronics systems (ICCES)
Kunte AV, Panicker S (2020) Analysis of machine learning algorithms for predicting personality: brief survey and experimentation. In: 2019 Global conference for advancement in technology (GCAT)
Kunte A, Panicker S (2020) Personality prediction of social network users using ensemble and XGBoost. In: Das H, Pattnaik P, Rautaray S, Li KC (eds) Progress in computing, analytics and networking. Advances in intelligent systems and computing, vol 1119. Springer, Singapore
Kunte AV, Panicker SS (2019) Using textual data for personality prediction: a machine learning approach. In: 2019 4th International conference on information systems and computer networks (ISCON)
Dangra BS, Rajput D, Bedekar MV, Panicker SS (2015) Profiling of automobile drivers using car games. In: International conference on pervasive computing (ICPC). IEEE
Bedekar M, Atote B, Zahoor S, Panicker S (2016) Proposed used of information dispersal algorithm in user profiling ACM, ICT4SD. In: International conference on ICT for sustainable development. Goa, India
Mane VL, Panicker SS (2015) Knowledge discovery from user health posts. In: 2015 IEEE 9th international conference on intelligent systems and control (ISCO)
Mane V, Panicker SS (2015) Summarization and sentiment analysis from user health posts. In: 2015 International conference on pervasive computing (ICPC). IEEE
Cioffi R, Travaglioni M, Piscitelli G, Petrillo A, De Felice F (2020) Artificial intelligence and machine learning applications in smart production: progress, trends, and directions. Sustainability 12:492
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–326
Russell S, Norvig P Artificial intelligence: a modern approach, 3rd edn
Raj JS, Ananthi JV (2019) Recurrent neural networks and nonlinear prediction in support vector machines. J Soft Comput Paradigm (JSCP) 1(01):33–40
Perera P, Tian YC, Fidge C, Kelly W (2017) A comparison of supervised machine learning algorithms for classification of communications network traffic. Lect Notes Comput Sci 445–454
Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc. Mach Learn 16 pp 235–240
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC
Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M, Tornatore M (2018) An overview on application of machine learning techniques in optical networks. IEEE Commun Surv Tutorials 1–1
Murat N (2007) The use of Bayesian approaches to model selection. M.Sc. Thesis. Ondokuz May’s University, Samsun, Turkey
Bulbul HI, Unsal Ö (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 10th International conference on machine learning and applications and workshops. Honolulu, HI, pp 298–301
Liu L (2018) Research on logistic regression algorithm of breast cancer diagnose data by machine learning. In: 2018 International conference on robots and intelligent system (ICRIS)
Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). Faridabad, India, pp 35–39
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553
B German Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products Corporation (213) 776–0180 (ext 3014).
Aich S, Al-Absi AA, Hui KL, Lee JT, Sain M (2018) A classification approach with different feature sets to predict the quality of different types of wine using machine learning techniques. In: 2018 20th International conference on advanced communication technology (ICACT)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Khire, S., Ganorkar, P., Apastamb, A., Panicker, S. (2021). Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept. In: Smys, S., Palanisamy, R., Rocha, Á., Beligiannis, G.N. (eds) Computer Networks and Inventive Communication Technologies. Lecture Notes on Data Engineering and Communications Technologies, vol 58. Springer, Singapore. https://doi.org/10.1007/978-981-15-9647-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-15-9647-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9646-9
Online ISBN: 978-981-15-9647-6
eBook Packages: EngineeringEngineering (R0)