Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept

Khire, Sarvesh; Ganorkar, Pushkar; Apastamb, Aseem; Panicker, Suja

doi:10.1007/978-981-15-9647-6_17

Sarvesh Khire⁶,
Pushkar Ganorkar⁶,
Aseem Apastamb⁶ &
…
Suja Panicker⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 58))

1724 Accesses
1 Citations

Abstract

Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naïve Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naïve Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Marsland S (2015) Machine learning: an algorithmic perspective. CRC
Google Scholar
Breiman L (2001) Mach Learn 45(1):5–32
Google Scholar
Kesavaraj G, Sukumaran S (2013) A study on classification techniques in data mining. In: Fourth international conference on computing, communications and networking technologies (ICCCNT). Tiruchengode, pp 1–7
Google Scholar
Vijayarani S, Divya M (2011) An efficient algorithm for generating classification rules. Int J Comput Sci Technol 2(4)
Google Scholar
Panicker SS, Gayathri P (2019) A survey of machine learning techniques in physiology based mental stress detection. Biocybernet Biomed Eng 39
Google Scholar
Liu E, Effiok E, Hitchcock J (2020) Survey on health care applications in 5G networks. IET Commun 14(7)
Google Scholar
Jin J, Sun W, Al-Turjman F, Bilal Khan M (2020) Activity pattern mining for healthcare. IEEE Access 8
Google Scholar
Ismail WN, Hassan MM, Alsalamah HA, Fortino G (2020) CNN-based health model for regular health factors analysis in Internet-of-Medical things environment. IEEE Access 8
Google Scholar
Shroff S, Pise S, Chalekar P, Panicker S (2015) Thyroid disease diagnosis: a survey. In: IEEE sponsored 9th international conference on intelligent systems and control (ISCO)
Google Scholar
Seddigh N, Nandy B, Bennett D, Ren Y, Dolgikh S, Zeidler C, Knoe J (2020) A framework and system for classification of encrypted network traffic using machine learning. In: 15th international conference on network and service management (CNSM)
Google Scholar
Ammar D, De Moor K, Skorin-Kapov L, Fiedler M, Heegaard PE (2020) Exploring the usefulness of machine learning in the context of WebRTC performance estimation. In: 2019 IEEE 44th conference on local computer networks (LCN)
Google Scholar
Cruz A, Ampatzidis Y, De Bellis L, Pierro R, Panatton A, Materaz A (2020) Automatic diagnosis of Olive quick decline syndrome and Grapevine yellows for the agriculture industry. In: Second international conference on artificial intelligence for industries (AI4I)
Google Scholar
Suwa K, Cap QH, Kotani R, Uga H, Kagiwada S, Iyatomi H (2020) A comparable study: intrinsic difficulties of practical plant diagnosis from wide-angle images. In: 20 IEEE international conference on big data (Big Data)
Google Scholar
Shirahatti J, Patil R, Akulwar P (2018) A survey paper on plant disease identification using machine learning approach. In: 3rd International conference on communication and electronics systems (ICCES)
Google Scholar
Kunte AV, Panicker S (2020) Analysis of machine learning algorithms for predicting personality: brief survey and experimentation. In: 2019 Global conference for advancement in technology (GCAT)
Google Scholar
Kunte A, Panicker S (2020) Personality prediction of social network users using ensemble and XGBoost. In: Das H, Pattnaik P, Rautaray S, Li KC (eds) Progress in computing, analytics and networking. Advances in intelligent systems and computing, vol 1119. Springer, Singapore
Google Scholar
Kunte AV, Panicker SS (2019) Using textual data for personality prediction: a machine learning approach. In: 2019 4th International conference on information systems and computer networks (ISCON)
Google Scholar
Dangra BS, Rajput D, Bedekar MV, Panicker SS (2015) Profiling of automobile drivers using car games. In: International conference on pervasive computing (ICPC). IEEE
Google Scholar
Bedekar M, Atote B, Zahoor S, Panicker S (2016) Proposed used of information dispersal algorithm in user profiling ACM, ICT4SD. In: International conference on ICT for sustainable development. Goa, India
Google Scholar
Mane VL, Panicker SS (2015) Knowledge discovery from user health posts. In: 2015 IEEE 9th international conference on intelligent systems and control (ISCO)
Google Scholar
Mane V, Panicker SS (2015) Summarization and sentiment analysis from user health posts. In: 2015 International conference on pervasive computing (ICPC). IEEE
Google Scholar
Cioffi R, Travaglioni M, Piscitelli G, Petrillo A, De Felice F (2020) Artificial intelligence and machine learning applications in smart production: progress, trends, and directions. Sustainability 12:492
Article Google Scholar
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–326
MathSciNet MATH Google Scholar
Russell S, Norvig P Artificial intelligence: a modern approach, 3rd edn
Google Scholar
Raj JS, Ananthi JV (2019) Recurrent neural networks and nonlinear prediction in support vector machines. J Soft Comput Paradigm (JSCP) 1(01):33–40
Article Google Scholar
Perera P, Tian YC, Fidge C, Kelly W (2017) A comparison of supervised machine learning algorithms for classification of communications network traffic. Lect Notes Comput Sci 445–454
Google Scholar
Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc. Mach Learn 16 pp 235–240
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC
Google Scholar
Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M, Tornatore M (2018) An overview on application of machine learning techniques in optical networks. IEEE Commun Surv Tutorials 1–1
Google Scholar
Murat N (2007) The use of Bayesian approaches to model selection. M.Sc. Thesis. Ondokuz May’s University, Samsun, Turkey
Google Scholar
Bulbul HI, Unsal Ö (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 10th International conference on machine learning and applications and workshops. Honolulu, HI, pp 298–301
Google Scholar
Liu L (2018) Research on logistic regression algorithm of breast cancer diagnose data by machine learning. In: 2018 International conference on robots and intelligent system (ICRIS)
Google Scholar
Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). Faridabad, India, pp 35–39
Google Scholar
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553
Article Google Scholar
B German Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products Corporation (213) 776–0180 (ext 3014).
Google Scholar
Aich S, Al-Absi AA, Hui KL, Lee JT, Sain M (2018) A classification approach with different feature sets to predict the quality of different types of wine using machine learning techniques. In: 2018 20th International conference on advanced communication technology (ICACT)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, MAEER’s Maharashtra Institute of Technology, Pune, Maharashtra, India
Sarvesh Khire, Pushkar Ganorkar & Aseem Apastamb
Assistant Professor, School of Computer Engineering and Technology, MIT World Peace University, Pune, Maharashtra, India
Suja Panicker

Authors

Sarvesh Khire
View author publications
You can also search for this author in PubMed Google Scholar
Pushkar Ganorkar
View author publications
You can also search for this author in PubMed Google Scholar
Aseem Apastamb
View author publications
You can also search for this author in PubMed Google Scholar
Suja Panicker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suja Panicker .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, RVS Technical Campus, Coimbatore, India
S. Smys
Gerald Schwartz School of Business, St. Francis Xavier University, Antigonish, India
Ram Palanisamy
University of Lisbon, Lisbon, Portugal
Álvaro Rocha
Department of Business Administration of Food and Agricultural Enterprises, Agrinio Campus, University of Patras, Patras, Greece
Grigorios N. Beligiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khire, S., Ganorkar, P., Apastamb, A., Panicker, S. (2021). Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept. In: Smys, S., Palanisamy, R., Rocha, Á., Beligiannis, G.N. (eds) Computer Networks and Inventive Communication Technologies. Lecture Notes on Data Engineering and Communications Technologies, vol 58. Springer, Singapore. https://doi.org/10.1007/978-981-15-9647-6_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-9647-6_17
Published: 03 June 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9646-9
Online ISBN: 978-981-15-9647-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics