Skip to main content

Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept

  • Conference paper
  • First Online:
Computer Networks and Inventive Communication Technologies

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 58))

Abstract

Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naïve Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naïve Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Marsland S (2015) Machine learning: an algorithmic perspective. CRC

    Google Scholar 

  2. Breiman L (2001) Mach Learn 45(1):5–32

    Google Scholar 

  3. Kesavaraj G, Sukumaran S (2013) A study on classification techniques in data mining. In: Fourth international conference on computing, communications and networking technologies (ICCCNT). Tiruchengode, pp 1–7

    Google Scholar 

  4. Vijayarani S, Divya M (2011) An efficient algorithm for generating classification rules. Int J Comput Sci Technol 2(4)

    Google Scholar 

  5. Panicker SS, Gayathri P (2019) A survey of machine learning techniques in physiology based mental stress detection. Biocybernet Biomed Eng 39

    Google Scholar 

  6. Liu E, Effiok E, Hitchcock J (2020) Survey on health care applications in 5G networks. IET Commun 14(7)

    Google Scholar 

  7. Jin J, Sun W, Al-Turjman F, Bilal Khan M (2020) Activity pattern mining for healthcare. IEEE Access 8

    Google Scholar 

  8. Ismail WN, Hassan MM, Alsalamah HA, Fortino G (2020) CNN-based health model for regular health factors analysis in Internet-of-Medical things environment. IEEE Access 8

    Google Scholar 

  9. Shroff S, Pise S, Chalekar P, Panicker S (2015) Thyroid disease diagnosis: a survey. In: IEEE sponsored 9th international conference on intelligent systems and control (ISCO)

    Google Scholar 

  10. Seddigh N, Nandy B, Bennett D, Ren Y, Dolgikh S, Zeidler C, Knoe J (2020) A framework and system for classification of encrypted network traffic using machine learning. In: 15th international conference on network and service management (CNSM)

    Google Scholar 

  11. Ammar D, De Moor K, Skorin-Kapov L, Fiedler M, Heegaard PE (2020) Exploring the usefulness of machine learning in the context of WebRTC performance estimation. In: 2019 IEEE 44th conference on local computer networks (LCN)

    Google Scholar 

  12. Cruz A, Ampatzidis Y, De Bellis L, Pierro R, Panatton A, Materaz A (2020) Automatic diagnosis of Olive quick decline syndrome and Grapevine yellows for the agriculture industry. In: Second international conference on artificial intelligence for industries (AI4I)

    Google Scholar 

  13. Suwa K, Cap QH, Kotani R, Uga H, Kagiwada S, Iyatomi H (2020) A comparable study: intrinsic difficulties of practical plant diagnosis from wide-angle images. In: 20 IEEE international conference on big data (Big Data)

    Google Scholar 

  14. Shirahatti J, Patil R, Akulwar P (2018) A survey paper on plant disease identification using machine learning approach. In: 3rd International conference on communication and electronics systems (ICCES)

    Google Scholar 

  15. Kunte AV, Panicker S (2020) Analysis of machine learning algorithms for predicting personality: brief survey and experimentation. In: 2019 Global conference for advancement in technology (GCAT)

    Google Scholar 

  16. Kunte A, Panicker S (2020) Personality prediction of social network users using ensemble and XGBoost. In: Das H, Pattnaik P, Rautaray S, Li KC (eds) Progress in computing, analytics and networking. Advances in intelligent systems and computing, vol 1119. Springer, Singapore

    Google Scholar 

  17. Kunte AV, Panicker SS (2019) Using textual data for personality prediction: a machine learning approach. In: 2019 4th International conference on information systems and computer networks (ISCON)

    Google Scholar 

  18. Dangra BS, Rajput D, Bedekar MV, Panicker SS (2015) Profiling of automobile drivers using car games. In: International conference on pervasive computing (ICPC). IEEE

    Google Scholar 

  19. Bedekar M, Atote B, Zahoor S, Panicker S (2016) Proposed used of information dispersal algorithm in user profiling ACM, ICT4SD. In: International conference on ICT for sustainable development. Goa, India

    Google Scholar 

  20. Mane VL, Panicker SS (2015) Knowledge discovery from user health posts. In: 2015 IEEE 9th international conference on intelligent systems and control (ISCO)

    Google Scholar 

  21. Mane V, Panicker SS (2015) Summarization and sentiment analysis from user health posts. In: 2015 International conference on pervasive computing (ICPC). IEEE

    Google Scholar 

  22. Cioffi R, Travaglioni M, Piscitelli G, Petrillo A, De Felice F (2020) Artificial intelligence and machine learning applications in smart production: progress, trends, and directions. Sustainability 12:492

    Article  Google Scholar 

  23. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–326

    MathSciNet  MATH  Google Scholar 

  24. Russell S, Norvig P Artificial intelligence: a modern approach, 3rd edn

    Google Scholar 

  25. Raj JS, Ananthi JV (2019) Recurrent neural networks and nonlinear prediction in support vector machines. J Soft Comput Paradigm (JSCP) 1(01):33–40

    Article  Google Scholar 

  26. Perera P, Tian YC, Fidge C, Kelly W (2017) A comparison of supervised machine learning algorithms for classification of communications network traffic. Lect Notes Comput Sci 445–454

    Google Scholar 

  27. Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc. Mach Learn 16 pp 235–240

    Google Scholar 

  28. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  29. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC

    Google Scholar 

  30. Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M, Tornatore M (2018) An overview on application of machine learning techniques in optical networks. IEEE Commun Surv Tutorials 1–1

    Google Scholar 

  31. Murat N (2007) The use of Bayesian approaches to model selection. M.Sc. Thesis. Ondokuz May’s University, Samsun, Turkey

    Google Scholar 

  32. Bulbul HI, Unsal Ö (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 10th International conference on machine learning and applications and workshops. Honolulu, HI, pp 298–301

    Google Scholar 

  33. Liu L (2018) Research on logistic regression algorithm of breast cancer diagnose data by machine learning. In: 2018 International conference on robots and intelligent system (ICRIS)

    Google Scholar 

  34. Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). Faridabad, India, pp 35–39

    Google Scholar 

  35. Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553

    Article  Google Scholar 

  36. B German Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products Corporation (213) 776–0180 (ext 3014).

    Google Scholar 

  37. Aich S, Al-Absi AA, Hui KL, Lee JT, Sain M (2018) A classification approach with different feature sets to predict the quality of different types of wine using machine learning techniques. In: 2018 20th International conference on advanced communication technology (ICACT)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suja Panicker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khire, S., Ganorkar, P., Apastamb, A., Panicker, S. (2021). Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept. In: Smys, S., Palanisamy, R., Rocha, Á., Beligiannis, G.N. (eds) Computer Networks and Inventive Communication Technologies. Lecture Notes on Data Engineering and Communications Technologies, vol 58. Springer, Singapore. https://doi.org/10.1007/978-981-15-9647-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-9647-6_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-9646-9

  • Online ISBN: 978-981-15-9647-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics