Skip to main content

A Comparative Study of Famous Classification Techniques and Data Mining Tools

  • Conference paper
  • First Online:
Proceedings of ICRIC 2019

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

Abstract

Data mining is the procedure or technique of drawing out the facts and patterns hidden in huge sum of data and converts it into a readable and understandable form. Data mining has four main modules like classification, association rule analysis, and clustering and sequence analysis. The classification is the major module and is used in many different areas for classification problems. Classification process gives a summary of data investigation which may be utilized to develop models or structures, telling different classes or predict future data trends for improved understanding of the data at maximum. In this survey, various data mining classification techniques and some important data mining tools along with their advantages and disadvantages are presented. Data classification techniques are classified into three categories namely, Eager learners, Lazy learners, and other Classification techniques. Decision tree, Bayesian classification, Rule based classification, Support Vector Machines (SVM), Association rule mining and backpropagation (Neural Networks) are eager learners. The K-Nearest Neighbor (KNN) classification and Case Based Reasoning (CRT) are lazy learners. Other classification techniques include genetic algorithms, fuzzy logic and Rough Set Approach. Here six important data mining tools, basic Eager learner, Lazy learner and other classification techniques for data classification are discussed. The aim of this article is to provide a survey of six famous data mining tools and famous different data mining classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, M.: Data Mining: concepts and Techniques, 2nd edn, Morgan Kaufmann Publishers (2006)

    Google Scholar 

  2. Weiss, S.M., Kulikowski, C.A.: Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, Burlington (1991)

    Google Scholar 

  3. Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data. Min. Knowl. Discov. 2, 345–389 (1998)

    Article  Google Scholar 

  4. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  5. Breiman, L., Friedman, J., Olshen, R., Stone. C.: Classification and Regression Trees. Wadsworth International Group (1984)

    Google Scholar 

  6. Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient. Classification in data mining. In: Proceedings of 1997 International Workshop Research Issues on Data Engineering (RIDE’97), pp. 111–120. Birmingham, England (1997)

    Google Scholar 

  7. Kalpana, R., Bansal, K.L.: Comparative study of data mining tools. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6), 216–223 (2014)

    Google Scholar 

  8. Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Base (VLDB’96), pp. 544–555. Bombay, India (1996)

    Google Scholar 

  9. Gehrke, J., Ramakrishnan, R., Ganti. V.: Rainforest: a framework for fast decision tree construction of large datasets. In: Proceedings of 1998 International Conference Very Large Data Bases (VLDB’98), pp. 416–427. New York, NY (1998)

    Google Scholar 

  10. Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.-Y.: BOAT—optimistic decision tree construction. In: Proceedings of 1999 ACM-SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 169–180. Philadelphia, PA (1999)

    Google Scholar 

  11. Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Proceedings of 5th International Joint Conference on Artificial Intelligence, pp. 305–310. Cambridge, MA (1977)

    Google Scholar 

  12. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd. edn. Wiley (2001)

    Google Scholar 

  13. Heckerman, D.: Bayesian networks for knowledge discovery. In: Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 273–305. MIT Press (1996)

    Google Scholar 

  14. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kauffman (1988)

    Google Scholar 

  15. Rumelhart, D.E., Hinton, G.E., Williams, R,J,: Learning internal representations by error propagation. In: Rumelhart D.E., McClelland J.L. (eds.) Parallel Distributed Processing. MIT Press (1986)

    Google Scholar 

  16. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–498 (1958)

    Article  Google Scholar 

  17. Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice Hall (1995)

    Google Scholar 

  18. Minsky, M.L., Papert, S.: Perceptrons: an Introduction to Computational Geometry. MIT Press (1969)

    Google Scholar 

  19. Mezard, M., Nadal, J.P.: Learning in feedforward layered networks: the tiling algorithm. J. Phys. 22(12), 2191 (1989)

    MathSciNet  Google Scholar 

  20. Boser, B., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press: San Mateo, CA (1992)

    Google Scholar 

  21. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–168 (1998)

    Article  Google Scholar 

  22. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probability Appl. 16, 264–280 (1971)

    Google Scholar 

  23. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag (1995)

    Google Scholar 

  24. Vapnik, V.N.: Statistical Learning Theory. Wiley (1998)

    Google Scholar 

  25. Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learning 3, 261–283 (1989)

    Google Scholar 

  26. Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowledge Data Eng. 8, 866–883 (1996)

    Article  Google Scholar 

  27. Li L., Dong, G., Ramamohanrarao. K.: Making use of the most expressive jumping emerging patterns for classification. In: Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), pp. 220232. Kyoto, Japan (2000)

    Google Scholar 

  28. Quinlan, J.R.: Learning logic definitions from relations. Mach. Learn. 5, 139–166 (1990)

    Google Scholar 

  29. Major, J., Mangano, J.: Selecting among rules induced from a hurricane data base. J. Intell. Info. Syst. 39–52 (1995)

    Google Scholar 

  30. Liu, B., Hsu, W., Ma. Y., Integrating classification and association rule mining. In: Proceedings of 1998 International Conference on Knowledge Discovery and Data Mining (KDD’98), pp. 80–86. New York, NY (1998)

    Google Scholar 

  31. Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple classification rules. In: Proceedings of 2001 International Conference on Data Mining (ICDM’01), pp. 369–376. San Jose, CA (2001)

    Google Scholar 

  32. Ziarko, W.: The discovery, analysis, and representation of data dependencies in databases. In Piatetsky-Shapiro G., Frawley W.J. (eds.) Knowledge Discovery in Databases, pp. 195–209. AAAI Press (1991)

    Google Scholar 

  33. Cios, K., Pedrycz, W., Swiniarski, R.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers (1998)

    Google Scholar 

  34. Fix, E., Hodges, J.R.: Discriminatory analysis non-parametric discrimination: consistency properties. In: Technical Report 21–49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas (1951)

    Google Scholar 

  35. Riesbeck, C., Schank, R.: Inside Case-Based Reasoning. Lawrence Erlbaum (1989)

    Google Scholar 

  36. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of 1993 ACM-SIGMOD International Conference on Management of Data (SIGMOD’93), pp. 207–216. Washington, DC (1993)

    Google Scholar 

  37. Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of 2003 SIAM International Conference on Data Mining (SDM’03), pp. 331–335, San Francisco, CA (2003)

    Google Scholar 

  38. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1996)

    Google Scholar 

  39. Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley (1989)

    Google Scholar 

  40. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Proceedings of 1996 International Conference on Extending Database Technology (EDBT’96), pp. 18–32. Avignon, France (1996)

    Google Scholar 

  41. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of 1999 International Conference on Knowledge Discovery and Data Mining (KDD’99), pp. 43–52. San Diego, CA, (1999)

    Google Scholar 

  42. Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Netherlands (1991)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yash Paul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paul, Y., Kumar, N. (2020). A Comparative Study of Famous Classification Techniques and Data Mining Tools. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29407-6_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29406-9

  • Online ISBN: 978-3-030-29407-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics