A Comparative Study of Famous Classification Techniques and Data Mining Tools

Paul, Yash; Kumar, Neerendra

doi:10.1007/978-3-030-29407-6_45

Yash Paul³⁹ &
Neerendra Kumar^40,41

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

2316 Accesses
2 Citations

Abstract

Data mining is the procedure or technique of drawing out the facts and patterns hidden in huge sum of data and converts it into a readable and understandable form. Data mining has four main modules like classification, association rule analysis, and clustering and sequence analysis. The classification is the major module and is used in many different areas for classification problems. Classification process gives a summary of data investigation which may be utilized to develop models or structures, telling different classes or predict future data trends for improved understanding of the data at maximum. In this survey, various data mining classification techniques and some important data mining tools along with their advantages and disadvantages are presented. Data classification techniques are classified into three categories namely, Eager learners, Lazy learners, and other Classification techniques. Decision tree, Bayesian classification, Rule based classification, Support Vector Machines (SVM), Association rule mining and backpropagation (Neural Networks) are eager learners. The K-Nearest Neighbor (KNN) classification and Case Based Reasoning (CRT) are lazy learners. Other classification techniques include genetic algorithms, fuzzy logic and Rough Set Approach. Here six important data mining tools, basic Eager learner, Lazy learner and other classification techniques for data classification are discussed. The aim of this article is to provide a survey of six famous data mining tools and famous different data mining classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han, J., Kamber, M.: Data Mining: concepts and Techniques, 2nd edn, Morgan Kaufmann Publishers (2006)
Google Scholar
Weiss, S.M., Kulikowski, C.A.: Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, Burlington (1991)
Google Scholar
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data. Min. Knowl. Discov. 2, 345–389 (1998)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone. C.: Classification and Regression Trees. Wadsworth International Group (1984)
Google Scholar
Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient. Classification in data mining. In: Proceedings of 1997 International Workshop Research Issues on Data Engineering (RIDE’97), pp. 111–120. Birmingham, England (1997)
Google Scholar
Kalpana, R., Bansal, K.L.: Comparative study of data mining tools. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6), 216–223 (2014)
Google Scholar
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Base (VLDB’96), pp. 544–555. Bombay, India (1996)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti. V.: Rainforest: a framework for fast decision tree construction of large datasets. In: Proceedings of 1998 International Conference Very Large Data Bases (VLDB’98), pp. 416–427. New York, NY (1998)
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.-Y.: BOAT—optimistic decision tree construction. In: Proceedings of 1999 ACM-SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 169–180. Philadelphia, PA (1999)
Google Scholar
Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Proceedings of 5th International Joint Conference on Artificial Intelligence, pp. 305–310. Cambridge, MA (1977)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd. edn. Wiley (2001)
Google Scholar
Heckerman, D.: Bayesian networks for knowledge discovery. In: Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 273–305. MIT Press (1996)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kauffman (1988)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R,J,: Learning internal representations by error propagation. In: Rumelhart D.E., McClelland J.L. (eds.) Parallel Distributed Processing. MIT Press (1986)
Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–498 (1958)
Article Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice Hall (1995)
Google Scholar
Minsky, M.L., Papert, S.: Perceptrons: an Introduction to Computational Geometry. MIT Press (1969)
Google Scholar
Mezard, M., Nadal, J.P.: Learning in feedforward layered networks: the tiling algorithm. J. Phys. 22(12), 2191 (1989)
MathSciNet Google Scholar
Boser, B., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press: San Mateo, CA (1992)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–168 (1998)
Article Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probability Appl. 16, 264–280 (1971)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag (1995)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley (1998)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learning 3, 261–283 (1989)
Google Scholar
Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowledge Data Eng. 8, 866–883 (1996)
Article Google Scholar
Li L., Dong, G., Ramamohanrarao. K.: Making use of the most expressive jumping emerging patterns for classification. In: Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), pp. 220232. Kyoto, Japan (2000)
Google Scholar
Quinlan, J.R.: Learning logic definitions from relations. Mach. Learn. 5, 139–166 (1990)
Google Scholar
Major, J., Mangano, J.: Selecting among rules induced from a hurricane data base. J. Intell. Info. Syst. 39–52 (1995)
Google Scholar
Liu, B., Hsu, W., Ma. Y., Integrating classification and association rule mining. In: Proceedings of 1998 International Conference on Knowledge Discovery and Data Mining (KDD’98), pp. 80–86. New York, NY (1998)
Google Scholar
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple classification rules. In: Proceedings of 2001 International Conference on Data Mining (ICDM’01), pp. 369–376. San Jose, CA (2001)
Google Scholar
Ziarko, W.: The discovery, analysis, and representation of data dependencies in databases. In Piatetsky-Shapiro G., Frawley W.J. (eds.) Knowledge Discovery in Databases, pp. 195–209. AAAI Press (1991)
Google Scholar
Cios, K., Pedrycz, W., Swiniarski, R.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers (1998)
Google Scholar
Fix, E., Hodges, J.R.: Discriminatory analysis non-parametric discrimination: consistency properties. In: Technical Report 21–49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas (1951)
Google Scholar
Riesbeck, C., Schank, R.: Inside Case-Based Reasoning. Lawrence Erlbaum (1989)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of 1993 ACM-SIGMOD International Conference on Management of Data (SIGMOD’93), pp. 207–216. Washington, DC (1993)
Google Scholar
Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of 2003 SIAM International Conference on Data Mining (SDM’03), pp. 331–335, San Francisco, CA (2003)
Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1996)
Google Scholar
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley (1989)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Proceedings of 1996 International Conference on Extending Database Technology (EDBT’96), pp. 18–32. Avignon, France (1996)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of 1999 International Conference on Knowledge Discovery and Data Mining (KDD’99), pp. 43–52. San Diego, CA, (1999)
Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Netherlands (1991)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Ph.D. School of Informatics, Eötvös Loránd University, Budapest, Hungary
Yash Paul
John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
Neerendra Kumar
Department Computer Science & IT, Central University of Jammu, Jammu, India
Neerendra Kumar

Authors

Yash Paul
View author publications
You can also search for this author in PubMed Google Scholar
Neerendra Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yash Paul .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India
Pradeep Kumar Singh
Indian Institute of Technology Delhi, New Delhi, Delhi, India
Arpan Kumar Kar
Central University of Jammu, Jammu, Jammu and Kashmir, India
Yashwant Singh
Indian Institute of Technology Patna, Patna, Bihar, India
Maheshkumar H. Kolekar
Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paul, Y., Kumar, N. (2020). A Comparative Study of Famous Classification Techniques and Data Mining Tools. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-29407-6_45
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics