Abstract
Data mining is the procedure or technique of drawing out the facts and patterns hidden in huge sum of data and converts it into a readable and understandable form. Data mining has four main modules like classification, association rule analysis, and clustering and sequence analysis. The classification is the major module and is used in many different areas for classification problems. Classification process gives a summary of data investigation which may be utilized to develop models or structures, telling different classes or predict future data trends for improved understanding of the data at maximum. In this survey, various data mining classification techniques and some important data mining tools along with their advantages and disadvantages are presented. Data classification techniques are classified into three categories namely, Eager learners, Lazy learners, and other Classification techniques. Decision tree, Bayesian classification, Rule based classification, Support Vector Machines (SVM), Association rule mining and backpropagation (Neural Networks) are eager learners. The K-Nearest Neighbor (KNN) classification and Case Based Reasoning (CRT) are lazy learners. Other classification techniques include genetic algorithms, fuzzy logic and Rough Set Approach. Here six important data mining tools, basic Eager learner, Lazy learner and other classification techniques for data classification are discussed. The aim of this article is to provide a survey of six famous data mining tools and famous different data mining classification techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M.: Data Mining: concepts and Techniques, 2nd edn, Morgan Kaufmann Publishers (2006)
Weiss, S.M., Kulikowski, C.A.: Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, Burlington (1991)
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data. Min. Knowl. Discov. 2, 345–389 (1998)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Breiman, L., Friedman, J., Olshen, R., Stone. C.: Classification and Regression Trees. Wadsworth International Group (1984)
Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient. Classification in data mining. In: Proceedings of 1997 International Workshop Research Issues on Data Engineering (RIDE’97), pp. 111–120. Birmingham, England (1997)
Kalpana, R., Bansal, K.L.: Comparative study of data mining tools. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6), 216–223 (2014)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Base (VLDB’96), pp. 544–555. Bombay, India (1996)
Gehrke, J., Ramakrishnan, R., Ganti. V.: Rainforest: a framework for fast decision tree construction of large datasets. In: Proceedings of 1998 International Conference Very Large Data Bases (VLDB’98), pp. 416–427. New York, NY (1998)
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.-Y.: BOAT—optimistic decision tree construction. In: Proceedings of 1999 ACM-SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 169–180. Philadelphia, PA (1999)
Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Proceedings of 5th International Joint Conference on Artificial Intelligence, pp. 305–310. Cambridge, MA (1977)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd. edn. Wiley (2001)
Heckerman, D.: Bayesian networks for knowledge discovery. In: Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 273–305. MIT Press (1996)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kauffman (1988)
Rumelhart, D.E., Hinton, G.E., Williams, R,J,: Learning internal representations by error propagation. In: Rumelhart D.E., McClelland J.L. (eds.) Parallel Distributed Processing. MIT Press (1986)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–498 (1958)
Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice Hall (1995)
Minsky, M.L., Papert, S.: Perceptrons: an Introduction to Computational Geometry. MIT Press (1969)
Mezard, M., Nadal, J.P.: Learning in feedforward layered networks: the tiling algorithm. J. Phys. 22(12), 2191 (1989)
Boser, B., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press: San Mateo, CA (1992)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–168 (1998)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probability Appl. 16, 264–280 (1971)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag (1995)
Vapnik, V.N.: Statistical Learning Theory. Wiley (1998)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learning 3, 261–283 (1989)
Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowledge Data Eng. 8, 866–883 (1996)
Li L., Dong, G., Ramamohanrarao. K.: Making use of the most expressive jumping emerging patterns for classification. In: Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), pp. 220232. Kyoto, Japan (2000)
Quinlan, J.R.: Learning logic definitions from relations. Mach. Learn. 5, 139–166 (1990)
Major, J., Mangano, J.: Selecting among rules induced from a hurricane data base. J. Intell. Info. Syst. 39–52 (1995)
Liu, B., Hsu, W., Ma. Y., Integrating classification and association rule mining. In: Proceedings of 1998 International Conference on Knowledge Discovery and Data Mining (KDD’98), pp. 80–86. New York, NY (1998)
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple classification rules. In: Proceedings of 2001 International Conference on Data Mining (ICDM’01), pp. 369–376. San Jose, CA (2001)
Ziarko, W.: The discovery, analysis, and representation of data dependencies in databases. In Piatetsky-Shapiro G., Frawley W.J. (eds.) Knowledge Discovery in Databases, pp. 195–209. AAAI Press (1991)
Cios, K., Pedrycz, W., Swiniarski, R.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers (1998)
Fix, E., Hodges, J.R.: Discriminatory analysis non-parametric discrimination: consistency properties. In: Technical Report 21–49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas (1951)
Riesbeck, C., Schank, R.: Inside Case-Based Reasoning. Lawrence Erlbaum (1989)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of 1993 ACM-SIGMOD International Conference on Management of Data (SIGMOD’93), pp. 207–216. Washington, DC (1993)
Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of 2003 SIAM International Conference on Data Mining (SDM’03), pp. 331–335, San Francisco, CA (2003)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1996)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley (1989)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Proceedings of 1996 International Conference on Extending Database Technology (EDBT’96), pp. 18–32. Avignon, France (1996)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of 1999 International Conference on Knowledge Discovery and Data Mining (KDD’99), pp. 43–52. San Diego, CA, (1999)
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Netherlands (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Paul, Y., Kumar, N. (2020). A Comparative Study of Famous Classification Techniques and Data Mining Tools. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-29407-6_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)