An Optimized Formulation of Decision Tree Classifier

Alam, Fahim Irfan; Bappee, Fateha Khanam; Rabbani, Md. Reza; Islam, Md. Mohaiminul

doi:10.1007/978-3-642-36321-4_10

Fahim Irfan Alam⁴,
Fateha Khanam Bappee⁵,
Md. Reza Rabbani⁴ &
…
Md. Mohaiminul Islam⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 361))

Included in the following conference series:

International Conference on Advances in Computing, Communication and Control

2870 Accesses
2 Citations

Abstract

An effective input dataset, valid pattern-spotting ability, good discovered pattern evaluation is required in order to analyze, predict and discover previously unknown knowledge from a large data set. The criteria of significance, novelty and usefulness need to be fulfilled in order to evaluate the performance of the prediction and classification of data. Thankfully data mining, an important step in this process of knowledge discovery extract hidden and non-trivial information from raw data through useful methods such as decision tree classification. But due to the enormous size, high-dimensionality and heterogeneous nature of the data sets, the traditional decision tree classification algorithms sometimes do not perform well in terms of computation time. This paper proposes a framework that uses a parallel strategy to optimize the performance of decision tree induction and cross-validation in order to classify data. Moreover, an existing pruning method is incorporated with our framework to overcome the overfitting problem and enhancing generalization ability along with reducing cost and structural complexity. Experiments on ten benchmark data sets suggest significant improvement in computation time and better classification accuracy by optimizing the classification framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://archive.ics.uci.edu/ml/datasets.html/
An, A., Cercone, N.J.: Discretization of Continuous Attributes for Learning Classification Rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 509–514. Springer, Heidelberg (1999)
Chapter Google Scholar
Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. In: Proceedings of ICML 2001- Eighteenth International Conference on Machine Learning, pp. 11–18. Morgan Kaufmann (2001)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont (1984)
MATH Google Scholar
Chen, M.-S., Hun, J., Yu, P.S., Ibm, T.J., Ctr, W.R.: Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 866–883 (1996)
Article Google Scholar
Chung, H., Gray, P.: Data mining. Journal of Management Information Systems 16(1), 11–16 (1999)
Google Scholar
Goil, S., Aluru, S., Ranka, S.: Concatenated parallelism: A technique for efficient parallel divide and conquer. In: Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP 1996), pp. 488–495. IEEE Computer Society, Washington, DC (1996)
Google Scholar
Joshi, M.V., Karypis, G., Kumar, V.: Scalparc: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proceedings of the 12th International Parallel Processing Symposium on International Parallel Processing Symposium, IPPS 1998, pp. 573–579. IEEE Computer Society, Washington, DC (1998)
Google Scholar
Senthamarai Kannan, K., Sailapathi Sekar, P., Mohamed Sathik, M., Arumugam, P.: Financial stock market forecast using data mining techniques. In: Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS 1996), Hong Kong, March 17-19, pp. 555–559 (2010)
Google Scholar
Koh, H.C., Tan, G.: Data mining applications in healthcare. Journal of Healthcare Information Management 19(2), 64–72 (2005)
Google Scholar
Kreuze, D.: Debugging hospitals. Technology Review 104(2), 32 (2001)
Google Scholar
Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to parallel computing: design and analysis of algorithms. Benjamin-Cummings Publishing Co., Inc., Redwood City (1994)
MATH Google Scholar
Lipschutz, S.: Schaum’s Outline of Theory and Problems of Data Structures. McGraw-Hill, Redwood City (1986)
Google Scholar
Liu, X., Wang, G., Qiao, B., Han, D.: Parallel strategies for training decision tree. Computer Science J. 31, 129–135 (2004)
Google Scholar
Madai, B., AlShaikh, R.: Performance modeling and mpi evaluation using westmere-based infiniband hpc cluster. In: Proceedings of the 2010 Fourth UKSim European Symposium on Computer Modeling and Simulation, Washington, DC, USA, pp. 363–368 (2010)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)
Chapter Google Scholar
Milley, A.: Healthcare and data mining. Health Management Technology 21(8), 44–47 (2000)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann, San Mateo (1992)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: VLDB, pp. 544–555 (1996)
Google Scholar
Srivastava, A., Han, E., Kumar, V., Singh, V.: Parallel formulations of decision-tree classification algorithms. Data Mining and Knowledge Discovery: An International Journal, 237–261 (1998)
Google Scholar
Trybula, W.J.: Data mining and knowledge discovery. Annual Review of Information Science and Technology (ARIST) 32, 197–229 (1997)
Google Scholar
Wei, J.M., Wang, S.Q., Yu, G., Gu, L., Wang, G.Y., Yuan, X.J.: A novel method for pruning decision trees. In: Proceedings of 8th International Conference on Machine Learning and Cybernetics, July 12-15, pp. 339–343 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Chittagong, Chittagong, Bangladesh
Fahim Irfan Alam, Md. Reza Rabbani & Md. Mohaiminul Islam
Department of Mathematics, Statistics and Computer Science, St. Francis Xavier University, Antigonish, Canada
Fateha Khanam Bappee

Authors

Fahim Irfan Alam
View author publications
You can also search for this author in PubMed Google Scholar
Fateha Khanam Bappee
View author publications
You can also search for this author in PubMed Google Scholar
Md. Reza Rabbani
View author publications
You can also search for this author in PubMed Google Scholar
Md. Mohaiminul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (W), 400 050, Mumbai, Maharashtra, India
Srija Unnikrishnan
Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (W), 400 050, Mumbai, India
Sunil Surve
Dept. of Electronics Engineering, Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (West), 400 050, Mumbai, India
Deepak Bhoir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, F.I., Bappee, F.K., Rabbani, M.R., Islam, M.M. (2013). An Optimized Formulation of Decision Tree Classifier. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-36321-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36320-7
Online ISBN: 978-3-642-36321-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics