Skip to main content

An Optimized Formulation of Decision Tree Classifier

  • Conference paper
Advances in Computing, Communication, and Control (ICAC3 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 361))

Abstract

An effective input dataset, valid pattern-spotting ability, good discovered pattern evaluation is required in order to analyze, predict and discover previously unknown knowledge from a large data set. The criteria of significance, novelty and usefulness need to be fulfilled in order to evaluate the performance of the prediction and classification of data. Thankfully data mining, an important step in this process of knowledge discovery extract hidden and non-trivial information from raw data through useful methods such as decision tree classification. But due to the enormous size, high-dimensionality and heterogeneous nature of the data sets, the traditional decision tree classification algorithms sometimes do not perform well in terms of computation time. This paper proposes a framework that uses a parallel strategy to optimize the performance of decision tree induction and cross-validation in order to classify data. Moreover, an existing pruning method is incorporated with our framework to overcome the overfitting problem and enhancing generalization ability along with reducing cost and structural complexity. Experiments on ten benchmark data sets suggest significant improvement in computation time and better classification accuracy by optimizing the classification framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://archive.ics.uci.edu/ml/datasets.html/

  2. An, A., Cercone, N.J.: Discretization of Continuous Attributes for Learning Classification Rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 509–514. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  3. Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. In: Proceedings of ICML 2001- Eighteenth International Conference on Machine Learning, pp. 11–18. Morgan Kaufmann (2001)

    Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont (1984)

    MATH  Google Scholar 

  5. Chen, M.-S., Hun, J., Yu, P.S., Ibm, T.J., Ctr, W.R.: Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 866–883 (1996)

    Article  Google Scholar 

  6. Chung, H., Gray, P.: Data mining. Journal of Management Information Systems 16(1), 11–16 (1999)

    Google Scholar 

  7. Goil, S., Aluru, S., Ranka, S.: Concatenated parallelism: A technique for efficient parallel divide and conquer. In: Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP 1996), pp. 488–495. IEEE Computer Society, Washington, DC (1996)

    Google Scholar 

  8. Joshi, M.V., Karypis, G., Kumar, V.: Scalparc: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proceedings of the 12th International Parallel Processing Symposium on International Parallel Processing Symposium, IPPS 1998, pp. 573–579. IEEE Computer Society, Washington, DC (1998)

    Google Scholar 

  9. Senthamarai Kannan, K., Sailapathi Sekar, P., Mohamed Sathik, M., Arumugam, P.: Financial stock market forecast using data mining techniques. In: Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS 1996), Hong Kong, March 17-19, pp. 555–559 (2010)

    Google Scholar 

  10. Koh, H.C., Tan, G.: Data mining applications in healthcare. Journal of Healthcare Information Management 19(2), 64–72 (2005)

    Google Scholar 

  11. Kreuze, D.: Debugging hospitals. Technology Review 104(2), 32 (2001)

    Google Scholar 

  12. Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to parallel computing: design and analysis of algorithms. Benjamin-Cummings Publishing Co., Inc., Redwood City (1994)

    MATH  Google Scholar 

  13. Lipschutz, S.: Schaum’s Outline of Theory and Problems of Data Structures. McGraw-Hill, Redwood City (1986)

    Google Scholar 

  14. Liu, X., Wang, G., Qiao, B., Han, D.: Parallel strategies for training decision tree. Computer Science J. 31, 129–135 (2004)

    Google Scholar 

  15. Madai, B., AlShaikh, R.: Performance modeling and mpi evaluation using westmere-based infiniband hpc cluster. In: Proceedings of the 2010 Fourth UKSim European Symposium on Computer Modeling and Simulation, Washington, DC, USA, pp. 363–368 (2010)

    Google Scholar 

  16. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Milley, A.: Healthcare and data mining. Health Management Technology 21(8), 44–47 (2000)

    Google Scholar 

  18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  19. Quinlan, J.R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann, San Mateo (1992)

    Google Scholar 

  20. Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  21. Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: VLDB, pp. 544–555 (1996)

    Google Scholar 

  22. Srivastava, A., Han, E., Kumar, V., Singh, V.: Parallel formulations of decision-tree classification algorithms. Data Mining and Knowledge Discovery: An International Journal, 237–261 (1998)

    Google Scholar 

  23. Trybula, W.J.: Data mining and knowledge discovery. Annual Review of Information Science and Technology (ARIST) 32, 197–229 (1997)

    Google Scholar 

  24. Wei, J.M., Wang, S.Q., Yu, G., Gu, L., Wang, G.Y., Yuan, X.J.: A novel method for pruning decision trees. In: Proceedings of 8th International Conference on Machine Learning and Cybernetics, July 12-15, pp. 339–343 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alam, F.I., Bappee, F.K., Rabbani, M.R., Islam, M.M. (2013). An Optimized Formulation of Decision Tree Classifier. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36321-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36320-7

  • Online ISBN: 978-3-642-36321-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics