Skip to main content

On Efficient Construction of Decision Trees From Large Databases

  • Conference paper
  • First Online:
Book cover Rough Sets and Current Trends in Computing (RSCTC 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2005))

Included in the following conference series:

Abstract

The main taskin decision tree construction algorithms is to find the “best partition” of the set of objects. In this paper, we investigate the problem of optimal binary partition of continuous attribute for large data sets stored in relational databases. The critical for time complexity of algorithms solving this problem is the number of simple SQL queries necessary to construct such partitions. The straightforward approach to optimal partition selection needs at least O(N) queries, where N is the number of pre-assumed partitions of the searching space. We show some properties of optimization measures related to discernibility between objects, that allow to construct the partition very close to optimal using only O(logN) simple queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chmielewski, M. R., Grzymala-Busse, J. W.: Global discretization of attributes as preprocessing for machine learning. In. T.Y. Lin, A.M. Wildberger (eds.). Soft Computing. Rough Sets, Fuzzy Logic Neural Networks, Uncertainty Management, Knowledge Discovery, Simulation Councils, Inc., San Diego, CA 294–297

    Google Scholar 

  2. Dougherty J., Kohavi R., Sahami M.: Supervised and unsupervised discretization of continuous features. In. Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA

    Google Scholar 

  3. Fayyad, U. M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102

    Google Scholar 

  4. Fayyad, U. M., Irani, K.B.: The attribute selection problem in decision tree generation. In. Proc. of AAAI-92, San Jose, CA. MIT Press

    Google Scholar 

  5. J. E. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST-A Framework for Fast Decision Tree Construction of Large Datasets. In Proc. of the 24th International Conference on Very Large Data Bases, New York, New York, 1998.

    Google Scholar 

  6. Nguyen, H. Son: Discretization Methods in Data Mining. In L. Polkowski, A. Skowron (Eds.): Rough Sets in Knowledge Discovery 1, Springer Physica-Verlag, Heidelberg, 451–482.

    Google Scholar 

  7. H.S. Nguyen and S.H. Nguyen. From Optimal Hyperplanes to Optimal Deciison Trees, Fundamenta Informaticae 34No 1-2, (1998) 145–174.

    MATH  MathSciNet  Google Scholar 

  8. Nguyen, H. Son: Efficient SQL-Querying Method for Data Mining in Large Data Bases. Proc. of Sixteenth International Joint Conference on Artificial Intelligence, IJCAI-99, Morgan Kaufmann Publishers, Stockholm, Sweden, pp. 806–811.

    Google Scholar 

  9. Pawlak Z.: Rough sets: Theoretical aspects of reasoning about data, Kluwer Dordrecht.

    Google Scholar 

  10. Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg.

    Google Scholar 

  11. Quinlan, J. R. C4.5. Programs for machine learning. Morgan Kaufmann, San Mateo CA.

    Google Scholar 

  12. Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In. R. Slowiński (ed.). Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht 311–362

    Google Scholar 

  13. J. Komorowski, Z. Pawlak, L. Polkowski and A. Skowron,(1998). Rough sets: A tutorial. In: S.K. Pal and A. Skowron (eds.), Rough-fuzzy hybridization: A new trend in decision making, Springer-Verlag, Singapore, pp. 3–98.

    Google Scholar 

  14. Ziarko, W.: Rough set as a methodology in Data Mining. In Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg, pp. 554–576.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Son, N.H. (2001). On Efficient Construction of Decision Trees From Large Databases. In: Ziarko, W., Yao, Y. (eds) Rough Sets and Current Trends in Computing. RSCTC 2000. Lecture Notes in Computer Science(), vol 2005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45554-X_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-45554-X_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43074-2

  • Online ISBN: 978-3-540-45554-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics