On Efficient Construction of Decision Trees From Large Databases

Son, Nguyen Hung

doi:10.1007/3-540-45554-X_43

Nguyen Hung Son²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2005))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

Abstract

The main taskin decision tree construction algorithms is to find the “best partition” of the set of objects. In this paper, we investigate the problem of optimal binary partition of continuous attribute for large data sets stored in relational databases. The critical for time complexity of algorithms solving this problem is the number of simple SQL queries necessary to construct such partitions. The straightforward approach to optimal partition selection needs at least O(N) queries, where N is the number of pre-assumed partitions of the searching space. We show some properties of optimization measures related to discernibility between objects, that allow to construct the partition very close to optimal using only O(logN) simple queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chmielewski, M. R., Grzymala-Busse, J. W.: Global discretization of attributes as preprocessing for machine learning. In. T.Y. Lin, A.M. Wildberger (eds.). Soft Computing. Rough Sets, Fuzzy Logic Neural Networks, Uncertainty Management, Knowledge Discovery, Simulation Councils, Inc., San Diego, CA 294–297
Google Scholar
Dougherty J., Kohavi R., Sahami M.: Supervised and unsupervised discretization of continuous features. In. Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA
Google Scholar
Fayyad, U. M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102
Google Scholar
Fayyad, U. M., Irani, K.B.: The attribute selection problem in decision tree generation. In. Proc. of AAAI-92, San Jose, CA. MIT Press
Google Scholar
J. E. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST-A Framework for Fast Decision Tree Construction of Large Datasets. In Proc. of the 24th International Conference on Very Large Data Bases, New York, New York, 1998.
Google Scholar
Nguyen, H. Son: Discretization Methods in Data Mining. In L. Polkowski, A. Skowron (Eds.): Rough Sets in Knowledge Discovery 1, Springer Physica-Verlag, Heidelberg, 451–482.
Google Scholar
H.S. Nguyen and S.H. Nguyen. From Optimal Hyperplanes to Optimal Deciison Trees, Fundamenta Informaticae 34No 1-2, (1998) 145–174.
MATH MathSciNet Google Scholar
Nguyen, H. Son: Efficient SQL-Querying Method for Data Mining in Large Data Bases. Proc. of Sixteenth International Joint Conference on Artificial Intelligence, IJCAI-99, Morgan Kaufmann Publishers, Stockholm, Sweden, pp. 806–811.
Google Scholar
Pawlak Z.: Rough sets: Theoretical aspects of reasoning about data, Kluwer Dordrecht.
Google Scholar
Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg.
Google Scholar
Quinlan, J. R. C4.5. Programs for machine learning. Morgan Kaufmann, San Mateo CA.
Google Scholar
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In. R. Slowiński (ed.). Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht 311–362
Google Scholar
J. Komorowski, Z. Pawlak, L. Polkowski and A. Skowron,(1998). Rough sets: A tutorial. In: S.K. Pal and A. Skowron (eds.), Rough-fuzzy hybridization: A new trend in decision making, Springer-Verlag, Singapore, pp. 3–98.
Google Scholar
Ziarko, W.: Rough set as a methodology in Data Mining. In Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg, pp. 554–576.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Nguyen Hung Son

Authors

Nguyen Hung Son
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina Regina, S4S 0A2, Saskatchewan, Canada
Wojciech Ziarko & Yiyu Yao &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Son, N.H. (2001). On Efficient Construction of Decision Trees From Large Databases. In: Ziarko, W., Yao, Y. (eds) Rough Sets and Current Trends in Computing. RSCTC 2000. Lecture Notes in Computer Science(), vol 2005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45554-X_43

Download citation

DOI: https://doi.org/10.1007/3-540-45554-X_43
Published: 18 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43074-2
Online ISBN: 978-3-540-45554-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics