Skip to main content

Approximating Decision Trees with Multiway Branches

  • Conference paper
Automata, Languages and Programming (ICALP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5555))

Included in the following conference series:

Abstract

We consider the problem of constructing decision trees for entity identification from a given table. The input is a table containing information about a set of entities over a fixed set of attributes. The goal is to construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. The previously best known approximation ratio for this problem was O(log2 N). In this paper, we present a new greedy heuristic that yields an improved approximation ratio of O(logN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2(4), 345–389 (1998)

    Article  Google Scholar 

  2. Moret, B.: Decision trees and diagrams. ACM Computing Surveys 14(4), 593–623 (1982)

    Article  Google Scholar 

  3. Hyafil, L., Rivest, R.: Constructing optimal binary decision trees is NP-complete. Information Processing Letters 5(1), 15–17 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  4. Garey, M.: Optimal binary identification procedures. SIAM Journal on Applied Mathematics 23(2), 173–186 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  5. Kosaraju, S., Przytycka, M., Borgstrom, R.: On an optimal split tree problem. In: Workshop on Algorithms and Data Structures (1999)

    Google Scholar 

  6. Adler, M., Heeringa, B.: Approximating optimal binary decision trees. In: Goel, A., Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds.) APPROX and RANDOM 2008. LNCS, vol. 5171, pp. 1–9. Springer, Heidelberg (2008)

    Google Scholar 

  7. Heeringa, B.: Improving Access to Organized Information. Ph.D. thesis, University of Massachusetts, Amherst (2006)

    Google Scholar 

  8. Garey, M., Graham, R.: Performance bounds on the splitting algorithm for binary testing. Acta Informatica 3, 347–355 (1974)

    Article  MATH  Google Scholar 

  9. Chakaravarthy, V., Pandit, V., Roy, S., Awasthi, P., Mohania, M.: Decision trees for entity identification: approximation algorithms and hardness results. In: ACM Symposium on Principles of Database Systems (2007)

    Google Scholar 

  10. Feige, U., Lovász, L., Tetali, P.: Approximating min sum set cover. Algorithmica 40(4), 219–234 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chakaravarthy, V.T., Pandit, V., Roy, S., Sabharwal, Y. (2009). Approximating Decision Trees with Multiway Branches. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02927-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02927-1_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02926-4

  • Online ISBN: 978-3-642-02927-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics