Skip to main content

Experiences Using Decision Trees for Knowledge Discovery

  • Chapter
  • First Online:
Fuzzy Sets, Rough Sets, Multisets and Clustering

Part of the book series: Studies in Computational Intelligence ((SCI,volume 671))

Abstract

Knowledge discovery is the process of identifying useful patterns from large data sets. There are two families of approaches to be used for knowledge discovery: clustering, when the classes of domain objects are not known; and inductive learning algorithms, when the classes are known and the goal is to construct a domain model useful to identify new unseen objects. Clustering algorithms have also been proposed to analyze the data when the classes are known. However, to our knowledge, inductive learning methods are not used to analyze the available data but only for prediction. What we propose here is a methodology, namely FTree, that uses a decision tree to analyze both the available data identifying patterns and some important aspects of the domain (at least from the domain’s part represented by the data at hand) such as similarity between classes, separability, characterization of classes and even some possible errors on data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. E. Armengol. Usages of generalization in CBR. In R.O. Weber and M. M. Richter, editors, ICCBR-2007. Case-based Reasoning and Development, number 4626 in Lecture Notes in Artificial Intelligence, pages 31–45. Springer-Verlag, 2007.

    Google Scholar 

  2. E. Armengol. Building partial domain theories from explanations. Knowledge Intelligence, 2/08:19–24, 2008.

    Google Scholar 

  3. E. Armengol and E. Plaza. Discovery of toxicological patterns with lazy learning. In V. Palade, R.J. Howlett, and L. Jain, editors, KES-2003, number 2774 in Lecture Notes in Artificial Intelligence, pages 919–926. Springer, 2003.

    Google Scholar 

  4. A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

    Google Scholar 

  5. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.

    Google Scholar 

  6. J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - a framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery, 4(2/3):127–162, 2000.

    Google Scholar 

  7. L.E. Gómez, M.A. Verdugo, B. Arias and R.L. Schalock. Formulari de l’escala gencat de qualitat de vida. manual d’aplicació de l’escala gencat de qualitat de vida. Technical report, Departament d’Acció Social i Ciutadania, Generalitat de Catalunya, Barcelona, 2008.

    Google Scholar 

  8. L.E. Gómez, M.A. Verdugo, B. Arias and R.L. Schalock. Informe sobre la creació d’una escala multidimensional per avaluar la qualitat de vida de les persones usuàries dels serveis socials a catalunya. Technical report, Departament d’Acció Social i Ciutadania, Generalitat de Catalunya, Barcelona, 2008.

    Google Scholar 

  9. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264–323, September 1999.

    Google Scholar 

  10. T. Kohonen. The self-organizing map. Neurocomputing, 21(1-3):1–6, 1998.

    Google Scholar 

  11. R. López de Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6:81–92, 1991.

    Google Scholar 

  12. O. Maimon and L. Rokach, editors. Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, 2010.

    Google Scholar 

  13. M. Núñez. The use of background knowledge in decision tree induction. Machine Learning, 6:231–250, 1991.

    Google Scholar 

  14. J. Ortega and D. Fisher. Flexibly exploiting prior knowledge in empirical learning. In Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2, IJCAI’95, pages 1041–1047, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

    Google Scholar 

  15. M. J. Pazzani. Knowledge discovery from data? IEEE Intelligent Systems, 15(2):10–13, 2000.

    Google Scholar 

  16. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

    Google Scholar 

  17. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  18. J. R. Quinlan. Discovering rules by induction from large collection of examples. In Expert Systems in the Microelectronic Age. D. Michie (Ed.), pages 168–201. Edimburg Eniversity Press, 1979.

    Google Scholar 

  19. R.L. Schalock and M.A. Verdugo. Handbook of quality of life for human service practitioners. Washington, DC, 2002.

    Google Scholar 

  20. J. C. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In VLDB, pages 544–555, 1996.

    Google Scholar 

  21. S. M. Sivagama. A knowledge discovery using decision tree by Gini coefficient. In International Conference on Business, Engineering and Industrial Applications (ICBEIA), pages 232–235, 2011.

    Google Scholar 

  22. Y. Tsai, Paul H. King, Ph. D, Michael S. Higgins, Ph. D, and Nimesh P. Patel. An expert-guided decision tree construction strategy: An application in knowledge discovery with medical databases. In AMIA Annual Fall Symposium, pages 208–212, 1997.

    Google Scholar 

Download references

Acknowledgements

The authors thank Susana Puig their helpful comments and suggestions, and the Taller Jeroni de Moragas. This research is partially funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 689176 (SYSMICS project), the projects RASO (TIN2015-71799-C2-1-P) and RPREF (CSIC Intramural 201650E044) and the grants 2014-SGR-118 and 2014-SGR-788 from the Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eva Armengol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Armengol, E., García-Cerdaña, À., Dellunde, P. (2017). Experiences Using Decision Trees for Knowledge Discovery. In: Torra, V., Dahlbom, A., Narukawa, Y. (eds) Fuzzy Sets, Rough Sets, Multisets and Clustering. Studies in Computational Intelligence, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-319-47557-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47557-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47556-1

  • Online ISBN: 978-3-319-47557-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics