Skip to main content

Learning Classifiers from Distributed, Ontology-Extended Data Sources

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

Abstract

There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontology associated with it and semantic correspondences between data source ontologies and a user ontology are supplied. The proposed approach yields algorithms for learning a broad class of classifiers (including Bayesian networks, decision trees, etc.) from semantically heterogeneous distributed data with strong performance guarantees relative to their centralized counterparts. We illustrate the application of the proposed approach in the case of learning Naive Bayes classifiers from distributed, ontology-extended data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)

    Google Scholar 

  2. Zhang, J., Caragea, D., Honavar, V.: Learning ontology-aware classifiers. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 308–321. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)

    Google Scholar 

  4. Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on Information Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)

    Google Scholar 

  5. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  6. Caragea, D., Pathak, J., Honavar, V.: Learning classifiers from semantically heterogeneous data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2004)

    Google Scholar 

  7. Kearns, M.: Efficient noise-tolerant learning from statistical queries. Journal of the ACM 45, 983–1006 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  8. Zhang, J., Honavar, V.: AVT-NBL: An algorithm for learning compact and accurate naive bayes classifiers from attribute value taxonomies and data. In: Proceedings of the Fourth IEEE International Conference on Data Mining, Brighton, UK (2004)

    Google Scholar 

  9. Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)

    Google Scholar 

  10. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29 (1997)

    Google Scholar 

  11. Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT (2000)

    Google Scholar 

  12. Doan, A., Halevy, A.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine, Special Issue on Semantic Integration 26, 83–94 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Caragea, D., Zhang, J., Pathak, J., Honavar, V. (2006). Learning Classifiers from Distributed, Ontology-Extended Data Sources. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_35

Download citation

  • DOI: https://doi.org/10.1007/11823728_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37736-8

  • Online ISBN: 978-3-540-37737-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics