Learning Classifiers from Distributed, Ontology-Extended Data Sources

Caragea, Doina; Zhang, Jun; Pathak, Jyotishman; Honavar, Vasant

doi:10.1007/11823728_35

Doina Caragea¹⁸,
Jun Zhang¹⁸,
Jyotishman Pathak¹⁸ &
…
Vasant Honavar¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

769 Accesses
2 Citations

Abstract

There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontology associated with it and semantic correspondences between data source ontologies and a user ontology are supplied. The proposed approach yields algorithms for learning a broad class of classifiers (including Bayesian networks, decision trees, etc.) from semantically heterogeneous distributed data with strong performance guarantees relative to their centralized counterparts. We illustrate the application of the proposed approach in the case of learning Naive Bayes classifiers from distributed, ontology-extended data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Google Scholar
Zhang, J., Caragea, D., Honavar, V.: Learning ontology-aware classifiers. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 308–321. Springer, Heidelberg (2005)
Chapter Google Scholar
Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)
Google Scholar
Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on Information Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Caragea, D., Pathak, J., Honavar, V.: Learning classifiers from semantically heterogeneous data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2004)
Google Scholar
Kearns, M.: Efficient noise-tolerant learning from statistical queries. Journal of the ACM 45, 983–1006 (1998)
Article MATH MathSciNet Google Scholar
Zhang, J., Honavar, V.: AVT-NBL: An algorithm for learning compact and accurate naive bayes classifiers from attribute value taxonomies and data. In: Proceedings of the Fourth IEEE International Conference on Data Mining, Brighton, UK (2004)
Google Scholar
Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29 (1997)
Google Scholar
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT (2000)
Google Scholar
Doan, A., Halevy, A.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine, Special Issue on Semantic Integration 26, 83–94 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

AI Research Lab, Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, IA, 50011, USA
Doina Caragea, Jun Zhang, Jyotishman Pathak & Vasant Honavar

Authors

Doina Caragea
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jyotishman Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Vasant Honavar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caragea, D., Zhang, J., Pathak, J., Honavar, V. (2006). Learning Classifiers from Distributed, Ontology-Extended Data Sources. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_35

Download citation

DOI: https://doi.org/10.1007/11823728_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics