Learning Classifiers from Semantically Heterogeneous Data

Caragea, Doina; Pathak, Jyotishman; Honavar, Vasant G.

doi:10.1007/978-3-540-30469-2_9

Doina Caragea¹⁸,
Jyotishman Pathak¹⁸ &
Vasant G. Honavar¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3291))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

575 Accesses
9 Citations

Abstract

Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no single ontology meets the needs of all applications or users in every context, or for that matter, even a single user in different contexts, there is a need for principled approaches to acquiring statistics from semantically heterogeneous data. In this paper, we introduce ontology-extended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to derive mappings from source ontologies to the user ontology. We observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output. We show how the ontology mappings can be used to answer statistical queries needed by algorithms for learning classifiers from data viewed from a certain user perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hendler, J.: Science and the semantic web. Science 299 (2003)
Google Scholar
Levy, A.Y.: Logic-based techniques in data integration. In: Logic-based artificial intelligence, pp. 575–595. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Reinoso-Castillo, J., Silvescu, A., Caragea, D., Pathak, J., Honavar, V.: Information extraction and integration from heterogeneous, distributed, autonomous information sources: A federated, query-centric approach. In: IEEE International Conference on Information Integration and Reuse (2003) (in press)
Google Scholar
Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)
Google Scholar
Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Pearl, J.: Graphical Models for Probabilistic and Causal Reasoning. Cambridge Press, New York (2000)
Google Scholar
Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2001)
MATH Google Scholar
Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1300–1309. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Atramentov, A., Leiva, H., Honavar, V.: Learning decision trees from multirelational data. In: Horváth, T., Yamamoto, A. (eds.) ILP 2003. LNCS (LNAI), vol. 2835, pp. 38–56. Springer, Heidelberg (2003)
Chapter Google Scholar
Silvescu, A., Andorf, C., Dobbs, D., Honavar, V.: Inter-element dependency models for sequence classification. In: ICDM (2004) (submitted)
Google Scholar
Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Transactions On Knowledge And Data Engineering 8, 962–969 (1996)
Article Google Scholar
Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on INformation Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)
Google Scholar
Caragea, D.: Learning from Distributed, Heterogeneous and Autonomous Data Sources. PhD thesis, Department of Computer Sciene, Iowa State University, USA (2004)
Google Scholar
Zhang, J., Honavar, V.: Learning naive bayes classifiers from attribute-value taxonomies and partially specified data. In: Proceedings of the Conference on Intelligent System Design and Applications (2004) (in Press)
Google Scholar
Davidson, S., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, G., Stoeckert, C.: K2/kleisli and gus: Experiments in integrated access to genomic data sources. IBM Journal 40 (2001)
Google Scholar
Eckman, B.: A practitioner’s guide to data management and data integration in bioinformatics. Bioinformatics, 3–74 (2003)
Google Scholar
McClean, S., Páircéir, R., Scotney, B., Greer, K.: A Negotiation Agent for Distributed Heterogeneous Statistical Databases. In: SSDBM 2002, pp. 207–216 (2002)
Google Scholar
McClean, S., Scotney, B., Greer, K.: A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE), 232–235 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA, 50011-1040, USA
Doina Caragea, Jyotishman Pathak & Vasant G. Honavar

Authors

Doina Caragea
View author publications
You can also search for this author in PubMed Google Scholar
Jyotishman Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Vasant G. Honavar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, VIC 3001, Melbourne, Australia
Zahir Tari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caragea, D., Pathak, J., Honavar, V.G. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. OTM 2004. Lecture Notes in Computer Science, vol 3291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30469-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-30469-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23662-7
Online ISBN: 978-3-540-30469-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics