Skip to main content

Learning Classifiers from Semantically Heterogeneous Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3291))

Abstract

Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no single ontology meets the needs of all applications or users in every context, or for that matter, even a single user in different contexts, there is a need for principled approaches to acquiring statistics from semantically heterogeneous data. In this paper, we introduce ontology-extended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to derive mappings from source ontologies to the user ontology. We observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output. We show how the ontology mappings can be used to answer statistical queries needed by algorithms for learning classifiers from data viewed from a certain user perspective.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hendler, J.: Science and the semantic web. Science 299 (2003)

    Google Scholar 

  2. Levy, A.Y.: Logic-based techniques in data integration. In: Logic-based artificial intelligence, pp. 575–595. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  3. Reinoso-Castillo, J., Silvescu, A., Caragea, D., Pathak, J., Honavar, V.: Information extraction and integration from heterogeneous, distributed, autonomous information sources: A federated, query-centric approach. In: IEEE International Conference on Information Integration and Reuse (2003) (in press)

    Google Scholar 

  4. Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)

    Google Scholar 

  5. Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)

    Google Scholar 

  6. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  7. Pearl, J.: Graphical Models for Probabilistic and Causal Reasoning. Cambridge Press, New York (2000)

    Google Scholar 

  8. Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  9. Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  10. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1300–1309. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  11. Atramentov, A., Leiva, H., Honavar, V.: Learning decision trees from multirelational data. In: Horváth, T., Yamamoto, A. (eds.) ILP 2003. LNCS (LNAI), vol. 2835, pp. 38–56. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Silvescu, A., Andorf, C., Dobbs, D., Honavar, V.: Inter-element dependency models for sequence classification. In: ICDM (2004) (submitted)

    Google Scholar 

  13. Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Transactions On Knowledge And Data Engineering 8, 962–969 (1996)

    Article  Google Scholar 

  14. Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on INformation Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)

    Google Scholar 

  15. Caragea, D.: Learning from Distributed, Heterogeneous and Autonomous Data Sources. PhD thesis, Department of Computer Sciene, Iowa State University, USA (2004)

    Google Scholar 

  16. Zhang, J., Honavar, V.: Learning naive bayes classifiers from attribute-value taxonomies and partially specified data. In: Proceedings of the Conference on Intelligent System Design and Applications (2004) (in Press)

    Google Scholar 

  17. Davidson, S., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, G., Stoeckert, C.: K2/kleisli and gus: Experiments in integrated access to genomic data sources. IBM Journal 40 (2001)

    Google Scholar 

  18. Eckman, B.: A practitioner’s guide to data management and data integration in bioinformatics. Bioinformatics, 3–74 (2003)

    Google Scholar 

  19. McClean, S., Páircéir, R., Scotney, B., Greer, K.: A Negotiation Agent for Distributed Heterogeneous Statistical Databases. In: SSDBM 2002, pp. 207–216 (2002)

    Google Scholar 

  20. McClean, S., Scotney, B., Greer, K.: A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE), 232–235 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Caragea, D., Pathak, J., Honavar, V.G. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. OTM 2004. Lecture Notes in Computer Science, vol 3291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30469-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30469-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23662-7

  • Online ISBN: 978-3-540-30469-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics