Learning Classifiers from Distributed, Ontology-Extended Data Sources

  • Doina Caragea
  • Jun Zhang
  • Jyotishman Pathak
  • Vasant Honavar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)


There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontology associated with it and semantic correspondences between data source ontologies and a user ontology are supplied. The proposed approach yields algorithms for learning a broad class of classifiers (including Bayesian networks, decision trees, etc.) from semantically heterogeneous distributed data with strong performance guarantees relative to their centralized counterparts. We illustrate the application of the proposed approach in the case of learning Naive Bayes classifiers from distributed, ontology-extended data sources.


Bayesian Network User Perspective Statistical Query Student Program Heterogeneous Data Source 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)Google Scholar
  2. 2.
    Zhang, J., Caragea, D., Honavar, V.: Learning ontology-aware classifiers. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 308–321. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)Google Scholar
  4. 4.
    Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on Information Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)Google Scholar
  5. 5.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  6. 6.
    Caragea, D., Pathak, J., Honavar, V.: Learning classifiers from semantically heterogeneous data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2004)Google Scholar
  7. 7.
    Kearns, M.: Efficient noise-tolerant learning from statistical queries. Journal of the ACM 45, 983–1006 (1998)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Zhang, J., Honavar, V.: AVT-NBL: An algorithm for learning compact and accurate naive bayes classifiers from attribute value taxonomies and data. In: Proceedings of the Fourth IEEE International Conference on Data Mining, Brighton, UK (2004)Google Scholar
  9. 9.
    Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)Google Scholar
  10. 10.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29 (1997)Google Scholar
  11. 11.
    Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT (2000)Google Scholar
  12. 12.
    Doan, A., Halevy, A.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine, Special Issue on Semantic Integration 26, 83–94 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Doina Caragea
    • 1
  • Jun Zhang
    • 1
  • Jyotishman Pathak
    • 1
  • Vasant Honavar
    • 1
  1. 1.AI Research Lab, Department of Computer ScienceIowa State UniversityAmesUSA

Personalised recommendations