Learning Classifiers from Distributed, Ontology-Extended Data Sources
There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontology associated with it and semantic correspondences between data source ontologies and a user ontology are supplied. The proposed approach yields algorithms for learning a broad class of classifiers (including Bayesian networks, decision trees, etc.) from semantically heterogeneous distributed data with strong performance guarantees relative to their centralized counterparts. We illustrate the application of the proposed approach in the case of learning Naive Bayes classifiers from distributed, ontology-extended data sources.
KeywordsBayesian Network User Perspective Statistical Query Student Program Heterogeneous Data Source
Unable to display preview. Download preview PDF.
- 1.Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)Google Scholar
- 3.Caragea, D., Silvescu, A., Honavar, V.: A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal of Hybrid Intelligent Systems 1 (2004)Google Scholar
- 4.Bonatti, P., Deng, Y., Subrahmanian, V.: An ontology-extended relational algebra. In: Proceedings of the IEEE Conference on Information Integration and Reuse, pp. 192–199. IEEE Press, Los Alamitos (2003)Google Scholar
- 6.Caragea, D., Pathak, J., Honavar, V.: Learning classifiers from semantically heterogeneous data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2004)Google Scholar
- 8.Zhang, J., Honavar, V.: AVT-NBL: An algorithm for learning compact and accurate naive bayes classifiers from attribute value taxonomies and data. In: Proceedings of the Fourth IEEE International Conference on Data Mining, Brighton, UK (2004)Google Scholar
- 9.Casella, G., Berger, R.: Statistical Inference. Duxbury Press, Belmont (2001)Google Scholar
- 10.Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29 (1997)Google Scholar
- 11.Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT (2000)Google Scholar
- 12.Doan, A., Halevy, A.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine, Special Issue on Semantic Integration 26, 83–94 (2005)Google Scholar