Abstract
We address the problem of learning predictive models from multiple large, distributed, autonomous, and hence almost invariably semantically disparate, relational data sources from a user’s point of view. We show under fairly general assumptions, how to exploit data sources annotated with relevant meta data in building predictive models (e.g., classifiers) from a collection of distributed relational data sources, without the need for a centralized data warehouse, while offering strong guarantees of exactness of the learned classifiers relative to their centralized relational learning counterparts. We demonstrate an application of the proposed approach in the case of learning link-based Naïve Bayes classifiers and present results of experiments on a text classification task that demonstrate the feasibility of the proposed approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Levy, A.: Logic-based techniques in data integration. In: Logic-based artificial intelligence, pp. 575–595. Kluwer Academic Publishers, Dordrecht (2000)
Noy, N.F.: Semantic Integration: A Survey Of Ontology-Based Approaches. SIGMOD Record, Special Issue on Semantic Integration 33 (2004)
Doan, A., Halevy, A.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26, 83–94 (2005)
Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y.: View-based query processing: On the relationship between rewriting, answering and losslessness. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 321–336. Springer, Heidelberg (2005)
Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: The state of the art. In: Proceedings of Semantic Interoperability and Integration, Dagstuhl, Germany (2005)
Noy, N., Stuckenschmidt, H.: Ontology Alignment: An annotated Bibliography. In: Semantic Interoperability and Integration. Dagstuhl Seminar Proceedings, vol. 04391 (2005)
Caragea, D., Zhang, J., Bao, J., Pathak, J., Honavar, V.: Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous information sources. In: Proceedings of ICALT, Singapore. LNCS, pp. 13–44 (2005)
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the International Conference on Machine Learning, ICML (2003)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Caragea, D., Bao, J., Honavar, V.: Learning relational bayesian classifiers on the semantic web. In: Proceedings of the IJCAI 2007 SWeCKa Workshop, India (2007)
Rajan, S., Punera, K., Ghosh, J.: A maximum likelihood framework for integrating taxonomies. In: Proceedings of AAAI, Pittsburgh, Pennsylvania, pp. 856–861 (2005)
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB Journal (2003)
Caragea, C., Caragea, D., Honavar, V.: Learning link-based classifiers from ontology-extended textual data. In: Proceedings of ICTAI 2009, Newark, New Jersey, USA (2009)
Parag, Domingos, P.: Multi-relational record linkage. In: Proceedings of the KDD-2004 Workshop on Multi-Relational Data Mining, Seattle, CA. ACM Press, New York (2004)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the contruction of internet portals with machine learning. Information Retrieval Journal 3, 127–163 (2000)
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of relational structure. Journal of Machine Learning Research 3, 679–707 (2002)
Neville, J., Jensen, D., Gallagher, B.: Simple estimators for relational bayesian classifiers. In: Proceedings of the 3rd IEEE ICDM 2003 (2003)
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT (2000)
Caragea, D., Honavar, V.: Learning classifiers from distributed data sources. Encyclopedia of Database Technologies and Applications (2008)
Zhang, J., Honavar, V.: Learning decision tree classifiers from attribute-value taxonomies and partially specified data. In: Fawcett, T., Mishra, N. (eds.) Proceedings of ICML, Washington, DC, pp. 880–887 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Caragea, C., Caragea, D., Honavar, V. (2009). Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-05151-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05150-0
Online ISBN: 978-3-642-05151-7
eBook Packages: Computer ScienceComputer Science (R0)