Abstract
The rapidly increasing number of scientific documents available publicly on the Internet creates the challenge of efficiently organizing and indexing these documents. Due to the time consuming and tedious nature of manual classification and indexing, there is a need for better methods to automate this process. This thesis proposes an approach which leverages encyclopedic background knowledge for enriching domain-specific ontologies with textual and structural information about the semantic vicinity of the ontologies’ concepts. The proposed approach aims to exploit this information for improving both ontology-based methods for classifying and indexing documents and methods based on supervised machine learning.
Chapter PDF
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: Neural Information Processing Systems (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research (2003)
de Melo, G., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 541–548. Springer, Heidelberg (2007)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (1990)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)
Gupta, R., Ratinov, L.A.: Text categorization with knowledge transfer from heterogeneous data sources. In: Proceedings of the 23rd Conference on Artificial Intelligence (2008)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1999)
Jonquet, C., Shah, N.H., Musen, M.A.: The open biomedical annotator. Summit on translational bioinformatics (2009)
Lancaster, F.W.: Indexing and abstracting in theory and practice. University of Illinois Press (1991)
Medelyan, O., Milne, D., Legg, C., Witten, I.H.: Mining meaning from wikipedia. International Journal of Human-Computer Studies (2009)
Oren, E., Möller, K., Scerri, S., Handschuh, S., Sintek, M.: What are semantic annotations. Technical report (2006)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language (2009)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management (1988)
Studer, R., Benjamins, R., Fensel, D.: Knowledge engineering: principles and methods. Data & Knowledge Engineering (1998)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web. ACM (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Posch, L. (2014). Enriching Ontologies with Encyclopedic Background Knowledge for Document Indexing. In: Mika, P., et al. The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol 8797. Springer, Cham. https://doi.org/10.1007/978-3-319-11915-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-11915-1_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11914-4
Online ISBN: 978-3-319-11915-1
eBook Packages: Computer ScienceComputer Science (R0)