Abstract
Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B., Modern Information Retrieval, Addison- Wesley, 1999.
Bast, H., Chitea, A., Suchanek, F., Weber, I., “ESTER: Efficient Search on Text, Entities, and Relations,” in Proc. of 30th Annual International ACM SIGIR Conference, pp. 671–678, 2007.
Buckley, C., “Implementation of the SMART Information Retrieval System,” Technical Report, Cornell University, pp. 85–686, 1985.
Cao, T. H., Do, H. T., Hong, D. T., Quan, T. T., “Fuzzy Named Entity-Based Document Clustering,” in Proc. of the 17th IEEE International Conference on Fuzzy Systems, pp. 2028–2034, 2008.
Cao, T. H., Cao, T. D., Tran, T. L., “A Robust Ontology-Based Method for Translating Natural Language Queries to Conceptual Graphs,” in Proc. of the 3th Asian Semantic Web Conference, LNCS 5367, Springer, pp. 479–492, 2008.
Castells, P., Vallet, D., Fernández, M., “An Adaptation of the Vector Space Model for Ontology-Based Information Retrieval,” IEEE Transactions of Knowledge and Data Engineering, pp. 261–272, 2006.
Cheng, G., Ge, W., Wu, H., Qu, H., “Searching Semantic Web Objects based on Class Hierarchies,” in Proc. of WWW2008 Workshop on Linked Data on the Web., 2008
Cheng, T., Yan, X., Chen, K., Chang, C., “EntityRank: Searching Entities Directly and Holistically,” in Proc. of the 33rd Very Large Data Bases Conference, pp. 387–398, 2007.
Dominich, S., “Paradox-Free Formal Foundation of Vector Space Model,” in Proc. of the ACM SIGIR 2002 Workshop on Mathematical/Formal Methods in Information Retrieval, pp. 43–48, 2002.
Gonalves, A., Zhu, J., Song, D., Uren, V., Pacheco, R., “LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval,” in Proc. of the 7th International Conference on Web-Age Information Management, 2006.
Gospodnetic, O., “Parsing, Indexing, and Searching XML with Digester and Lucene,” Journal of IBM DeveloperWorks, 2003.
Guha, R., McCool, R., Miller, E., “Semantic Search,” in Proc. of the 12th International Conference on World Wide Web, pp. 700–709, 2003.
Hull, D., “Using Statistical Testing in the Evaluation of Retrieval Experiments,” in Proc. of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 329–338, 1993.
Jing, L., Ng M. K., Huang, J. Z., “Knowledge-Based Vector Space Model for Text Clustering,” Knowledge and Information Systems, 2009.
Jones, R., Hassan, A., Diaz, F. “Geographic Features in Web Search Retrieval,” in Proc. of the 2nd ACM International Workshop on Geographic Information Retrieval, pp. 57–58, 2008.
Khalid, M. A., Jijkoun, V., de Rijke, M., “The Impact of Named Entity Normalization on Information Retrieval for Question Answering,” in Proc. of the 30th European Conference on IR Research, LNCS 4956, Springer, pp. 705–710, 2008.
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D., “Semantic Annotation, Indexing, and Retrieval,” Journal of Web Semantics, 2, 2005.
Lee D. L., Chuang H., Seamons K.: “Document Ranking and the Vector-Space Model. ” IEEE Software 14, 67–75 (1997)
Manning, C. D., Raghavan, P., Schtze, H., Introduction to Information Retrieval, Cambridge University Press, 2008.
Meij, E., Katrenko, S., “Bootstrapping Language Associated with Biomedical Entities,” in Proc. of the 16th Text Retrieval Conference, 2007.
Mihalcea, R. and Moldovan, D., “Document Indexing Using Named Entities,” Studies in Informatics and Control, 10, 2001.
Nguyen V. T. T., Cao T. H.: “VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web”. Journal of New Generation Computing 25, 277–292 (2007)
Salton G., Wong A., Yang C. S.: “A Vector Space Model for Automatic Indexing”. Communications of the ACM 18, 613–620 (1975)
Sekine, S., “Named Entity: History and Future,” Proteus Project Report, 2004.
Small H.: “The Relationship of Information Science to the Social Science: A co-Citation Analysis”. Information Processing & Management 13, 277–288 (1973)
Smucker, M. D., Allan, J., Carterette, B., “A Comparison of Statistical Significance Tests for Information Retrieval Evaluation,” in Proc. of the 16th ACM Conference on Information and Knowledge Management, pp. 623–632, 2007.
Sparck Jones, K., Walker, S., Robertson, S. E., “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments Part 1 and Part 2,” Information Processing and Management, 36, pp. 623–632 and 809–840, 2000.
van Rijbergen C. J.: “A Non-Classical Logic for Information Retrieval”. The Computer Journal 29, 481–485 (1986)
Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G. M., Milios, E. E., “Semantic Similarity Methods inWordNet and Their Application to Information Retrieval on the Web,” in Proc. of the 7th Annual ACM Intl Workshop on Web Information and Data Management, pp. 10–16, 2005.
Voorhees, E. M., Harman, D. K. (Eds.), TREC - Experiment and Evaluation in Information Retrieval, MIT Press, 2005
Wang P., Hu J., Zeng H.-J., Chen Z.: “Using Wikipedia Knowledge to Improve Text Classification”. Knowledge and Information Systems 19, 265–281 (2009)
Zhou, W., Yu, C. T., Torvik, V. I., Smalheiser, N. R., “A Concept-based Framework for Passage Retrieval in Genomics,” in Proc. of the 15th Text Retrieval Conference, 2006.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Cao, T.H., Ngo, V.M. Semantic Search by Latent Ontological Features. New Gener. Comput. 30, 53–71 (2012). https://doi.org/10.1007/s00354-012-0104-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-012-0104-0