Skip to main content
Log in

Semantic Search by Latent Ontological Features

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., Ribeiro-Neto, B., Modern Information Retrieval, Addison- Wesley, 1999.

  2. Bast, H., Chitea, A., Suchanek, F., Weber, I., “ESTER: Efficient Search on Text, Entities, and Relations,” in Proc. of 30th Annual International ACM SIGIR Conference, pp. 671–678, 2007.

  3. Buckley, C., “Implementation of the SMART Information Retrieval System,” Technical Report, Cornell University, pp. 85–686, 1985.

  4. Cao, T. H., Do, H. T., Hong, D. T., Quan, T. T., “Fuzzy Named Entity-Based Document Clustering,” in Proc. of the 17th IEEE International Conference on Fuzzy Systems, pp. 2028–2034, 2008.

  5. Cao, T. H., Cao, T. D., Tran, T. L., “A Robust Ontology-Based Method for Translating Natural Language Queries to Conceptual Graphs,” in Proc. of the 3th Asian Semantic Web Conference, LNCS 5367, Springer, pp. 479–492, 2008.

  6. Castells, P., Vallet, D., Fernández, M., “An Adaptation of the Vector Space Model for Ontology-Based Information Retrieval,” IEEE Transactions of Knowledge and Data Engineering, pp. 261–272, 2006.

  7. Cheng, G., Ge, W., Wu, H., Qu, H., “Searching Semantic Web Objects based on Class Hierarchies,” in Proc. of WWW2008 Workshop on Linked Data on the Web., 2008

  8. Cheng, T., Yan, X., Chen, K., Chang, C., “EntityRank: Searching Entities Directly and Holistically,” in Proc. of the 33rd Very Large Data Bases Conference, pp. 387–398, 2007.

  9. Dominich, S., “Paradox-Free Formal Foundation of Vector Space Model,” in Proc. of the ACM SIGIR 2002 Workshop on Mathematical/Formal Methods in Information Retrieval, pp. 43–48, 2002.

  10. Gonalves, A., Zhu, J., Song, D., Uren, V., Pacheco, R., “LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval,” in Proc. of the 7th International Conference on Web-Age Information Management, 2006.

  11. Gospodnetic, O., “Parsing, Indexing, and Searching XML with Digester and Lucene,” Journal of IBM DeveloperWorks, 2003.

  12. Guha, R., McCool, R., Miller, E., “Semantic Search,” in Proc. of the 12th International Conference on World Wide Web, pp. 700–709, 2003.

  13. Hull, D., “Using Statistical Testing in the Evaluation of Retrieval Experiments,” in Proc. of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 329–338, 1993.

  14. Jing, L., Ng M. K., Huang, J. Z., “Knowledge-Based Vector Space Model for Text Clustering,” Knowledge and Information Systems, 2009.

  15. Jones, R., Hassan, A., Diaz, F. “Geographic Features in Web Search Retrieval,” in Proc. of the 2nd ACM International Workshop on Geographic Information Retrieval, pp. 57–58, 2008.

  16. Khalid, M. A., Jijkoun, V., de Rijke, M., “The Impact of Named Entity Normalization on Information Retrieval for Question Answering,” in Proc. of the 30th European Conference on IR Research, LNCS 4956, Springer, pp. 705–710, 2008.

  17. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D., “Semantic Annotation, Indexing, and Retrieval,” Journal of Web Semantics, 2, 2005.

  18. Lee D. L., Chuang H., Seamons K.: “Document Ranking and the Vector-Space Model. ” IEEE Software 14, 67–75 (1997)

    Article  Google Scholar 

  19. Manning, C. D., Raghavan, P., Schtze, H., Introduction to Information Retrieval, Cambridge University Press, 2008.

  20. Meij, E., Katrenko, S., “Bootstrapping Language Associated with Biomedical Entities,” in Proc. of the 16th Text Retrieval Conference, 2007.

  21. Mihalcea, R. and Moldovan, D., “Document Indexing Using Named Entities,” Studies in Informatics and Control, 10, 2001.

  22. Nguyen V. T. T., Cao T. H.: “VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web”. Journal of New Generation Computing 25, 277–292 (2007)

    Article  Google Scholar 

  23. Salton G., Wong A., Yang C. S.: “A Vector Space Model for Automatic Indexing”. Communications of the ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  24. Sekine, S., “Named Entity: History and Future,” Proteus Project Report, 2004.

  25. Small H.: “The Relationship of Information Science to the Social Science: A co-Citation Analysis”. Information Processing & Management 13, 277–288 (1973)

    Article  Google Scholar 

  26. Smucker, M. D., Allan, J., Carterette, B., “A Comparison of Statistical Significance Tests for Information Retrieval Evaluation,” in Proc. of the 16th ACM Conference on Information and Knowledge Management, pp. 623–632, 2007.

  27. Sparck Jones, K., Walker, S., Robertson, S. E., “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments Part 1 and Part 2,” Information Processing and Management, 36, pp. 623–632 and 809–840, 2000.

    Google Scholar 

  28. van Rijbergen C. J.: “A Non-Classical Logic for Information Retrieval”. The Computer Journal 29, 481–485 (1986)

    Article  Google Scholar 

  29. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G. M., Milios, E. E., “Semantic Similarity Methods inWordNet and Their Application to Information Retrieval on the Web,” in Proc. of the 7th Annual ACM Intl Workshop on Web Information and Data Management, pp. 10–16, 2005.

  30. Voorhees, E. M., Harman, D. K. (Eds.), TREC - Experiment and Evaluation in Information Retrieval, MIT Press, 2005

  31. Wang P., Hu J., Zeng H.-J., Chen Z.: “Using Wikipedia Knowledge to Improve Text Classification”. Knowledge and Information Systems 19, 265–281 (2009)

    Article  Google Scholar 

  32. Zhou, W., Yu, C. T., Torvik, V. I., Smalheiser, N. R., “A Concept-based Framework for Passage Retrieval in Genomics,” in Proc. of the 15th Text Retrieval Conference, 2006.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tru H. Cao.

About this article

Cite this article

Cao, T.H., Ngo, V.M. Semantic Search by Latent Ontological Features. New Gener. Comput. 30, 53–71 (2012). https://doi.org/10.1007/s00354-012-0104-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-012-0104-0

Keywords

Navigation