Boosting a Semantic Search Engine by Named Entities

  • Annalina Caputo
  • Pierpaolo Basile
  • Giovanni Semeraro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5722)

Abstract

Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. This paper focuses on the named entity level. Our aim is to prove that named entities are useful to improve retrieval performance. We exploit a model able to capture entity relationships, although they are not explicit in documents text. Experiments on CLEF dataset prove the effectiveness of our hypothesis.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING, pp. 466–471 (1996)Google Scholar
  2. 2.
    Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  3. 3.
    Basile, P., Caputo, A., Gentile, A.L., Degemmis, M., Lops, P., Semeraro, G.: Enhancing Semantic Search using N-Levels Document Representation. In: Bloehdorn, S., Grobelnik, M., Mika, P., Tran, D.T. (eds.) Proceedings of the Workshop on Semantic Search (SemSearch 2008) at the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, June 2, 2008. CEUR Workshop Proceedings, CEUR-WS.org, vol. 334, pp. 29–43 (2008)Google Scholar
  4. 4.
    Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: TREC, pp. 243–252 (1993)Google Scholar
  5. 5.
    Lee, J.H.: Analyses of Multiple Evidence Combination. In: SIGIR, pp. 267–276. ACM, New York (1997)CrossRefGoogle Scholar
  6. 6.
    Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)Google Scholar
  7. 7.
    Basile, P., Caputo, A., Semeraro, G.: UNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine. In: CLEF 2008: Ad Hoc Track Overview (2008) (CLEF 2008 Working Notes)Google Scholar
  8. 8.
    Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 24–31 (2003)Google Scholar
  9. 9.
    Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)MATHGoogle Scholar
  10. 10.
    Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)Google Scholar
  11. 11.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  12. 12.
    Basile, P., de Gemmis, M., Gentile, A., Iaquinta, L., Lops, P., Semeraro, G.: META-MultilanguagE Text Analyzer. In: Proc. of the Language and Speech Technnology Conference-LangTech., pp. 137–140 (2008)Google Scholar
  13. 13.
    Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Working notes for the CLEF 2008 Workshop (2008), http://www.clef-campaign.org/2008/working_notes/adhoc-final.pdf
  14. 14.
    Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–240 (1997)CrossRefGoogle Scholar
  15. 15.
    Lund, K., Burgess, C.: Producing High-Dimensional Semantic Spaces From Lexical Co-Occurrence. Behavior Research Methods Instruments and Computers 28, 203–208 (1996)CrossRefGoogle Scholar
  16. 16.
    Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm: Stockholm University, Faculty of Humanities, Department of Linguistics (2006)Google Scholar
  17. 17.
    Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)MathSciNetGoogle Scholar
  18. 18.
    Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)Google Scholar
  19. 19.
    Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)MATHGoogle Scholar
  20. 20.
    Smeaton, A., Kelledy, F., ODonnell, R.: TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet, and POS Tagging of Spanish. In: TREC (1995)Google Scholar
  21. 21.
    Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 3-6, pp. 61–69 (1994) (Special Issue of the SIGIR Forum)Google Scholar
  22. 22.
    Corley, C., Mihalcea, R.: Measuring the Semantic Similarity of Texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, Michigan, June 2005, pp. 13–18. Association for Computational Linguistics (2005)Google Scholar
  23. 23.
    Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)MATHGoogle Scholar
  24. 24.
    Moldovan, D.I., Mihalcea, R.: Using WordNet and Lexical Operators to Improve Internet Searches. IEEE Internet Computing 4(1), 34–43 (2000)CrossRefGoogle Scholar
  25. 25.
    Davies, J., Weeks, R.: QuizRDF: Search Technology for the Semantic Web. In: Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004)-Track 4, vol. 4, p. 8 (2004)Google Scholar
  26. 26.
    Ducatel, G., Cui, Z., Azvine, B.: Hybrid Ontology and Keyword Matching Indexing System. In: Proceedings of IntraWeb Workshop at WWW2006, Edimburgh (2006)Google Scholar
  27. 27.
    Thompson, P., Dozier, C.: Name searching and information retrieval. In: Proceedings of Second Conference on Empirical Methods in Natural Language Processing, pp. 134–140 (1997)Google Scholar
  28. 28.
    Pehcevski, J., Vercoustre, A.M., Thom, J.A.: Exploiting Locality of Wikipedia Links in Entity Ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  29. 29.
    Bautin, M., Skiena, S.: Concordance-Based Entity-Oriented Search. In: Web Intelligence, pp. 586–592. IEEE Computer Society, Los Alamitos (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Annalina Caputo
    • 1
  • Pierpaolo Basile
    • 1
  • Giovanni Semeraro
    • 1
  1. 1.Department of Computer ScienceUniversity of BariBariItaly

Personalised recommendations