Semantic Search over Documents and Ontologies

  • Kalina Bontcheva
  • Valentin Tablan
  • Hamish Cunningham

Abstract

Semantic search over documents is about finding information that is not based just on the presence of words, but also on their meaning [1, 2]. This task is a modification of classical Information Retrieval (IR), but documents are retrieved on the basis of relevance to ontology concepts, as well as words. Nevertheless the basic assumption is quite similar - a document is characterized by the bag of tokens constituting its content, disregarding its structure. While the basic IR approach considers word stems as tokens, there has been considerable effort towards using word-senses or lexical concepts (see [3, 4]) for indexing and retrieval. In the case of semantic search, what is being indexed is typically a combination of words, ontological concepts conveying the meaning of some of these words (e.g. Cambridge is a location), and optionally relations between such concepts (e.g. Cambridge is in the UK) [1]. The latter enable somebody searching for documents about the UK to find also documents mentioning Cambridge.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M.: Semantic annotation, indexing and retrieval. Journal of Web Semantics, ISWC 2003 Special Issue 1(2), 671–680 (2004)Google Scholar
  2. 2.
    Cunningham, H., Tablan, V., Roberts, I., Greenwood, M.A., Aswani, N.: Information Extraction and Semantic Annotation for Multi-Paradigm Information Management. In: Lupu, M., Mayer, K., Tait, J., Trippe, A.J. (eds.) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol. 29, pp. 307–327. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Mahesh, K., Kud, J., Dixon, P.: Oracle at TREC8: A Lexical Approach. In: Proceedings of the Eighth Text Retrieval Conference, TREC-8 (1999)Google Scholar
  4. 4.
    Voorhees, E.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. MIT Press (1998)Google Scholar
  5. 5.
    Gruber, T.R.: A Translation Approach to Portable Ontologies. Knowledge Acquisition 5(2), 199–220 (1993)CrossRefGoogle Scholar
  6. 6.
    Singhal, A.: Introducing the knowledge graph: things, not strings (May 2012)Google Scholar
  7. 7.
    Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 771–780. ACM (2010)Google Scholar
  8. 8.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)Google Scholar
  9. 9.
    Bontcheva, K., Cunningham, H.: Semantic annotations and retrieval: Manual, semiautomatic, and automatic generation. In: Domingue, J., Fensel, D., Hendler, J. (eds.) Handbook of Semantic Web Technologies, pp. 77–116. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proc. of the 17th Conf. on Information and Knowledge Management (CIKM), pp. 509–518 (2008)Google Scholar
  11. 11.
    Rao, D., McNamee, P., Dredze, M.: Entity linking: Finding extracted entities in a knowledge base. In: Multi-source, Multi-lingual Information Extraction and Summarization. Springer (2013)Google Scholar
  12. 12.
    Ji, H., Grishman, R.: Knowledge base population: Successful approaches and challenges. In: Proc. of ACL 2011, pp. 1148–1158 (2011)Google Scholar
  13. 13.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8 (2011)Google Scholar
  14. 14.
    Shen, W., Wang, J., Luo, P., Wang, M.: LINDEN: Linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st Conference on World Wide Web, pp. 449–458 (2012)Google Scholar
  15. 15.
    Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., Sheth, A.: Context and Domain Knowledge Enhanced Entity Spotting in Informal Text. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 260–276. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Kiryakov, A., Ognyanoff, D., Velkov, R., Tashev, Z., Peikov, I.: Ldsr: Materialized reason-able view to the web of linked data. In: OWL: Experiences and Directions workshop (OWLED 2009) (2009)Google Scholar
  17. 17.
    Klyne, G., Carroll, J.: Resource description framework (RDF): Concepts and abstract syntax. W3C recommendation, W3C (2004), http://www.w3.org/TR/rdf-concepts/
  18. 18.
    Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL web ontology language reference. W3C recommendation, W3C (February 2004), http://www.w3.org/
  19. 19.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C recommendation — 15 January 2008, W3C (2008), http://www.w3.org/, http://www.w3.org/TR/rdf-sparql-query/.
  20. 20.
    Bast, H., Bäurle, F., Buchhold, B., Haussmann, E.: A case for semantic full-text search. In: Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, JIWES 2012, pp. 4:1–4:3. ACM (2012)Google Scholar
  21. 21.
    Bast, H., Bäurle, F., Buchhold, B., Haussmann, E.: Broccoli: Semantic full-text search at your fingertips. CoRR abs/1207.2615 (2012)Google Scholar
  22. 22.
    Kieniewicz, J., Sudlow, A., Newbold, E.: Coordinating improved environmental information access and discovery: Innovations in sharing environmental observations and information. In: Pillman, W., Schade, S., Smits, P. (eds.) Proceedings of the 25th International EnviroInfo Conference (2011)Google Scholar
  23. 23.
    Kieniewicz, J., Wallis, M.: User requirements. Technical Report, EnviLOD project deliverable (2012), http://gate.ac.uk/projects/envilod/EnviLOD-WP2-User-Requirements.pdf
  24. 24.
    Lupu, M., Hanbury, A.: Patent retrieval. Foundations and Trends in Information Retrieval 7(1), 1–97 (2013)CrossRefGoogle Scholar
  25. 25.
    Haas, K., Mika, P., Tarjan, P., Blanco, R.: Enhanced results for web search. In: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 725–734 (2011)Google Scholar
  26. 26.
    Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with gate’s full lifecycle open source text analytics. PLoS Computational Biology 9(2), e1002854 (2013)Google Scholar
  27. 27.
    Li, Y., Bontcheva, K., Cunningham, H.: Hierarchical, Perceptron-like Learning for Ontology Based Information Extraction. In: 16th International World Wide Web Conference (WWW 2007), pp. 777–786 (May 2007)Google Scholar
  28. 28.
    McDowell, L.K., Cafarella, M.: Ontology-Driven Information Extraction with OntoSyphon. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  29. 29.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, July 7-12, pp. 168–175. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  30. 30.
    Bontcheva, K., Cunningham, H.: Semantic annotation and retrieval: Manual, semi-automatic and automatic generation. In: Domingue, J., Fensel, D., Hendler, J.A. (eds.) Handbook of Semantic Web Technologies. Springer (2011)Google Scholar
  31. 31.
    Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C., Sachs, J.: Swoogle: A Search and Metadata Engine for the Semantic Web. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management (2004)Google Scholar
  32. 32.
    Hildebrand, M., van Ossenbruggen, J., Hardman, L.: /facet: A Browser for Heterogeneous Semantic Web Repositories. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 272–285. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Zhang, L., Liu, Q., Zhang, J., Wang, H., Pan, Y., Yu, Y.: Semplore: An IR approach to scalable hybrid query of semantic web data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 652–665. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  34. 34.
    Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: An ontology-based approach. Web Semantics 9(4), 434–452 (2011)CrossRefGoogle Scholar
  35. 35.
    Wang, H., Tran, T., Liu, C., Fu, L.: Lightweight integration of ir & db for scalable hybrid search with integrated ranking support. Web Semantics: Science, Services and Agents on the World Wide Web 9(4) (2011)Google Scholar
  36. 36.
    Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic web search based on ontological conjunctive queries. Web Semantics: Science, Services and Agents on the World Wide Web 9(4) (2011)Google Scholar
  37. 37.
    Bikakis, N., Giannopoulos, G., Dalamagas, T., Sellis, T.: Integrating keywords and semantics on document annotation and search. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 921–938. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  38. 38.
    Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – Semantic Annotation Platform. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  39. 39.
    Kiryakov, A.: OWLIM: balancing between scalable repository and light-weight reasoner. In: Proceedings of the 15th International World Wide Web Conference (WWW 2006), Edinburgh, Scotland, May 23-26 (2006)Google Scholar
  40. 40.
    Boldi, P., Vigna, S.: MG4J at TREC 2005. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), November 15-18. Special Publications, NIST, vol. 500, pp. 266–271 (2005), http://mg4j.dsi.unimi.it/
  41. 41.
    Maynard, D., Greenwood, M.A.: Large Scale Semantic Annotation, Indexing and Search at The National Archives. In: Proceedings of LREC 2012, Turkey (2012)Google Scholar
  42. 42.
    Tablan, V., Roberts, I., Cunningham, H., Bontcheva, K.: Gatecloud.net: a platform for large-scale, open-source text processing on the cloud. Philosophical Transactions of the Royal Society A 371(1983) (2013)Google Scholar
  43. 43.
    Damljanovic, D., Agatonovic, M., Cunningham, H.: Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part I. LNCS, vol. 6088, pp. 106–120. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  44. 44.
    Lopez, V., Uren, V., Motta, E., Pasin, M.: AquaLog: An Ontology-driven Question Answering System for Organizational Semantic Intranets. Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 72–105 (2007)CrossRefGoogle Scholar
  45. 45.
    Kaufmann, E., Bernstein, A.: How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users? In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 281–294. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  46. 46.
    Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Davis, B., Handschuh, S.: CLOnE: Controlled Language for Ontology Editing. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 142–155. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  47. 47.
    Bernstein, A., Kaufmann, E.: GINO - A Guided Input Natural Language Ontology Editor. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 144–157. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  48. 48.
    Damljanovic, D., Bontcheva, K.: Enhanced Semantic Access to Software Artefacts. In: Workshop on Semantic Web Enabled Software Engineering (SWESE), Karlsruhe, Germany (October 2008)Google Scholar
  49. 49.
    Lei, Y., Uren, V.S., Motta, E.: SemSearch: A Search Engine for the Semantic Web. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 238–245. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  50. 50.
    Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, T.: Repeatable and reliable semantic search evaluation. Web Semantics: Science, Services and Agents on the World Wide Web (in press)Google Scholar
  51. 51.
    Halpin, H., Lavrenko, V.: Relevance feedback between hypertext and semantic web search: Frameworks and evaluation. Web Semantics: Science, Services and Agents on the World Wide Web 9(4) (2011)Google Scholar
  52. 52.
    Bontcheva, K., Kieniewicz, J., Aswani, N., Wallis, M., Andrews, S.: User feedback report on the envilod semantic search interface. Technical Report, EnviLOD project deliverable (2012), http://gate.ac.uk/projects/envilod/EnviLOD-user-feedback-report.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Kalina Bontcheva
    • 1
  • Valentin Tablan
    • 1
  • Hamish Cunningham
    • 1
  1. 1.University of SheffieldUK

Personalised recommendations