Skip to main content

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9630))

  • 452 Accesses

Abstract

In document search, documents are typically seen as a flat list of keywords. To deal with the syntactic interoperability, i.e., the use of different keywords to refer to the same real world entity, entity linkage has been used to replace keywords in the text with a unique identifier of the entity to which they are referring. Yet, the flat list of entities fails to capture the actual relationships that exist among the entities, information that is significant for a more effective document search. In this work we propose to go one step further from entity linkage in text, and model the documents as a set of structures that describe relationships among the entities mentioned in the text. We show that this kind of representation is significantly improving the effectiveness of document search. We describe the details of the implementation of the above idea and we present an extensive set of experimental results that prove our point.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The meaning of a “noun phrase” is the one used in linguistics.

  2. 2.

    Note that a “raw document” is what we defined as the document that the user provided, while a “document” is a set of statements containing identifiers and verbs.

References

  1. Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.: Banks: browsing and keyword searching in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 1083–1086 (2002)

    Google Scholar 

  2. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February - 1 March 2002, pp. 5–16 (2002)

    Google Scholar 

  3. Ando, R.K., Lee, L.: Iterative residual rescaling: an analysis and generalization of lsi. In: Proceedings of the 24st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2001)

    Google Scholar 

  4. Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Association for the Advancement of Artificial Intelligence Conference (2008)

    Google Scholar 

  5. Bergamaschi, S., Guerra, F., Interlandi, M., Trillo-Lado, R., Velegrakis, Y.: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 2016(55), 1–19 (2016)

    Article  Google Scholar 

  6. Bergamaschi, S., Domnori, E., Guerra, F., Trillo-Lado, R., Velegrakis, Y.: Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12-16 June 2011, pp. 565–576 (2011)

    Google Scholar 

  7. Bouquet, P., Stoermer, H., Niederee, C., Mana, A.: Entity name system: The backbone of an open and scalable web of data. In: Proceedings of the IEEE International Conference on Semantic Computing, pp. 554–561 (2008)

    Google Scholar 

  8. Bykau, S., Korn, F., Srivastava, D., Velegrakis, Y.: Fine-grained controversy detection in wikipedia. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, 13-17 April 2015, pp. 1573–1584 (2015). http://dx.doi.org/10.1109/ICDE.2015.7113426

  9. Cao, T.H., Tang, T.M., Chau, C.K.: Text clustering with named entities: a model, experimentation and realization. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 23, pp. 267–287. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Caputo, A., Basile, P., Semerato, G.: Integrating named entities in a semantic search engine. In: Proceedings of the 1st Italian Information Retrieval Workshop (2010)

    Google Scholar 

  11. Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters (2004)

    Google Scholar 

  12. Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, 19-23 July 2009, pp. 267–274 (2009). http://doi.acm.org/10.1145/1571941.1571989

  13. Hensman, S.: Construction of conceptual graph representation of texts. In: HLT-SRWS 2004 Proceedings of the Student Research Workshop at HLT-NAACL (2004)

    Google Scholar 

  14. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 670–681 (2002)

    Google Scholar 

  15. Ioannou, E., Nejdl, W., Niedere, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. Proc. VLDB Endowment 3, 429–438 (2010)

    Article  Google Scholar 

  16. Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013)

    Article  Google Scholar 

  17. Leskovec, J., Grobelnik, M., Milic-Frayling, N.: Learning sub-structures of document semantic graphs for document summarization. In: Workshop on Link Analysis and Group Detection (LinkKDD) (2004)

    Google Scholar 

  18. Lucene: https://lucene.apache.org

  19. Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12-14 June 2007, pp. 115–126. ACM (2007)

    Google Scholar 

  20. Mihalcea, R., Moldovan, D.: Document indexing using named entities (2001)

    Google Scholar 

  21. Mihalcea, R., Moldovan, D.I.: Document indexing using named entities. In: Studies in Informatics and Control (2001)

    Google Scholar 

  22. Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: Give me an example of what you need. PVLDB 7(5), 365–376 (2014)

    Google Scholar 

  23. Mottin, D., Marascu, A., Roy, S.B., Das, G., Palpanas, T., Velegrakis, Y.: A probabilistic optimization framework for the empty-answer problem. PVLDB 6(14), 1762–1773 (2013)

    Google Scholar 

  24. Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: Proceedings of the 31st Annual International ACM SIGIR conference on Research and Development in Information Retrieval (2008)

    Google Scholar 

  25. Steven, B., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  26. Tata, S., Lohman, G.M.: SQAK: doing more with keywords. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10-12 June 2008, pp. 889–902. ACM (2008)

    Google Scholar 

  27. The OpenCalais System: http://www.opencalais.com

  28. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment 1, 1008–1019 (2008)

    Article  Google Scholar 

  29. Zhang, L., Yu, Y.: Learning to generate CGs from domain specific sentences. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 44–57. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Velegrakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sartori, E., Velegrakis, Y., Guerra, F. (2016). Entity-Based Keyword Search in Web Documents. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49521-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49521-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49520-9

  • Online ISBN: 978-3-662-49521-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics