A Case Study in Natural Language Based Web Search

  • Giovanni Marchisio
  • Navdeep Dhillon
  • Jisheng Liang
  • Carsten Tusk
  • Krzysztof Koperski
  • Thien Nguyen
  • Dan White
  • Lubos Pochman

Abstract

Is there a public for natural language based search? This study, based on our experience with a Web portal, attempts to address criticisms on the lack of scalability and usability of natural language approaches to search. Our solution is based on InFact®, a natural language search engine that combines the speed of keyword search with the power of natural language processing. InFact performs clause level indexing, and offers a full spectrum of functionality that ranges from Boolean keyword operators to linguistic pattern matching in real time, which include recognition of syntactic roles, such as subject/object and semantic categories, such as people and places. A user of our search can navigate and retrieve information based on an understanding of actions, roles and relationships. In developing InFact, we ported the functionality of a deep text analysis platform to a modern search engine architecture. Our distributed indexing and search services are designed to scale to large document collections and large numbers of users. We tested the operational viability of InFact as a search platform by powering a live search on the Web. Site statistics and user logs demonstrate that a statistically significant segment of the user population is relying on natural language search functionality. Going forward, we will focus on promoting this functionality to an even greater percentage of users through a series of creative interfaces.

Keywords

Noun Phrase Keyword Search Parse Tree Inverted Index Computational Linguistics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D. Appelt and D. Israel. Introduction to information extraction technology. IJCAI-99 tutorial. http://www.ai.sri.com/?appelt/ie-tutorial/ijcai99.pdf.Google Scholar
  2. 2.
    D. Appelt and D. Israel. Semantic approaches to binding theory. In Proceedings of the Workshop on Semantic Approaches to Binding Theory. ESSLLI, 2003.Google Scholar
  3. 3.
    A. Arampatzis, T. van der Weide, P. van Bommel, and C. Koster. Linguisticallymotivated information retrieval. In M. Dekker, editor, Encyclopedia of Library and Information Science, Springer Verlag, volume 69, pages 201–222. 2000.Google Scholar
  4. 4.
    C. F. Baker, C. J. Fillmore, and J. B. Lowe. The Berkeley FrameNet project. In C. Boitet and P. Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 86–90, San Francisco, California, 1998. Morgan Kaufmann Publishers.Google Scholar
  5. 5.
    I. Dagan, O. Glickman, and B. Magnini. The pascal recognizing textual entailment challenge. In Proceedings of the PASCAL Challenges Workshop Recognizing Textual Entailment, 2005.Google Scholar
  6. 6.
    M. Dimitrov. A light-weight approach to coreference resolution for named entities in text. Master’s thesis, University of Sofia, 2002.Google Scholar
  7. 7.
    D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288, 2002.CrossRefGoogle Scholar
  8. 8.
    D. Gildea and M. Palmer. The necessity of parsing for predicate argument recognition. In Proceedings of the 40th Meeting of the Association for Computational Linguistics (ACL 2002), pages 239–246, 2002.Google Scholar
  9. 10.
    M. Kameyama. Recognizing referential links: An information extraction perspective. In Proceedings of the ACL’97/EACL’97 Workshop on Operation Factors in Practical, Robust Anaphora Resolution, pages 46–53, 1997.Google Scholar
  10. 11.
    A. Kao and S. Poteet. Report on KDD conference 2004 panel discussion can natural language processing help text mining? SIGKDD Explorations, 6(2):132–133, 2004.CrossRefGoogle Scholar
  11. 12.
    C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96), pages 113–118, 1996.Google Scholar
  12. 13.
    S. Lappin and H. Leass. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561, 1994.Google Scholar
  13. 14.
    D. Lin and P. Pantel. DIRT-discovery of inference rules from text. In Knowledge Discovery and Data Mining, pages 323–328, 2001.Google Scholar
  14. 15.
    C. Manning and H. Schutze. Foundation of Statistical Natural Language Processing. The MIT Press, 2000.Google Scholar
  15. 16.
    R. Miltov. Robust pronoun resolution with limited knowledge. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’98)/ACL’98, pages 869–875.Google Scholar
  16. 17.
    R. Miltov. Anaphora resolution: The state of the art. Working paper. University of Wolverhamption, 1999.Google Scholar
  17. 18.
    National Institute of Standards and Technology. Automatic content extraction (ACE). http://www.itl.nist.gov/iaui/894.01/tests/ace.Google Scholar
  18. 19.
    M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth. Using predicateargument structures for information extraction. In 41th Annual Meeting of the Association for Computational Linguistics, pages 8–15, 2003.Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Giovanni Marchisio
    • 1
  • Navdeep Dhillon
    • 1
  • Jisheng Liang
    • 1
  • Carsten Tusk
    • 1
  • Krzysztof Koperski
    • 1
  • Thien Nguyen
    • 1
  • Dan White
    • 1
  • Lubos Pochman
    • 1
  1. 1.Insightful CorporationSeattleUSA

Personalised recommendations