A Case Study in Natural Language Based Web Search
Abstract
Is there a public for natural language based search? This study, based on our experience with a Web portal, attempts to address criticisms on the lack of scalability and usability of natural language approaches to search. Our solution is based on InFact®, a natural language search engine that combines the speed of keyword search with the power of natural language processing. InFact performs clause level indexing, and offers a full spectrum of functionality that ranges from Boolean keyword operators to linguistic pattern matching in real time, which include recognition of syntactic roles, such as subject/object and semantic categories, such as people and places. A user of our search can navigate and retrieve information based on an understanding of actions, roles and relationships. In developing InFact, we ported the functionality of a deep text analysis platform to a modern search engine architecture. Our distributed indexing and search services are designed to scale to large document collections and large numbers of users. We tested the operational viability of InFact as a search platform by powering a live search on the Web. Site statistics and user logs demonstrate that a statistically significant segment of the user population is relying on natural language search functionality. Going forward, we will focus on promoting this functionality to an even greater percentage of users through a series of creative interfaces.
Keywords
Noun Phrase Keyword Search Parse Tree Inverted Index Computational LinguisticsPreview
Unable to display preview. Download preview PDF.
References
- 1.D. Appelt and D. Israel. Introduction to information extraction technology. IJCAI-99 tutorial. http://www.ai.sri.com/?appelt/ie-tutorial/ijcai99.pdf.Google Scholar
- 2.D. Appelt and D. Israel. Semantic approaches to binding theory. In Proceedings of the Workshop on Semantic Approaches to Binding Theory. ESSLLI, 2003.Google Scholar
- 3.A. Arampatzis, T. van der Weide, P. van Bommel, and C. Koster. Linguisticallymotivated information retrieval. In M. Dekker, editor, Encyclopedia of Library and Information Science, Springer Verlag, volume 69, pages 201–222. 2000.Google Scholar
- 4.C. F. Baker, C. J. Fillmore, and J. B. Lowe. The Berkeley FrameNet project. In C. Boitet and P. Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 86–90, San Francisco, California, 1998. Morgan Kaufmann Publishers.Google Scholar
- 5.I. Dagan, O. Glickman, and B. Magnini. The pascal recognizing textual entailment challenge. In Proceedings of the PASCAL Challenges Workshop Recognizing Textual Entailment, 2005.Google Scholar
- 6.M. Dimitrov. A light-weight approach to coreference resolution for named entities in text. Master’s thesis, University of Sofia, 2002.Google Scholar
- 7.D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288, 2002.CrossRefGoogle Scholar
- 8.D. Gildea and M. Palmer. The necessity of parsing for predicate argument recognition. In Proceedings of the 40th Meeting of the Association for Computational Linguistics (ACL 2002), pages 239–246, 2002.Google Scholar
- 10.M. Kameyama. Recognizing referential links: An information extraction perspective. In Proceedings of the ACL’97/EACL’97 Workshop on Operation Factors in Practical, Robust Anaphora Resolution, pages 46–53, 1997.Google Scholar
- 11.A. Kao and S. Poteet. Report on KDD conference 2004 panel discussion can natural language processing help text mining? SIGKDD Explorations, 6(2):132–133, 2004.CrossRefGoogle Scholar
- 12.C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96), pages 113–118, 1996.Google Scholar
- 13.S. Lappin and H. Leass. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561, 1994.Google Scholar
- 14.D. Lin and P. Pantel. DIRT-discovery of inference rules from text. In Knowledge Discovery and Data Mining, pages 323–328, 2001.Google Scholar
- 15.C. Manning and H. Schutze. Foundation of Statistical Natural Language Processing. The MIT Press, 2000.Google Scholar
- 16.R. Miltov. Robust pronoun resolution with limited knowledge. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’98)/ACL’98, pages 869–875.Google Scholar
- 17.R. Miltov. Anaphora resolution: The state of the art. Working paper. University of Wolverhamption, 1999.Google Scholar
- 18.National Institute of Standards and Technology. Automatic content extraction (ACE). http://www.itl.nist.gov/iaui/894.01/tests/ace.Google Scholar
- 19.M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth. Using predicateargument structures for information extraction. In 41th Annual Meeting of the Association for Computational Linguistics, pages 8–15, 2003.Google Scholar