Abstract
Extracting nuggets (pieces of an answer) is a very important process in question answering systems, especially in the case of definition questions. Although there are advances in nugget extraction, the problem is finding some general and flexible patterns that allow producing as many useful definition nuggets as possible. Nowadays, patterns are obtained in manual or automatic way and then these patterns are matched against sentences. In contrast to the traditional form of working with patterns, we propose a method using information gain and machine learning instead of matching patterns. We classify the sentences as likely to contain nuggets or not. Also, we analyzed separately in a sentence the nuggets that are left and right of the target term (the term to define). We performed different experiments with the collections of questions from the TREC 2002, 2003 and 2004 and the F-measures obtained are comparable with the participating systems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Breiman, L.: Random Forest. Machine Learning 45 (1), (2001) 5-32.
Carmel, D., Farchi, E., Petruschka, Y., and Soffer, A.: Automatic Query refinement using lexical affinities with maximal information gain. SIGIR (2002): 283-290.
Cortes, C. and Vapnik, V.: Support Vector Networks. Machine Learning. (1995) 20:1-25.
Cui, H., Kan, M. Chua, T. and Xiao, J.: A Comparative Study on Sentence Retrieval for Definitional Questions Answering. SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), (2004) 90-99.
Denicia-Carral, C., Montes-y-Gómez, M., and Villaseñor-Pineda, L.: A Text Mining Approach for Definition Question Answering. 5th International Conference on Natural Language Processing, Fin Tal. Lecture Notes in Artificial Intelligence, Springer (2006).
Grinberg, D., Lafferty, J., and Sleator, D.: A robust parsing algorithm for link grammars. Carnegie Mellon University Computer Science technical report CMU-CS-95-125, and Proceedings of the Fourth International Workshop on Parsing Technologies, Prague, September, (1995).
Hildebranddt, W., Katz, B. and Lin, J.: Answering Definition Question Using Multiple Knowledge Sources. In Proceeding of HLT/NAACL, Boston (2004) 49-56.
Ho, T.: The Random Subspace Method for Constructing Decision Forests. IEEE Trans. on Pattern Analysis and Machine Intelligence 20 (8), (1998) 832-844.
Mitchell, T.: Machine Learning. McGraw-Hill. (1997).
Toutanova, K., Klein, D., Manning, C., and Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL (2003): 252-259.
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York. (1995).
Voohees, E.: Evaluating Answering to Definition Questions. NIST (2003) 1-3.
Voorhees, E.: Overview of the TREC 2003 Question Answering Track. NIST (2003): 54-68.
Voorhees, E.: Overview of the TREC 2004 Question Answering Track. NIST (2004): 12-20. http://lucene.apache.org/java/docs/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 International Federation for Information Processing
About this paper
Cite this paper
Martínez-Gil, C., López-López, A. (2008). Answer Extraction for Definition Questions using Information Gain and Machine Learning. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice II. IFIP AI 2008. IFIP – The International Federation for Information Processing, vol 276. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09695-7_14
Download citation
DOI: https://doi.org/10.1007/978-0-387-09695-7_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09694-0
Online ISBN: 978-0-387-09695-7
eBook Packages: Computer ScienceComputer Science (R0)