QA@INEX Track 2011: Question Expansion and Reformulation Using the REG Summarization System

  • Jorge Vivaldi
  • Iria da Cunha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7424)

Abstract

In this paper, our strategy and results for the INEX@QA 2011 question-answering task are presented. In this task, a set of 50 documents is provided by the search engine Indri, using some queries. The initial queries are titles associated with tweets. Reformulation of these queries is carried out using terminological and named entities information. To design the queries, the full process is divided into 2 steps: a) both titles and tweets are POS tagged, and b) queries are expanded or reformulated, using: terms and named entities included in the title, terms and named entities found in the tweet related to those ones, and Wikipedia redirected terms and named entities from those ones included in the title. In our work, the automatic summarization system REG is used to summarize the 50 documents obtained with these queries. The algorithm models a document as a graph to obtain weighted sentences. A single document is generated and it is considered the answer of the query. This strategy, combining summarization and question reformulation, obtains good results regarding informativeness and readability.

Keywords

INEX Question-Answering Terms Named Entities Wikipedia Automatic Summarization REG 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Saggion, H., Lapalme, G.: Generating Indicative-Informative Summaries with SumUM. Computational Linguistics 28(4), 497–526 (2002)CrossRefGoogle Scholar
  2. 2.
    Edmunson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery 16, 264–285 (1969)CrossRefGoogle Scholar
  3. 3.
    Nanba, H., Okumura, M.: Producing More Readable Extracts by Revising Them. In: Proceedings of the 18th Int. Conference on Computational Linguistics (COLING 2000), Saarbrucken, pp. 1071–1075 (2000)Google Scholar
  4. 4.
    Gaizauskas, R., Herring, P., Oakes, M., Beaulieu, M., Willett, P., Fowkes, H., Jonsson, A.: Intelligent access to text: Integrating information extraction technology into text browsers. In: Proceedings of the Human Language Technology Conference, San Diego, pp. 189–193 (2001)Google Scholar
  5. 5.
    Lal, P., Reger, S.: Extract-based Summarization with Simplication. In: Proceedings of the 2nd Document Understanding Conference at the 40th Meeting of the Association for Computational Linguistics, pp. 90–96 (2002)Google Scholar
  6. 6.
    Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: Proceedings of the 6th Int. Conference on the Statistical Analysis of Textual Data (JADT), St. Malo, pp. 723–734 (2002)Google Scholar
  7. 7.
    da Cunha, I., Fernández, S., Velázquez Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.-M.: A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Ono, K., Sumita, K., Miike, S.: Abstract generation based on rhetorical structure extraction. In: Proceedings of the Int. Conference on Computational Linguistics, Kyoto, pp. 344–348 (1994)Google Scholar
  9. 9.
    Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. Information Processing and Management 26, 171–186 (1990)CrossRefGoogle Scholar
  10. 10.
    Radev, D.: Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources. PhD Thesis. New York, Columbia University (1999)Google Scholar
  11. 11.
    Torres-Moreno, J.-M., Ramírez, J.: REG: un algorithme glouton appliqué au résumé automatique de texte. In: Proceedings of the 10th Int. Conference on the Statistical Analysis of Textual, Roma, Italia (2010)Google Scholar
  12. 12.
    Torres-Moreno, J.-M., Ramírez, J., da Cunha, I.: Un resumeur a base de graphes, indépendant de la langue. In: Proceedings of the Int. Workshop African HLT 2010, Djibouti (2010)Google Scholar
  13. 13.
    Vivaldi, J., da Cunha, I., Ramírez, J.: The REG Summarization System with Question Reformulation at QA@INEX Track 2010. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds.) INEX 2010. LNCS, vol. 6932, pp. 295–302. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Torres-Moreno, J.-M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P.: Summary Evaluation With and Without References. Polibitis: Research Journal on Computer Science and Computer Engineering with Applications 42 (2010a)Google Scholar
  15. 15.
    Torres-Moreno, J.-M., Saggion, H., da Cunha, I., Velázquez-Morales, P., SanJuan, E.: Ealuation automatique de résumés avec et sans référence. In: Proceedings of the 17e Conférence sur le Traitement Automatique des Langues Naturelles (TALN). Univ. de Montréal et Ecole Polytechnique de Montréal, Montreal (2010b)Google Scholar
  16. 16.
    Saggion, H., Torres-Moreno, J.-M., da Cunha, I., SanJuan, E., Velázquez-Morales, P.: Multilingual Summarization Evaluation without Human Models. In: Proceedings of the 23rd Int. Conference on Computational Linguistics (COLING 2010), Pekin (2010)Google Scholar
  17. 17.
    Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection. A review of current systems. Recent Advances in Computational Terminology 2, 53–87 (2001)Google Scholar
  18. 18.
    Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. STUDFUZZ, vol. 185, pp. 255–279 (2005)Google Scholar
  19. 19.
    Ahrenberg, L.: Term Extraction: A Review (2009) (Unpublished draft)Google Scholar
  20. 20.
    Alarcón, R., Sierra, G., Bach, C.: ECODE: A Pattern Based Approach for Definitional Knowledge Extraction. In: Proceedings of the XIII EURALEX Int. Congress, pp. 923–928. IULA, UPF, DOCUMENTA UNIVERSITARIA, Barcelona (2008)Google Scholar
  21. 21.
    Enguehard, C., Pantera, L.: Automatic Natural Acquisition of a Terminology. Journal of Quantitative Linguistics 2(1), 27–32 (1994)CrossRefGoogle Scholar
  22. 22.
    Patry, A., Langlais, P.: Corpus-based terminology extraction. In: Proceedings of 7th Int. Conference on Terminology and Knowledge Engineering, Copenhagen (2005)Google Scholar
  23. 23.
    Drouin, P.: Acquisition automatique des termes: l’utilisation des pivots lexicaux spécialisés. Ph.D. Thesis. Université de Montréal, Montreal, Canada (2002)Google Scholar
  24. 24.
    Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C − value/NC − value Method of Automatic Recognition for Multi-word Terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  25. 25.
    Vintar, S.: Bilingual term recognition revisited: The bag-of-equivalents term alignment approach and its evaluation. Terminology 16(2), 141–158 (2010)CrossRefGoogle Scholar
  26. 26.
    Gómez Guinovart, X.: A Hybrid Corpus-Based Approach to Bilingual Terminology Extraction. In: Moskowich, I., Crespo, B. (eds.) Encoding the Past, Decoding The Future: Corpora in the 21st Century, pp. 147–175. Cambridge Scholar Publishing, Newcastle upon Tyne (2012)Google Scholar
  27. 27.
    Aronson, A., Lang, F.: An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3), 229–236 (2010)Google Scholar
  28. 28.
    Maynard, D.: Term Recognition Using Combined Knowledge Sources. Ph.D. Thesis. Manchester Metropolitan University, Manchester, UK (1999)Google Scholar
  29. 29.
    Vivaldi, J.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. Ph.D. thesis. Universitat Politècnica de Catalunya. Barcelona, Spain (2001)Google Scholar
  30. 30.
    Vivaldi, J., Rodríguez, H.: Using Wikipedia for term extraction in the biomedical domain: first experiences. Procesamiento del Lenguaje Natural 45, 251–254 (2010)Google Scholar
  31. 31.
    Cabrera-Diego, L., Sierra, G., Vivaldi, J., Pozzi, M.: Using Wikipedia to Validate Term Candidates for the Mexican Basic Scientific Vocabulary. In: Proceedings of LaRC 2011: First Int. Conference on Terminology, Languages, and Content Resources, Seoul, pp. 76–85 (2011)Google Scholar
  32. 32.
    Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: Improving the Extraction of Bilingual Terminology from Wikipedia. ACM Transactions on Multimedia Computing, Communications and Applications 5(4), 31.1–31.16 (2009)Google Scholar
  33. 33.
    Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proceedings of the 16th Int. Conference on Computational Linguistics, pp. 466–471 (1996)Google Scholar
  34. 34.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Journal of Linguisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  35. 35.
    Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceedings of the First AAAI Workshop on Wikipedia and Artificial Intelligence (2008)Google Scholar
  36. 36.
    Strube, M., Ponzetto, S.P.: WikiRelate! Computing Semantic Relatedness Using Wikipedia. Association for Artificial Intelligence (2006)Google Scholar
  37. 37.
    Milne, D., Witten, I.H.: A Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Mining, New York (2008)Google Scholar
  38. 38.
    Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Text Summarization Branches Out: ACL 2004 Workshop, pp. 74–81 (2004)Google Scholar
  39. 39.
    SanJuan, E., Moriceau, V., Tannier, X., Bellot, P., Mothe, J.: Overview of the INEX 2011 Question Answering Track (QA@INEX). In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 188–206. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jorge Vivaldi
    • 1
  • Iria da Cunha
    • 1
  1. 1.Institut Universitari de Lingüística AplicadaUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations