The REG Summarization System with Question Reformulation at QA@INEX Track 2010

  • Jorge Vivaldi
  • Iria da Cunha
  • Javier Ramírez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6932)

Abstract

In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms and name entities from the queries, in order to obtain a list of terms and name entities related with the main topic of the question. Using this strategy, REG obtained good results regarding performance (measured with the automatic evaluation system FRESA) and readability (measured with human evaluation), being one of the seven best systems into the task.

Keywords

INEX Automatic Summarization System Question-Answering System REG 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abracos, J., Lopes, G.: Statistical methods for retrieving most significant paragraphs in newspaper articles. In: Proceedings of the ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, pp. 51–57 (1997)Google Scholar
  2. 2.
    Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization of medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)CrossRefGoogle Scholar
  3. 3.
    Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Bourigault, D., Jacquemin, C.: Term Extraction + Term Clustering: an integrated platform for computer-aided terminology. In: Proceedings of EACL, pp. 15–22 (1999)Google Scholar
  5. 5.
    Cabré, M.T.: La terminología. Representación y comunicación. IULA-UPF, Barcelona (1999)Google Scholar
  6. 6.
    Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)CrossRefGoogle Scholar
  7. 7.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. The MIT Press, Cambridge (2005)MATHGoogle Scholar
  8. 8.
    da Cunha, I., Wanner, L., Cabré, M.T.: Summarization of specialized discourse: The case of medical articles in Spanish. Terminology 13(2), 249–286 (2007)CrossRefGoogle Scholar
  9. 9.
    Edmunson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery 16, 264–285 (1969)CrossRefGoogle Scholar
  10. 10.
    Farzindar, A., Lapalme, G., Desclés, J.-P.: Résumé de textes juridiques par identification de leur structure thématique. Traitement Automatique des Langues 45(1), 39–64 (2004)Google Scholar
  11. 11.
    Fuentes, M., Gonzalez, E., Rodriguez, H.: Resumidor de noticies en catala del projecte Hermes. In: Proceedings of II Congrés d’Enginyeria en Llengua Catalana (CELC 2004), Andorra, pp. 102–102 (2004)Google Scholar
  12. 12.
    Gaizauskas, R., Herring, P., Oakes, M., Beaulieu, M., Willett, P., Fowkes, H., Jonsson, A.: Intelligent access to text: Integrating information extraction technology into text browsers. In: Proceedings of the Human Language Technology Conference, San Diego, pp. 189–193 (2001)Google Scholar
  13. 13.
    Johnson, D.B., Zou, Q., Dionisio, J.D., Liu, V.Z., Chu, W.W.: Modeling medical content for automated summarization. Annals of the New York Academy of Sciences 980, 247–258 (2002)CrossRefGoogle Scholar
  14. 14.
    Jun’ichi, K., Kentaro, T.: Exploiting Wikipedia as External Knowledge for Name Entity Recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)Google Scholar
  15. 15.
    Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3(2), 259–289 (1996)CrossRefGoogle Scholar
  16. 16.
    Lal, P., Reger, S.: Extract-based Summarization with Simplication. In: Proceedings of the 2nd Document Understanding Conference at the 40th Meeting of the Association for Computational Linguistics, pp. 90–96 (2002)Google Scholar
  17. 17.
    Leong Chieu, H., Tou Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1-7 (2002)Google Scholar
  18. 18.
    Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Text Summarization Branches Out: ACL 2004 Workshop, pp. 74–81 (2004)Google Scholar
  19. 19.
    Nanba, H., Okumura, M.: Producing More Readable Extracts by Revising Them. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, pp. 1071–1075 (2000)Google Scholar
  20. 20.
    Ono, K., Sumita, K., Miike, S.: Abstract generation based on rhetorical structure extraction. In: Proceedings of the International Conference on Computational Linguistics, Kyoto, pp. 344–348 (1994)Google Scholar
  21. 21.
    Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. Information Processing and Management 26, 171–186 (1990)CrossRefGoogle Scholar
  22. 22.
    Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279 (2005)Google Scholar
  23. 23.
    Pearson, J.: Terms in context. John Benjamin, Amsterdam (1998)CrossRefGoogle Scholar
  24. 24.
    Radev, D.: Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources. New York, Columbia University [PhD Thesis] (1999)Google Scholar
  25. 25.
    Sager, J.C.: In search of a foundation: Towards a theory of terms. Terminology 5(1), 41–57 (1999)CrossRefGoogle Scholar
  26. 26.
    Saggion, H., Lapalme, G.: Generating Indicative-Informative Summaries with SumUM. Computational Linguistics 28(4), 497–526 (2002)CrossRefGoogle Scholar
  27. 27.
    Saggion, H., Torres-Moreno, J.-M., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Multilingual Summarization Evaluation without Human Models. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Pekin (2010)Google Scholar
  28. 28.
    SanJuan, E., Bellot, P., Moriceau, V., Tannier, X.: Overview of the 2010 QA Track: Preliminary results. In: Geva, S., et al. (eds.) INEX 2010. LNCS, vol. 6932, pp. 269–281. Springer, Heidelberg (2010)Google Scholar
  29. 29.
    Sclano, F., Velardi, P.: Termextractor: a web application to learn the shared terminology of emergent web communities. In: Proceedings of the 3rd International Conference on Interoperability for Enterprise Software and Applications, pp. 287–298 (2007)Google Scholar
  30. 30.
    Torres-Moreno, J.-M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Summary Evaluation With and Without References. Polibitis: Research Journal on Computer Science and Computer Engineering with Applications 42 (2010a)Google Scholar
  31. 31.
    Torres-Moreno, J.-M., Saggion, H., da Cunha, I., Velázquez-Morales, P., SanJuan, E.: Ealuation automatique de résumés avec et sans référence. In: Proceedings of the 17e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Université de Montréal et Ecole Polytechnique de Montréal, Montreal Canada (2010)Google Scholar
  32. 32.
    Torres-Moreno, J-M., Ramírez, J.: REG: un algorithme glouton appliqué au résumé automatique de texte. In: JADT 2010, Roma, Italia (2010)Google Scholar
  33. 33.
    Torres-Moreno, J-M., Ramírez, J.: Un resumeur a base de graphes, indépendant de la langue. In: Proceedings of the International Workshop African HLT 2010, Djibouti (2010)Google Scholar
  34. 34.
    Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: Proceedings of the 6th International Conference on the Statistical Analysis of Textual Data (JADT), St. Malo, pp. 723–734 (2002)Google Scholar
  35. 35.
    Vivaldi, J., da Cunha, I., Torres-Moreno, J.M., Velázquez, P.: Automatic Summarization Using Terminological and Semantic Resources. In: En Actas del 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta (2010)Google Scholar
  36. 36.
    Vivaldi, J.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. Ph.D. thesis, Universitat Politcnica de Catalunya, Barcelona (2001)Google Scholar
  37. 37.
    Vivaldi, J., Rodríguez, H.: Improving term extraction by combining different techniques. Terminology 7(1), 31–47 (2001a)CrossRefGoogle Scholar
  38. 38.
    Vivaldi, J., Màrquez, L., Rodríguez, H.: Improving term extraction by system combination using boosting. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 515–526. Springer, Heidelberg (2001b)CrossRefGoogle Scholar
  39. 39.
    Volk, M., Clematide, S.: Learn-filter-apply-forget. Mixed approaches to name entity recognition. In: Proceedings of the 6th International Workshop on Applications of Natural Language for Informations Systems, Madrid, Spain (2001)Google Scholar
  40. 40.
    Won, W., Liu, W., Bennamoun, M.: Determination of Unithood and Termhood for Term Recognition. In: Song, M., Wu, Y. (eds.) Handbook of Research on Text and Web Mining Technologies. IGI Global (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jorge Vivaldi
    • 1
  • Iria da Cunha
    • 1
  • Javier Ramírez
    • 2
  1. 1.Instituto Universitario de Lingüística Aplicada - UPFBarcelonaSpain
  2. 2.Universidad Autónoma Metropolitana-AzcapotzalcoMexico

Personalised recommendations