Text Summarization by Sentence Extraction Using Unsupervised Learning

  • René Arnulfo García-Hernández
  • Romyna Montiel
  • Yulia Ledeneva
  • Eréndira Rendón
  • Alexander Gelbukh
  • Rafael Cruz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5317)

Abstract

The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Although, some approaches claim being domain and language independent, they use high dependence knowledge like key-phrases or golden samples for machine-learning approaches. In this work, we propose a language- and domain-independent automatic text summarization approach by sentence extraction using an unsupervised learning algorithm. Our hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences). Then, for composing the summary, the most representative sentence is selected from each cluster. Several experiments in the standard DUC-2002 collection show that the proposed method obtains more favorable results than other approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lin, C.Y., Hovy, E.: Automated Text Summarization in SUMMARIST. In: Proc. of ACL Workshop on Intelligent, Scalable Text Summarization, Madrid, Spain (1997)Google Scholar
  2. 2.
    Song, Y., et al.: A Term Weighting Method based on Lexical Chain for Automatic Summarization. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 636–639. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    HaCohen-Kerner, Y., Zuriel, G., Asaf, M.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Villatoro-Tello, E., Villaseñor-Pineda, L., Montes-y-Gómez, M.: Using Word Sequences for Text Summarization. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 293–300. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Chuang, T.W., Yang, J.: Text Summarization by Sentence Segment Extraction Using Machine Learning Algorithms. In: Proc. of the ACL 2004 Workshop, Barcelona, España (2004)Google Scholar
  6. 6.
    Neto, L., Freitas, A.A., Kaestner, C.A.A.: Automatic Text Summarization using a Machine learning Approach. In: Proceedings of the ACL 2004 Workshop, Barcelona, España (2004)Google Scholar
  7. 7.
    Ledeneva, Y., Gelbukh, A., García, H.R.: Terms Derived from Frequent Sequences for Extractive Text Summarization. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 593–604. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Ledeneva, Y., Gelbukh, A., García, H.R.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. Research in Computing Science 34 (2008)Google Scholar
  9. 9.
    Cristea, D., Postolache, O., Pistol, I.: Summarization through Discourse Structure. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 632–644. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Kupiec, J., Pedersen, J.O., Chen, F.: A Trainable Document Summarizer. In: Proc. 18th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 68–73 (1995)Google Scholar
  11. 11.
    DUC. Document Understanding Conference 2002 (2002), www-nlpir.nist.gov/projects/duc
  12. 12.
    Xu, W., Li, W., Wu, M., Li, W., Yuan, C.: Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 480–489. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Mihalcea, R.: Random Walks on Text Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 249–262. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proc. Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain (2004)Google Scholar
  15. 15.
    Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: Proc. Semantic Computing (ICSC 2007), Irvine, CA (2007)Google Scholar
  16. 16.
    Liu, D., He, Y., Ji, D., Hua, J.: Multi-Document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 470–479. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Koster, C.H.A.: Transducing Text to Multiword Units. In: Workshop on Multiword Units MEMURA at 4th Int. Conf. on Language Resources and Evaluation, LREC 2004, Portugal (2004)Google Scholar
  19. 19.
    Sidorov, G., Gelbukh, A.: Automatic Detection of Semantically Primitive Words Using Their Reachability in an Explanatory Dictionary. In: Proc. Int. Workshop on Natural Language Processing and Knowledge Engineering, NLPKE 2001, USA, pp. 1683–1687 (2001)Google Scholar
  20. 20.
    Luhn, H.P.: A Statical Approach to Mechanical Encoding and Searching of Literary Information. IBM Journal of Research and Development, 309–317 (1975)Google Scholar
  21. 21.
    Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24, 513–523 (1988)CrossRefGoogle Scholar
  22. 22.
    Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain (2004)Google Scholar
  23. 23.
    Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of HLT-NAACL, Canada (2003)Google Scholar
  24. 24.
    Spark Jones, K., Willet, P.: Reading in Information Retrieval. Morgan Kaufmann, San Francisco (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • René Arnulfo García-Hernández
    • 1
  • Romyna Montiel
    • 1
  • Yulia Ledeneva
    • 1
  • Eréndira Rendón
    • 1
  • Alexander Gelbukh
    • 1
  • Rafael Cruz
    • 1
  1. 1.Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, SoNet RC, University of Center Europe in SkalicaSlovakia

Personalised recommendations