Automatic Text Summarization Using a Machine Learning Approach

Neto, Joel Larocca; Freitas, Alex A.; Kaestner, Celso A. A.

doi:10.1007/3-540-36127-8_20

Joel Larocca Neto³,
Alex A. Freitas³ &
Celso A. A. Kaestner³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2507))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

1385 Accesses
71 Citations

Abstract

In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical - based on the frequency of some elements in the text; and linguistic - extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barzilay, R.; Elhadad, M. Using Lexical Chains for Text Summarization. In Mani, I.; Maybury, M. T. (eds.). In Proceedings of the ACL/EACL-97 Workshop on Intelligent Scalable Text Summarization, Association of Computional Linguistics (1997)
Google Scholar
Brandow, R.; Mitze, K., Rau, L. Automatic condensation of electronic publications by sentence selection. Information Processing and Management 31(5) (1994) 675–685
Article Google Scholar
Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of the Third Conference on Applied Comp. Linguistics. Assoc. for Computational Linguistics (1992)
Google Scholar
Carbonell, J. G.; Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR-98 (1998)
Google Scholar
Edmundson, H. P. New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2) (1969) 264–285
MATH Google Scholar
Harman, D. Data Preparation. In Merchant, R. (ed.). The Proceedings of the TIPSTER Text Program Phase I. Morgan Kaufmann Publishing Co. (1994)
Google Scholar
Kupiec, J.; Pedersen, J. O.; Chen, F. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference, Association of Computing Machinery (1995) 68–73
Google Scholar
Larocca Neto, J.; Santos, A. D.; Kaestner, CA.; Freitas, A.A.. Document clustering and text summarization. Proc. of 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining (PADD-2000) London: The Practical Application Company (2000) 41–55
Google Scholar
Luhn, H. The automatic creation of literature abstracts. IBM Journal of Research and Development 2(92) (1958) 159–165
Article MathSciNet Google Scholar
Mani, I.; House, D.; Klein, G.; Hirschman, L.; Obrsl, L.; Firmin, T.; Chrzanowski, M.; Sundheim, B. The TIPSTER SUMMAC Text Summarization Evaluation. MITRE Technical Report MTR 98W0000138. The MITRE Corporation (1998)
Google Scholar
Mani, I.; Bloedorn, E. Machine Learning of Generic and User-Focused Summarization. In Proceedings of the Fifteenth National Conference on AI (AAAI-98) (1998) 821–826
Google Scholar
Mani, I. Automatic Summarization. J. Benjamins Publ. Co. Amsterdam Philadelphia (2001)
MATH Google Scholar
Marcu, D. Discourse trees are good indicators of importance in text. In Mani., I.; Maybury, M. (eds.). Adv. in Automatic Text Summarization. The MIT Press (1999) 123–136
Google Scholar
Mitchell, T. Machine Learning. McGraw-Hill (1997)
Google Scholar
Mitra, M.; Singhal, A.; Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL’97VEACL’97 Workshop on Intelligent Scalable Text Summarization. Madrid (1997)
Google Scholar
Nevill-Manning, C. G.; Witten, I. H. Paynter, G. W. et al. KEA: Practical Automatic Keyphrase Extraction. ACMDL 1999 (1999) 254–255
Google Scholar
Porter, M.F. An algorithm for suffix stripping. Program 14, 130–137. 1980. Reprinted in: Sparck-Jones, K.; Willet, P. (eds.) Readings in Information Retrieval. Morgan Kaufmann (1997) 313-316
Google Scholar
Quinlan, J. C4.5: Programs for Machine Learning. Morgan Kaufmann SaoMateo California (1992)
Google Scholar
Rath, G. J.; Resnick A.; Sawage R. The formation of abstracts by the selection of sentences. American Documentation 12(2) (1961) 139–141
Article Google Scholar
Saltón, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523. 1988. Reprinted in: Sparck-Jones, K.; Willet, P. (eds.) Readings in Retrieval. Morgan Kaufmann (1997) 323-328
Article Google Scholar
Sparck-Jones, K. Automatic summarizing: factors and directions. In Mani, I.; Maybury, M. Advances in Automatic Text Summarization. The MIT Press (1999) 1–12
Google Scholar
Strzalkowski, T.; Stein, G.; Wang, J.; Wise, B. A Robust Practical Text Summarizer. In Mani, I.; Maybury, M. (eds.), Adv. in Autom. Text Summarization. The MIT Press (1999)
Google Scholar
Teufel, S.; Moens, M. Argumentative classification of extracted sentences as a first step towards flexible abstracting. In Mani, I.; Maybury M. (eds.). Advances in automatic text summarization. The MIT Press (1999)
Google Scholar
Yaari, Y. Segmentation of Expository Texts by Hierarchical Agglomerative Clustering. Technical Report, Bar-Ilan University Israel (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontifical Catholic University of Parana (PUCPR), Rua Imaculada Conceicao, 1155 Curitiba - PR, 80.215-901, Brazil
Joel Larocca Neto, Alex A. Freitas & Celso A. A. Kaestner

Authors

Joel Larocca Neto
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Celso A. A. Kaestner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Automação e Sistemas, Universidade Federal de Santa Catarina, 88040-900, Florianópolis, SC, Brazil
Guilherme Bittencourt
Centro de Informática, Universidade Federal de Pernambuco, Cx. Postal 7851, 50732-970, Recife, PE, Brazil
Geber L. Ramalho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neto, J.L., Freitas, A.A., Kaestner, C.A.A. (2002). Automatic Text Summarization Using a Machine Learning Approach. In: Bittencourt, G., Ramalho, G.L. (eds) Advances in Artificial Intelligence. SBIA 2002. Lecture Notes in Computer Science(), vol 2507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36127-8_20

Download citation

DOI: https://doi.org/10.1007/3-540-36127-8_20
Published: 16 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00124-9
Online ISBN: 978-3-540-36127-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics