Selecting a Feature Set to Summarize Texts in Brazilian Portuguese

  • Daniel Saraiva Leite
  • Lucia Helena Machado Rino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4140)


This paper presents a novel approach to combining features for training an automatic extractive summarizer of texts written in Brazilian Portuguese. The approach aims at both diminishing the effort of classifying features that are representative for Automatic Summarization and providing more informativeness for the summarizer to decide which text spans to include in an extract. Finding a balanced set of features is explored through WEKA. We discuss several ways of modifying the feature set and show how automatic feature selection may be useful for customizing the summarizer.


Feature Selection Feature Subset Source Text Sentence Length Proper Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic Text Summarization, pp. 111–121. MIT Press, Cambridge (1997)Google Scholar
  2. 2.
    Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309–347 (1992)MATHGoogle Scholar
  3. 3.
    Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. In: Saitta, L. (ed.) Proc. of the 13th International Conference on Machine Learning, pp. 105–112. Morgan Kaufmann, Bari (1996)Google Scholar
  4. 4.
    Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16, 264–285 (1969)MATHGoogle Scholar
  5. 5.
    Fayyad, U., Irani, K.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Bajcsy, R. (ed.) Proc. of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, Chambery (1993)Google Scholar
  6. 6.
    Hall, M.A.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: Langley, P. (ed.) Proc. of 17th International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann, San Francisco (2000)Google Scholar
  7. 7.
    Hearst, M.A.: TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report 93/24. University of California, Berkeley (1993)Google Scholar
  8. 8.
    Hoey, M.: Patterns of Lexis in Text. Oxford University Press, Oxford (1991)Google Scholar
  9. 9.
    Hsu, C.N., Huang, H.J., Wong, T.T.: Why discretization works for naive Bayesian classifiers. In: Langley, P. (ed.) Proc. of 17th International Conference on Machine Learning, pp. 309–406. Morgan Kaufmann, San Francisco (2000)Google Scholar
  10. 10.
    John, G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Besnard, P., Hanks, S. (eds.) Proc. of the 11th Conference on Uncertainty in Artificial Intelligence, Quebec, Canada, pp. 338–345 (1995)Google Scholar
  11. 11.
    Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proc. of the 18th ACM-SIGIR Conference on Research & Development in Information Retrieval, Seatlle, WA, pp. 68–73 (1995)Google Scholar
  12. 12.
    Larocca Neto, J., Santos, A.D., Kaestner, A.A., Freitas, A.A.: Generating Text Summaries through the Relative Importance of Topics. In: Monard, M.C., Sichman, J.S. (eds.) SBIA 2000 and IBERAMIA 2000. LNCS (LNAI), vol. 1952, pp. 300–309. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Larocca Neto, J., Freitas, A.A., Kaestner, C.A.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Luhn, H.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 159–165 (1958)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Mani, I.: Automatic Summarization. John Benjamin’s Publishing Company (2001)Google Scholar
  16. 16.
    Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)Google Scholar
  17. 17.
    Módolo, M.: SuPor: an Environment for Exploration of Extractive Methods for Automatic Text Summarization for Portuguese (in Portuguese). MSc. Dissertation. Departamento de Computação, UFSCar (2003)Google Scholar
  18. 18.
    Pardo, T.A.S., Rino, L.H.M.: TeMário: A corpus for automatic text summarization (in Portuguese). NILC Tech. Report NILC-TR-03-09 (2003)Google Scholar
  19. 19.
    Quinlan, J.R.: C4.5 Programs for machine learning. Morgan-Kaufman, San Mateo (1993)Google Scholar
  20. 20.
    Rino, L.H.M., Módolo, M.: SuPor: An environment for AS of texts in Brazilian Portuguese. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS, vol. 3230, pp. 419–430. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Rino, L.H.M., Pardo, T.A.S., Silla Jr., C.N., Kaestner, C.A.A., Pombo, M.: A Comparison of Automatic Summarizers of Texts in Brazilian Portuguese. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 235–244. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  22. 22.
    Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic Text Structuring and Summarization. Information Processing & Management 33, 193–207 (1997)CrossRefGoogle Scholar
  23. 23.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  24. 24.
    Yang, Y.: Discretization for Naive-Bayes Learning. PhD Thesis, Monash University (2003)Google Scholar
  25. 25.
    Zhang, H., Su, J.: Naive Bayesian Classifiers for Ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 501–512. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Daniel Saraiva Leite
    • 1
  • Lucia Helena Machado Rino
    • 1
  1. 1.Departamento de ComputaçãoUFSCar Núcleo Interinstitucional de Lingüística ComputacionalSão CarlosBrazil

Personalised recommendations