Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

  • Massih-Reza Amini
  • Patrick Gallinari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2168)


This paper investigates a new approach for unsupervised and semisupervised learning. We show that this method is an instance of the Classification EM algorithm in the case of gaussian densities. Its originality is that it relies on a discriminant approach whereas classical methods for unsupervised and semi-supervised learning rely on density estimation. This idea is used to improve a generic document summarization system, it is evaluated on the Reuters news-wire corpus and compared to other strategies.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson J.A., Richardson S.C. Logistic Discrimination and Bias correction in maximum likelihood estimation. Technometrics, 21 (1979) 71–78.MATHCrossRefGoogle Scholar
  2. 2.
    Barzilay R., Elhadad M. Using lexical chains for text summarization. Proceedings of the ACL’ 97/EACL’97 Workshop on Intelligent Scalable Text Summarization, (1997) 10–17.Google Scholar
  3. 3.
    Blum A., Mitchell T. Combining Labeled and Unlabeled Data with Co-Training. Proceedings of the 1998 Conference on Computational Learning Theory. (1998).Google Scholar
  4. 4.
    Carbonell J.G., Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st ACM SIGIR, (1998) 335–336.Google Scholar
  5. 5.
    Celeux G., Govaert G. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis. 14 (1992) 315–332.MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Chuang W.T., Yang J. Extracting sentence segments for text summarization: a machine learning approach. Proceedings of the 23rd ACM SIGIR. (2000) 152–159.Google Scholar
  7. 7.
    Duda R. O., Hart P. T. Pattern Recognition and Scene Analysis. Edn. Wiley (1973).Google Scholar
  8. 8.
    Goldstein J., Kantrowitz M., Mittal V., Carbonell J. Summarizing Text Documents: Sentence Selection and Evaluation Metrics. Proceedings of the 22nd ACM SIGIR (1999) 121–127.Google Scholar
  9. 9.
    Klavans J.L., Shaw J. Lexical semantics in summarization. Proceedings of the First Annual Workshop of the IFIP working Group for NLP and KR. (1995).Google Scholar
  10. 10.
    Knaus D., Mittendorf E., Schauble P., Sheridan P. Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System. in TREC-4 proceedings. (1994).Google Scholar
  11. 11.
    Kupiec J., Pedersen J., Chen F. A. Trainable Document Summarizer. Proceedings of the 18th ACM SIGIR. (1995) 68–73.Google Scholar
  12. 12.
    Luhn P.H. Automatic creation of literature abstracts. IBM Journal (1958) 159–165.Google Scholar
  13. 13.
    Mani I., Bloedorn E. Machine Learning of Generic and User-Focused Summarization. Proceedings of the Fifteenth National Conference on AI. (1998) 821–826.Google Scholar
  14. 14.
    Marcu D. From discourse structures to text summaries. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997) 82–88.Google Scholar
  15. 15.
    McLachlan G.J. Discriminant Analysis and Statistical Pattern Recognition. Edn. John Wiley & Sons, New-York (1992).Google Scholar
  16. 16.
    Miller D., Uyar H. A Mixture of Experts classifier with learning based on both labeled and unlabeled data. Advances in Neural Information Processing Systems. 9 (1996) 571–577.Google Scholar
  17. 17.
    Mittal V., Kantrowitz M., Goldstein J., Carbonell J. Selecting Text Spans for Document Summaries: Heuristics and Metrics. Proceedings of the 6th National Conference on AI. (1999).Google Scholar
  18. 18.
    Mitra M., Singhal A., Buckley C. Automatic Text Summarization by Paragraph Extraction. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997) 31–36.Google Scholar
  19. 19.
    Nigam K., McCallum A., Thrun A., Mitchell T. Text Classification from labeled and unlabeled documents using EM. In proceedings of National Conference on Artificial Intelligence. (1998).Google Scholar
  20. 20.
  21. 21.
    NIST. TIPSTER Information-Retrieval Text Research Collection on CD-ROM. National Institute of Standards and Technology, Gaithersburg, Maryland. (1993).Google Scholar
  22. 22.
    Radev D., McKeown K. Generating natural language summaries from multiple online sources. Computational Linguistics. (1998).Google Scholar
  23. 23.
    Roth V., Steinhage V. Nonlinear Discriminant Analysis using Kernel Functions. Advances in Neural Information Processing Systems. 12 (1999).Google Scholar
  24. 24.
    Scott A.J., Symons M.J. Clustering Methods based on Likelihood Ratio Criteria. Biometrics. 27 (1991) 387–397.CrossRefGoogle Scholar
  25. 25.
    Sparck Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer laboratory, university of Cambridge. (1993).Google Scholar
  26. 26.
    Strzalkowski T., Wang J., Wise B. A robust practical text summarization system. Proceedings of the Fifteenth National Conference on AI. (1998) 26–30.Google Scholar
  27. 27.
    SUMMAC. TIPSTER Text Summarization Evaluation Conference (SUMMAC).
  28. 28.
    Symons M.J. Clustering Criteria and Multivariate Normal Mixture. Biometrics. 37 (1981) 35–43.MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Teufel S., Moens M. Sentence Extraction as a Classification Task. Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization. (1997). 58–65.Google Scholar
  30. 30.
    Xu J., Croft W.B. Query Expansion Using Local and Global Document Analysis. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1996). 4–11.Google Scholar
  31. 31.
    Zechner K.: Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences. COLING. (1996) 986–989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Massih-Reza Amini
    • 1
  • Patrick Gallinari
    • 1
  1. 1.LIP6University of Paris 6Paris cedex 05France

Personalised recommendations