Integer Linear Programming for Dutch Sentence Compression

  • Jan De Belder
  • Marie-Francine Moens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6008)

Abstract

Sentence compression is a valuable task in the framework of text summarization. In this paper we compress sentences from news articles from Dutch and Flemish newspapers written in Dutch using an integer linear programming approach. We rely on the Alpino parser available for Dutch and on the Latent Words Language Model. We demonstrate that the integer linear programming approach yields good results for compressing Dutch sentences, despite the large freedom in word order.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nenkova, A.: Automatic text summarization of newswire: Lessons learned from the document understanding conference. In: Proceedings of the National Conference on Artificial Intelligence, vol. 20, p. 1436. MIT Press, Cambridge (2005)Google Scholar
  2. 2.
    Text Analysis Conference (TAC), http://www.nist.gov/tac/
  3. 3.
    Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 310–315 (2000)Google Scholar
  4. 4.
    Knight, K., Marcu, D.: Statistics-based summarization-step one: Sentence compression. In: Proceedings of the National Conference on Artificial Intelligence, pp. 703–710. MIT Press, Cambridge (2000)Google Scholar
  5. 5.
    Angheluta, R., De Busser, R., Moens, M.F.: The use of topic segmentation for automatic summarization. In: Proceedings of the ACL 2002 Workshop on Automatic Summarization, Citeseer (2002)Google Scholar
  6. 6.
    Vandeghinste, V., Pan, Y.: Sentence compression for automated subtitling: A hybrid approach. In: Proceedings of the ACL Workshop on Text Summarization, pp. 89–95 (2004)Google Scholar
  7. 7.
    Vandeghinste, V., Sang, E.: Using a parallel transcript/subtitle corpus for sentence compression. In: Proceedings of LREC 2004, Citeseer (2004)Google Scholar
  8. 8.
    Daelemans, W., Hothker, A., Sang, E.: Automatic sentence simplification for subtitling in Dutch and English. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Citeseer, pp. 1045–1048 (2004)Google Scholar
  9. 9.
    de Kok, D.: Headline generation for Dutch newspaper articles through transformation-based learning. Master’s thesisGoogle Scholar
  10. 10.
    Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research 31(1), 399–429 (2008)MATHGoogle Scholar
  11. 11.
    Teng, Z., Liu, Y., Ren, F., Tsuchiya, S.: Single document summarization based on local topic identification and word frequency. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 37–41. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL IJCNLP 2009 Conference Short Papers, Suntec, Singapore, August 2009, pp. 297–300. Association for Computational Linguistics (2009)Google Scholar
  13. 13.
    Dorr, B., Zajic, D., Schwartz, R.: Hedge Trimmer: a parse-and-trim approach to headline generation. In: Proceedings of the HLT-NAACL 2003 on Text summarization workshop, vol. 5, pp. 1–8. Association for Computational Linguistics, Morristown (2003)Google Scholar
  14. 14.
    Marsi, E., Krahmer, E., Hendrickx, I., Daelemans, W.: Is sentence compression an NLG task? In: Proceedings of the 12th European Workshop on Natural Language Generation, pp. 25–32. Association for Computational Linguistics (2009)Google Scholar
  15. 15.
    Hori, C., Furui, S.: Speech summarization: an approach through word extraction and a method for evaluation. IEICE Transactions on Information and Systems 87, 15–25 (2004)Google Scholar
  16. 16.
    Turner, J., Charniak, E.: Supervised and unsupervised learning for sentence compression. Ann Arbor 100 (2005)Google Scholar
  17. 17.
    Bouma, G., Van Noord, G., Malouf, R.: Alpino: Wide-coverage computational analysis of Dutch. In: Computational Linguistics in the Netherlands 2000. Selected Papers from the 11th CLIN Meeting (2001)Google Scholar
  18. 18.
    Deschacht, K., Moens, M.F.: Semi-supervised semantic role labeling using the latent words language model. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009 (2009)Google Scholar
  19. 19.
    Deschacht, K., Moens, M.F.: The Latent Words Language Model. In: Proceedings of the 18th Annual Belgian-Dutch Conference on Machine Learning (2009)Google Scholar
  20. 20.
    Martins, A., Smith, N., Xing, E.: Concise integer linear programming formulations for dependency parsing. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2009), Singapore (2009)Google Scholar
  21. 21.
    Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference resolution using integer programming. In: Proceedings of NAACL HLT, pp. 236–243 (2007)Google Scholar
  22. 22.
    Roth, D., Yih, W.: Integer linear programming inference for conditional random fields. In: Proceedings of the 22nd international conference on Machine learning, p. 743. ACM, New York (2005)Google Scholar
  23. 23.
    Briscoe, T., Carroll, J., Watson, R.: The second release of the RASP system. In: Proceedings of the COLING/ACL, vol. 6 (2006)Google Scholar
  24. 24.
    Ordelman, R., de Jong, F., van Hessen, A., Hondorp, H.: Twnc: a multifaceted Dutch news corpus. ELRA Newsletter 12(3/4), 4–7 (2007)Google Scholar
  25. 25.
    Lin, C.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jan De Belder
    • 1
  • Marie-Francine Moens
    • 1
  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenHeverleeBelgium

Personalised recommendations