Advertisement

Knowledge and Information Systems

, Volume 38, Issue 3, pp 691–716 | Cite as

Comparative news summarization using concept-based optimization

  • Xiaojiang Huang
  • Xiaojun WanEmail author
  • Jianguo Xiao
Regular Paper

Abstract

Comparative news summarization aims to highlight the commonalities and differences between two comparable news topics by using human-readable sentences. The summary ought to focus on the salient comparative aspects of both topics, and at the same time, it should describe the representative properties of each topic appropriately. In this study, we propose a novel approach for generating comparative news summaries. We consider cross-topic pairs of semantic-related concepts as evidences of comparativeness and consider topic-related concepts as evidences of representativeness. The score of a summary is estimated by summing up the weights of evidences in the summary. We formalize the summarization task as an optimization problem of selecting proper sentences to maximize this score and address the problem by using a mixed integer programming model. The experimental results demonstrate the effectiveness of our proposed model.

Keywords

Comparative news summarization Comparative text mining  Multi-document summarization Mixed integer programming 

Notes

Acknowledgments

This work is supported by NSFC (61170166), Beijing Nova Program (2008B03), and National High Technology Research and Development Program of China (2012AA011101). We thank IBM for making the CPLEX optimizer freely access for academic use.

References

  1. 1.
    Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, DordrechtGoogle Scholar
  2. 2.
    Anttila R (1989) Historical and comparative linguistics (current issues in linguistic theory). John Benjamins Pub Co., AmsterdamGoogle Scholar
  3. 3.
    Balahur A, Steinberger R, Kabadjov M et al. (2010) Sentiment analysis in the news. In: Proceedings of the seventh international conference on language resources and evaluationGoogle Scholar
  4. 4.
    Bao S, Li R, Yu Y et al. (2008) Competitor mining with the web. IEEE Trans Knowl Data Eng 20(10): 1297–1310Google Scholar
  5. 5.
    Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328CrossRefGoogle Scholar
  6. 6.
    Black CE (1966) Dynamics of modernization. A study in comparative history. Harper & Row, New YorkGoogle Scholar
  7. 7.
    Carbonell J and Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 335–336Google Scholar
  8. 8.
    Dang HT and Owczarzak K (2008) Overview of the TAC 2008 update summarization task. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USAGoogle Scholar
  9. 9.
    Dantzig GB (1949) Programming of interdependent activities: II mathematical model. Econometrica 17(3):200–211CrossRefMathSciNetGoogle Scholar
  10. 10.
    Das AS, Datar M, Garg A et al. (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on World Wide Web. AMC, New York, NY, USA, pp 271–280Google Scholar
  11. 11.
    Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI Summarization System at TAC 2008. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USAGoogle Scholar
  12. 12.
    Gunes E, Radev DR (2004) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the 2004 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 365–371Google Scholar
  13. 13.
    Hovy E, and Lin C-Y (1997) Automated text summarization in SUMMARIST. In: Proceedings of ACL’1997/EACL’1997 workshop on intelligent scalable text summarizationGoogle Scholar
  14. 14.
    Jain A and Pantel P (2009) How do they compare? Automatic identification of comparable entities on the web. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1661–1664Google Scholar
  15. 15.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle RiverzbMATHGoogle Scholar
  16. 16.
    Jindal N and Liu B (2006a) Identifying comparative sentences in text documents. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, USA, pp 244–251Google Scholar
  17. 17.
    Jindal N and Liu B (2006b) Mining comparative sentences and relations. In: Proceedings of the 21st national conference on, artificial intelligence (AAAI-06). pp 1331–1336Google Scholar
  18. 18.
    Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4): 373–395Google Scholar
  19. 19.
    Kawai Y, Kumamoto T, Tanaka K (2007) Fair news reader: recommending news articles with different sentiments based on user preference. Knowl Intell Inf Eng Syst 4692:612–622Google Scholar
  20. 20.
    Kennedy C (2005) Comparatives, semantics of. In: Allen K (ed) Enclycopedia of language and linguistics, 2nd edn. Elsevier, OxfordGoogle Scholar
  21. 21.
    Khachian LG (1979) A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR 244:1093–1096Google Scholar
  22. 22.
    Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 385–394Google Scholar
  23. 23.
    Lerman K, McDonald R (2009) Contrastive summarization: an experiment with consumer reviews. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 113–116Google Scholar
  24. 24.
    Lerner J-Y and Pinkal Ms (2004). Comparatives and nested quantifications. In: Semantics: critical concepts in linguistics vol V, operators and sentence types, pp 70–87Google Scholar
  25. 25.
    Li R, Bao S, Wang J et al (2006) CoMiner: an effective algorithm for mining competitors from the web. In: Proceedings of the sixth IEEE international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, USA, pp 948–952Google Scholar
  26. 26.
    Lijphart A (1971) Comparative politics and the comparative method. Am Polit Sci Rev 65(3):682–693CrossRefGoogle Scholar
  27. 27.
    Lin C-Y and Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 71–78Google Scholar
  28. 28.
    Lin C-Y and Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 605Google Scholar
  29. 29.
    Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the14th international conference on world wide web. ACM, New York, NY, USA, pp 342–351Google Scholar
  30. 30.
    Liu J, Wagner E, Birnbaum L (2007) Compare & contrast: using the web to discover comparable cases for news stories. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, NY, USA, pp 541–550Google Scholar
  31. 31.
    Liu Q, Li S (2002) Word similarity computing based on How-net. Comput Linguist Chinese Lang Process 7(2):59–76Google Scholar
  32. 32.
    Ma J (1898) Ma Shi Wen Tong. Commercial Press, ShanghaiGoogle Scholar
  33. 33.
    Mani I (2001) Automatic summarization, vol 3. John Benjamins Pub Co., AmsterdamCrossRefzbMATHGoogle Scholar
  34. 34.
    Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proceedings of the 29th European conference on IR research. Springer, Berlin, Heidelberg, pp 557–564Google Scholar
  35. 35.
    Mei J-P, Chen L (2012) SumCR: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545CrossRefMathSciNetGoogle Scholar
  36. 36.
    Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, Stoudsburg, PA, USA, pp 20Google Scholar
  37. 37.
    Montes-y-Gómez M, Gelbukh A, López-López A (2001) Mining the news: trends, associations, and deviations. Computación y Sistemas 5(1):14–24Google Scholar
  38. 38.
    Nigam K, McCallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134CrossRefzbMATHGoogle Scholar
  39. 39.
    NIST (2002) The 2002 topic detection and tracking (TDT2002) task definition and evaluation planGoogle Scholar
  40. 40.
    Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. AMC, New York, NY, USA, pp 26–34Google Scholar
  41. 41.
    Pablo-Sánchez Cd, Segura-Bedmar I, Martínez P et al (2012) Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining. Knowl Inf Syst, pp 1–23Google Scholar
  42. 42.
    Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 66–76Google Scholar
  43. 43.
    Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity: measuring the relatedness of concepts. In: Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 38–41Google Scholar
  44. 44.
    Rau LF, Jacobs PS, Zernik U (1989) Information extraction and text summarization using linguistic knowledge acquisition. Inf Process Manag 25(4):419–428CrossRefGoogle Scholar
  45. 45.
    Roy JR (2000) Compare and contrast, Grades 5-6: using comparisons and contrasts to build comprehension. Instructional fair, Inc., Grand Rapids, MIGoogle Scholar
  46. 46.
    Salton G, Singhal A, Mitra M et al (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207CrossRefGoogle Scholar
  47. 47.
    Shen D, Sun J-T, Li H et al (2007) Document summarization using conditional random fields. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp 2862–2867Google Scholar
  48. 48.
    Sun A, Grishman R, Sekine S (2011) Semi-supervised relation extraction with large-scale word clustering. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1 . Association for Computational Linguistics, Stroudsburg, PA, USA, pp 521–529Google Scholar
  49. 49.
    Tsai F, Zhang Y (2011) D2S: document-to-sentence framework for novelty detection. Knowl Inf Syst 29(2):419–433CrossRefGoogle Scholar
  50. 50.
    Wan X, Jia H, Huang S et al (2011) Summarizing the differences in multilingual news. In: Proceedings of the 34th annual ACM SIGIR conference (SIGIR 2011). ACM, New York, NY, USA, pp 735–744Google Scholar
  51. 51.
    Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of the 20th international joint conference on artifical, intelligence, pp 2903–2908Google Scholar
  52. 52.
    Wang D, Zhu S, Li T et al (2009) Comparative document summarization via discriminative sentence selection. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1963–1966Google Scholar
  53. 53.
    Wayne CL (1998) Topic detection & tracking (TDT) overview & perspective. In: Proceedings of the broadcast news transcription and understanding workshopGoogle Scholar
  54. 54.
    Wei F, Li W, Lu Q et al (2010) A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22(2):245–259CrossRefGoogle Scholar
  55. 55.
    Weisstein U (1974) Comparative literature and literary theory: survey and introduction. Indiana University Press, IndianaGoogle Scholar
  56. 56.
    Witte R, Bergler S (2007) Next-generation summarization: contrastive, focused, and update summaries. In: International conference on recent advances in natural language processing (RANLP 2007)Google Scholar
  57. 57.
    Wood MK, Dantzig GB (1949) Programming of interdependent activities: I general discussion. Econometrica 17(3):193–199Google Scholar
  58. 58.
    Yeh J-Y, Ke H-R, Yang W-P et al (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag 41(1):75–95CrossRefGoogle Scholar
  59. 59.
    Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining, In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining . ACM, New York, NY, USA, pp 743–748Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.Key Laboratory of Computational LinguisticsPeking University, MOEBeijingChina

Personalised recommendations