Abstract
Automatic multi-document summarization (MDS) is the process of extracting the most important information, such as events and entities, from multiple natural language texts focused on the same topic. In this paper, we experiment with the effects of different groups of information such as events and named entities in the domain of generic and update MDS. Our generic MDS system has outperformed the best recent generic MDS systems in DUC 2004 in terms of ROUGE-1 recall and \(f_1\)-measure. Update summarization is a new form of MDS, where novel yet salient sentences are chosen as summary sentences based on the assumption that the user has already read a given set of documents. We present an event based update summarization where the novelty is detected based on the temporal ordering of events, and the saliency is ensured by the event and entity distribution. To our knowledge, no other study has deeply experimented with the effects of the novelty information acquired from the temporal ordering of events (assuming that a sentence contains one or more events) in the domain of update multi-document summarization. Our update MDS system has outperformed the state-of-the-art update MDS system in terms of ROUGE-2 and ROUGE-SU4 recall measures. All our MDS systems also generate quality summaries which are manually evaluated based on popular evaluation criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here ‘22 years’ is a time period. Time periods do not carry important information for detecting novelty.
- 2.
- 3.
- 4.
- 5.
Document Creation Time (DCT) can be calculated from document name.
- 6.
Total 4 topics are taken into account, i.e. K is 4.
- 7.
ROUGE runtime arguments for DUC 2004:
\(ROUGE \text{- }a \text{- }c 95 \text{- }b 665 \text{- }m \text{- }n 4 \text{- }w 1.2\).
- 8.
References
James, F.: Allen.: maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983)
Bethard, S.: Cleartk-timeml: a minimalist approach to tempeval. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), vol. 2, pp. 10–14 (2013)
Boudin, F., El-Bèze, M., Torres-Moreno, J. M.: A scalable MMR approach to sentence scoring for multi-document update summarization. COLING (2008)
Cer, D.M., De Marneffe, M.-C., Jurafsky, D., Manning, C.D.: Parsing to stanford dependencies: trade-offs between speed and accuracy. In: LREC (2010)
Chang, A.X., Manning, C.D.: Sutime: a library for recognizing and normalizing time expressions. In: Language Resources and Evaluation (2012)
Christensen, J., Mausam, S.S., Etzioni, O.: Towards coherent multi-document summarization. In: Proceedings of NAACL-HLT, pp. 1163–1173 (2013)
Delort, J.-Y., Alfonseca, E.: Dualsum: a topic-model based approach for update summarization. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 214–223 (2012)
Denis, P., Muller, P.: Predicting globally-coherent temporal structures from texts via endpoint inference and graph decomposition. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1788–1793. AAAI Press (2011)
Pan, D., Guo, J., Zhang, J., Cheng, X.: Manifold ranking with sink points for update summarization. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1757–1760 (2010)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22(1), 457–479 (2004)
Filatova, E., Hatzivassiloglou, V.: Event-based extractive summarization. In: Proceedings of ACL Workshop on Summarization, vol. 111 (2004)
Fisher, S., Roark, B.: Query-focused supervised sentence ranking for update summaries. In: Proceeding of TAC 2008 (2008)
Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The icsi/utd summarization system at tac. In: Proceedings of the Second Text Analysis Conference, Gaithersburg, Maryland, USA. NIST (2009)
Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)
Kullback, S.: The kullback-leibler distance (1987)
Li, J., Li, S., Wang, X., Tian, Y., Chang, B.: Update summarization using a multi-level hierarchical dirichlet process model. In: COLING (2012)
Li, L., Heng, W., Jia, Y., Liu, Y., Wan, S.: Cist system report for acl multiling 2013-track 1: multilingual multi-document summarization. In: MultiLing 2013, p. 39 (2013)
Li, P., Wang, Y., Gao, W., Jiang, J.: Generating aspect-oriented multi-document summarization with event-aspect model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1137–1146 (2011)
Li, W., Mingli, W., Qin, L., Wei, X., Yuan, C.: Extractive summarization using inter-and intra-event relevance. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 369–376. Association for Computational Linguistics (2006)
Li, X., Liang, D., Shen, Y.-D.: Graph-based marginal ranking for update summarization. In: SDM, pp. 486–497. SIAM (2011)
Li, Xuan, Liang, Du, Shen, Yi-Dong: Update summarization via graph-based sentence ranking. IEEE Trans. Knowl. Data Eng. 25(5), 1162–1174 (2013)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81 (2004)
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: ACL, pp. 510–520 (2011)
Mani, I.: Automatic Summarization, vol. 3. John Benjamins Publishing, Amsterdam (2001)
Mani, I., Schiffman, B., Zhang, J.: Inferring temporal ordering of events in news. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 55–57. Association for Computational Linguistics (2003)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of EMNLP, vol. 4, p. 275, Barcelona, Spain (2004)
Ng, J.-P., Kan, M.-Y.: Improved temporal relation classification using dependency parses and selective crowdsourced annotations. In: COLING, pp. 2109–2124 (2012)
Ng, J.-P., Kan, M.-Y., Lin, Z., Feng, W., Chen, B., Jian, S., Tan, C.L.: Exploiting discourse analysis for article-wide temporal classification. In: EMNLP, pp. 12–23 (2013)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Martin, F.: Porter.: an algorithm for suffix stripping. Program Electr. Libr. Inf. Syst. 14(3), 130–137 (1980)
Pustejovsky, J., Castano, J.M., Ingria, R., Sauri, R., Gaizauskas, R.J., Setzer, A., Katz, G., Radev, D.R.: Timeml: robust specification of event and temporal expressions in text. In: New Directions in Question Answering, vol. 3, pp. 28–34 (2003)
Steinberger, J., Ježek, K.: Update summarization based on novel topic distribution. In: Proceedings of the 9th ACM symposium on Document Engineering, pp. 205–213 (2009)
Steinberger, J., Kabadjov, M., Steinberger, R., Tanev, H., Turchi, M., Zavarella, V.: Jrcs participation at tac: Guided and multilingual summarization tasks. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789. Association for Computational Linguistics (2009)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2004)
Wenjie, L., Wei Furu, L., Qin, H.Y.: Pnr 2: ranking sentences with positive and negative reinforcement for query-oriented update summarization. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 489–496 (2008)
Zhang, R., Li, W., Qin, L.: Sentence ordering with event-enriched semantics and two-layered clustering for multi-document news summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1489–1497 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chali, Y., Uddin, M. (2016). Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)