Event Phase Extraction and Summarization

Wang, Chengyu; Zhang, Rong; He, Xiaofeng; Zhou, Guomin; Zhou, Aoying

doi:10.1007/978-3-319-48740-3_35

Chengyu Wang¹⁹,
Rong Zhang¹⁹,
Xiaofeng He¹⁹,
Guomin Zhou²⁰ &
…
Aoying Zhou¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10041))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1290 Accesses
2 Citations

Abstract

Text summarization aims to generate a single, concise representation for documents. For Web applications, documents related to an event retrieved by search engines usually describe several event phases implicitly, making it difficult for existing approaches to identify, extract and summarize these phases. In this paper, we aim to mine and summarize event phases automatically from a stream of news data on the Web. We model the semantic relations of news via a graph model called Temporal Content Coherence Graph. A structural clustering algorithm EPCluster is designed to separate news articles corresponding to event phases. After that, we calculate the relevance of news articles based on a vertex-reinforced random walk algorithm and generate event phase summaries in a relevance maximum optimization framework. Experiments on news datasets illustrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See background info at: https://en.wikipedia.org/wiki/Egyptian_Revolution_of_2011.
2.
One issue that needs to be discussed here is that because our dataset is relatively large and there are over k news articles in each cluster regarding an event phase, we set a uniform parameter k for all the event phases. We can also modify the definition such that k varies for different event phases without changing our algorithm.
3.
In the implementation, we set one day as a time slot and compute \(w_t(\cdot )\) based on publication date difference. See Fig. 1(a) and (b).
4.
Based on the definition, we can see that each news article \(d_i\) and node \(v_i\) has a one-to-one correspondence relationship. In the following, without ambiguity, we will use \(d_i\) to represent a node and a news article interchangeably.
5.
Many other methods focus on timeline generation. However, the summaries we generates are headlines and dates, making it difficult to compare our method with them. We will investigate how to modify these algorithms for our task in the future.

References

Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR, pp. 19–25 (2001)
Google Scholar
Conroy, J.M., O’Leary, D.P.: Text summarization via hidden markov models. In: SIGIR, pp. 406–407 (2001)
Google Scholar
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document summarization based on data reconstruction. In: AAAI (2012)
Google Scholar
Qian, X., Liu, Y.: Fast joint compression and summarization via graph cuts. In: EMNLP, pp. 1492–1502 (2013)
Google Scholar
Yan, R., Kong, L., Huang, C., Wan, X., Li, X., Zhang, Y.: Timeline generation through evolutionary trans-temporal summarization. In: EMNLP, pp. 433–443 (2011)
Google Scholar
Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16354-3_26
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Qiqihar Junior Teachers Coll. 22, 2004 (2011)
Google Scholar
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR, pp. 299–306 (2008)
Google Scholar
Ng, J., Chen, Y., Kan, M., Li, Z.: Exploiting timelines to enhance multi-document summarization. In: ACL, pp. 923–933 (2014)
Google Scholar
Chen, C.C., Chen, Y.-T., Sun, Y., Chen, M.C.: Life cycle modeling of news events using aging theory. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 47–59. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39857-8_7
Chapter Google Scholar
Knights, D., Mozer, M.C., Nicolov, N.: Detecting topic drift with compound topic models. In: ICWSM (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Wang, C., Zhang, R., He, X., Zhou, A.: Nerank: ranking named entities in document collections. In: WWW, pp. 123–124 (2016)
Google Scholar
De Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: SIGIR, pp. 113–120 (1999)
Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Google Scholar
Pemantle, R.: Vertex-reinforced random walk. Probab. Theory Relat. Fields 92(1), 117–136 (1992)
Article MathSciNet MATH Google Scholar
Mei, Q., Guo, J., Radev, D.R.: Divrank: the interplay of prestige and diversity in information networks. In: KDD, pp. 1009–1018 (2010)
Google Scholar
Khuller, S., Moss, A., Naor, J.S.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)
Article MathSciNet MATH Google Scholar
Chen, J., Niu, Z., Fu, H.: A multi-news timeline summarization algorithm based on aging theory. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 449–460. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25255-1_37
Chapter Google Scholar
Lin, C., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL (2003)
Google Scholar
Chieu, H.L., Lee, Y.K.: Query based event extraction along a timeline. In: SIGIR, pp. 425–432 (2004)
Google Scholar

Download references

Acknowledgements

This work is partially supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000904, Shanghai Agriculture Science Program (2015) Number 3-2 and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant No. U1509219.

Author information

Authors and Affiliations

Institute for Data Science and Engineering, East China Normal University, Shanghai, China
Chengyu Wang, Rong Zhang, Xiaofeng He & Aoying Zhou
Zhejiang Police College, Hangzhou, Zhejiang, China
Guomin Zhou

Authors

Chengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng He
View author publications
You can also search for this author in PubMed Google Scholar
Guomin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng He .

Editor information

Editors and Affiliations

Poznań University of Economics, Poznan, Poland
Wojciech Cellary
University of Minnesota, Minneapolis, Minnesota, USA
Mohamed F. Mokbel
Tsinghua University, Beijing, China
Jianmin Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Victoria University, Melbourne, Victoria, Australia
Rui Zhou
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A. (2016). Event Phase Extraction and Summarization. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-48740-3_35
Published: 02 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48739-7
Online ISBN: 978-3-319-48740-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics