Abstract
With the dramatic growth of social media users, microblogs are created and shared at an unprecedented rate. The high velocity and large volumes of short text posts (microblogs) bring redundancies and noise, making it hard for users and analysts to elicit useful information. In this paper, we formalize the problem from a summarization angle – Continuous Summarization over Microblog Threads (CSMT), which considers three facets: information gain of the microblog dialogue, diversity, and temporal information. This summarization problem is different from the classic ones in two aspects: (i) It is considered over a large-scale, dynamic data with high updating frequency; (ii) the context between microblogs are taken into account. We first prove that the CSMT problem is NP-hard. Then we propose a greedy algorithm with (\(1-1/\mathrm{e}\)) performance guarantee. Finally we extend the greedy algorithm on the sliding window to continuously summarize microblogs for threads. Our experimental results on large-scale datasets show that our method is more superior than other two baselines in terms of summary diversity and information gain, with a close time cost to the best performed baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bian, J., Yang, Y., Chua, T.-S.: Multimedia summarization for trending topics in microblogs. In: Proceedings of the CIKM, pp. 1807–1812 (2013)
Bian, J., Yang, Y., Zhang, H., Chua, T.-S.: Multimedia summarization for social events in microblog stream. IEEE Trans. Multimedia 17(2), 216 (2015)
Chakrabarti, D., Punera, K.: Event summarization using tweets. In: ICWSM, vol. 11, pp. 66–73 (2011)
Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards twitter context summarization with user influence models. In: Proceedings of the WSDM, pp. 527–536. ACM (2013)
Chen, Y., Zhang, X., Li, Z., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)
Chua, F., Asur, S.: Automatic summarization of events from social media. In: ICWSM (2013)
Drosou, M., Pitoura, E.: Dynamic diversification of continuous data. In: Proceedings of the EDBT, pp. 216–227 (2012)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29(3), 410–421 (2001)
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the CIKM, pp. 1173–1182 (2012)
Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. Technical report (2014)
Khan, M., Bollegala, D., Liu, G.: Multi-tweet summarization of real-time events. In: Proceedings of the SocialCom, pp. 128–133 (2013)
Li, J., Cardie, C.: Timeline generation: tracking individuals on twitter. In: Proceedings of the WWW, pp. 643–652 (2014)
Lin, J., Efron, M., Wang, Y., Sherman, G.: Overview of the TREC-2014 Microblog track. In: Proceedings of the TREC (2014)
Magdy, W., Gao, W., Elganainy, T., Wei, Z.: QCRI at TREC 2014: applying the kiss principle for the TTG task in the microblog track. Technical report (2014)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions. Math. Program. 14(1), 265–294 (1978)
Ren, Z., Liang, S., Meij, E., de Rijke, M.: Personalized time-aware tweets summarization. In: Proceedings of the SIGIR, pp. 513–522 (2013)
Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the SIGIR, pp. 533–542 (2013)
Wang, C., Yu, X., Li, Y., Zhai, C., Han, J.: Content coverage maximization on word networks for hierarchical topic summarization. In: Proceedings of the CIKM, pp. 249–258 (2013)
Zhao, X.W., Guo, Y., Yan, R., He, Y., Li, X.: Timeline generation with social attention. In: Proceedings of the SIGIR, pp. 1061–1064 (2013)
Acknowledgement
This work was partially supported by ARC DP170102726, DP170102231 and National Natural Science Foundation of China (NSFC) 91646204.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Song, L., Zhang, P., Bao, Z., Sellis, T. (2017). Continuous Summarization over Microblog Threads. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-55699-4_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)