Towards Effective Event Detection, Tracking and Summarization on Microblog Data
Microblogging has become one of the most popular social Web applications in recent years. Posting short messages (i.e., a maximum of 140 characters) to the Web at any time and at any place lowers the usage barrier, accelerates the information diffusion process, and makes it possible for instant publication. Among those daily user-published posts, many are related to recent or real-time events occurring in our daily life. While microblog sites usually display a list of words representing the trend topics during a time period (e.g., 24 hours, a week or even longer) on their homepages, the topical words do not make any sense to let the users have a comprehensive view of the topic, especially for those without any background knowledge. Additionally, users can only open each post in the relevant list to learn the topic details. In this paper, we propose a unified workflow of event detection, tracking and summarization on microblog data. Particularly, we introduce novel features considering the characteristics of microblog data for topical words selection, and thus for event detection. In the tracking phase, a bipartite graph is constructed to capture the relationship between two events occurring at adjacent time. The matched event pair is grouped into an event chain. Furthermore, inspired by diversity theory in Web search, we are the first to summarize event chains by considering the content coverage and evolution over time. The experimental results show the effectiveness of our approach on microblog data.
KeywordsEvent Detection Event Cluster Document Frequency Computational Linguistics Event Summary
Unable to display preview. Download preview PDF.
- 1.Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
- 2.Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, WebKDD/SNA-KDD 2007, pp. 56–65. ACM, New York (2007)Google Scholar
- 3.Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar
- 4.Mitchell, T.: Machine Learning. McGraw-Hill Education (ISE Editions) (October 1997)Google Scholar
- 5.Sakaki, T., Matsuo, Y.: How to become famous in the microblog world. In: ICWSM (2010)Google Scholar
- 6.Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: World Wide Web Conference Series, pp. 851–860 (2010)Google Scholar
- 7.Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Adar, E., Hurst, M., Finin, T., Glance, N.S., Nicolov, N., Tseng, B.L. (eds.) ICWSM. The AAAI Press, Menlo Park (2009)Google Scholar
- 8.Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 685–688. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
- 9.Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 35–44. ACM, New York (2011)Google Scholar
- 10.Zhao, Q., Mitra, P., Chen, B.: Temporal and information flow based event detection from social text streams. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1501–1506. AAAI Press, Menlo Park (2007)Google Scholar