Abstract
With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.
Supported by NSFC with Grant No. 61073081, National Key Technology R&D Pillar Program in the 11th Five-year Plan of China with Research No. 2009BAH47B00, ZTE University Partnership Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report, Tech. Rep. (1998)
Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR 1998, pp. 28–36 (1998)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998, pp. 37–45 (1998)
Brants, T., Chen, F., Farahat, A.: A System for new event detection. In: SIGIR 2003, pp. 330–337 (2003)
Franz, M., Ward, T., McCarley, J.S., Zhu, W.: Unsupervised and supervised clustering for topic tracking. In: SIGIR 2001, pp. 310–317 (2001)
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002, pp. 688–693 (2002)
Liu, S., Merhav, Y., Yee, W.G., Goharian, N., Frieder, O.: A sentence level probabilistic model for evolutionary theme pattern mining from news corpora. In: SAC 2009, pp. 1742–1747 (2009)
Nallapati, R., Feng, A., Peng, F., Allan, J.: Event threading within news topics. In: CIKM 2004, pp. 446–453 (2004)
Motwani, R., Winograd, T., Page, L., Brin, S.: The pagerank citation ranking: Bringing order to the web. Manuscript in Progress
Sergey Brin, L.P.: The anatomy of a large-scale hypertextual web search engine. In: WWW 1998, pp. 107–117 (1998)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Jo, C.G.Y., Lagoze, C.: Detecting research topics via the correlation between graphs and texts. In: KDD 2007, pp. 370–379 (2007)
XWan, J.Y.: Collabsum: exploiting multiple document clustering for collaborative single document summarizations. In: SIGIR 2007, pp. 143–150 (2007)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: KDD 2000, pp. 160–169 (2000)
Kumar, R., Raghavan, P., Rajagopalan, S.: Trawling the Web for emerging cyber-communities. Journal of Computer networks, 1481–1493 (1999)
Yan, R., Li, Y., Zhang, Y., Li, X.: Event Recognition from News Webpages through Latent Ingredients Extraction. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 490–501. Springer, Heidelberg (2010)
Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR 2008, pp. 299–306 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kong, L., Yan, R., He, Y., Zhang, Y., Zhang, Z., Fu, L. (2011). DVD: A Model for Event Diversified Versions Discovery. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-20291-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)