Skip to main content

DVD: A Model for Event Diversified Versions Discovery

  • Conference paper
Book cover Web Technologies and Applications (APWeb 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Included in the following conference series:

Abstract

With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.

Supported by NSFC with Grant No. 61073081, National Key Technology R&D Pillar Program in the 11th Five-year Plan of China with Research No. 2009BAH47B00, ZTE University Partnership Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report, Tech. Rep. (1998)

    Google Scholar 

  2. Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR 1998, pp. 28–36 (1998)

    Google Scholar 

  3. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998, pp. 37–45 (1998)

    Google Scholar 

  4. Brants, T., Chen, F., Farahat, A.: A System for new event detection. In: SIGIR 2003, pp. 330–337 (2003)

    Google Scholar 

  5. Franz, M., Ward, T., McCarley, J.S., Zhu, W.: Unsupervised and supervised clustering for topic tracking. In: SIGIR 2001, pp. 310–317 (2001)

    Google Scholar 

  6. Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002, pp. 688–693 (2002)

    Google Scholar 

  7. Liu, S., Merhav, Y., Yee, W.G., Goharian, N., Frieder, O.: A sentence level probabilistic model for evolutionary theme pattern mining from news corpora. In: SAC 2009, pp. 1742–1747 (2009)

    Google Scholar 

  8. Nallapati, R., Feng, A., Peng, F., Allan, J.: Event threading within news topics. In: CIKM 2004, pp. 446–453 (2004)

    Google Scholar 

  9. Motwani, R., Winograd, T., Page, L., Brin, S.: The pagerank citation ranking: Bringing order to the web. Manuscript in Progress

    Google Scholar 

  10. Sergey Brin, L.P.: The anatomy of a large-scale hypertextual web search engine. In: WWW 1998, pp. 107–117 (1998)

    Google Scholar 

  11. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  Google Scholar 

  12. Jo, C.G.Y., Lagoze, C.: Detecting research topics via the correlation between graphs and texts. In: KDD 2007, pp. 370–379 (2007)

    Google Scholar 

  13. XWan, J.Y.: Collabsum: exploiting multiple document clustering for collaborative single document summarizations. In: SIGIR 2007, pp. 143–150 (2007)

    Google Scholar 

  14. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: KDD 2000, pp. 160–169 (2000)

    Google Scholar 

  15. Kumar, R., Raghavan, P., Rajagopalan, S.: Trawling the Web for emerging cyber-communities. Journal of Computer networks, 1481–1493 (1999)

    Google Scholar 

  16. Yan, R., Li, Y., Zhang, Y., Li, X.: Event Recognition from News Webpages through Latent Ingredients Extraction. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 490–501. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)

    Google Scholar 

  18. Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR 2008, pp. 299–306 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kong, L., Yan, R., He, Y., Zhang, Y., Zhang, Z., Fu, L. (2011). DVD: A Model for Event Diversified Versions Discovery. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20291-9_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20290-2

  • Online ISBN: 978-3-642-20291-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics