The VLDB Journal

, Volume 23, Issue 2, pp 175–199 | Cite as

Dense subgraph maintenance under streaming edge weight updates for real-time story identification

  • Albert Angel
  • Nick Koudas
  • Nikos Sarkas
  • Divesh Srivastava
  • Michael Svendsen
  • Srikanta Tirthapura
Special Issue Paper

Abstract

Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, everyday, millions of blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly coupled real-world entities, namely the people, locations, products, etc, that are involved in the story. The sheer scale and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting and yields meaningful, intuitive results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.

Keywords

Dense subgraphs Story identification  Entity graph 

References

  1. 1.
    Abello, J., Resende, M.G.C., Sudarsky, S.: Massive quasi-clique detection. In: Proceedings of the 5th Latin American Symposium on Theoretical Informatics, pp. 598–612 (2002)Google Scholar
  2. 2.
    Angel, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD Conference, pp. 781–792 (2011)Google Scholar
  3. 3.
    Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: Proceedings of the SIGMOD Conference, pp. 1047–1050 (2009)Google Scholar
  4. 4.
    Angel, A., Sarkas, N., Koudas, N., Srivastava, D.: Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB 5(6), 574–585 (2012)Google Scholar
  5. 5.
    Bansal, N., Chiang, F., Koudas, N., Tompa, F.W.: Seeking stable clusters in the blogosphere. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), pp. 806–817 (2007)Google Scholar
  6. 6.
    Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 623–632 (2002)Google Scholar
  7. 7.
    Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)Google Scholar
  8. 8.
    Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: Proceedings of the ACM KDD Conference, pp. 554–560 (2006)Google Scholar
  9. 9.
    Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC), pp. 626–635 (1997)Google Scholar
  10. 10.
    Cortes, C., Pregibon, D., Volinsky, C.: Computational methods for dynamic graphs. J. Comput. Graph. Stat. (2003)Google Scholar
  11. 11.
    Eppstein, D., Galil, Z., Italiano, G.F.: Dynamic graph algorithms. In: Atallah, M.J. (ed.) Algorithms and Theory of Computation Handbook, chap. 8. CRC Press (1999). URL:http://www.info.uniroma2.it/~italiano/Papers/dyn-survey.ps.Z
  12. 12.
    Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24rd International Conference on Very Large Data, Bases, pp. 323–333 (1998)Google Scholar
  13. 13.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)Google Scholar
  14. 14.
    Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pp. 721–732 (2005)Google Scholar
  15. 15.
    Goldberg, A.: Finding a maximum density subgraph. Technical report, University of California at Berkeley (1984). URL:http://nma.berkeley.edu/ark:/28722/bk000570k8g
  16. 16.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15, 515–528 (2003)CrossRefGoogle Scholar
  17. 17.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)Google Scholar
  18. 18.
    Hartline, J., Sharp, A.: An incremental model for combinatorial maximization problems. In: Proceedings of the 5th International Workshop on Experimental Algorithms, pp. 36–48 (2006)Google Scholar
  19. 19.
    Hartline, J., Sharp, A.: Incremental flow. Networks 50(1), 77–85 (2007)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Hill, S., Agarwal, D.K., Bell, R., Volinsky, C.: Building an effective representation for dynamic networks. J. Comput. Graph. Stat. 15(3), 584–608 (2006)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Khuller, S., Saha, B.: On finding dense subgraphs. In: Proceedings of the 36th International Colloquium on Automata, Languages and Programming (ICALP), pp. 597–608 (2009)Google Scholar
  22. 22.
    Kim, M.S., Han, J.: Chronicle: a two-stage density-based clustering algorithm for dynamic networks. In: Discovery Science, pp. 152–167 (2009)Google Scholar
  23. 23.
    Kumar, S., Gupta, P.: An incremental algorithm for the maximum flow problem. J. Math. Model. Algorithm. 2(1), 1–16 (2003)CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Lawler, E.L.: A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Manag. Sci. 18(7), 401–405 (1972)CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Long, J., Hartman, C.: ODES: an overlapping dense sub-graph algorithm. Bioinformatics 26(21), 2788–2789 (2010)Google Scholar
  26. 26.
    Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD Conference, pp. 1155–1158 (2010)Google Scholar
  27. 27.
    Pardalos, P.M., Xue, J.: The maximum clique problem. J. Glob. Optim. 4, 301–328 (1994)CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Sarkas, N., Angel, A., Koudas, N., Srivastava, D.: Efficient identification of coupled entities in document collections. In: Proceedings of ICDE Conference, pp. 769–772 (2010) Google Scholar
  29. 29.
    Stix, V.: Finding all maximal cliques in dynamic graphs. Comput. Optim. Appl. 27, 173–186 (2004)CrossRefMATHMathSciNetGoogle Scholar
  30. 30.
    Uno, T.: An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica 56, 3–16 (2010)CrossRefMATHMathSciNetGoogle Scholar
  31. 31.
    Wang, N., Parthasarathy, S., Tan, K.L., Tung, A.K.H.: Csv: visualizing and mining cohesive subgraphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 445–458 (2008)Google Scholar
  32. 32.
    Yang, D., Rundensteiner, E.A., Ward, M.O.: Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the EDBT Conference, pp. 529–540 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Albert Angel
    • 1
  • Nick Koudas
    • 2
  • Nikos Sarkas
    • 2
  • Divesh Srivastava
    • 3
  • Michael Svendsen
    • 4
  • Srikanta Tirthapura
    • 4
  1. 1.Google Inc.ZurichSwitzerland
  2. 2.University of TorontoTorontoCanada
  3. 3.AT&T Labs-ResearchAustinUSA
  4. 4.Iowa State UniversityAmesUSA

Personalised recommendations