World Wide Web

, Volume 8, Issue 2, pp 159–178

On the Bursty Evolution of Blogspace

  • Ravi Kumar
  • Jasmine Novak
  • Prabhakar Raghavan
  • Andrew Tomkins
Article

Abstract

We propose two new tools to address the evolution of hyperlinked corpora. First, we define time graphs to extend the traditional notion of an evolving directed graph, capturing link creation as a point phenomenon in time. Second, we develop definitions and algorithms for time-dense community tracking, to crystallize the notion of community evolution.

We develop these tools in the context of Blogspace, the space of weblogs (or blogs). Our study involves approximately 750 K links among 25 K blogs. We create a time graph on these blogs by an automatic analysis of their internal time stamps. We then study the evolution of connected component structure and microscopic community structure in this time graph.

We show that Blogspace underwent a transition behavior around the end of 2001, and has been rapidly expanding, not just in metrics of scale but also in metrics of community structure and connectedness.

By randomizing link destinations in Blogspace, but retaining sources and timestamps, we introduce a concept of randomized Blogspace. Herein, we observe similar evolution of a giant component, but no corresponding increase in community structure.

Having demonstrated the formation of micro-communities over time, we then turn to the ongoing activity within active communities. We extend recent work of Kleinberg (2002) to discover dense periods of “bursty” intra-community link creation. Furthermore, we find that the blogs that give rise to these communities are significantly more enduring than an average blog.

Keywords

weblogs blogs evolution burst analysis time graphs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proc. 20th Internat. Conference on Very Large Data Bases, 1994, pp. 487–499. Google Scholar
  2. [2]
    Z. Bar-Yossef and S. Rajagopalan, “Template detection via data mining and its applications,” in Proc. 11th Internat. World-Wide Web Conference, 2002, pp. 580–591. Google Scholar
  3. [3]
    N. Z. Bear, “The TTLB Blogosphere Ecosystem,” January 2004, http://www.truthlaidbear.com/ecosystem.php
  4. [4]
    K. Bharat, B. Chang, M. Henzinger, and M. Ruhl, “Who links to whom: Mining linkage between web sites,” in IEEE Internat. Conference on Data Mining, 2001, pp. 51–58. Google Scholar
  5. [5]
    B. E. Brewington and G. Cybenko, “Keeping Up with the Changing Web,” Computer 33(5), 2000, 52–58. Google Scholar
  6. [6]
    D. Eppstein, Z. Galil, and G. Italiano, “Dynamic graph algorithms,” in CRC Handbook of Algorithms and Theory of Computation, ed. M. J. Atallah, CRC Press, 1999, Chapter 8. Google Scholar
  7. [7]
    P. Erdös and A. Rényi, “On the evolution of random graphs,” Magy. Tud. Akad. Mat. Kut. Intez. Kozl. 5, 1960, 17–61. Google Scholar
  8. [8]
    U. Feige, D. Peleg, and G. Kortsarz, “The dense k-subgraph problem,” Algorithmica 29(3), 2001, 410–421. Google Scholar
  9. [9]
    D. Fetterly, M. Manasse, M. Najork, and J. Wiener, “Crawling towards light: A large scale study of the evolution of Web pages,” in Proc. 1st Workshop on Algorithms for the Web Graph, 2002. Google Scholar
  10. [10]
    G. W. Flake, S. Lawrence, and C. L. Giles, “Efficient identification of web communities,” in Proc. 6th ACM SIGKDD Internat. Conference on Knowledge Discovery and Data Mining, 2000, pp. 150–160. Google Scholar
  11. [11]
    D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, “Information Diffusion through Blogspace,” in Proc. 13th Internat. World-Wide Web Conference, 2004. Google Scholar
  12. [12]
    J. Kleinberg, “Authoritative sources in a hyperlinked environment,” J. ACM 46(5), 2000, 604–632. Google Scholar
  13. [13]
    J. Kleinberg, “Bursty and hierarchical structure in streams,” in Proc. 8th ACM SIGKDD Internat. Conference on Knowledge Discovery and Data Mining, 2002, pp. 373–397. Google Scholar
  14. [14]
    R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Extracting large scale knowledge bases from the Web,” in Proc. 27th Internat. Conference on Very Large Data Bases, 1999, pp. 639–650. Google Scholar
  15. [15]
    R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for cyber communities,” WWW8/Computer Networks 31(11–16), 1999, 1481–1493. Google Scholar
  16. [16]
    T. Mitchell, Machine Learning, McGraw-Hill, 1997. Google Scholar
  17. [17]
    Perseus Development Corporation, “The blogging iceberg,” 2004, http://www.perseusdevelopment.com/blogsurvey/thebloggingiceberg.html

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Ravi Kumar
    • 1
  • Jasmine Novak
    • 1
  • Prabhakar Raghavan
    • 2
  • Andrew Tomkins
    • 3
  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.Verity Inc.SunnyvaleUSA
  3. 3.IBM Almaden Research CenterSan JoseUSA

Personalised recommendations