We propose two new tools to address the evolution of hyperlinked corpora. First, we define time graphs to extend the traditional notion of an evolving directed graph, capturing link creation as a point phenomenon in time. Second, we develop definitions and algorithms for time-dense community tracking, to crystallize the notion of community evolution.
We develop these tools in the context of Blogspace, the space of weblogs (or blogs). Our study involves approximately 750 K links among 25 K blogs. We create a time graph on these blogs by an automatic analysis of their internal time stamps. We then study the evolution of connected component structure and microscopic community structure in this time graph.
We show that Blogspace underwent a transition behavior around the end of 2001, and has been rapidly expanding, not just in metrics of scale but also in metrics of community structure and connectedness.
By randomizing link destinations in Blogspace, but retaining sources and timestamps, we introduce a concept of randomized Blogspace. Herein, we observe similar evolution of a giant component, but no corresponding increase in community structure.
Having demonstrated the formation of micro-communities over time, we then turn to the ongoing activity within active communities. We extend recent work of Kleinberg (2002) to discover dense periods of “bursty” intra-community link creation. Furthermore, we find that the blogs that give rise to these communities are significantly more enduring than an average blog.
Keywordsweblogs blogs evolution burst analysis time graphs
Unable to display preview. Download preview PDF.
- R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proc. 20th Internat. Conference on Very Large Data Bases, 1994, pp. 487–499. Google Scholar
- Z. Bar-Yossef and S. Rajagopalan, “Template detection via data mining and its applications,” in Proc. 11th Internat. World-Wide Web Conference, 2002, pp. 580–591. Google Scholar
- N. Z. Bear, “The TTLB Blogosphere Ecosystem,” January 2004, http://www.truthlaidbear.com/ecosystem.php
- K. Bharat, B. Chang, M. Henzinger, and M. Ruhl, “Who links to whom: Mining linkage between web sites,” in IEEE Internat. Conference on Data Mining, 2001, pp. 51–58. Google Scholar
- B. E. Brewington and G. Cybenko, “Keeping Up with the Changing Web,” Computer 33(5), 2000, 52–58. Google Scholar
- D. Eppstein, Z. Galil, and G. Italiano, “Dynamic graph algorithms,” in CRC Handbook of Algorithms and Theory of Computation, ed. M. J. Atallah, CRC Press, 1999, Chapter 8. Google Scholar
- P. Erdös and A. Rényi, “On the evolution of random graphs,” Magy. Tud. Akad. Mat. Kut. Intez. Kozl. 5, 1960, 17–61. Google Scholar
- U. Feige, D. Peleg, and G. Kortsarz, “The dense k-subgraph problem,” Algorithmica 29(3), 2001, 410–421. Google Scholar
- D. Fetterly, M. Manasse, M. Najork, and J. Wiener, “Crawling towards light: A large scale study of the evolution of Web pages,” in Proc. 1st Workshop on Algorithms for the Web Graph, 2002. Google Scholar
- G. W. Flake, S. Lawrence, and C. L. Giles, “Efficient identification of web communities,” in Proc. 6th ACM SIGKDD Internat. Conference on Knowledge Discovery and Data Mining, 2000, pp. 150–160. Google Scholar
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, “Information Diffusion through Blogspace,” in Proc. 13th Internat. World-Wide Web Conference, 2004. Google Scholar
- J. Kleinberg, “Authoritative sources in a hyperlinked environment,” J. ACM 46(5), 2000, 604–632. Google Scholar
- J. Kleinberg, “Bursty and hierarchical structure in streams,” in Proc. 8th ACM SIGKDD Internat. Conference on Knowledge Discovery and Data Mining, 2002, pp. 373–397. Google Scholar
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Extracting large scale knowledge bases from the Web,” in Proc. 27th Internat. Conference on Very Large Data Bases, 1999, pp. 639–650. Google Scholar
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for cyber communities,” WWW8/Computer Networks 31(11–16), 1999, 1481–1493. Google Scholar
- T. Mitchell, Machine Learning, McGraw-Hill, 1997. Google Scholar
- Perseus Development Corporation, “The blogging iceberg,” 2004, http://www.perseusdevelopment.com/blogsurvey/thebloggingiceberg.html