Abstract
Citation graphs representing a body of scientific literature convey measures of scholarly activity and productivity. In this work we present a study of the structure of the citation graph of the computer science literature. Using a web robot we built several topic-specific citation graphs and their union graph from the digital library ResearchIndex. After verifying that the degree distributions follow a power law, we applied a series of graph theoretical algorithms to elicit an aggregate picture of the citation graph in terms of its connectivity. We discovered the existence of a single large weakly-connected and a single large biconnected component, and confirmed the expected lack of a large strongly-connected component. The large components remained even after removing the strongest authority nodes or the strongest hub nodes, indicating that such tight connectivity is widespread and does not depend on a small subset of important nodes. Finally, minimum cuts between authority papers of different areas did not result in a balanced partitioning of the graph into areas, pointing to the need for more sophisticated algorithms for clustering the graph.
Similar content being viewed by others
References
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Broder AZ, Kumar SR, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web: experiments and models. In Proc 9th WWW Conf, pp 309–320
Chakrabarti S, Dom BE, Gibson D, Kleinberg J, Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Mining the link structure of the World Wide Web. IEEE Comput 32:60–67
Chen C (1999) Visualising semantic spaces and author co-citation networks in digital libraries. Inf Proc Manage 35:401–420
Chen C (2001) Visualising a knowledge domain’s intellectual structure. IEEE Comput 34:65–71
Dill S, Kumar R, McCurley K, Rajagopalan S, Sivakumar D, Tomkins A (2001) Self-similarity in the web. In 27th Int Conf on Very Large Databases (VLDB2001)
Even G, Naor J, Rao S, Schieber B (1999) Fast approximate graph partitioning algorithms. SIAM J COMPUT. Soc Ind Appl Math 28(6):2187–2214
Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In Proc of the 19th IEEE Des Automation Conf, pp 175–181
Garey M, Johnson D, Stockmeyer L (1976) Some simplified NP-complete graph problems. Theor Comput Sci 1:237–267
Garfield E (1972) Citation analysis as a tool in journal evaluation. Science 178:471–479
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. The Bell Syst Tech J 29(2):291–307
Lawrence S, Bollacker K, Giles CL (2001) ResearchIndex. NEC Research Institute, http://citeseer.nj.nec.com (accessed on Sep 30, 2001)
Redner S (1998) How popular is your paper? An empirical study of the citation distribution. Euro Phys J B 4:131–134
Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
An, Y., Janssen, J. & Milios, E. Characterizing and Mining the Citation Graph of the Computer Science Literature. Know. Inf. Sys. 6, 664–678 (2004). https://doi.org/10.1007/s10115-003-0128-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-003-0128-3