Characterizing and Mining the Citation Graph of the Computer Science Literature


Citation graphs representing a body of scientific literature convey measures of scholarly activity and productivity. In this work we present a study of the structure of the citation graph of the computer science literature. Using a web robot we built several topic-specific citation graphs and their union graph from the digital library ResearchIndex. After verifying that the degree distributions follow a power law, we applied a series of graph theoretical algorithms to elicit an aggregate picture of the citation graph in terms of its connectivity. We discovered the existence of a single large weakly-connected and a single large biconnected component, and confirmed the expected lack of a large strongly-connected component. The large components remained even after removing the strongest authority nodes or the strongest hub nodes, indicating that such tight connectivity is widespread and does not depend on a small subset of important nodes. Finally, minimum cuts between authority papers of different areas did not result in a balanced partitioning of the graph into areas, pointing to the need for more sophisticated algorithms for clustering the graph.

This is a preview of subscription content, log in to check access.


  1. 1.

    Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512

    Article  Google Scholar 

  2. 2.

    Broder AZ, Kumar SR, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web: experiments and models. In Proc 9th WWW Conf, pp 309–320

    Google Scholar 

  3. 3.

    Chakrabarti S, Dom BE, Gibson D, Kleinberg J, Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Mining the link structure of the World Wide Web. IEEE Comput 32:60–67

    Google Scholar 

  4. 4.

    Chen C (1999) Visualising semantic spaces and author co-citation networks in digital libraries. Inf Proc Manage 35:401–420

    Google Scholar 

  5. 5.

    Chen C (2001) Visualising a knowledge domain’s intellectual structure. IEEE Comput 34:65–71

    MATH  Google Scholar 

  6. 6.

    Dill S, Kumar R, McCurley K, Rajagopalan S, Sivakumar D, Tomkins A (2001) Self-similarity in the web. In 27th Int Conf on Very Large Databases (VLDB2001)

  7. 7.

    Even G, Naor J, Rao S, Schieber B (1999) Fast approximate graph partitioning algorithms. SIAM J COMPUT. Soc Ind Appl Math 28(6):2187–2214

    MATH  Google Scholar 

  8. 8.

    Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In Proc of the 19th IEEE Des Automation Conf, pp 175–181

  9. 9.

    Garey M, Johnson D, Stockmeyer L (1976) Some simplified NP-complete graph problems. Theor Comput Sci 1:237–267

    Article  MATH  Google Scholar 

  10. 10.

    Garfield E (1972) Citation analysis as a tool in journal evaluation. Science 178:471–479

    Google Scholar 

  11. 11.

    Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. The Bell Syst Tech J 29(2):291–307

    Google Scholar 

  12. 12.

    Lawrence S, Bollacker K, Giles CL (2001) ResearchIndex. NEC Research Institute, (accessed on Sep 30, 2001)

  13. 13.

    Redner S (1998) How popular is your paper? An empirical study of the citation distribution. Euro Phys J B 4:131–134

    Google Scholar 

  14. 14.

    Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Evangelos E. Milios.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

An, Y., Janssen, J. & Milios, E. Characterizing and Mining the Citation Graph of the Computer Science Literature. Know. Inf. Sys. 6, 664–678 (2004).

Download citation


  • Citation graph
  • Graph connectivity
  • Networked information spaces
  • Power law
  • Small worlds