Knowledge and Information Systems

, Volume 6, Issue 6, pp 664–678

Characterizing and Mining the Citation Graph of the Computer Science Literature


DOI: 10.1007/s10115-003-0128-3

Cite this article as:
An, Y., Janssen, J. & Milios, E. Know. Inf. Sys. (2004) 6: 664. doi:10.1007/s10115-003-0128-3


Citation graphs representing a body of scientific literature convey measures of scholarly activity and productivity. In this work we present a study of the structure of the citation graph of the computer science literature. Using a web robot we built several topic-specific citation graphs and their union graph from the digital library ResearchIndex. After verifying that the degree distributions follow a power law, we applied a series of graph theoretical algorithms to elicit an aggregate picture of the citation graph in terms of its connectivity. We discovered the existence of a single large weakly-connected and a single large biconnected component, and confirmed the expected lack of a large strongly-connected component. The large components remained even after removing the strongest authority nodes or the strongest hub nodes, indicating that such tight connectivity is widespread and does not depend on a small subset of important nodes. Finally, minimum cuts between authority papers of different areas did not result in a balanced partitioning of the graph into areas, pointing to the need for more sophisticated algorithms for clustering the graph.


Citation graphGraph connectivityNetworked information spacesPower lawSmall worlds

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  • Yuan An
    • 1
  • Jeannette Janssen
    • 2
  • Evangelos E. Milios
    • 3
  1. 1.Department of Computer ScienceUniversity of TorontoCanada
  2. 2.Department of Mathematics and StatisticsDalhousie UniversityCanada
  3. 3.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada