A Comparison of On-Line Computer Science Citation Databases
- Vaclav PetricekAffiliated withUniversity College London
- , Ingemar J. CoxAffiliated withUniversity College London
- , Hui HanAffiliated withYahoo! Inc.
- , Isaac G. CouncillAffiliated withThe School of Information Sciences and Technology, The Pennsylvania State University
- , C. Lee GilesAffiliated withThe School of Information Sciences and Technology, The Pennsylvania State University
This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies.
We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers.
Despite their difference, both databases exhibit similar and significantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time.
- A Comparison of On-Line Computer Science Citation Databases
- Book Title
- Research and Advanced Technology for Digital Libraries
- Book Subtitle
- 9th European Conference, ECDL 2005, Vienna, Austria, September 18-23, 2005. Proceedings
- pp 438-449
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 16. Vienna University of Technology
- 17. Laboratory of Distributed Multimedia Information Systems and Applications, Technical University of Crete (MUSIC/TUC) Chania
- 18. Institute of Software Technology and Interactive Systems, Vienna University of Technology
- Author Affiliations
- 19. University College London, WC1E 6BT, Gower Street, London, United Kingdom
- 20. Yahoo! Inc., 701 First Avenue, Sunnyvale, CA, 94089
- 21. The School of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, 16802, USA
To view the rest of this content please follow the download PDF link above.