Skip to main content
Log in

Mining citation information from CiteSeer data

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including citation and in-degree counts, HITS, PageRank, and its variations based on both the citation and collaboration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://citeseer.ist.psu.edu.

  2. http://portal.acm.org.

  3. http://www.scopus.com.

  4. http://apps.isiknowledge.com.

  5. http://dblp.uni-trier.de.

  6. http://scholar.google.com.

References

  • An, Y., Janssen, J., & Milios, E. E. (2004). Characterizing and mining the citation graph of the computer science literature. Knowledge and Information Systems, 6(6), 664–678.

    Article  Google Scholar 

  • Bar-Ilan, J. (2006). An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes. Information Processing and Management, 42(6), 1553–1566.

    Article  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th World Wide Web Conference (pp. 107–117). Brisbane, Australia.

  • Chakrabarti, S., & Agarwal, A. (2006). Learning parameters in entity relationship graphs from ranking preferences. Lecture Notes in Computer Science, 4213, 91–102.

    Article  Google Scholar 

  • Chen, C. (2000). Domain visualization for digital libraries. In Proceedings of the international conference on information visualization (IV2000) (pp. 261–267). London, UK.

  • Feitelson, D. G., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.

    Article  Google Scholar 

  • Fiala, D., Rousselot, F., & Ježek, K. (2008). PageRank for bibliographic networks. Scientometrics, 76(1), 135–158.

    Article  Google Scholar 

  • Franceschet, M. (2010). A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar. Scientometrics, 83(1), 243–258.

    Article  Google Scholar 

  • Giles, C. L., & Councill, I. G. (2004). Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing. Proceedings of the National Academy of Sciences of the United States of America, 101(51), 17599–17604.

    Article  Google Scholar 

  • Goodrum, A. A., McCain, K. W., Lawrence, S., & Giles, C. L. (2001). Scholarly publishing in the Internet age: A citation analysis of computer science literature. Information Processing and Management, 37(5), 661–675.

    Article  MATH  Google Scholar 

  • Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl. 1), 5249–5253.

    Article  Google Scholar 

  • Ježek, K., Fiala, D., & Steinberger, J. (2008). Exploration and evaluation of citation networks. In Proceedings of the 12th international conference on electronic publishing (pp. 351–362). Toronto, Canada.

  • Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

    Article  MATH  MathSciNet  Google Scholar 

  • Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.

    Article  Google Scholar 

  • Popescul, A., Ungar, L. H., Lawrence, S., & Pennock, D. M. (2003). Statistical relational learning for document mining. In Proceedings of the third IEEE international conference on data mining (ICDM’03) (pp. 275–282). Melbourne, Florida, USA.

  • Sidiropoulos, A., & Manolopoulos, Y. (2005). A citation-based system to assist prize awarding. SIGMOD Record, 34(4), 54–60.

    Article  Google Scholar 

  • Šingliar, T., & Hauskrecht, M. (2006). Noisy-OR component analysis and its application to link analysis. Journal of Machine Learning Research, 7, 2189–2213.

    Google Scholar 

  • Zhao, D. (2005). Challenges of scholarly publications on the Web to the evaluation of science—A comparison of author visibility on the Web and in print journals. Information Processing & Management, 41(6), 1403–1418.

    Article  Google Scholar 

  • Zhao, D., & Logan, E. (2002). Citation analysis using scientific publications on the Web as data source: A case study in the XML research area. Scientometrics, 54(3), 449–472.

    Article  Google Scholar 

  • Zhao, D., & Strotmann, A. (2007). Can citation analysis of web publications better detect research fronts? Journal of the American Society for Information Science and Technology, 58(9), 1285–1302.

    Article  Google Scholar 

  • Zhou, D., Councill, I., Zha, H., & Giles, C. L. (2007). Discovering temporal communities from social network documents. In Proceedings of the seventh IEEE international conference on data mining (ICDM’07) (pp. 745–750). Omaha, Nebraska, USA.

Download references

Acknowledgments

This work (The related software may found at http://textmining.zcu.cz/downloads/sciento.php.) was supported in part by the Ministry of Education of the Czech Republic under Grant 2C06009. Many thanks go to the anonymous reviewers for their useful hints and comments and to Karel Ježek for his support of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalibor Fiala.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 106 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fiala, D. Mining citation information from CiteSeer data. Scientometrics 86, 553–562 (2011). https://doi.org/10.1007/s11192-010-0326-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-010-0326-1

Keywords

Navigation