Mining citation information from CiteSeer data

Fiala, Dalibor

doi:10.1007/s11192-010-0326-1

Mining citation information from CiteSeer data

Published: 30 November 2010

Volume 86, pages 553–562, (2011)
Cite this article

Scientometrics Aims and scope Submit manuscript

Dalibor Fiala¹

522 Accesses
14 Citations
Explore all metrics

Abstract

The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including citation and in-degree counts, HITS, PageRank, and its variations based on both the citation and collaboration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encouraging data citation and discovery with the Data Citation Index

Article 01 July 2014

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Article Open access 02 March 2020

Predicting citation patterns: defining and determining influence

Article 03 May 2016

Notes

References

An, Y., Janssen, J., & Milios, E. E. (2004). Characterizing and mining the citation graph of the computer science literature. Knowledge and Information Systems, 6(6), 664–678.
Article Google Scholar
Bar-Ilan, J. (2006). An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes. Information Processing and Management, 42(6), 1553–1566.
Article Google Scholar
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th World Wide Web Conference (pp. 107–117). Brisbane, Australia.
Chakrabarti, S., & Agarwal, A. (2006). Learning parameters in entity relationship graphs from ranking preferences. Lecture Notes in Computer Science, 4213, 91–102.
Article Google Scholar
Chen, C. (2000). Domain visualization for digital libraries. In Proceedings of the international conference on information visualization (IV2000) (pp. 261–267). London, UK.
Feitelson, D. G., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.
Article Google Scholar
Fiala, D., Rousselot, F., & Ježek, K. (2008). PageRank for bibliographic networks. Scientometrics, 76(1), 135–158.
Article Google Scholar
Franceschet, M. (2010). A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar. Scientometrics, 83(1), 243–258.
Article Google Scholar
Giles, C. L., & Councill, I. G. (2004). Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing. Proceedings of the National Academy of Sciences of the United States of America, 101(51), 17599–17604.
Article Google Scholar
Goodrum, A. A., McCain, K. W., Lawrence, S., & Giles, C. L. (2001). Scholarly publishing in the Internet age: A citation analysis of computer science literature. Information Processing and Management, 37(5), 661–675.
Article MATH Google Scholar
Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl. 1), 5249–5253.
Article Google Scholar
Ježek, K., Fiala, D., & Steinberger, J. (2008). Exploration and evaluation of citation networks. In Proceedings of the 12th international conference on electronic publishing (pp. 351–362). Toronto, Canada.
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Article MATH MathSciNet Google Scholar
Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.
Article Google Scholar
Popescul, A., Ungar, L. H., Lawrence, S., & Pennock, D. M. (2003). Statistical relational learning for document mining. In Proceedings of the third IEEE international conference on data mining (ICDM’03) (pp. 275–282). Melbourne, Florida, USA.
Sidiropoulos, A., & Manolopoulos, Y. (2005). A citation-based system to assist prize awarding. SIGMOD Record, 34(4), 54–60.
Article Google Scholar
Šingliar, T., & Hauskrecht, M. (2006). Noisy-OR component analysis and its application to link analysis. Journal of Machine Learning Research, 7, 2189–2213.
Google Scholar
Zhao, D. (2005). Challenges of scholarly publications on the Web to the evaluation of science—A comparison of author visibility on the Web and in print journals. Information Processing & Management, 41(6), 1403–1418.
Article Google Scholar
Zhao, D., & Logan, E. (2002). Citation analysis using scientific publications on the Web as data source: A case study in the XML research area. Scientometrics, 54(3), 449–472.
Article Google Scholar
Zhao, D., & Strotmann, A. (2007). Can citation analysis of web publications better detect research fronts? Journal of the American Society for Information Science and Technology, 58(9), 1285–1302.
Article Google Scholar
Zhou, D., Councill, I., Zha, H., & Giles, C. L. (2007). Discovering temporal communities from social network documents. In Proceedings of the seventh IEEE international conference on data mining (ICDM’07) (pp. 745–750). Omaha, Nebraska, USA.

Download references

Acknowledgments

This work (The related software may found at http://textmining.zcu.cz/downloads/sciento.php.) was supported in part by the Ministry of Education of the Czech Republic under Grant 2C06009. Many thanks go to the anonymous reviewers for their useful hints and comments and to Karel Ježek for his support of this project.

Author information

Authors and Affiliations

University of West Bohemia, Univerzitní 8, 30614, Plzeň, Czech Republic
Dalibor Fiala

Authors

Dalibor Fiala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dalibor Fiala.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 106 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fiala, D. Mining citation information from CiteSeer data. Scientometrics 86, 553–562 (2011). https://doi.org/10.1007/s11192-010-0326-1

Download citation

Received: 09 April 2010
Published: 30 November 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s11192-010-0326-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining citation information from CiteSeer data

Abstract

Access this article

Similar content being viewed by others

Encouraging data citation and discovery with the Data Citation Index

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Predicting citation patterns: defining and determining influence

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (PDF 106 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining citation information from CiteSeer data

Abstract

Access this article

Similar content being viewed by others

Encouraging data citation and discovery with the Data Citation Index

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Predicting citation patterns: defining and determining influence

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (PDF 106 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation