Researchers’ Publication Patterns and Their Use for Author Disambiguation

  • Vincent LarivièreEmail author
  • Benoit Macaluso


In recent years we have been witnessing an increase in the need for advanced bibliometric indicators for individual researchers and research groups, for which author disambiguation is needed. Using the complete population of university professors and researchers in the Canadian province of Québec (N = 13,479), their papers as well as the papers authored by their homonyms, this paper provides evidence of regularities in researchers’ publication patterns. It shows how these patterns can be used to automatically assign papers to individuals and remove papers authored by their homonyms. Two types of patterns were found: (1) at the individual researchers’ level and (2) at the level of disciplines. On the whole, these patterns allow the construction of an algorithm that provides assignment information for at least one paper for 11,105 (82.4 %) out of all 13,479 researchers—with a very low percentage of false positives (3.2 %).


Individual Researcher Bibliometric Data Reference Index Light Zone Publication Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aksnes, D. W. (2008). When different persons have an identical author name. How frequent are homonyms? Journal of the American Society for Information Science and Technology, 59(5), 838–841.CrossRefGoogle Scholar
  2. Aswani, N., Bontcheva, K., & Cunningham, H. (2006). Mining information for instance unification. Lecture Notes in Computer Science, 4273, 329–342.CrossRefGoogle Scholar
  3. Barnett, G. A., & Fink, E. L. (2008). Impact of the internet and scholar age distribution on academic citation age. Journal of the American Society for Information Science and Technology, 59(4), 526–534.CrossRefGoogle Scholar
  4. Boyack, K. W., & Klavans, R. (2008). Measuring science–technology interaction using rare inventor–author names. Journal of Informetrics, 2(3), 173–182.CrossRefGoogle Scholar
  5. Braun, T. (Ed). (2006). Evaluations of Individual Scientists and Research Institutions: Scientometrics Guidebooks Series. Budapest, Hungary : Akademiai Kiado.Google Scholar
  6. Campbell, D., Picard-Aitken, M., Côté, G., Caruso, J., Valentim, R., Edmonds, S., … & Archambault, É. (2010). Bibliometrics as a performance measurement tool for research evaluation: The case of research funded by the National Cancer Institute of Canada. American Journal of Evaluation, 31(1), 66–83.Google Scholar
  7. Cole, J. R., & Cole, S. (1973). Social stratification in science. Chicago, IL: University of Chicago Press.Google Scholar
  8. Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.CrossRefGoogle Scholar
  9. Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131–152.CrossRefMathSciNetGoogle Scholar
  10. Enserink, M. (2009). Are you ready to become a number? Science, 323, 1662–1664.CrossRefGoogle Scholar
  11. Gingras, Y., Larivière, V., Macaluso, B., & Robitaille, J. P. (2008). The effects of aging on researchers’ publication and citation patterns. PLoS One, 3(12), e4048.CrossRefGoogle Scholar
  12. Gurney, T., Horlings, E., & van den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity measures. Scientometrics, 91(2), 435–449.CrossRefGoogle Scholar
  13. Han, H., Zha, H., & Giles, C. L. (2005). Name disambiguation in author citations using a K-way spectral clustering method. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital libraries (pp. 334–343). Retrieved from
  14. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Science, 102(46), 16569–16572.CrossRefGoogle Scholar
  15. Jensen, P., Rouquier, J. B., Kreimer, P., & Croissant, Y. (2008). Scientists who engage in society perform better academically. Science and Public Policy, 35(7), 527–541.CrossRefGoogle Scholar
  16. Kang, I. S., Seung-Hoon, N., Seungwoo, L., Hanmin, J., Pyung, K., Won-Kyung, S., & Jong-Hyeok, L. (2009). On co-authorship for author disambiguation. Information Processing and Management, 45(1), 84–97.Google Scholar
  17. Larivière, V., Macaluso, B., Archambault, E., & Gingras, Y. (2010). Which scientific elites? On the concentration of research funds, publications and citations. Research Evaluation, 19(1), 45–53.CrossRefGoogle Scholar
  18. Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047.CrossRefGoogle Scholar
  19. Lewison, G. (1996). The frequencies of occurrence of scientific papers with authors of each initial letter and their variation with nationality. Scientometrics, 37(3), 401–416.CrossRefGoogle Scholar
  20. Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. Chicago, IL: Chicago University Press.Google Scholar
  21. Reijnhoudt, L., Costas, R., Noyons, E., Borner, K., & Scharnhorst, A. (2013). “Seed + Expand”: A validated methodology for creating high quality publication oeuvres of individual researchers. arXiv preprint arXiv:1301.5177.Google Scholar
  22. Schreiber, M. (2008). A modification of the h-index: The hm-index accounts for multi-authored manuscripts. Journal of Informetrics, 2(3), 211–216.CrossRefMathSciNetGoogle Scholar
  23. Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. In B. Cronin (Ed.), Annual review of information science and technology (Vol. 43, pp. 287–313). Medford, NJ: ASIST and Information Today.Google Scholar
  24. Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). Probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.CrossRefGoogle Scholar
  25. Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.CrossRefGoogle Scholar
  26. Wooding, S., Wilcox-Jay, K., Lewison, G., & Grant, J. (2006). Co-author inclusion: A novel recursive algorithmic method for dealing with homonyms in bibliometric analysis. Scientometrics, 66(1), 11–21.CrossRefGoogle Scholar
  27. Zhang, C. T. (2009). The e-index, complementing the h-index for excess citations. PLoS One, 5(5), e5429.CrossRefGoogle Scholar
  28. Zuckerman, H. A. (1977). Scientific elite: Nobel laureates in the United States. New York, NY: Free Press.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.École de bibliothéconomie et des sciences de l’informationUniversité de MontréalMontréalCanada
  2. 2.Observatoire des sciences et des technologies (OST), Centre interuniversitaire de recherche sur la science et la technologie (CIRST)Université du Québec à MontréalMontréalCanada

Personalised recommendations