Skip to main content

Researchers’ Publication Patterns and Their Use for Author Disambiguation

  • Chapter
  • First Online:
Book cover Measuring Scholarly Impact

Abstract

In recent years we have been witnessing an increase in the need for advanced bibliometric indicators for individual researchers and research groups, for which author disambiguation is needed. Using the complete population of university professors and researchers in the Canadian province of Québec (N = 13,479), their papers as well as the papers authored by their homonyms, this paper provides evidence of regularities in researchers’ publication patterns. It shows how these patterns can be used to automatically assign papers to individuals and remove papers authored by their homonyms. Two types of patterns were found: (1) at the individual researchers’ level and (2) at the level of disciplines. On the whole, these patterns allow the construction of an algorithm that provides assignment information for at least one paper for 11,105 (82.4 %) out of all 13,479 researchers—with a very low percentage of false positives (3.2 %).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The recent collection of Scientometrics papers dealing with individual researchers published by Academia Kiado (Braun, 2006) illustrates this trend: the study with the highest number of researchers included has less than 200. Similarly, notable studies in the sociology of science by Cole and Cole (1973), Merton (1973) and Zuckerman (1977) analyzed small datasets.

  2. 2.

    Physics journals, for instance, often having very long author lists, only provide initial(s) of author(s) given name(s).

  3. 3.

    See for example Gingras, Larivière, Macaluso, and Robitaille (2008) and Larivière, Macaluso, Archambault, and Gingras (2010) for the some results based on this population.

  4. 4.

    The bibliometric part of their paper used the Scopus database, which, contrary to Thomson Reuters’ databases, links names of authors with institutional addresses for papers published since 1996.

  5. 5.

    More details on the classification scheme can be found at: http://www.nsf.gov/statistics/seind06/c5/c5s3.htm#sb1

  6. 6.

    For more details on the CIP, see: http://nces.ed.gov/pubs2002/cip2000/.

  7. 7.

    Thus, 2,256 of Quebec’s researchers did not publish any papers during that period nor had any of their homonyms.

  8. 8.

    The similarity threshold (MinSimilarity) was set at 0.25 in Microsoft SQL Server SQL Server Integration Services (SSIS). More details on the system can be found at: http://technet.microsoft.com/en-US/library/ms345128(v=SQL.90).aspx

References

  • Aksnes, D. W. (2008). When different persons have an identical author name. How frequent are homonyms? Journal of the American Society for Information Science and Technology, 59(5), 838–841.

    Article  Google Scholar 

  • Aswani, N., Bontcheva, K., & Cunningham, H. (2006). Mining information for instance unification. Lecture Notes in Computer Science, 4273, 329–342.

    Article  Google Scholar 

  • Barnett, G. A., & Fink, E. L. (2008). Impact of the internet and scholar age distribution on academic citation age. Journal of the American Society for Information Science and Technology, 59(4), 526–534.

    Article  Google Scholar 

  • Boyack, K. W., & Klavans, R. (2008). Measuring science–technology interaction using rare inventor–author names. Journal of Informetrics, 2(3), 173–182.

    Article  Google Scholar 

  • Braun, T. (Ed). (2006). Evaluations of Individual Scientists and Research Institutions: Scientometrics Guidebooks Series. Budapest, Hungary : Akademiai Kiado.

    Google Scholar 

  • Campbell, D., Picard-Aitken, M., Côté, G., Caruso, J., Valentim, R., Edmonds, S., … & Archambault, É. (2010). Bibliometrics as a performance measurement tool for research evaluation: The case of research funded by the National Cancer Institute of Canada. American Journal of Evaluation, 31(1), 66–83.

    Google Scholar 

  • Cole, J. R., & Cole, S. (1973). Social stratification in science. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.

    Article  Google Scholar 

  • Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131–152.

    Article  MathSciNet  Google Scholar 

  • Enserink, M. (2009). Are you ready to become a number? Science, 323, 1662–1664.

    Article  Google Scholar 

  • Gingras, Y., Larivière, V., Macaluso, B., & Robitaille, J. P. (2008). The effects of aging on researchers’ publication and citation patterns. PLoS One, 3(12), e4048.

    Article  Google Scholar 

  • Gurney, T., Horlings, E., & van den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity measures. Scientometrics, 91(2), 435–449.

    Article  Google Scholar 

  • Han, H., Zha, H., & Giles, C. L. (2005). Name disambiguation in author citations using a K-way spectral clustering method. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital libraries (pp. 334–343). Retrieved from http://clgiles.ist.psu.edu/papers/JCDL-2005-K-Way-Spectral-Clustering.pdf.

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Science, 102(46), 16569–16572.

    Article  Google Scholar 

  • Jensen, P., Rouquier, J. B., Kreimer, P., & Croissant, Y. (2008). Scientists who engage in society perform better academically. Science and Public Policy, 35(7), 527–541.

    Article  Google Scholar 

  • Kang, I. S., Seung-Hoon, N., Seungwoo, L., Hanmin, J., Pyung, K., Won-Kyung, S., & Jong-Hyeok, L. (2009). On co-authorship for author disambiguation. Information Processing and Management, 45(1), 84–97.

    Google Scholar 

  • Larivière, V., Macaluso, B., Archambault, E., & Gingras, Y. (2010). Which scientific elites? On the concentration of research funds, publications and citations. Research Evaluation, 19(1), 45–53.

    Article  Google Scholar 

  • Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047.

    Article  Google Scholar 

  • Lewison, G. (1996). The frequencies of occurrence of scientific papers with authors of each initial letter and their variation with nationality. Scientometrics, 37(3), 401–416.

    Article  Google Scholar 

  • Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. Chicago, IL: Chicago University Press.

    Google Scholar 

  • Reijnhoudt, L., Costas, R., Noyons, E., Borner, K., & Scharnhorst, A. (2013). “Seed + Expand”: A validated methodology for creating high quality publication oeuvres of individual researchers. arXiv preprint arXiv:1301.5177.

    Google Scholar 

  • Schreiber, M. (2008). A modification of the h-index: The hm-index accounts for multi-authored manuscripts. Journal of Informetrics, 2(3), 211–216.

    Article  MathSciNet  Google Scholar 

  • Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. In B. Cronin (Ed.), Annual review of information science and technology (Vol. 43, pp. 287–313). Medford, NJ: ASIST and Information Today.

    Google Scholar 

  • Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). Probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.

    Article  Google Scholar 

  • Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.

    Article  Google Scholar 

  • Wooding, S., Wilcox-Jay, K., Lewison, G., & Grant, J. (2006). Co-author inclusion: A novel recursive algorithmic method for dealing with homonyms in bibliometric analysis. Scientometrics, 66(1), 11–21.

    Article  Google Scholar 

  • Zhang, C. T. (2009). The e-index, complementing the h-index for excess citations. PLoS One, 5(5), e5429.

    Article  Google Scholar 

  • Zuckerman, H. A. (1977). Scientific elite: Nobel laureates in the United States. New York, NY: Free Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Larivière .

Editor information

Editors and Affiliations

Appendices

Appendix 1: List of Disciplines Assigned to Journals

  • Arts

    • Fine Arts & Architecture

    • Performing Arts

  • Biology

    • Agricultural & Food Sciences

    • Botany

    • Dairy & Animal Science

    • Ecology Entomology

    • General Biology

    • General Zoology

    • Marine Biology & Hydrobiology

    • Miscellaneous Biology

    • Miscellaneous Zoology

  • Biomedical Research

    • Anatomy & Morphology

    • Biochemistry & Molecular Biology

    • Biomedical Engineering

    • Biophysics

    • Cellular Biology Cytology & Histology

    • Embryology

    • General Biomedical Research

    • Genetics & Heredity

    • Microbiology Microscopy

    • Miscellaneous Biomedical Research

    • Nutrition & Dietetic

    • Parasitology

    • Physiology

    • Virology

  • Chemistry

    • Analytical Chemistry

    • Applied Chemistry

    • General Chemistry

    • Inorganic & Nuclear Chemistry

    • Organic Chemistry

    • Physical Chemistry

    • Polymers

  • Clinical Medicine

    • Addictive Diseases

    • Allergy

    • Anesthesiology

    • Arthritis & Rheumatology

    • Cancer

    • Cardiovascular System

    • Dentistry

    • Dermatology & Venereal Disease

    • Endocrinology

    • Environmental & Occupational Health

    • Fertility

    • Gastroenterology

    • General & Internal Medicine

    • Geriatrics

    • Hematology

    • Immunology

    • Miscellaneous Clinical Medicine

    • Nephrology

    • Neurology & Neurosurgery

    • Obstetrics & Gynecology

    • Ophthalmology

    • Orthopedics

    • Otorhinolaryngology

    • Pathology

    • Pediatrics

    • Pharmacology

    • Pharmacy

    • Psychiatry

    • Radiology & Nuclear Medicine

    • Respiratory System

    • Surgery

    • Tropical Medicine

    • Urology

    • Veterinary Medicine

  • Earth and Space

    • Astronomy & Astrophysics

    • Earth & Planetary Science

    • Environmental Science

    • Geology

    • Meteorology & Atmospheric Science

    • Oceanography & Limnology

  • Engineering and Technology

    • Aerospace Technology

    • Chemical Engineering

    • Civil Engineering

    • Computers

    • Electrical Engineering & Electronics

    • General Engineering

    • Industrial Engineering

    • Materials Science

    • Mechanical Engineering

    • Metals & Metallurgy

    • Miscellaneous Engineering & Technology

    • Nuclear Technology

    • Operations Research

  • Health

    • Geriatrics & Gerontology

    • Health Policy & Services

    • Nursing

    • Public Health

    • Rehabilitation

    • Social Sciences, Biomedical

    • Social Studies of Medicine

    • Speech-Language Pathology and Audiology

  • Humanities

    • History

    • Language & Linguistics

    • Literature

    • Miscellaneous Humanities

    • Philosophy

    • Religion

  • Mathematics

    • Applied Mathematics

    • General Mathematics

    • Miscellaneous Mathematics

    • Probability & Statistics

  • Physics

    • Acoustics

    • Applied Physics

    • Chemical Physics

    • Fluids & Plasmas

    • General Physics

    • Miscellaneous Physics

    • Nuclear & Particle Physics

    • Optics

    • Solid State Physics

    Professional Fields

    • Communication

    • Education

    • Information Science & Library Science

    • Law

    • Management

    • Miscellaneous Professional Field

    • Social Work

  • Psychology

    • Behavioral Science & Complementary Psychology

    • Clinical Psychology

    • Developmental & Child Psychology

    • Experimental Psychology

    • General Psychology

    • Human Factors

    • Miscellaneous Psychology

    • Psychoanalysis

    • Social Psychology

    Social Sciences

    • Anthropology and Archaeology

    • Area Studies

    • Criminology

    • Demography

    • Economics

    • General Social Sciences

    • Geography

    • International Relations

    • Miscellaneous Social Sciences

    • Planning & Urban Studies

    • Political Science and Public Administration

    • Science studies

    • Sociology

Appendix 2: List of Disciplines Assigned to Departments

  • Basic Medical Sciences

    • General Medicine

    • Laboratory Medicine

    • Medical Specialties

    • Surgical Specialties

  • Business & Management

  • Education

  • Engineering

    • Chemical Engineering

    • Civil Engineering

    • Electrical & Computer Engineering

    • Mechanical & Industrial Engineering

    • Other Engineering

  • Health Sciences

    • Dentistry

    • Kinesiology/Physical Education

    • Nursing

    • Other Health Sciences

    • Public Health & Health Administration

    • Rehabilitation Therapy

  • Humanities

    • Fine & Performing Arts

    • Foreign Languages, Literature, & Linguistics

    • Area Studies

    • French/English

    • History

    • Philosophy

    • Religious Studies & Vocations

  • Non-health Professional

    • Law & Legal Studies

    • Library & Information Sciences

    • Media & Communication Studies

    • Planning & Architecture

    • Social Work

  • Sciences

    • Agricultural & Food Sciences

    • Biology & Botany

    • Chemistry

    • Computer & Information Science

    • Earth & Ocean Sciences

    • Mathematics

    • Physics & Astronomy

    • Resource Management & Forestry

  • Social Sciences

    • Anthropology, Archaeology, & Sociology

    • Economics

    • Geography

    • Other Social Sciences & Humanities

    • Political Science

    • Psychology

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Larivière, V., Macaluso, B. (2014). Researchers’ Publication Patterns and Their Use for Author Disambiguation. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10377-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10376-1

  • Online ISBN: 978-3-319-10377-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics