Skip to main content
Log in

Robust analysis of bibliometric data

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

This work stems from the idea of describing the scientific productivity of Italian statisticians. There are several problems that must be addressed in achieving this goal: What data should be used? Have the data been cleaned? What techniques can be used? We propose the use of multiple sources and multiple metrics to get a complete information base. We check the correctness of the data using multivariate outlier identification techniques. We appropriately transform the data. We apply robust clustering to verify the existence of homogeneous groups. We suggest the use of forward search to establish a ranking among scholars. The proposed methodology, which, in this case, allowed us to group scholars into four homogeneous groups and sort them according to multidimensional data, can be applied to other similar applications in bibliometrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adler R, Ewing J, Taylor P (2009) Citation statistics with discussion. Stat Sci 24: 1–28

    Article  MathSciNet  Google Scholar 

  • Archambault E, Campbell D, Gingras Y, Larivire V (2009) Comparing bibliometric statistics obtained from the web of science and Scopus. J Am Soc Inf Sci Technol 60(7): 1320–1326

    Article  Google Scholar 

  • Atkinson AC, Riani M (2000) Robust diagnostic regression analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Atkinson AC, Riani M (2007) Exploratory tools for clustering multivariate data. Comput Stat Data Anal 52: 272–285

    Article  MathSciNet  MATH  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York

    Book  MATH  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2006) Random start forward searches with envelopes for detecting clusters in multivariate data. In: Zani S, Cerioli A, Riani M, Vichi M (eds) Data analysis classification and the forward search. Springer, Berlin

    Google Scholar 

  • Baccini A, Barabesi L, Marcheselli M (2009) How are statistical journal linked? A network analysis. Chance 22(3): 34–43

    Article  MathSciNet  Google Scholar 

  • Baccini A, Barabesi L (2011) Seats at the table: the network of editorial boards in information and library sciences. J Infomet 5: 382–391

    Article  Google Scholar 

  • Bakkalbasi N, Bauer K, Glover J, Wang L (2006) Three options for citation tracking: Google Scholar, Scopus and web of science. Biomed Digit Libr 3: 7

    Article  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Batista PD, Campiteli MG, Konouchi O (2006) Is it possible to compare researchers with different scientific interests. Scientometrics 68(1): 179–189

    Article  Google Scholar 

  • Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26(2): 211–252

    MathSciNet  MATH  Google Scholar 

  • De Moya-Anegón F, Chincilla-Rodriguez Z, Vargas-Qesada B, Corera-Álvarez E, JosèMunoz Fernandez FJ, Gonzáles-Molina A, Herrero-Solana V (2007) Coverage analysis of Scopus: a journal metric approach. Scientometrics 73(1): 53–78

    Article  Google Scholar 

  • Emerson JD (1991) Introduction to transformation. In: Hoaglin DC, Mosteller F, Tukey JW (eds) Fundamentals of exploratory analysis of variance. Wiley, New York

    Google Scholar 

  • Falagas ME, Pitsouni EI, Malietzis GA, Pappas G (2008) Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strenghts and weaknesses. FASEB J 22: 338–342

    Article  Google Scholar 

  • Ferrara A, Salini S (2012) Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics. doi:10.1007/S11192-012-0810-x

  • Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52: 1694–1711

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Franceschet M (2010) A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar. Scientometrics 83(1): 243–258

    Article  Google Scholar 

  • Godin B (2006) On the origins of bibliometrics. Scientometrics 68(I): 109–133

    Article  Google Scholar 

  • Hirsch E (2005) An index to quantify an individual’s scientific research output. In: PNAS. Proceedings of the National Academy of Sciences of the United States of America, Nov 15, vol 102, no 46

  • Jacsò P (2005) Google Scholar: the pros and the cons. Online Inf Rev 29(2): 208–214

    Article  Google Scholar 

  • Katsaros C, Manolopoulos Y, Sidiropoulos A (2006) Generalized h-index for disclosing latent facts in citation networks. Retrieved 20 Dec 2008, from http://arxiv.org/abs/cs.DL/0607066

  • Lotka AJ (1926) The frequency distribution of scientific productivity. J Wash Acad Sci 16(12): 317–324

    Google Scholar 

  • Marchant T (2009) An axiomatic characterization of the ranking based on the h-index and some other bibliometric rankings of authors. Scientometrics 80(2): 327–344

    Article  Google Scholar 

  • Moed HF (2005) Citation analysis in research evaluation. Springer, Berlin

    Google Scholar 

  • Norris M, Oppenheim C (2007) Comparing alternatives to the Web of Science for coverage of the social sciences literature. J Infomet 1: 161–169

    Article  Google Scholar 

  • Rivellini G, Rizzi E, Zaccarin S (2006) The science network in Italian population research: an analysis according to the social network perspective. Scientometrics 67: 3

    Article  Google Scholar 

  • Yeo IK, Johnson RA (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87(4): 954–959

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Salini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Battisti, F., Salini, S. Robust analysis of bibliometric data. Stat Methods Appl 22, 269–283 (2013). https://doi.org/10.1007/s10260-012-0217-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-012-0217-0

Keywords

Navigation