Abstract
The objective of this paper is to propose a cluster analysis methodology for measuring the performance of research activities in terms of productivity, visibility, quality, prestige and international collaboration. The proposed methodology is based on bibliometric techniques and permits a robust multi-dimensional cluster analysis at different levels. The main goal is to form different clusters, maximizing within-cluster homogeneity and between-cluster heterogeneity. The cluster analysis methodology has been applied to the Spanish public universities and their academic staff in the computer science area. Results show that Spanish public universities fall into four different clusters, whereas academic staff belong into six different clusters. Each cluster is interpreted as providing a characterization of research activity by universities and academic staff, identifying both their strengths and weaknesses. The resulting clusters could have potential implications on research policy, proposing collaborations and alliances among universities, supporting institutions in the processes of strategic planning, and verifying the effectiveness of research policies, among others.
Similar content being viewed by others
References
Abramo, G., & D’Angelo, C. A. (2011). National-scale research performance assessment at the individual level. Scientometrics, 86(2), 347–364.
Abramo, G., D’Angelo, C. A., & Pugini, F. (2008). The measurement of Italian universities’ research productivity by a non parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.
Agrait, N., Poves, A. (2009). Report on CNEAI assessment results. Technical report, National Evaluation Committee of Research Activity (in Spanish).
Bornmann, L., & Leydesdorff, L. (2012). Which are the best performing regions in information science in terms of highly cited papers? Some improvements of our previous mapping approaches. Journal of Informetrics, 6(2), 336–345.
Cheeseman, P., & Stutz, J. (1996). Bayesian classification (autoclass): Theory and results. Menlo Park: AAAI Press.
Cobo, E., Selva O’Callagham, A., Ribera, J., Cardellach, F., Dominguez, R., & Vilardell, M. (2007). Statistical reviewers improve reporting in biomedical articles: A randomized trial. PLoS ONE, 2(3), 332.
Costas, R., VanLeeuwen, T. N., & Bordons, M. (2010). A bibliometric classificatory approach for the study and assessment of research performance at the individual level: The effects of age on productivity and impact. Journal of the American Society for Information Science and Technology, 61(8), 1564–1581.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.
Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster analysis. London.: Arnold.
Fraley, C., Raftery, A. (1999). Mclust: Software for model-based cluster and discriminant analysis. Technical report, Department of Statistics, University of Washington.
Garfield, E. (1996). The significant scientific literature appears in a small core of journals. The Scientist, 10(17), 13.
Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2/3), 107–145.
Hanks, G. (2005). Peer review in action: the contribution of referees to advancing reliable knowledge. Palliative Medicine, 19(5), 359–370.
He, Y., & Guan, J. C. (2008). Contribution of Chinese publications in computer science: A case study on LNCS. Scientometrics, 75(3), 519–534.
Horrobin, D. (2001). Something rotten at the core of science. Trends in Pharmacological Sciences, 22(2), 51–52.
Horrobin, D. L. (1990). The philosophical basis of peer review and the suppression of innovation. Journal of the American Medical Association, 263, 1438–1441.
Ibáñez, A., Larrañaga, P., & Bielza, C. (2011). Using Bayesian networks to discover relationships between bibliometric indices. A case study of computer science and artificial intelligence journals. Scientometrics, 89(2), 523–551.
Ibáñez, A., Bielza, C., Larrañaga, P. (2013). Relationship among research collaboration, number of documents and number of citations: A case study in Spanish computer science production in 2000–2009. Scientometrics. doi:10.1007/s11192-012-0883-6.
Jain, A., & Dubes, R. (1988). Algorithms for clustering data. Englewood Cliffs: Prentice-Hall.
Jain, A., Murty, M., & Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Liu, G. (1968). Introduction to combinatorial mathematics. New York: McGraw-Hill.
Maarek, Y. S., & BenShaul, I. Z. (1996). Automatically organizing bookmarks per contents. Computer Networks and ISDN Systems, 28(7-11), 1321–1333.
MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36, 435–444.
McLachlan, G., & Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.
McQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceeding of the Fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297).
Moxham, H., Anderson, J. (1992). Peer review. A view from the inside. Science and Technology Policy 5(1), 7–15.
Mulligan, A. (2005). Is peer review in crisis?. Oral Oncology, 41, 135–141.
Pain, E. (2012). Research cuts will cause “exodus” from Spain. Science, 336(6078), 139–140.
Palomares-Montero, D., García-Aracil, A. (2010). Fuzzy cluster analysis on Spanish public universities. In: Investigaciones de Economía de la Educación, Asociación de Economía de la Educación (Vol. 5, Chapt. 49, pp. 976–994).
Pearson, K. (1901). On lines and planes of closest fit to systems of point in space. Philosophical Magazine, 2(6), 559–572.
R Development Core Team. (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 14 Nov 2011.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.
Rojas-Sola, J. I., Jorda-Albinana, B. (2009). Bibliometric analysis of Venezuelan publications in the computer sciences category of the JCR data base (1997–2007). Interciencia, 34(10), 689–695 (in Spanish).
Rojo, R., & Gómez, I. (2006). Analysis of the Spanish scientific and technological output in the ICT sector. Scientometrics, 66(1), 101–121.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 53–65.
Ruiz Pérez, R., Delgado-López-Cózar, E., & Jiménez-Contreras, E. (2002). Spanish personal name variations in national and international biomedical databases: Implications for information retrieval and bibliometric studies. Journal of the Medical Library Association, 90(4), 411–430.
Ruiz Pérez, R., Delgado-López-Cózar, E., & Jiménez Contreras, E. (2010). Principles and criteria used by the National Evaluation Committee of Research Activity (CNEAI-Spain) for the assessment of scientific publications: 1989–2009. Psicothema, 22(4), 898–908.
Scarpa, T. (2006). Peer review at NIH. Science, 311(5757), 41.
Sneath, P. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17(1), 201–226.
Sorensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyzes of the vegetation on Danish commons. Biologiske Skrifter, 5(1), 1–34.
Torres Salinas, D., Moreno Torres, J. G., Delgado-López-Cózar, E., & Herrera, F. (2011). A methodology for institution-field ranking based on a bidimensional analysis: the IFQ 2 A index. Scientometrics, 88(3), 771–786.
Torres-Salinas, D., Moreno-Torres, J. G., Robinson-García, N., Delgado-López-Cózar, E., Herrera, F. (2011). Rankings ISI of Spanish universities according to fields and scientific disciplines (2nd ed. 2011). El Profesional de la Información, 20(6), 701–709 (in Spanish).
VanRaan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.
Wainer, J., Xavier, E. C., & Bezerra, F. (2009). Scientific production in computer science: A comparative study of Brazil and other countries. Scientometrics, 81(2), 535–547.
Wallace, C., Dowe, D. (1994). Intrinsic classification by MML-The SNOB program. In Proceeding of the 7th Australian Joint Conference on artificial intelligence (pp. 37–44).
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.
Acknowledgements
This work was partially supported by the Spanish Ministry of Science and Innovation, grants TIN2010-20900-C04-04, Cajal Blue Brain and Consolider Ingenio 2010-CSD2007-00018.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ibáñez, A., Larrañaga, P. & Bielza, C. Cluster methods for assessing research performance: exploring Spanish computer science. Scientometrics 97, 571–600 (2013). https://doi.org/10.1007/s11192-013-0985-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-0985-9