Abstract
Social networks play a significant role in sharing knowledge. Scientific collaboration online networks allow scientific articles and research results to be shared, and the interaction and possible collaboration between researchers. These networks have many users and store varied data about each of them, and which of the data are used to characterize and grouping similar users. The number of attributes available about each instance (user) can reach several hundred, making this a problem with high dimensionality. Thus, dimensionality reduction is indispensable to remove redundant and irrelevant attributes to improve machine learning algorithms’ performance and make models more understandable. In order to produce an efficient recommendation system for collaborative research, one of the main challenges of dimensionality reduction techniques is guaranteeing that the information of the data is represented in the reduced dataset after the reduction. In our dimensionality reduction, we used Factor Analysis, as it preserves the relationships between the variables. In this study, we characterize the profiles of ResearchGate users after applying dimensionality reduction to two different datasets. A dataset of continuous attributes composed of profile metrics and a dataset of dichotomous attributes contained interest topics. We evaluated our methodology using two recommendation applications: (1) Identifying groups of researchers through a global profile extraction process; and (2) Identifying profiles similar to a reference profile. For both applications, we used hierarchical clustering techniques to identify the groups of user profiles. Our experiments show that the Factor Analysis transformation was able to preserve the relevant information in the data, resulting in an effective clustering process for the recommendation system for collaborative networks of researchers.
Similar content being viewed by others
Availability of data and material
(data transparency).
References
Brown TA (2015) Confirmatory factor analysis for applied research, 2nd edn. Methodology in the Social Sciences. Guilford Publications
Cabanac G (2011) Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics 87:597–620. https://doi.org/10.1007/s11192-011-0358-1
Caon M (2017) Gaming the impact factor: where who cites what, whom and when. Aust Phys Eng Sci Med 40(2):273–276. https://doi.org/10.1007/s13246-017-0547-1
Cattell RB (1966) The scree test for the number of factors. Multiv Behav Res 1(2):245–276. https://doi.org/10.1207/s15327906mbr0102_10
Cohen S, Ebel L (2013) Recommending collaborators using keywords. In: International world wide web conference committee (IW3C2), WWW ’13 Companion, pp 959–962. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/2487788.2488091
Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16:2859–2900
De Gooijer JG (2017) Model estimation, selection, and checking, pp 197–255. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-43252-6_6
Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. In: International conference on data mining, pp 147–154. https://doi.org/10.1109/ICDM.2002.1183897
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. https://doi.org/10.1145/2347736.2347755
Galbraith JI, Bartholomew DJ, Steele F, Moustaki I (2002) The analysis and interpretation of multivariate data for social scientists. CRC Press, Cambridge
García S, Luengo J, Sáez JA, López V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750. https://doi.org/10.1109/TKDE.2012.35
Ghodsi A (2006) Dimensionality reduction a short tutorial. Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada 37: 38
Hoang DT, Nguyen NT, Hwang D (2018) A group recommender system for selecting experts to review a specific problem. In: Nguyen NT, Pimenidis E, Khan Z, Trawiński B (eds) Computational collective intelligence. Springer International Publishing, Cham, pp 270–280
Horn JL (1965) A rationale and test for the number of factors in factor analysis. Psychometrika 30(2):179–185. https://doi.org/10.1007/BF02289447
Nivash JP, Dinesh Babu LD (2018) Analyzing the impact of news trends on research publications and scientific collaboration networks. Concurrency and computation: practice and experience (2018). https://doi.org/10.1002/cpe.5058
Jammalamadaka S, Sengupta A (2001) Topics in Circular Statistics. World Scientific, Series on multivariate analysis
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1–2):81–93. https://doi.org/10.1093/biomet/30.1-2.81
Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251. https://doi.org/10.1093/biomet/33.3.239
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference, pp. 372–378 (2014). https://doi.org/10.1109/SAI.2014.6918213
Kirch W (ed) (2008) Pearson’s correlation coefficient, pp 1090–1091. Springer, Dordrecht (2008). https://doi.org/10.1007/978-1-4020-5614-7_2569
Kong X, Jiang H, Yang Z, Xu Z, Xia F, Tolba A (2016) Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE. https://doi.org/10.1371/journal.pone.0148492
Koperwas J, Skonieczny Ł, Kozłowski M, Andruszkiewicz P, Rybiński H, Struk W (2017) Intelligent information processing for building university knowledge base. J Intell Inf Syst 48(1):141–163. https://doi.org/10.1007/s10844-015-0393-0
Leone Sciabolazza V, Vacca R, McCarty C (2020) Connecting the dots: implementing and evaluating a network intervention to foster scientific collaboration and productivity. Soc Netw 61:181–195. https://doi.org/10.1016/j.socnet.2019.11.003
Li L, He D, Zhang C (2016) Evaluating academic answer quality: a pilot study on research gate q&a. In: Nah FFH, Tan CH (eds) HCI in business, government, and organizations: ecommerce and innovation. Springer International Publishing, Cham, pp 61–71
Maruyama GM (1997) Basics of structural equation modeling. SAGE Publications
Mukaka M (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24:69–71
Nishizawa H, Katsurai M, Ohmukai I, Takeda H (2018) Measuring researcher relatedness with changes in their research interests. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 149–152. https://doi.org/10.23919/APSIPA.2018.8659506
Paweena Chaiwanarom CL (2015) Collaborator recommendation in interdisciplinary computer science using degrees of collaborative forces, temporal evolution of research interest, and comparative seniority status. Knowledge-Based Systems (KNOSYS) pp 161–172. https://doi.org/10.1016/j.knosys.2014.11.029
Pradhan T, Pal S (2020) A multi-level fusion based decision support system for academic collaborator recommendation. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2020.105784
Robert T, Guenther W, Trevor H (2001) Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc 63(2):411–423. https://doi.org/10.1111/1467-9868.00293
Rodrigues MW, Brandão WC, Zárate LE (2018) Recommending scientific collaboration from researchgate. In: 7th Brazilian conference on intelligent systems (BRACIS), pp 336–341. https://doi.org/10.1109/BRACIS.2018.00065
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
dos Tiago RL, Santos LEZ (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260
Smith TB, Vacca R, Krenz T, McCarty C (2021) Great minds think alike, or do they often differ? Research topic overlap and the formation of scientific teams. J Informet. https://doi.org/10.1016/j.joi.2020.101104
Sorzano CO, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. http://arxiv.org/abs/1403.2877
Stewart DW (1981) The application and misapplication of factor analysis in marketing research. J Mark Res 18(1):51–62
Sun N, Lu Y, Cao Y (2019) Career age-aware scientific collaborator recommendation in scholarly big data. IEEE Access 7(1):136036–136045. https://doi.org/10.1109/ACCESS.2019.2941022
Takahashi T, Tango K, Chikazawa Y, Katsurai M (2020) A novel researcher search system based on research content similarity and geographic information. In: In: Ishita E, Pang NLS, Zhou L (eds) Digital libraries at times of massive societal transition. ICADL 2020. Lecture Notes in Computer Science, Lecture Notes in Computer Science, pp 390–398. Springer. https://doi.org/10.1007/978-3-030-64452-9_36
Tang J, Wu S, Sun J, Su H (2012) Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 1285–1293. Association for Computing Machinery, New York. https://doi.org/10.1145/2339530.2339730
Thanoon TY, Adnan R, Saffari SE (2014) Multiple factor analysis with continuous and dichotomous variables. AIP Conf Proc 1635:926–933. https://doi.org/10.1063/1.4903693
Tucker LR, MacCallum RC (1997) Exploratory Factor Analysis. Unpublished manuscript, Ohio State University, Columbus
Acknowledgements
The authors acknowledge the financial support received from the CNPq (Brazilian National Council for Scientific and Technological Development), CAPES (Coordination for the Improvement of Higher Education Personnel), FAPEMIG (Foundation for Research Support of the State of Minas Gerais), and Pontifical Catholic University of Minas Gerais, Brazil.
Funding
This research was financed by: CNPq (Brazilian National Council for Scientific and Technological Development); CAPES (Coordination for the Improvement of Higher Education Personnel); FAPEMIG (Foundation for Research Support of the State of Minas Gerais); Pontifical Catholic University of Minas Gerais, Brazil.
Author information
Authors and Affiliations
Contributions
Marcos Wander Rodrigues was involved in responsible for the data collection, preprocessing and analysis, the creation of the models, and writing of the article. Mark A. Junho Song contributed to responsible for writing and revising the article. Luis Enrique Zárate Gálvez was involved in responsible for structuring, creation of the models, writing, and revising the article.
Corresponding author
Ethics declarations
Conflict of interest
No Conflict of interest to declare.
Code availability
(software application or custom code).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rodrigues, M.W., Song, M.A.J. & Zárate, L.E. Effectively clustering researchers in scientific collaboration networks: case study on ResearchGate. Soc. Netw. Anal. Min. 11, 71 (2021). https://doi.org/10.1007/s13278-021-00781-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00781-9