Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Clustering research group website homepages

  • 579 Accesses

  • 6 Citations


The majority of early exploratory webometrics studies have typically used simple network methods or multi-dimensional scaling to identify hyperlink or text-based relationships between collections of related academic websites. This paper uses unsupervised machine learning techniques to identify groups of computer science departments with similar interests through co-word occurrences in the homepages of the departmental research groups. The clustering results reflect inter-department research similarity reasonably well, at least as reflected online. This clustering approach may be useful for policy makers in identifying future collaborators with similar research interests or for monitoring research fields.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Almind, T. C., & Ingwersen, P. (1997). Informetric analyses on the world wide web: Methodological approaches to “webometrics”. Journal of Documentation, 53(404), 404–426.

  2. Ballabio, D., Vasighi, M., & Filzmoser, P. (2013). Effects of supervised Self Organising Maps parameters on classification performance. Analytica Chimica Acta, 765, 45–53. doi:10.1016/j.aca.2012.12.027.

  3. Barjak, F., & Thelwall, M. (2008). A statistical analysis of the web presences of European life sciences research teams. Journal of the American Society for Information Science and Technology, 59(4), 628–643. doi:10.1002/asi.20776.

  4. Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227. doi:10.1002/asi.20077.

  5. Chu, H. (2005). Taxonomy of inlinked Web entities: What does it imply for webometric research? Library and Information Science Research, 27(1), 8–27. doi:10.1016/j.lisr.2004.09.002.

  6. Chu, H., He, S., & Thelwall, M. (2002). Library and information science schools in Canada and USA: A Webometric perspective. Journal of Education for Library and Information Science, 43(2), 110–125. http://www.jstor.org/stable/40323972. Accessed 23 September 2014.

  7. Cronin, B., Snyder, H. W., Rosenbaum, H., Martinson, A., & Callahan, E. (1998). Invoked on the Web. Journal of the American Society for Information Science, 49(14), 1319–1328. doi:10.1002/(SICI)1097-4571(1998)49:14<1319:AID-ASI9>3.0.CO;2-W.

  8. Ding, C., & He, X. (2004). K -means clustering via principal component analysis. In Twenty-first international conference on Machine learningICML’04 (p. 29). New York, USA: ACM Press. doi:10.1145/1015330.1015408.

  9. François, C., Lamirel, J., & Shehabi, S. (2008). Combining advanced visualization and automatized reasoning for webometrics: A test study. arXiv preprint arXiv:0810.5057. http://arxiv.org/abs/0810.5057. Accessed 10 February 2014.

  10. Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572. doi:10.1016/j.ipm.2005.03.021.

  11. Gómez, I., Teresa Fernández, M., & Sebastián, J. (1999). Analysis of the structure of international scientific cooperation networks through bibliometric indicators. Scientometrics, 44(3), 441–457. doi:10.1007/BF02458489.

  12. Hayfron-Acquah, J., & Gyimah, M. (2014). Classification and recognition of fingerprints using self organizing maps (SOM). International Journal of Computer Science Issues, 11(1), 153–159.

  13. Heimeriks, G., & van den Besselaar, P. (2006). Analyzing hyperlinks networks: The meaning of hyperlink based indicators of knowledge production. Cybermetrics, 10(1). http://cybermetrics.cindoc.csic.es/articles/v10i1p1.html. Accessed 15 June 2014.

  14. Huang, A. (2008). Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008) (pp. 49–56). Christchurch, New Zealand.

  15. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323. doi:10.1145/331499.331504.

  16. Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing and Management, 42(6), 1614–1642. doi:10.1016/j.ipm.2006.03.025.

  17. Kenekayoro, P., Buckley, K., & Thelwall, M. (2014a). Hyperlinks as inter-university collaboration indicators. Journal of Information Science, 40(4), 514–522. doi:10.1177/0165551514534141.

  18. Kenekayoro, P., Buckley, K., & Thelwall, M. (2014b). Automatic classification of academic web page types. Scientometrics, 1–12. doi:10.1007/s11192-014-1292-9.

  19. Khan, G. F., & Park, H. W. (2011). Measuring the triple helix on the web: Longitudinal trends in the university-industry-government relationship in Korea. Journal of the American Society for Information Science and Technology, 62(12), 2443–2455. doi:10.1002/asi.21595.

  20. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

  21. Kousha, K., & Thelwall, M. (2007). Google Scholar citations and Google Web/URL citations: A multi‐discipline exploratory analysis. Journal of the American Society for Information Science and Technology, 58(7), 1055–1065. http://onlinelibrary.wiley.com/doi/10.1002/asi.20584/full. Accessed 24 November 2013.

  22. Krippendorff, K. (2004). Reliability in content analysis. Human Communication Research, 30(3), 411–433. doi:10.1111/j.1468-2958.2004.tb00738.x.

  23. Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. doi:10.1016/j.joi.2011.01.008.

  24. Leydesdroff, L. (1989). Words and co-words as indicators of intellectual organization. Research Policy. http://www.sciencedirect.com/science/article/pii/0048733389900164. Accessed 22 September 2014.

  25. Marini, F., Zupan, J., & Magrì, A. L. (2004). On the use of counterpropagation artificial neural networks to characterize Italian rice varieties. Analytica Chimica Acta, 510(2), 231–240. doi:10.1016/j.aca.2004.01.009.

  26. Microsoft. (2012). Top keywords in computer science. http://academic.research.microsoft.com/?SearchDomain=2&SubDomain=0&entitytype=8

  27. Olawoyin, R., Nieto, A., Grayson, R. L., Hardisty, F., & Oyewole, S. (2013). Application of artificial neural network (ANN)–self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions. Expert Systems with Applications, 40(9), 3634–3648. doi:10.1016/j.eswa.2012.12.069.

  28. Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools. Information Processing and Management, 44(4), 1624–1633. doi:10.1016/j.ipm.2007.09.010.

  29. Ortega, J. L., & Aguillo, I. F. (2009). Mapping world-class universities on the web. Information Processing and Management, 45(2), 272–279. doi:10.1016/j.ipm.2008.10.001.

  30. Ozel, B., & Park, H. W. (2011). Online image content analysis of political figures: An exploratory study. Quality and Quantity, 46(4), 1013–1024. doi:10.1007/s11135-011-9445-x.

  31. Perianes-Rodríguez, A., Olmeda-Gómez, C., & Moya-Anegón, F. (2009). Detecting, identifying and visualizing research groups in co-authorship networks. Scientometrics, 82(2), 307–319. doi:10.1007/s11192-009-0040-z.

  32. Peters, H. P. F., & van Raan, A. F. J. (1993). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy, 22(1), 23–45. doi:10.1016/0048-7333(93)90031-C.

  33. Schreiber, M., Malesios, C. C., & Psarakis, S. (2012). Exploratory factor analysis for the Hirsch index, 17 h-type variants, and some traditional bibliometric indicators. Journal of Informetrics, 6(3), 347–358. doi:10.1016/j.joi.2012.02.001.

  34. Seeber, M., Lepori, B., Lomi, A., Aguillo, I., & Barberio, V. (2012). Factors affecting web links between European higher education institutions. Journal of Informetrics, 6(3), 435–447. doi:10.1016/j.joi.2012.03.001.

  35. Singh, S. K., Paini, D. R., Ash, G. J., & Hodda, M. (2013). Prioritising plant-parasitic nematode species biosecurity risks using self organising maps. Biological Invasions,. doi:10.1007/s10530-013-0588-7.

  36. Skupin, A., Biberstine, J. R., & Börner, K. (2013). Visualizing the topical structure of the medical sciences: A self-organizing map approach. PLoS One, 8(3), e58779. doi:10.1371/journal.pone.0058779.

  37. Sun, Y. (2000). On quantization error of self-organizing map network. Neurocomputing, 34, 169–193. http://www.sciencedirect.com/science/article/pii/S0925231200002927. Accessed 6 June 2014.

  38. Thelwall, M. (2002a). The top 100 linked-to pages on UK university web sites: High inlink counts are not usually associated with quality scholarly content. Journal of Information Science, 28(6), 483–491. doi:10.1177/016555150202800604.

  39. Thelwall, M. (2002b). A research and institutional size-based model for national university Web site interlinking. Journal of Documentation, 58(6), 683–694. http://www.emeraldinsight.com/journals.htm?articleid=864204&show=abstract. Accessed 16 January 2014.

  40. Thelwall, M. (2002c). Evidence for the existence of geographic trends in university Web site interlinking. Journal of Documentation, 58(5), 563–574.

  41. Thelwall, M. (2002d). An initial exploration of the link relationship between UK university Web sites. ASLIB Proceedings, 52(2), 118–126. http://www.emeraldinsight.com/10.1108/00012530210435248

  42. Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science, 57(1), 60–68. doi:10.1002/asi.v57:1.

  43. Thelwall, M., Klitkou, A., Verbeek, A., Stuart, D., & Vincent, C. (2010). Policy-relevant Webometrics for individual scientific fields. Journal of the American Society for Information Science and Technology, 61(7), 1464–1475. doi:10.1002/asi.21345.

  44. Thelwall, M., & Price, L. (2003). Disciplinary differences in academic web presence–a statistical study of the UK. Libri, 53, 242–253. http://www.degruyter.com/view/j/libr.2003.53.issue-4/libr.2003.242/libr.2003.242.xml. Accessed 2 June 2014.

  45. Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. G. (2003). Which academic subjects have most online impact? A pilot study and a new classification process. Online Information Review, 27(5), 333–343. doi:10.1108/14684520310502298.

  46. Thelwall, M., & Wilkinson, D. (2004). Finding similar academic Web sites with links, bibliometric couplings and colinks. Information Processing and Management, 40(3), 515–526. doi:10.1016/s0306-4573(03)00042-6.

  47. Thelwall, M., & Zuccala, A. (2008). A university-centred European Union link analysis. Scientometrics, 45(3), 407–420.

  48. Thijs, B., & Glänzel, W. (2010). A structural analysis of collaboration between European research institutes. Research Evaluation, 19(1), 55–65.

  49. Thomas, O., & Willett, P. (2000). Webometric analysis of departments of librarianship and information science. Journal of Information Science, 26(6), 421–428. doi:10.1177/016555150002600605.

  50. Tuomaala, O., Järvelin, K., & Vakkari, P. (2014). Evolution of library and information science, 1965-2005: Content analysis of journal articles. Journal of the Association for Information Science and Technology, 65(7), 1446–1462. doi:10.1002/asi.23034.

  51. Van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377–393. doi:10.1007/s11192-006-0118-9.

  52. Vaughan, L., & You, J. (2010). Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept. Journal of Informetrics, 4(4), 483–491. doi:10.1016/j.joi.2010.04.005.

  53. Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (1999). Self-organizing map in Matlab: The SOM Toolbox. In Proceedings of the Matlab DSP Conference (Vol. 99, pp. 16–17). http://cda.psych.uiuc.edu/matlab_class/martinez/edatoolbox/Docs/toolbox2paper.pdf. Accessed 3 March 2014.

  54. Waltman, L., van Eck, N. J., & Noyons, E. C. M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629–635. doi:10.1016/j.joi.2010.07.002.

  55. Whittaker, J., Courtial, J., Law, J., & Whittakert, J. (1989). Creativity and conformity in science: Titles, keywords and co-word analysis. Social Studies of Science, 19(3), 473–496.

  56. Yoon, B.-U., Yoon, C.-B., & Park, Y.-T. (2002). On the development and application of a self-organizing feature map-based patent map. R&D Management, 32(4), 291–300. doi:10.1111/1467-9310.00261.

  57. Zhao, D., & Strotmann, A. (2008a). Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic-coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070–2086. doi:10.1002/asi.20910.

  58. Zhao, D., & Strotmann, A. (2008b). Information science during the first decade of the web: An enriched author cocitation analysis. Journal of the American Society for Information Science and Technology, 59(6), 916–937. doi:10.1002/asi.20799.

  59. Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to …? Journal of the American Society for Information Science and Technology, 57(11), 1487–1502. doi:10.1002/asi.20468.

Download references


The authors are grateful to the two referees for their insightful comments. This paper is an extension of a paper previously presented at the IADIS European Conference on Data Mining (DM).

Author information

Correspondence to Patrick Kenekayoro.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kenekayoro, P., Buckley, K. & Thelwall, M. Clustering research group website homepages. Scientometrics 102, 2023–2039 (2015). https://doi.org/10.1007/s11192-014-1497-y

Download citation


  • Webometrics
  • Unsupervised learning
  • Cluster analysis
  • Co-word analysis
  • Research group
  • Self-organising maps

Mathematics Subject Classification

  • 68U15
  • 62H30
  • 91C20

JEL Classification

  • C63
  • C80