Skip to main content

Clustering research group website homepages

Abstract

The majority of early exploratory webometrics studies have typically used simple network methods or multi-dimensional scaling to identify hyperlink or text-based relationships between collections of related academic websites. This paper uses unsupervised machine learning techniques to identify groups of computer science departments with similar interests through co-word occurrences in the homepages of the departmental research groups. The clustering results reflect inter-department research similarity reasonably well, at least as reflected online. This clustering approach may be useful for policy makers in identifying future collaborators with similar research interests or for monitoring research fields.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  • Almind, T. C., & Ingwersen, P. (1997). Informetric analyses on the world wide web: Methodological approaches to “webometrics”. Journal of Documentation, 53(404), 404–426.

    Article  Google Scholar 

  • Ballabio, D., Vasighi, M., & Filzmoser, P. (2013). Effects of supervised Self Organising Maps parameters on classification performance. Analytica Chimica Acta, 765, 45–53. doi:10.1016/j.aca.2012.12.027.

    Article  Google Scholar 

  • Barjak, F., & Thelwall, M. (2008). A statistical analysis of the web presences of European life sciences research teams. Journal of the American Society for Information Science and Technology, 59(4), 628–643. doi:10.1002/asi.20776.

    Article  Google Scholar 

  • Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227. doi:10.1002/asi.20077.

    Article  Google Scholar 

  • Chu, H. (2005). Taxonomy of inlinked Web entities: What does it imply for webometric research? Library and Information Science Research, 27(1), 8–27. doi:10.1016/j.lisr.2004.09.002.

    Article  Google Scholar 

  • Chu, H., He, S., & Thelwall, M. (2002). Library and information science schools in Canada and USA: A Webometric perspective. Journal of Education for Library and Information Science, 43(2), 110–125. http://www.jstor.org/stable/40323972. Accessed 23 September 2014.

  • Cronin, B., Snyder, H. W., Rosenbaum, H., Martinson, A., & Callahan, E. (1998). Invoked on the Web. Journal of the American Society for Information Science, 49(14), 1319–1328. doi:10.1002/(SICI)1097-4571(1998)49:14<1319:AID-ASI9>3.0.CO;2-W.

    Article  Google Scholar 

  • Ding, C., & He, X. (2004). K -means clustering via principal component analysis. In Twenty-first international conference on Machine learningICML’04 (p. 29). New York, USA: ACM Press. doi:10.1145/1015330.1015408.

  • François, C., Lamirel, J., & Shehabi, S. (2008). Combining advanced visualization and automatized reasoning for webometrics: A test study. arXiv preprint arXiv:0810.5057. http://arxiv.org/abs/0810.5057. Accessed 10 February 2014.

  • Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572. doi:10.1016/j.ipm.2005.03.021.

    Article  Google Scholar 

  • Gómez, I., Teresa Fernández, M., & Sebastián, J. (1999). Analysis of the structure of international scientific cooperation networks through bibliometric indicators. Scientometrics, 44(3), 441–457. doi:10.1007/BF02458489.

    Article  Google Scholar 

  • Hayfron-Acquah, J., & Gyimah, M. (2014). Classification and recognition of fingerprints using self organizing maps (SOM). International Journal of Computer Science Issues, 11(1), 153–159.

    Google Scholar 

  • Heimeriks, G., & van den Besselaar, P. (2006). Analyzing hyperlinks networks: The meaning of hyperlink based indicators of knowledge production. Cybermetrics, 10(1). http://cybermetrics.cindoc.csic.es/articles/v10i1p1.html. Accessed 15 June 2014.

  • Huang, A. (2008). Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008) (pp. 49–56). Christchurch, New Zealand.

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323. doi:10.1145/331499.331504.

    Article  Google Scholar 

  • Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing and Management, 42(6), 1614–1642. doi:10.1016/j.ipm.2006.03.025.

    Article  Google Scholar 

  • Kenekayoro, P., Buckley, K., & Thelwall, M. (2014a). Hyperlinks as inter-university collaboration indicators. Journal of Information Science, 40(4), 514–522. doi:10.1177/0165551514534141.

    Article  Google Scholar 

  • Kenekayoro, P., Buckley, K., & Thelwall, M. (2014b). Automatic classification of academic web page types. Scientometrics, 1–12. doi:10.1007/s11192-014-1292-9.

  • Khan, G. F., & Park, H. W. (2011). Measuring the triple helix on the web: Longitudinal trends in the university-industry-government relationship in Korea. Journal of the American Society for Information Science and Technology, 62(12), 2443–2455. doi:10.1002/asi.21595.

    Article  Google Scholar 

  • Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

    Article  Google Scholar 

  • Kousha, K., & Thelwall, M. (2007). Google Scholar citations and Google Web/URL citations: A multi‐discipline exploratory analysis. Journal of the American Society for Information Science and Technology, 58(7), 1055–1065. http://onlinelibrary.wiley.com/doi/10.1002/asi.20584/full. Accessed 24 November 2013.

  • Krippendorff, K. (2004). Reliability in content analysis. Human Communication Research, 30(3), 411–433. doi:10.1111/j.1468-2958.2004.tb00738.x.

    Google Scholar 

  • Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. doi:10.1016/j.joi.2011.01.008.

    Article  Google Scholar 

  • Leydesdroff, L. (1989). Words and co-words as indicators of intellectual organization. Research Policy. http://www.sciencedirect.com/science/article/pii/0048733389900164. Accessed 22 September 2014.

  • Marini, F., Zupan, J., & Magrì, A. L. (2004). On the use of counterpropagation artificial neural networks to characterize Italian rice varieties. Analytica Chimica Acta, 510(2), 231–240. doi:10.1016/j.aca.2004.01.009.

    Article  Google Scholar 

  • Microsoft. (2012). Top keywords in computer science. http://academic.research.microsoft.com/?SearchDomain=2&SubDomain=0&entitytype=8

  • Olawoyin, R., Nieto, A., Grayson, R. L., Hardisty, F., & Oyewole, S. (2013). Application of artificial neural network (ANN)–self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions. Expert Systems with Applications, 40(9), 3634–3648. doi:10.1016/j.eswa.2012.12.069.

    Article  Google Scholar 

  • Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools. Information Processing and Management, 44(4), 1624–1633. doi:10.1016/j.ipm.2007.09.010.

    Article  Google Scholar 

  • Ortega, J. L., & Aguillo, I. F. (2009). Mapping world-class universities on the web. Information Processing and Management, 45(2), 272–279. doi:10.1016/j.ipm.2008.10.001.

    Article  Google Scholar 

  • Ozel, B., & Park, H. W. (2011). Online image content analysis of political figures: An exploratory study. Quality and Quantity, 46(4), 1013–1024. doi:10.1007/s11135-011-9445-x.

    Article  Google Scholar 

  • Perianes-Rodríguez, A., Olmeda-Gómez, C., & Moya-Anegón, F. (2009). Detecting, identifying and visualizing research groups in co-authorship networks. Scientometrics, 82(2), 307–319. doi:10.1007/s11192-009-0040-z.

    Article  Google Scholar 

  • Peters, H. P. F., & van Raan, A. F. J. (1993). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy, 22(1), 23–45. doi:10.1016/0048-7333(93)90031-C.

  • Schreiber, M., Malesios, C. C., & Psarakis, S. (2012). Exploratory factor analysis for the Hirsch index, 17 h-type variants, and some traditional bibliometric indicators. Journal of Informetrics, 6(3), 347–358. doi:10.1016/j.joi.2012.02.001.

    Article  Google Scholar 

  • Seeber, M., Lepori, B., Lomi, A., Aguillo, I., & Barberio, V. (2012). Factors affecting web links between European higher education institutions. Journal of Informetrics, 6(3), 435–447. doi:10.1016/j.joi.2012.03.001.

    Article  Google Scholar 

  • Singh, S. K., Paini, D. R., Ash, G. J., & Hodda, M. (2013). Prioritising plant-parasitic nematode species biosecurity risks using self organising maps. Biological Invasions,. doi:10.1007/s10530-013-0588-7.

    Google Scholar 

  • Skupin, A., Biberstine, J. R., & Börner, K. (2013). Visualizing the topical structure of the medical sciences: A self-organizing map approach. PLoS One, 8(3), e58779. doi:10.1371/journal.pone.0058779.

    Article  Google Scholar 

  • Sun, Y. (2000). On quantization error of self-organizing map network. Neurocomputing, 34, 169–193. http://www.sciencedirect.com/science/article/pii/S0925231200002927. Accessed 6 June 2014.

  • Thelwall, M. (2002a). The top 100 linked-to pages on UK university web sites: High inlink counts are not usually associated with quality scholarly content. Journal of Information Science, 28(6), 483–491. doi:10.1177/016555150202800604.

    Article  Google Scholar 

  • Thelwall, M. (2002b). A research and institutional size-based model for national university Web site interlinking. Journal of Documentation, 58(6), 683–694. http://www.emeraldinsight.com/journals.htm?articleid=864204&show=abstract. Accessed 16 January 2014.

  • Thelwall, M. (2002c). Evidence for the existence of geographic trends in university Web site interlinking. Journal of Documentation, 58(5), 563–574.

    Article  Google Scholar 

  • Thelwall, M. (2002d). An initial exploration of the link relationship between UK university Web sites. ASLIB Proceedings, 52(2), 118–126. http://www.emeraldinsight.com/10.1108/00012530210435248

  • Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science, 57(1), 60–68. doi:10.1002/asi.v57:1.

    Article  Google Scholar 

  • Thelwall, M., Klitkou, A., Verbeek, A., Stuart, D., & Vincent, C. (2010). Policy-relevant Webometrics for individual scientific fields. Journal of the American Society for Information Science and Technology, 61(7), 1464–1475. doi:10.1002/asi.21345.

    Article  Google Scholar 

  • Thelwall, M., & Price, L. (2003). Disciplinary differences in academic web presence–a statistical study of the UK. Libri, 53, 242–253. http://www.degruyter.com/view/j/libr.2003.53.issue-4/libr.2003.242/libr.2003.242.xml. Accessed 2 June 2014.

  • Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. G. (2003). Which academic subjects have most online impact? A pilot study and a new classification process. Online Information Review, 27(5), 333–343. doi:10.1108/14684520310502298.

    Article  Google Scholar 

  • Thelwall, M., & Wilkinson, D. (2004). Finding similar academic Web sites with links, bibliometric couplings and colinks. Information Processing and Management, 40(3), 515–526. doi:10.1016/s0306-4573(03)00042-6.

    Article  Google Scholar 

  • Thelwall, M., & Zuccala, A. (2008). A university-centred European Union link analysis. Scientometrics, 45(3), 407–420.

    Article  Google Scholar 

  • Thijs, B., & Glänzel, W. (2010). A structural analysis of collaboration between European research institutes. Research Evaluation, 19(1), 55–65.

    Article  Google Scholar 

  • Thomas, O., & Willett, P. (2000). Webometric analysis of departments of librarianship and information science. Journal of Information Science, 26(6), 421–428. doi:10.1177/016555150002600605.

    Article  Google Scholar 

  • Tuomaala, O., Järvelin, K., & Vakkari, P. (2014). Evolution of library and information science, 1965-2005: Content analysis of journal articles. Journal of the Association for Information Science and Technology, 65(7), 1446–1462. doi:10.1002/asi.23034.

    Article  Google Scholar 

  • Van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377–393. doi:10.1007/s11192-006-0118-9.

    Article  Google Scholar 

  • Vaughan, L., & You, J. (2010). Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept. Journal of Informetrics, 4(4), 483–491. doi:10.1016/j.joi.2010.04.005.

    Article  Google Scholar 

  • Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (1999). Self-organizing map in Matlab: The SOM Toolbox. In Proceedings of the Matlab DSP Conference (Vol. 99, pp. 16–17). http://cda.psych.uiuc.edu/matlab_class/martinez/edatoolbox/Docs/toolbox2paper.pdf. Accessed 3 March 2014.

  • Waltman, L., van Eck, N. J., & Noyons, E. C. M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629–635. doi:10.1016/j.joi.2010.07.002.

    Article  Google Scholar 

  • Whittaker, J., Courtial, J., Law, J., & Whittakert, J. (1989). Creativity and conformity in science: Titles, keywords and co-word analysis. Social Studies of Science, 19(3), 473–496.

    Article  Google Scholar 

  • Yoon, B.-U., Yoon, C.-B., & Park, Y.-T. (2002). On the development and application of a self-organizing feature map-based patent map. R&D Management, 32(4), 291–300. doi:10.1111/1467-9310.00261.

    Article  Google Scholar 

  • Zhao, D., & Strotmann, A. (2008a). Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic-coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070–2086. doi:10.1002/asi.20910.

    Article  Google Scholar 

  • Zhao, D., & Strotmann, A. (2008b). Information science during the first decade of the web: An enriched author cocitation analysis. Journal of the American Society for Information Science and Technology, 59(6), 916–937. doi:10.1002/asi.20799.

    Article  Google Scholar 

  • Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to …? Journal of the American Society for Information Science and Technology, 57(11), 1487–1502. doi:10.1002/asi.20468.

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the two referees for their insightful comments. This paper is an extension of a paper previously presented at the IADIS European Conference on Data Mining (DM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Kenekayoro.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kenekayoro, P., Buckley, K. & Thelwall, M. Clustering research group website homepages. Scientometrics 102, 2023–2039 (2015). https://doi.org/10.1007/s11192-014-1497-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1497-y

Keywords

Mathematics Subject Classification

JEL Classification