Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors


The ongoing discussion in the bibliometric community about the best similarity measures has led to diverse insights. Although these insights are sometimes contradicting, there is one very consistent conclusion: Hybrid measures outperform the application of their singular components. While this initially answers the question as to what is the best similarity measure, it also raises issues which have been resolved in part for conventional similarity measures. Given this, in this study we investigate the impact of the right weighting factors, the appropriate level of edge cutting, the performance of first- in contrast to second-order similarities, and the interaction of these three parameters in the context of hybrid similarities. Building upon a dataset of over 8000 articles from the manufacturing engineering field and using different parameter settings we calculated over 100 similarity matrices. For each matrix we determined several cluster solutions of different resolution levels, ranging from 100 to 1000 clusters, and evaluated them quantitatively with the help of a textual coherence value based on the Jensen Shannon Divergence. We found that second-order hybrid similarity measures calculated with a weighting factor of 0.6 for the citation-based similarity and a reduction to only the strongest values yield the best clustering results. Furthermore, we found the assessed parameters to be highly interdependent, where for example hybrid first-order outperforms second-order when no edge cutting is applied. Given this, our results can serve the bibliometric community as a guideline for the appropriate application of hybrid measures.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  • Hybrid clustering
  • Bibliographic coupling
  • Textual coherence
  • Similarity measures
  • First- and second-order similarity