Using mutual information as a cocitation similarity measure
- 46 Downloads
The debate regarding to which similarity measure can be used in co-citation analysis lasted for many years. The mostly debated measure is Pearson’s correlation coefficient r. It has been used as similarity measure in literature since the beginning of the technique in the 1980s. However, some researchers criticized using Pearson’s r as a similarity measure because it does not fully satisfy the mathematical conditions of a good similarity metric and (or) because it doesn’t meet some natural requirements a similarity measure should satisfy. Alternative similarity measures like cosine measure and chi square measure were also proposed and studied, which resulted in more controversies and debates about which similarity measure to use in co-citation analysis. In this article, we put forth the hypothesis that the researchers with high mutual information are closely related to each other and that the mutual information can be used as a similarity measure in author co-citation analysis. Given two researchers, the mutual information between them can be calculated based on their publications and their co-citation frequencies. A mutual information proximity matrix is then constructed. This proximity matrix meet the two requirements formulated by Ahlgren et al. (J Am Soc Inf Sci Technol 54(6):550–560, 2003). We conduct several experimental studies for the validation of our hypothesis and the results using mutual information are compared to the results using other similarity measures.
KeywordsAuthor co-citation analysis Similarity measures Mutual information
- Gao, S., Ver Steeg, G., & Galstyan, A. (2015). Efficient estimation of mutual information for strongly dependent variables. In Artificial intelligence and statistics (pp. 277–286).Google Scholar
- Hausser, J., & Strimmer, K. (2014). Entropy: Estimation of entropy, mutual information and related quantities. R package version, 1(1).Google Scholar
- Megnigbeto, E. (2013). Controversies arising from which similarity measures can be used in co-citation analysis. Malaysian Journal of Library & Information Science, 18(2), 25–31.Google Scholar