Advertisement

Scientometrics

, Volume 76, Issue 2, pp 273–290 | Cite as

Bibliographic coupling, common abstract stems and clustering: A comparison of two document-document similarity approaches in the context of science mapping

  • Per Ahlgren
  • Bo JarnevingEmail author
Article

Abstract

This paper deals with two document-document similarity approaches in the context of science mapping: bibliographic coupling and a text approach based on the number of common abstract stems. We used 43 articles, published in the journal Information Retrieval, as test articles. An information retrieval expert performed a classification of these articles. We used the cosine measure for normalization, and the complete linkage method was used for clustering the articles. A number of articles pairs were ranked (1) according to descending normalized coupling strength, and (2) according to descending normalized frequency of common abstract stems. The degree of agreement between the two obtained rankings was low, as measured by Kendall’s tau. The agreement between the two cluster solutions, one for each approach, was fairly low, according to the adjusted Rand index. However, there were examples of perfect agreement between the coupling solution and the stems solution. The classification generated by the expert contained larger groups compared to the coupling and stems solutions, and the agreement between the two solutions and the classification was not high. According to the adjusted Rand index, though, the stems solution was a better approximation of the classification than the coupling solution. With respect to cluster quality, the overall Silhouette value was slightly higher for the stems solution. Examples of homogeneous cluster structures, as well as negative Silhouette values, were found with regard to both solutions. The expert classification indicates that the field of information retrieval, as represented by one volume of articles published in Information Retrieval, is fairly heterogeneous regarding research themes, since the classification is associated with 15 themes. The complete linkage method, in combination with the upper tail rule, gave rise to a fairly good approximation of the classification with respect to the number of identified groups, especially in case of the stems approach.

Keywords

Cluster Solution Adjusted Rand Index Test Article Bibliographic Coupling Coupling Solution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boyce, B., Meadow, C., Kraft, D. (1994). Measurement in Information Science. San Diego: Academic press.Google Scholar
  2. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Gonçalves, M. (2003). Combining link-based and content-based methods for web document classification. In: Proceedings of the 12th ACM International Conference on Information and Knowledge Management, 394–401.Google Scholar
  3. Couto, T., Cristo, M., Gonçalves, M., Calado, P., Ziviani, N., Moura, E., Ribeiro-Neto, B. (2006). A Comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 75–84.Google Scholar
  4. Everitt, B., Landau, S., Leese, M. (2001). Cluster Analysis. 4th ed. London: Arnold.Google Scholar
  5. Glenisson, P., Glänzel, W., Janssens, F., De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6): 1548–1572.CrossRefGoogle Scholar
  6. Hubert, L., Arabie. P. (1985). Comparing partitions. Journal of Classification, 2(1): 193–218.CrossRefGoogle Scholar
  7. Janssens, F., Tran Quoc, V., Glänzel, W., De Moor, B. (2006). Integration of textual content and link information for accurate clustering of science fields. In: InSCit2006, Current Research in Information Sciences and Technologies: Multidsciplinary Approaches to Global Information Systems (Volume I). Badajoz: Open Institute of Knowledge, pp. 615–619.Google Scholar
  8. Kaufman, L., Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons.Google Scholar
  9. Kendall, M., Gibbons, J. (1990). Rank correlation methods. 5th ed. London: Edward Arnold.zbMATHGoogle Scholar
  10. Kessler, M. M. (1963a). Bibliographic coupling between scientific papers. American Documentation, 14(1): 10–25.CrossRefGoogle Scholar
  11. Kessler, M. M. (1963b). Bibliographic coupling extended in time: ten case histories. Information Storage and Retrieval, 1(4): 169–187.CrossRefGoogle Scholar
  12. Kessler, M. M. (1965). Comparison of the results of bibliographic coupling and analytic subject indexing. American Documentation, 16(3): 223–233.CrossRefGoogle Scholar
  13. Mojena, R. (1977). Hierarchical grouping methods and stopping rules: an evaluation. Computer Journal, 20(4), 359–363.CrossRefGoogle Scholar
  14. Nilsson, M. (2002). Hierarchical clustering using non-greedy principal direction divisive partitioning. Information Retrieval, 5(4): 311–321.CrossRefGoogle Scholar
  15. Peters, H., Braam, R., Van Raan, A. (1995). Cognitive resemblance and citation relations in chemical engineering publications. Journal of the American Society for Information Science, 46(1): 9–21.CrossRefGoogle Scholar
  16. Porter, M. (2001). Snowball: A language for stemming algorithms. URL http://snowball.tartarus.org/texts/introduction.html. Visited June 22nd, 2007.
  17. Salton, G., Mcgill, M. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill.zbMATHGoogle Scholar
  18. Small, H., Koenig, M. (1977). Journal clustering using a bibliographic coupling method. Information Processing & Management, 13(5): 277–288.CrossRefGoogle Scholar
  19. Stopword List 1 (2000). The reference created 2007-08-04. URL: http://www.lextek.com/manuals/onix/stopwords1.html.
  20. Vladutz, G., Cook, J. (1984). Bibliographic coupling and subject relatedness. Proceedings of the 47th ASIS Annual Meeting, 21, 204–207.Google Scholar
  21. Wishart, D. (2005). Number of clusters. In: Everitt, B., Howell, D. (Eds), Encyclopedia of Statistics in Behavioral Science. Chichester: John Wiley & Sons, 1442–1446.Google Scholar
  22. Wishart, D. (2006). ClustanGraphics Primer: A Guide to Cluster Analysis. Edinburgh: Clustan Limited.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Swedish School of Library and Information ScienceBoråsSweden

Personalised recommendations