, Volume 112, Issue 2, pp 851–875 | Cite as

Topic scientific community in science: a combined perspective of scientific collaboration and topics

  • Jin MaoEmail author
  • Yujie Cao
  • Kun Lu
  • Gang Li


Scientific communities are clusters of researchers and play important roles in modern science. Studying different forms of scientific communities that either physically or virtually exist is a feasible way to disclose underlying mechanisms of science. From the perspective of complex networks, topology-based communities and topic-based communities reflect scientific collaboration and topical features of science respectively. However, the two features are not isolated but intertwined in scientific practice. This study proposes an approach to detect Topical Scientific Communities (TSCs) with both topology and topic features by applying machine learning techniques and network theory. As an example, the TSCs of the informetrics field are detected, and then the characteristics of these TSCs are analyzed. It is shown that collaboration patterns on the topic level can be revealed by analyzing the static network structure and dynamics of TSCs. Furthermore, cross-topic collaborations at multiple levels could be investigated through TSCs. In addition, TSCs can effectively organize researchers in terms of productivity. Future work will further explore and generalize characteristics of TSCs, and the applications of TSCs to other tasks of studying science.


Scientific community Scientific collaboration Research topic Network Author topic model 

Mathematics Subject Classification

05C82 68U15 



We thank the anonymous reviewers for their comments. We also thank Dr. Hong Cui and Dr. Guo Chen for providing suggestions on an earlier version of this paper. This study is supported by the National Natural Science Foundation of China (CN) funded projects under Grant Nos. 71603189, 71420107026, and 71403190.


  1. Allison, P. D., & Stewart, J. A. (1974). Productivity differences among scientists: Evidence for accumulative advantage. American Sociological Review, 39(4), 596–606.CrossRefGoogle Scholar
  2. Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436(7053), 900.CrossRefGoogle Scholar
  3. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  4. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.CrossRefGoogle Scholar
  5. Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3), e18029.CrossRefGoogle Scholar
  6. Brown, C. M. (1999). Information seeking behavior of scientists in the electronic information age: Astronomers, chemists, mathematicians, and physicists. Journal of the Association for Information Science and Technology, 50(10), 929.Google Scholar
  7. Cahan, D. (2003). From natural philosophy to the sciences: Writing the history of nineteenth-century science. Chicago, London: University of Chicago Press.Google Scholar
  8. Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7), 1775–1781.CrossRefGoogle Scholar
  9. Clauset, A., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111.CrossRefGoogle Scholar
  10. Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Crane, D. (1965). Scientists at major and minor universities: A study of productivity and recognition. American Sociological Review, 1965, 699–714.CrossRefGoogle Scholar
  12. Dai, A. M., & Storkey, A. J. (2009, December). Author disambiguation: A nonparametric topic and co-authorship model. In NIPS workshop on applications for topic models text and beyond (pp. 1–4).Google Scholar
  13. Ding, Y. (2011). Community detection: Topological vs. topical. Journal of Informetrics, 5(4), 498–514.CrossRefGoogle Scholar
  14. Evans, T. S., & Lambiotte, R. (2009). Line graphs, link partitions, and overlapping communities. Physical Review E, 80(1), 016105.CrossRefGoogle Scholar
  15. Galvagno, M. (2011). The intellectual structure of the anti-consumption and consumer resistance field: An author co-citation analysis. European Journal of Marketing, 45(11/12), 1688–1701.CrossRefGoogle Scholar
  16. Garfield, E., & Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities (Vol. 8). New York: Wiley.Google Scholar
  17. Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Glänzel, W. (2012). Bibliometric methods for detecting and analysing emerging research topics. El profesional de la información, 21(1), 194–201.CrossRefGoogle Scholar
  19. Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for detecting and labelling new emerging topics. Scientometrics, 91(2), 399–416.CrossRefGoogle Scholar
  20. Griffith, B. C., Small, H. G., Stonehill, J. A., & Dey, S. (1974). The structure of scientific literatures II: Toward a macro-and microstructure for science. Social Studies of Science, 4(4), 339–365.Google Scholar
  21. Griffiths, T. (2002). Gibbs sampling in the generative model of latent dirichlet allocation. Technical report, Stanford University.Google Scholar
  22. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.CrossRefGoogle Scholar
  23. Hein, D. I. O., Schwind, D. W. I. M., & König, W. (2006). Scale-free networks. Wirtschaftsinformatik, 48(4), 267–275.CrossRefGoogle Scholar
  24. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.CrossRefzbMATHGoogle Scholar
  25. Kuhn, T. S. (2012). The structure of scientific revolutions. Chicago, London: University of Chicago Press.CrossRefGoogle Scholar
  26. Lau, J. H., Grieser, K., Newman, D., & Baldwin, T. (2011). Automatic labelling of topic models. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (Vol. 1, pp. 1536–1545). Association for Computational Linguistics.Google Scholar
  27. Li, D., He, B., Ding, Y., Tang, J., Sugimoto, C., Qin, Z., et al. (2010). Community-based topic modeling for social tagging. In Proceedings of the 19th ACM international conference on information and knowledge management (CIKM2010), October 26–30, 2010, Toronto, Canada (pp. 1565–1568).Google Scholar
  28. Li, D., Zhu, J., Ding, Y., Xin, S., Chen, S., Tang, J., Bollen, J., & Rocha, G. (2011). Adding community and dynamics to topic models. Technical Report. School of Library and Information Science, Indiana University.Google Scholar
  29. Lu, K., & Wolfram, D. (2010). Geographic characteristics of the growth of informetrics literature 1987–2008. Journal of Informetrics, 4(4), 591–601.CrossRefGoogle Scholar
  30. Lužar, B., Levnajić, Z., Povh, J., & Perc, M. (2014). Community structure and the evolution of interdisciplinarity in slovenia’s scientific collaboration network. PLoS ONE, 9(4), e94429.CrossRefGoogle Scholar
  31. McCain, K. W. (1990). Mapping authors in intellectual space: A technical overview. Journal of the American Society for Information Science, 41(6), 433.CrossRefGoogle Scholar
  32. Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587.CrossRefGoogle Scholar
  33. Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.CrossRefGoogle Scholar
  34. Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing (pp. 262–272). Association for Computational Linguistics.Google Scholar
  35. Morris, S. A., & Goldstein, M. L. (2007). Manifestation of research teams in journal literature: A growth model of papers, authors, collaboration, coauthorship, weak ties, and Lotka’s law. Journal of the American Society for Information Science and Technology, 58(12), 1764–1782.CrossRefGoogle Scholar
  36. Nagarajan, R., Kalinka, A. T., & Hogan, W. R. (2013). Evidence of community structure in biomedical research grant collaborations. Journal of Biomedical Informatics, 46(1), 40–46.CrossRefGoogle Scholar
  37. Newman, M. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1), 016131.MathSciNetCrossRefGoogle Scholar
  38. Newman, M. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.MathSciNetCrossRefGoogle Scholar
  39. Newman, M. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101(suppl 1), 5200–5205.CrossRefGoogle Scholar
  40. Newman, M. (2010). Networks: An introduction. New York: Oxford University Press.CrossRefzbMATHGoogle Scholar
  41. Palla, G., Barabási, A. L., & Vicsek, T. (2007). Quantifying social group evolution. Nature, 446(7136), 664–667.CrossRefGoogle Scholar
  42. Price de Solla, D. J. (1963). Little science, big science. NewYork: Columbia University Press.Google Scholar
  43. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9), 2658–2663.CrossRefGoogle Scholar
  44. Ramasco, J. J., & Morris, S. A. (2006). Social inertia in collaboration networks. Physical Review E, 73(1), 016122.CrossRefGoogle Scholar
  45. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS), 28(1), 4.CrossRefGoogle Scholar
  46. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.CrossRefGoogle Scholar
  47. Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.CrossRefGoogle Scholar
  48. Sonnenwald, D. H. (2007). Scientific collaboration. Annual Review of Information Science and Technology, 41(1), 643–681.CrossRefGoogle Scholar
  49. Strogatz, S. H. (2001). Exploring complex networks. Nature, 410(6825), 268–276.CrossRefGoogle Scholar
  50. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  51. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.CrossRefGoogle Scholar
  52. White, H. D. (1990). Author co-citation analysis: Overview and defense. Scholarly Communication and Bibliometrics, 84, 106.Google Scholar
  53. White, H. D., & Griffith, B. C. (1981). Author cocitation: A literature measure of intellectual structure. Journal of the American Society for Information Science, 32(3), 163–171.CrossRefGoogle Scholar
  54. White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.Google Scholar
  55. Yan, E., Ding, Y., & Jacob, E. K. (2012a). Overlaying communities and topics: An analysis on publication networks. Scientometrics, 90, 499–513.CrossRefGoogle Scholar
  56. Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012b). Topics in dynamic research communities: An exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.CrossRefGoogle Scholar
  57. Zhao, D., & Strotmann, A. (2008). Author bibliographic coupling: Another approach to citation-based author knowledge network analysis. Proceedings of the American Society for Information Science and Technology, 45(1), 1–10.CrossRefGoogle Scholar
  58. Zhou, D., Manavoglu, E., Li, J., Giles, L. C., & Zha, H. (2006). Probabilistic models for discovering e-communities. In Proceedings of the 15th ACM international conference on world wide web, May 23–26, 2006, Edinburgh, Scotland (pp. 173–182).Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  1. 1.School of Information ManagementWuhan UniversityWuhanChina
  2. 2.Center for the Studies of Information ResourcesWuhan UniversityWuhanChina
  3. 3.School of Library and Information StudiesUniversity of OklahomaNormanUSA

Personalised recommendations