Abstract
Scientific communities are clusters of researchers and play important roles in modern science. Studying different forms of scientific communities that either physically or virtually exist is a feasible way to disclose underlying mechanisms of science. From the perspective of complex networks, topology-based communities and topic-based communities reflect scientific collaboration and topical features of science respectively. However, the two features are not isolated but intertwined in scientific practice. This study proposes an approach to detect Topical Scientific Communities (TSCs) with both topology and topic features by applying machine learning techniques and network theory. As an example, the TSCs of the informetrics field are detected, and then the characteristics of these TSCs are analyzed. It is shown that collaboration patterns on the topic level can be revealed by analyzing the static network structure and dynamics of TSCs. Furthermore, cross-topic collaborations at multiple levels could be investigated through TSCs. In addition, TSCs can effectively organize researchers in terms of productivity. Future work will further explore and generalize characteristics of TSCs, and the applications of TSCs to other tasks of studying science.
Similar content being viewed by others
Notes
One can imagine 4 as the value of the parameter t, the minimum number of topics one author can have, if the paper has only one author and the author only publishes the only paper.
References
Allison, P. D., & Stewart, J. A. (1974). Productivity differences among scientists: Evidence for accumulative advantage. American Sociological Review, 39(4), 596–606.
Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436(7053), 900.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3), e18029.
Brown, C. M. (1999). Information seeking behavior of scientists in the electronic information age: Astronomers, chemists, mathematicians, and physicists. Journal of the Association for Information Science and Technology, 50(10), 929.
Cahan, D. (2003). From natural philosophy to the sciences: Writing the history of nineteenth-century science. Chicago, London: University of Chicago Press.
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7), 1775–1781.
Clauset, A., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111.
Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703.
Crane, D. (1965). Scientists at major and minor universities: A study of productivity and recognition. American Sociological Review, 1965, 699–714.
Dai, A. M., & Storkey, A. J. (2009, December). Author disambiguation: A nonparametric topic and co-authorship model. In NIPS workshop on applications for topic models text and beyond (pp. 1–4).
Ding, Y. (2011). Community detection: Topological vs. topical. Journal of Informetrics, 5(4), 498–514.
Evans, T. S., & Lambiotte, R. (2009). Line graphs, link partitions, and overlapping communities. Physical Review E, 80(1), 016105.
Galvagno, M. (2011). The intellectual structure of the anti-consumption and consumer resistance field: An author co-citation analysis. European Journal of Marketing, 45(11/12), 1688–1701.
Garfield, E., & Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities (Vol. 8). New York: Wiley.
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
Glänzel, W. (2012). Bibliometric methods for detecting and analysing emerging research topics. El profesional de la información, 21(1), 194–201.
Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for detecting and labelling new emerging topics. Scientometrics, 91(2), 399–416.
Griffith, B. C., Small, H. G., Stonehill, J. A., & Dey, S. (1974). The structure of scientific literatures II: Toward a macro-and microstructure for science. Social Studies of Science, 4(4), 339–365.
Griffiths, T. (2002). Gibbs sampling in the generative model of latent dirichlet allocation. Technical report, Stanford University.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.
Hein, D. I. O., Schwind, D. W. I. M., & König, W. (2006). Scale-free networks. Wirtschaftsinformatik, 48(4), 267–275.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Kuhn, T. S. (2012). The structure of scientific revolutions. Chicago, London: University of Chicago Press.
Lau, J. H., Grieser, K., Newman, D., & Baldwin, T. (2011). Automatic labelling of topic models. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (Vol. 1, pp. 1536–1545). Association for Computational Linguistics.
Li, D., He, B., Ding, Y., Tang, J., Sugimoto, C., Qin, Z., et al. (2010). Community-based topic modeling for social tagging. In Proceedings of the 19th ACM international conference on information and knowledge management (CIKM2010), October 26–30, 2010, Toronto, Canada (pp. 1565–1568).
Li, D., Zhu, J., Ding, Y., Xin, S., Chen, S., Tang, J., Bollen, J., & Rocha, G. (2011). Adding community and dynamics to topic models. Technical Report. School of Library and Information Science, Indiana University.
Lu, K., & Wolfram, D. (2010). Geographic characteristics of the growth of informetrics literature 1987–2008. Journal of Informetrics, 4(4), 591–601.
Lužar, B., Levnajić, Z., Povh, J., & Perc, M. (2014). Community structure and the evolution of interdisciplinarity in slovenia’s scientific collaboration network. PLoS ONE, 9(4), e94429.
McCain, K. W. (1990). Mapping authors in intellectual space: A technical overview. Journal of the American Society for Information Science, 41(6), 433.
Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587.
Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing (pp. 262–272). Association for Computational Linguistics.
Morris, S. A., & Goldstein, M. L. (2007). Manifestation of research teams in journal literature: A growth model of papers, authors, collaboration, coauthorship, weak ties, and Lotka’s law. Journal of the American Society for Information Science and Technology, 58(12), 1764–1782.
Nagarajan, R., Kalinka, A. T., & Hogan, W. R. (2013). Evidence of community structure in biomedical research grant collaborations. Journal of Biomedical Informatics, 46(1), 40–46.
Newman, M. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1), 016131.
Newman, M. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.
Newman, M. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101(suppl 1), 5200–5205.
Newman, M. (2010). Networks: An introduction. New York: Oxford University Press.
Palla, G., Barabási, A. L., & Vicsek, T. (2007). Quantifying social group evolution. Nature, 446(7136), 664–667.
Price de Solla, D. J. (1963). Little science, big science. NewYork: Columbia University Press.
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9), 2658–2663.
Ramasco, J. J., & Morris, S. A. (2006). Social inertia in collaboration networks. Physical Review E, 73(1), 016122.
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS), 28(1), 4.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.
Sonnenwald, D. H. (2007). Scientific collaboration. Annual Review of Information Science and Technology, 41(1), 643–681.
Strogatz, S. H. (2001). Exploring complex networks. Nature, 410(6825), 268–276.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge: Cambridge University Press.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.
White, H. D. (1990). Author co-citation analysis: Overview and defense. Scholarly Communication and Bibliometrics, 84, 106.
White, H. D., & Griffith, B. C. (1981). Author cocitation: A literature measure of intellectual structure. Journal of the American Society for Information Science, 32(3), 163–171.
White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.
Yan, E., Ding, Y., & Jacob, E. K. (2012a). Overlaying communities and topics: An analysis on publication networks. Scientometrics, 90, 499–513.
Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012b). Topics in dynamic research communities: An exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.
Zhao, D., & Strotmann, A. (2008). Author bibliographic coupling: Another approach to citation-based author knowledge network analysis. Proceedings of the American Society for Information Science and Technology, 45(1), 1–10.
Zhou, D., Manavoglu, E., Li, J., Giles, L. C., & Zha, H. (2006). Probabilistic models for discovering e-communities. In Proceedings of the 15th ACM international conference on world wide web, May 23–26, 2006, Edinburgh, Scotland (pp. 173–182).
Acknowledgements
We thank the anonymous reviewers for their comments. We also thank Dr. Hong Cui and Dr. Guo Chen for providing suggestions on an earlier version of this paper. This study is supported by the National Natural Science Foundation of China (CN) funded projects under Grant Nos. 71603189, 71420107026, and 71403190.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 6.
Rights and permissions
About this article
Cite this article
Mao, J., Cao, Y., Lu, K. et al. Topic scientific community in science: a combined perspective of scientific collaboration and topics. Scientometrics 112, 851–875 (2017). https://doi.org/10.1007/s11192-017-2418-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2418-7