Abstract
Scientific literature records research progresses of science and technology. Research topics of technologies are evolving in scientific literature. The temporal distribution of the research topic keywords in literature can reflect the evolving stages of a research topic over time. A research topic can be in different evolving stages with different evolving distributions. Previous work mainly focused on visualizing the temporal distribution of keyword weights to illustrate the developing history and trend of a research topic in a literature collection. Quantitatively measuring the evolving stage of a research topic keyword by a baseline distribution can help to detect topic evolving stages in a large scientific literature corpus in an automatic way. How to build a quantitative baseline and how to quantitative compare the topic temporal distribution with the baseline distribution are two challenges. In this paper, an explicit function of the research heat curve is obtained by constructing a differential equation system of evolving research population groups within a research community on a research topic represented by a topic keyword. Six segments of the heat curve are obtained by zero points of derivatives of the heat curve, which together with the full heat curve are used as the quantitative baselines for measuring the temporal distribution of a research topic in different evolving stages. The temporal distribution of a research topic keyword in a scientific literature collection is obtained from the TF-IDF features of the literature collection. A curve shape matching algorithm is designed to match the temporal distribution curve with each baseline segment of the heat curve function to obtain a distance by measuring the shape similarity between the baseline segment curve and the temporal distribution curve. The segment with the smallest distance is used as a quantitative indicator of the evolving stage of the research topic. Experiments on the produced distributions and the real distributions confirm the effectiveness of the heat curve matching method for measuring the evolving stages from the temporal distribution of topics.
Similar content being viewed by others
References
Abuhay, T. M., Nigatie, Y. G., & Kovalchuk, S. V. (2018). Towards predicting trend of scientific research topics using topic modeling. Procedia Computer Science, 136, 304–310.
Azoulay, P., et al. (2018). Toward a more scientific science. Science, 361(6408), 1194–1197.
Balili, C., Segev, A., & Lee, U. (2017). Tracking and predicting the evolution of research topics in scientific literature. In 2017 IEEE international conference on big data (big data) (pp. 1694–1697), 2017.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Bolelli, L., Ertekin, Ş, & Giles, C. L. (2009). Topic and trend detection in text collections using latent Dirichlet allocation. Advances in Information Retrieval (pp. 776–780). Berlin: Springer.
Bornmann, L., Haunschild, R., & Mutz, R. (2021). Growth rates of modern science: A latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanities and Social Sciences Communications, 8(1), 224.
Campani, M., & Vaglio, R. (2014). A simple interpretation of the growth of scientific/technological research impact leading to hype-type evolution curves. CoRR, vol. abs/1410.8685.
Carr, N. (2017). A mathematical justification of the Gartner hype curve A Mathematical formulation of the emerging risk curve and justification for the Gartner hype cycle. https://www.researchgate.net/publication/334328064_A_Mathematical_Justification_of_the_Gartner_Hype_Curve_A_Mathematical_Formulation_of_the_Emerging_Risk_Curve_and_Justification_for_the_Gartner_Hype_Cycle/stats
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017). Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Dong, K., Xu, H., Luo, R., Wei, L., & Fang, S. (2018). An integrated method for interdisciplinary topic identification and prediction: a case study on information science and library science. Scientometrics, 115(2), 849–868.
Dotsika, F., & Watkins, A. (2017). Identifying potentially disruptive trends by means of keyword network analysis. Technological Forecasting and Social Change, 119, 114–127.
Dwivedi, Y. K., Sharma, A., Rana, N. P., Giannakis, M., Goel, P., & Dutot, V. (2023). Evolution of artificial intelligence research in technological forecasting and social change: Research topics, trends, and future directions. Technological Forecasting and Social Change, 192, 122579.
Foster, J. G., Rzhetsky, A., & Evans, J. A. (2015). Tradition and innovation in scientists’ research strategies. American Sociological Review, 80(5), 875–908.
Garfield, E. (1955). Citation indexes for science. Science, 122(3159), 108–111.
Gartner, I. (2022). Gartner hype cycle. https://www.gartner.com/en/research/methodologies/gartner-hype-cycle
Garner, J., Carley, S., Porter, A. L. & Newman, N. C. (2017). Technological emergence indicators using emergence scoring. In 2017 Portland international conference on management of engineering and technology (PICMET) (pp. 1–12), 2017.
Grootendorst, M. (2020). KeyBERT: Minimal keyword extraction with BERT. https://github.com/MaartenGr/KeyBERT
Henry, S., & McInnes, B. T. (2017). Literature based discovery: Models, methods, and trends. Journal of Biomedical Informatics, 74, 20–32.
Huang, L., Chen, X., Zhang, Y., Wang, C., Cao, X., & Liu, J. (2022). Identification of topic evolution: network analytics with piecewise linear representation and word embedding. Scientometrics, 127(9), 5353–5383.
Kanellos, I., Vergoulis, T., Sacharidis, D., Dalamagas, T., & Vassiliou, Y. (2021). Impact-based ranking of scientific publications: A survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1567–1584.
Kumar, V., & Srivastava, A. (2022). Trends in the thematic landscape of corporate social responsibility research: A structural topic modeling approach. Journal of Business Research, 150, 26–37.
Kyebambe, M. N., Cheng, G., Huang, Y., He, C., & Zhang, Z. (2017). Forecasting emerging technologies: A supervised learning approach through patent analysis. Technological Forecasting and Social Change, 125, 236–244.
Lin, Y., Evans, J. A., & Wu, L. (2022). New directions in science emerge from disconnection and discord. Journal of Informetrics, 16(1), 101234.
Lu, W., Huang, S., Yang, J., Bu, Y., Cheng, Q., & Huang, Y. (2021). Detecting research topic trends by author-defined keyword frequency. Information Processing & Management, 58(4), 102594.
Ma, J., Wang, L., Zhang, Y.-R., Yuan, W., & Guo, W. (2023). An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local. Expert Systems with Applications, 212, 118695.
Ma, N., Guan, J., & Zhao, Y. (2008). Bringing PageRank to the citation analysis. Information Processing & Management, 44(2), 800–810.
Ma, T., Zhou, X., Liu, J., Lou, Z., Hua, Z., & Wang, R. (2021). Combining topic modeling and SAO semantic analysis to identify technological opportunities of emerging technologies. Technological Forecasting and Social Change, 173, 121159.
Malarya, A., Ragunathan, K., Kamaraj, M. B., Vijayarajan, V. (2021). Emerging trends demand forecast using dynamic time warping. In 2021 IEEE 22nd international conference on information reuse and integration for data science (IRI) (pp. 402–407), 2021.
Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933–1953.
Mizutani, E., & Dreyfus, S. (2021). On using dynamic programming for time warping in pattern recognition. Information Sciences, 580, 684–704.
Ranaei, S., Suominen, A., Porter, A., & Carley, S. (2020). Evaluating technological emergence using text analytics: two case technologies and three approaches. Scientometrics, 122(1), 215–247.
Rotolo, D., Hicks, D., & Martin, B. R. (2015). What is an emerging technology? Research Policy, 44(10), 1827–1843.
Sharma, P. L. (2019). Self-supervised contextual keyword and keyphrase retrieval with self-labelling. Preprints, 2019080073, 1.
Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., & Matsushima, K. (2011). Detecting emerging research fronts in regenerative medicine by the citation network analysis of scientific publications. Technological Forecasting and Social Change, 78(2), 274–282.
Shibayama, S., & Wang, J. (2020). Measuring originality in science. Scientometrics, 122(1), 409–427.
Taher Harikandeh, S. R., Aliakbary, S., & Taheri, S. (2023). An embedding approach for analyzing the evolution of research topics with a case study on computer science subdomains. Scientometrics, 128(3), 1567–1582.
Tomojiri, D., Takaya, K., & Ise, T. (2022). Temporal trends and spatial distribution of research topics in anthropogenic marine debris study: Topic modelling using latent Dirichlet allocation. Marine Pollution Bulletin, 182, 113917.
Tsinaslanidis, P. E., & Kugiumtzis, D. (2014). A prediction scheme using perceptually important points and dynamic time warping. Expert Systems with Applications, 41(15), 6848–6860.
Tu, Y.-N., & Seng, J.-L. (2012). Indices of novelty for emerging topic detection. Information Processing & Management, 48(2), 303–325.
Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342(6157), 468–472.
Wang, J., Veugelers, R., & Stephan, P. (2017). Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy, 46(8), 1416–1436.
Wang, S., Xie, S., Zhang, X., Li, Z., Yu, P. S., & He, Y. (2016). Coranking the future influence of multiobjects in bibliographic network through mutual reinforcement. ACM Transactions on Intelligent Systems and Technology (TIST), 7, 1–28.
Wu, H., Yi, H., & Li, C. (2021). An integrated approach for detecting and quantifying the topic evolutions of patent technology: A case study on graphene field. Scientometrics, 126(8), 6301–6321.
Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378–382.
Xu, H., Guo, T., Yue, Z., Ru, L., & Fang, S. (2016). Interdisciplinary topics of information science: a study based on the terms interdisciplinarity index series. Scientometrics, 106(2), 583–601.
Yan, E. (2014). Research dynamics: Measuring the continuity and popularity of research topics. Journal of Informetrics, 8(1), 98–110.
Zhang, Y., Chen, H., Lu, J., & Zhang, G. (2017). Detecting and predicting the topic change of Knowledge-based Systems: A topic-based bibliometric analysis from 1991 to 2016. Knowledge-Based Systems, 133, 255–268.
Zhou, H.-K., Yu, H.-M., & Hu, R. (2017). Topic discovery and evolution in scientific literature based on content and citations. Frontiers of Information Technology & Electronic Engineering, 18(10), 1511–1524.
Acknowledgements
The project is supported by the National R&D Project "Disruptive technology detection, theory, methods and expert system", project no: 2019YFA0707201, the Innovation Research Fund granted by Institute of Scientific and Technical Information of China with Project No: MS2022-05 and the Joint Project of CAS and Austria on ADaptive and Autonomous Data Performance Connectivity and Decentralized Transport Decision-Making Network (ADAPT, No. 881703).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Sun, X. & Liu, Z. Measuring the evolving stage of temporal distribution of research topic keyword in scientific literature by research heat curve. Scientometrics (2024). https://doi.org/10.1007/s11192-024-04937-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11192-024-04937-0