Abstract
The past decades have witnessed significant progress in scientific research, where new technologies emerge and traditional technologies constantly evolve. As a critical task in the Science of Science (SciSci), automatically mining technology trends from massive scientific publications have attracted broad research interests in various communities. While existing approaches can achieve remarkable performance, there are still many critical challenges to address, such as data sparsity, cross-document influence, and temporal dependency. To this end, in this paper, we propose a technical terms-based graph propagated neural topic model for mining technology trends in scientific publications. Specifically, we first utilize the documents’ citation relations and technical terms to construct a heterogeneous graph. Then, we design a term propagation network to spread the technical terms on the heterogeneous graph to overcome the sparseness of technical terms. In addition, we develop a dynamic embedded topic modeling method to capture the temporal dependencies for technical terms in cross-document, which can discover the distribution of technical terms over time. Finally, extensive experiments on real-world scientific datasets validate the effectiveness and interpretability of our approach compared with state-of-the-art baselines.
Similar content being viewed by others
Data Availability
The raw data can be found in https://zenodo.org/record/4617285.
Notes
We utilize the final layer’s [CLS] output of BERT as the representation of the document.
Microsoft Academic were retired in 2021. Due to the lack of complete data for 2021, we select papers before 2021.
References
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):0185
Kim G, Bae J (2017) A novel approach to forecast promising technology through patent analysis. Technological forecasting and social change
Prabhakaran V, Hamilton WL, McFarland D, Jurafsky D (2016) Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In: Proceedings of the ACL
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993–1022
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the ICML
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Hu J (2015) Modeling the evolution of development topics using dynamic topic models. In: 22nd IEEE international conference on software analysis, evolution,and reengineering, pp 3–12
Wang J (2007) Understanding research field evolving and trend with dynamic Bayesian networks. Proc PAKDD 4426:320–331
Zhao H, Phung DQ, Huynh V, Jin Y, Du L, Buntine WL (2021) Topic modelling meets deep neural networks: a survey. In: Proceedings of the IJCAI
Chen C, Wang Z, Li W, Sun X (2018) Modeling scientific influence for research trending topic prediction. In: Proceedings of the AAAI
Lu W (2021) Detecting research topic trends by author-defined keyword frequency. Inf Process Manag 58(4):102594
Gao Q (2022) Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics 127(3):1543–1563
Xie Q (2020) Monolingual and multilingual topic analysis using LDA and BERT embeddings. J Inf 14(3):101055
Bai H, Chen Z, Lyu MR, King I, Xu Z (2018) Neural relational topic models for scientific article analysis. In: Proceedings of the CIKM
Xu M et al (2022) A scientific research topic trend prediction model based on multi-lstm and graph convolutional network. Int J Intell Syst 37:6331–6353
Noji H, Mochihashi D, Miyao Y (2013) Improvements to the Bayesian topic n-gram models. In: Proceedings of the EMNLP
Shibata N, Kajikawa Y, Takeda Y, Matsushima K (2008) Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation 28:758–775
Soriano AS, Álvarez CL, Valdés RMT (2018) Bibliometric analysis to identify an emerging research area: public relations intelligence—a challenge to strengthen technological observatories in the network society. Scientometrics 115:1591–1614
Sayyadi H, Getoor L (2009) Futurerank: ranking scientific articles by predicting their future pagerank. In: SDM
Jiang S, Koch B, Sun Y (2021) Hints: citation time series prediction for new publications via dynamic heterogeneous information network embedding. In: Proceedings of the WWW
Jin B, Ge Y, Zhu H, Guo L, Xiong H, Zhang C (2014) Technology prospecting for high tech companies through patent mining. In: Proceedings of the ICDM
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the EMNLP
Sun X, Ding K (2018) Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents. Scientometrics 116(3):1735–1748. https://doi.org/10.1007/s11192-018-2836-1
Mao J, Liang Z, Cao Y, Li G (2020) Quantifying cross-disciplinary knowledge flow from the perspective of content: introducing an approach based on knowledge memes. J Informetr 14(4):101092. https://doi.org/10.1016/j.joi.2020.101092
Kuhn T, Perc M, Helbing D (2014) Inheritance patterns in citation networks reveal scientific memes. Phys Rev X 4(4):041036
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the EMNLP
Srivastava A, Sutton C (2017) Autoencoding variational inference for topic models. In: Proceedings of the ICLR
Miao Y, Grefenstette E, Blunsom P (2017) Discovering discrete latent topics with neural variational inference. In: Proceedings of the ICML
Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439–453
Zhang C, Lauw HW (2020) Topic modeling on document networks with adjacent-encoder. In: Proceedings of the AAAI
Chang JD, Blei DM (2009) Relational topic models for document networks. In: Proceedings of the AISTATS
Xie Q, Huang J, Du P, Peng M, Nie J-Y (2021) Graph topic neural network for document representation. In: Proceedings of the WWW
Papernot N, Abadi M, Erlingsson Ú, Goodfellow IJ, Talwar K (2017) Semi-supervised knowledge transfer for deep learning from private training data. In: Proceedings of the ICLR
Zhu Q, Feng Z, Li X (2018) Graphbtm: Graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the EMNLP
Shen D, Qin C, Wang C, Dong Z, Zhu H, Xiong H (2021) Topic modeling revisited: a document graph-based neural network perspective. In: Proceedings of the NeurIPS, pp 14681–14693
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the NeurIPS
Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the ACL
Grootendorst M (2022) Bertopic: neural topic modeling with a class-based TF-IDF procedure. CoRR arXiv:2203.05794
Reimers N, Gurevych I (2019) Sentence-bert: sentence embeddings using Siamese bert-networks. In: Proceedings of the EMNLP
Jahnichen P, Wenzel F, Kloft M, Mandt S (2018) Scalable generalized dynamic topic models. In: Proceedings of the AISTATS
Dieng AB, Ruiz FJR, Blei DM (2019) The dynamic embedded topic model. CoRR
Devlin J, Chang M-W, Lee K, Toutanova KN (2018) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the ACL
Jain S, van Zuylen M, Hajishirzi H, Beltagy I (2020) Scirex: a challenge dataset for document-level information extraction. In: Proceedings of the ACL
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the ICONIP
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech 2014
Taheri S, Aliakbary S (2022) Research trend prediction in computer science publications: a deep neural network approach. Scientometrics 127(2):849–869. https://doi.org/10.1007/s11192-021-04240-2
Roder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the WSDM
Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: From Form to meaning: processing texts automatically, proceedings of the biennial GSCL conference 2009
Sia S, Dalmia A, Mielke SJ (2020) Tired of topic models? clusters of pretrained word embeddings make for fast and good topics too! In: Proceedings of the EMNLP
Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the ACL
Funding
This research work is supported by the National Key Research and Development Program of China under Grant No.2019YFA0707204, the National Natural Science Foundation of China under Grant Nos.62176014, 62276015, the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Contributions
Chenguang Du contributed to conceptualization; CD and KY contributed to methodology; HZ and FZ involved in formal analysis and investigation; CD and KY involved in writing—original draft preparation; HZ, DW, FZ, and HX involved in writing—review and editing; CD and DW involved in funding acquisition; CD and DW contributed to resources; HZ, DW, FZ, and HX involved in supervision.
Corresponding authors
Ethics declarations
Conflit of interests
Not applicable.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Du, C., Yao, K., Zhu, H. et al. Mining technology trends in scientific publications: a graph propagated neural topic modeling approach. Knowl Inf Syst 66, 3085–3114 (2024). https://doi.org/10.1007/s10115-023-02005-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02005-2