Mining technology trends in scientific publications: a graph propagated neural topic modeling approach

Du, Chenguang; Yao, Kaichun; Zhu, Hengshu; Wang, Deqing; Zhuang, Fuzhen; Xiong, Hui

doi:10.1007/s10115-023-02005-2

Mining technology trends in scientific publications: a graph propagated neural topic modeling approach

Regular Paper
Published: 30 January 2024

Volume 66, pages 3085–3114, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Chenguang Du¹,
Kaichun Yao²,
Hengshu Zhu³,
Deqing Wang¹,
Fuzhen Zhuang^4,5 &
…
Hui Xiong^5,6

225 Accesses
Explore all metrics

Abstract

The past decades have witnessed significant progress in scientific research, where new technologies emerge and traditional technologies constantly evolve. As a critical task in the Science of Science (SciSci), automatically mining technology trends from massive scientific publications have attracted broad research interests in various communities. While existing approaches can achieve remarkable performance, there are still many critical challenges to address, such as data sparsity, cross-document influence, and temporal dependency. To this end, in this paper, we propose a technical terms-based graph propagated neural topic model for mining technology trends in scientific publications. Specifically, we first utilize the documents’ citation relations and technical terms to construct a heterogeneous graph. Then, we design a term propagation network to spread the technical terms on the heterogeneous graph to overcome the sparseness of technical terms. In addition, we develop a dynamic embedded topic modeling method to capture the temporal dependencies for technical terms in cross-document, which can discover the distribution of technical terms over time. Finally, extensive experiments on real-world scientific datasets validate the effectiveness and interpretability of our approach compared with state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation

Article Open access 18 October 2019

Mapping the technology evolution path: a novel model for dynamic topic detection and tracking

Article Open access 14 September 2020

Topic Discovery in Scientific Literature

Data Availability

The raw data can be found in https://zenodo.org/record/4617285.

Notes

https://trends.google.com/trends.
https://github.com/allenai/SciREX.
We utilize the final layer’s [CLS] output of BERT as the representation of the document.
Microsoft Academic were retired in 2021. Due to the lack of complete data for 2021, we select papers before 2021.
https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html.
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html.
https://github.com/JoeZJH/Labeled-LDA-Python.
https://github.com/zll17/Neural_Topic_Models#NVDM-GSM.
https://github.com/adjidieng/ETM.
https://github.com/MilaNLProc/contextualized-topic-models.
https://github.com/cezhang01/Adjacent-Encoder.
https://github.com/SmilesDZgk/GNTM.
https://github.com/adjidieng/DETM.
https://github.com/MaartenGr/BERTopic.
https://pytorch.org/.
https://www.gartner.com/en/research/methodologies/gartner-hype-cycle.
https://www.gartner.com/en/documents/4003843.

References

Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):0185
Article Google Scholar
Kim G, Bae J (2017) A novel approach to forecast promising technology through patent analysis. Technological forecasting and social change
Prabhakaran V, Hamilton WL, McFarland D, Jurafsky D (2016) Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In: Proceedings of the ACL
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993–1022
Google Scholar
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the ICML
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Article Google Scholar
Hu J (2015) Modeling the evolution of development topics using dynamic topic models. In: 22nd IEEE international conference on software analysis, evolution,and reengineering, pp 3–12
Wang J (2007) Understanding research field evolving and trend with dynamic Bayesian networks. Proc PAKDD 4426:320–331
Google Scholar
Zhao H, Phung DQ, Huynh V, Jin Y, Du L, Buntine WL (2021) Topic modelling meets deep neural networks: a survey. In: Proceedings of the IJCAI
Chen C, Wang Z, Li W, Sun X (2018) Modeling scientific influence for research trending topic prediction. In: Proceedings of the AAAI
Lu W (2021) Detecting research topic trends by author-defined keyword frequency. Inf Process Manag 58(4):102594
Article Google Scholar
Gao Q (2022) Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics 127(3):1543–1563
Article Google Scholar
Xie Q (2020) Monolingual and multilingual topic analysis using LDA and BERT embeddings. J Inf 14(3):101055
Google Scholar
Bai H, Chen Z, Lyu MR, King I, Xu Z (2018) Neural relational topic models for scientific article analysis. In: Proceedings of the CIKM
Xu M et al (2022) A scientific research topic trend prediction model based on multi-lstm and graph convolutional network. Int J Intell Syst 37:6331–6353
Article Google Scholar
Noji H, Mochihashi D, Miyao Y (2013) Improvements to the Bayesian topic n-gram models. In: Proceedings of the EMNLP
Shibata N, Kajikawa Y, Takeda Y, Matsushima K (2008) Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation 28:758–775
Article Google Scholar
Soriano AS, Álvarez CL, Valdés RMT (2018) Bibliometric analysis to identify an emerging research area: public relations intelligence—a challenge to strengthen technological observatories in the network society. Scientometrics 115:1591–1614
Article Google Scholar
Sayyadi H, Getoor L (2009) Futurerank: ranking scientific articles by predicting their future pagerank. In: SDM
Jiang S, Koch B, Sun Y (2021) Hints: citation time series prediction for new publications via dynamic heterogeneous information network embedding. In: Proceedings of the WWW
Jin B, Ge Y, Zhu H, Guo L, Xiong H, Zhang C (2014) Technology prospecting for high tech companies through patent mining. In: Proceedings of the ICDM
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the EMNLP
Sun X, Ding K (2018) Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents. Scientometrics 116(3):1735–1748. https://doi.org/10.1007/s11192-018-2836-1
Article Google Scholar
Mao J, Liang Z, Cao Y, Li G (2020) Quantifying cross-disciplinary knowledge flow from the perspective of content: introducing an approach based on knowledge memes. J Informetr 14(4):101092. https://doi.org/10.1016/j.joi.2020.101092
Article Google Scholar
Kuhn T, Perc M, Helbing D (2014) Inheritance patterns in citation networks reveal scientific memes. Phys Rev X 4(4):041036
Google Scholar
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the EMNLP
Srivastava A, Sutton C (2017) Autoencoding variational inference for topic models. In: Proceedings of the ICLR
Miao Y, Grefenstette E, Blunsom P (2017) Discovering discrete latent topics with neural variational inference. In: Proceedings of the ICML
Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439–453
Article Google Scholar
Zhang C, Lauw HW (2020) Topic modeling on document networks with adjacent-encoder. In: Proceedings of the AAAI
Chang JD, Blei DM (2009) Relational topic models for document networks. In: Proceedings of the AISTATS
Xie Q, Huang J, Du P, Peng M, Nie J-Y (2021) Graph topic neural network for document representation. In: Proceedings of the WWW
Papernot N, Abadi M, Erlingsson Ú, Goodfellow IJ, Talwar K (2017) Semi-supervised knowledge transfer for deep learning from private training data. In: Proceedings of the ICLR
Zhu Q, Feng Z, Li X (2018) Graphbtm: Graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the EMNLP
Shen D, Qin C, Wang C, Dong Z, Zhu H, Xiong H (2021) Topic modeling revisited: a document graph-based neural network perspective. In: Proceedings of the NeurIPS, pp 14681–14693
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the NeurIPS
Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the ACL
Grootendorst M (2022) Bertopic: neural topic modeling with a class-based TF-IDF procedure. CoRR arXiv:2203.05794
Reimers N, Gurevych I (2019) Sentence-bert: sentence embeddings using Siamese bert-networks. In: Proceedings of the EMNLP
Jahnichen P, Wenzel F, Kloft M, Mandt S (2018) Scalable generalized dynamic topic models. In: Proceedings of the AISTATS
Dieng AB, Ruiz FJR, Blei DM (2019) The dynamic embedded topic model. CoRR
Devlin J, Chang M-W, Lee K, Toutanova KN (2018) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the ACL
Jain S, van Zuylen M, Hajishirzi H, Beltagy I (2020) Scirex: a challenge dataset for document-level information extraction. In: Proceedings of the ACL
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the ICONIP
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech 2014
Taheri S, Aliakbary S (2022) Research trend prediction in computer science publications: a deep neural network approach. Scientometrics 127(2):849–869. https://doi.org/10.1007/s11192-021-04240-2
Article Google Scholar
Roder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the WSDM
Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: From Form to meaning: processing texts automatically, proceedings of the biennial GSCL conference 2009
Sia S, Dalmia A, Mielke SJ (2020) Tired of topic models? clusters of pretrained word embeddings make for fast and good topics too! In: Proceedings of the EMNLP
Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the ACL

Download references

Funding

This research work is supported by the National Key Research and Development Program of China under Grant No.2019YFA0707204, the National Natural Science Foundation of China under Grant Nos.62176014, 62276015, the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Computer, Beihang University, Xueyuan Road, Haidian District, Beijing, 100191, China
Chenguang Du & Deqing Wang
Institute of Software, Chinese Academy of Sciences, South Fourth Street, Zhong Guan Cun, Haidian District, Beijing, 100190, China
Kaichun Yao
Career Science Lab, BOSS Zhipin, Taiyanggong Middle Road, Chaoyang District, Beijing, 100028, China
Hengshu Zhu
Institute of Artificial Intelligence, Beihang University, Beijing, 100191, China
Fuzhen Zhuang
SKLSDE, School of Computer Science, Beihang University, Beijing, 100191, China
Fuzhen Zhuang & Hui Xiong
Artificial Intelligence Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511458, Guangdong, China
Hui Xiong

Authors

Chenguang Du
View author publications
You can also search for this author in PubMed Google Scholar
Kaichun Yao
View author publications
You can also search for this author in PubMed Google Scholar
Hengshu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Deqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fuzhen Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chenguang Du contributed to conceptualization; CD and KY contributed to methodology; HZ and FZ involved in formal analysis and investigation; CD and KY involved in writing—original draft preparation; HZ, DW, FZ, and HX involved in writing—review and editing; CD and DW involved in funding acquisition; CD and DW contributed to resources; HZ, DW, FZ, and HX involved in supervision.

Corresponding authors

Correspondence to Kaichun Yao or Deqing Wang.

Ethics declarations

Conflit of interests

Not applicable.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Du, C., Yao, K., Zhu, H. et al. Mining technology trends in scientific publications: a graph propagated neural topic modeling approach. Knowl Inf Syst 66, 3085–3114 (2024). https://doi.org/10.1007/s10115-023-02005-2

Download citation

Received: 06 April 2023
Revised: 13 September 2023
Accepted: 09 October 2023
Published: 30 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10115-023-02005-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining technology trends in scientific publications: a graph propagated neural topic modeling approach

Abstract

Access this article

Similar content being viewed by others

TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation

Mapping the technology evolution path: a novel model for dynamic topic detection and tracking

Topic Discovery in Scientific Literature

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflit of interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining technology trends in scientific publications: a graph propagated neural topic modeling approach

Abstract

Access this article

Similar content being viewed by others

TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation

Mapping the technology evolution path: a novel model for dynamic topic detection and tracking

Topic Discovery in Scientific Literature

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflit of interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation