Skip to main content
Log in

Mining technology trends in scientific publications: a graph propagated neural topic modeling approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The past decades have witnessed significant progress in scientific research, where new technologies emerge and traditional technologies constantly evolve. As a critical task in the Science of Science (SciSci), automatically mining technology trends from massive scientific publications have attracted broad research interests in various communities. While existing approaches can achieve remarkable performance, there are still many critical challenges to address, such as data sparsity, cross-document influence, and temporal dependency. To this end, in this paper, we propose a technical terms-based graph propagated neural topic model for mining technology trends in scientific publications. Specifically, we first utilize the documents’ citation relations and technical terms to construct a heterogeneous graph. Then, we design a term propagation network to spread the technical terms on the heterogeneous graph to overcome the sparseness of technical terms. In addition, we develop a dynamic embedded topic modeling method to capture the temporal dependencies for technical terms in cross-document, which can discover the distribution of technical terms over time. Finally, extensive experiments on real-world scientific datasets validate the effectiveness and interpretability of our approach compared with state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The raw data can be found in https://zenodo.org/record/4617285.

Notes

  1. https://trends.google.com/trends.

  2. https://github.com/allenai/SciREX.

  3. We utilize the final layer’s [CLS] output of BERT as the representation of the document.

  4. Microsoft Academic were retired in 2021. Due to the lack of complete data for 2021, we select papers before 2021.

  5. https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

  6. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html.

  7. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html.

  8. https://github.com/JoeZJH/Labeled-LDA-Python.

  9. https://github.com/zll17/Neural_Topic_Models#NVDM-GSM.

  10. https://github.com/adjidieng/ETM.

  11. https://github.com/MilaNLProc/contextualized-topic-models.

  12. https://github.com/cezhang01/Adjacent-Encoder.

  13. https://github.com/SmilesDZgk/GNTM.

  14. https://github.com/adjidieng/DETM.

  15. https://github.com/MaartenGr/BERTopic.

  16. https://pytorch.org/.

  17. https://www.gartner.com/en/research/methodologies/gartner-hype-cycle.

  18. https://www.gartner.com/en/documents/4003843.

References

  1. Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):0185

    Article  Google Scholar 

  2. Kim G, Bae J (2017) A novel approach to forecast promising technology through patent analysis. Technological forecasting and social change

  3. Prabhakaran V, Hamilton WL, McFarland D, Jurafsky D (2016) Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In: Proceedings of the ACL

  4. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993–1022

    Google Scholar 

  5. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the ICML

  6. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235

    Article  Google Scholar 

  7. Hu J (2015) Modeling the evolution of development topics using dynamic topic models. In: 22nd IEEE international conference on software analysis, evolution,and reengineering, pp 3–12

  8. Wang J (2007) Understanding research field evolving and trend with dynamic Bayesian networks. Proc PAKDD 4426:320–331

    Google Scholar 

  9. Zhao H, Phung DQ, Huynh V, Jin Y, Du L, Buntine WL (2021) Topic modelling meets deep neural networks: a survey. In: Proceedings of the IJCAI

  10. Chen C, Wang Z, Li W, Sun X (2018) Modeling scientific influence for research trending topic prediction. In: Proceedings of the AAAI

  11. Lu W (2021) Detecting research topic trends by author-defined keyword frequency. Inf Process Manag 58(4):102594

    Article  Google Scholar 

  12. Gao Q (2022) Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics 127(3):1543–1563

    Article  Google Scholar 

  13. Xie Q (2020) Monolingual and multilingual topic analysis using LDA and BERT embeddings. J Inf 14(3):101055

    Google Scholar 

  14. Bai H, Chen Z, Lyu MR, King I, Xu Z (2018) Neural relational topic models for scientific article analysis. In: Proceedings of the CIKM

  15. Xu M et al (2022) A scientific research topic trend prediction model based on multi-lstm and graph convolutional network. Int J Intell Syst 37:6331–6353

    Article  Google Scholar 

  16. Noji H, Mochihashi D, Miyao Y (2013) Improvements to the Bayesian topic n-gram models. In: Proceedings of the EMNLP

  17. Shibata N, Kajikawa Y, Takeda Y, Matsushima K (2008) Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation 28:758–775

    Article  Google Scholar 

  18. Soriano AS, Álvarez CL, Valdés RMT (2018) Bibliometric analysis to identify an emerging research area: public relations intelligence—a challenge to strengthen technological observatories in the network society. Scientometrics 115:1591–1614

    Article  Google Scholar 

  19. Sayyadi H, Getoor L (2009) Futurerank: ranking scientific articles by predicting their future pagerank. In: SDM

  20. Jiang S, Koch B, Sun Y (2021) Hints: citation time series prediction for new publications via dynamic heterogeneous information network embedding. In: Proceedings of the WWW

  21. Jin B, Ge Y, Zhu H, Guo L, Xiong H, Zhang C (2014) Technology prospecting for high tech companies through patent mining. In: Proceedings of the ICDM

  22. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the EMNLP

  23. Sun X, Ding K (2018) Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents. Scientometrics 116(3):1735–1748. https://doi.org/10.1007/s11192-018-2836-1

    Article  Google Scholar 

  24. Mao J, Liang Z, Cao Y, Li G (2020) Quantifying cross-disciplinary knowledge flow from the perspective of content: introducing an approach based on knowledge memes. J Informetr 14(4):101092. https://doi.org/10.1016/j.joi.2020.101092

    Article  Google Scholar 

  25. Kuhn T, Perc M, Helbing D (2014) Inheritance patterns in citation networks reveal scientific memes. Phys Rev X 4(4):041036

    Google Scholar 

  26. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the EMNLP

  27. Srivastava A, Sutton C (2017) Autoencoding variational inference for topic models. In: Proceedings of the ICLR

  28. Miao Y, Grefenstette E, Blunsom P (2017) Discovering discrete latent topics with neural variational inference. In: Proceedings of the ICML

  29. Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439–453

    Article  Google Scholar 

  30. Zhang C, Lauw HW (2020) Topic modeling on document networks with adjacent-encoder. In: Proceedings of the AAAI

  31. Chang JD, Blei DM (2009) Relational topic models for document networks. In: Proceedings of the AISTATS

  32. Xie Q, Huang J, Du P, Peng M, Nie J-Y (2021) Graph topic neural network for document representation. In: Proceedings of the WWW

  33. Papernot N, Abadi M, Erlingsson Ú, Goodfellow IJ, Talwar K (2017) Semi-supervised knowledge transfer for deep learning from private training data. In: Proceedings of the ICLR

  34. Zhu Q, Feng Z, Li X (2018) Graphbtm: Graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the EMNLP

  35. Shen D, Qin C, Wang C, Dong Z, Zhu H, Xiong H (2021) Topic modeling revisited: a document graph-based neural network perspective. In: Proceedings of the NeurIPS, pp 14681–14693

  36. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the NeurIPS

  37. Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the ACL

  38. Grootendorst M (2022) Bertopic: neural topic modeling with a class-based TF-IDF procedure. CoRR arXiv:2203.05794

  39. Reimers N, Gurevych I (2019) Sentence-bert: sentence embeddings using Siamese bert-networks. In: Proceedings of the EMNLP

  40. Jahnichen P, Wenzel F, Kloft M, Mandt S (2018) Scalable generalized dynamic topic models. In: Proceedings of the AISTATS

  41. Dieng AB, Ruiz FJR, Blei DM (2019) The dynamic embedded topic model. CoRR

  42. Devlin J, Chang M-W, Lee K, Toutanova KN (2018) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the ACL

  43. Jain S, van Zuylen M, Hajishirzi H, Beltagy I (2020) Scirex: a challenge dataset for document-level information extraction. In: Proceedings of the ACL

  44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the ICONIP

  45. Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech 2014

  46. Taheri S, Aliakbary S (2022) Research trend prediction in computer science publications: a deep neural network approach. Scientometrics 127(2):849–869. https://doi.org/10.1007/s11192-021-04240-2

    Article  Google Scholar 

  47. Roder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the WSDM

  48. Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: From Form to meaning: processing texts automatically, proceedings of the biennial GSCL conference 2009

  49. Sia S, Dalmia A, Mielke SJ (2020) Tired of topic models? clusters of pretrained word embeddings make for fast and good topics too! In: Proceedings of the EMNLP

  50. Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the ACL

Download references

Funding

This research work is supported by the National Key Research and Development Program of China under Grant No.2019YFA0707204, the National Natural Science Foundation of China under Grant Nos.62176014, 62276015, the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Contributions

Chenguang Du contributed to conceptualization; CD and KY contributed to methodology; HZ and FZ involved in formal analysis and investigation; CD and KY involved in writing—original draft preparation; HZ, DW, FZ, and HX involved in writing—review and editing; CD and DW involved in funding acquisition; CD and DW contributed to resources; HZ, DW, FZ, and HX involved in supervision.

Corresponding authors

Correspondence to Kaichun Yao or Deqing Wang.

Ethics declarations

Conflit of interests

Not applicable.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, C., Yao, K., Zhu, H. et al. Mining technology trends in scientific publications: a graph propagated neural topic modeling approach. Knowl Inf Syst 66, 3085–3114 (2024). https://doi.org/10.1007/s10115-023-02005-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02005-2

Keywords

Navigation