Skip to main content
Log in

Doc2vec-based link prediction approach using SAO structures: application to patent network

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

As the amount of documents has exploded in the Internet era, many researchers have tried to understand the relationships between documents and predict the links between similar but unconnected documents. However, existing link prediction techniques that use the predefined links of documents might provide incorrect results, because of the generic problem of citation analysis. Moreover, they may fail to reflect important contents of documents in the link prediction process. Thus, we propose a new link prediction approach that employs the Doc2vec algorithm, a document-embedding method, in order to predict potential links between documents, by reflecting the functional context of technological words. For this, first, we collected both citation information and documents of patents of interest, and generated a patent network by using the citation relationship between patents. Second, we identified unconnected links between nodes and transformed the patent document into document vectors, based on the Doc2vec algorithm. In particular, since patent documents include useful functions for solving technological problems, the proposed approach extracts subject-action-object (SAO) structures that we used to generate document vectors. Then, we calculated the similarity between patents in the unconnected links of a patent network, and could predict potential links by using the similarity. Third, we validated the results of the proposed approach by comparing them using the Adamic–Adar technique, one of the traditional link prediction techniques, and word vector-based link prediction. We applied the Doc2vec-based link prediction approach to a real case, the unmanned aerial vehicle (UAV) technology field. We found that the proposed approach makes better predictions performance than the Adamic–Adar technique and the word vector approach. Our results can help analyzers accurately forecast future relationships between nodes in a network, and give R&D managers insightful information on the future direction of technological development by using a patent network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Article  Google Scholar 

  • Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security.

  • Behrouzi, S., Sarmoor, Z. S., Hajsadeghi, K., & Kavousi, K. (2020). Predicting scientific research trends based on link prediction in keyword networks. Journal of Informetrics, 14(4), 101079.

    Article  Google Scholar 

  • Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740–750).

  • Chen, H., Li, X., & Huang, Z. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05) (pp. 141–142).

  • Dai, A. M., Olah, C., & Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv preprint http://arxiv.org/abs/arXiv:1507.07998.

  • Getoor, L. (2003). Link mining: A new data mining challenge. ACM SIGKDD Explorations Newsletter, 5(1), 84–89.

    Article  Google Scholar 

  • Getoor, L., & Diehl, C. P. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter, 7(2), 3–12.

    Article  Google Scholar 

  • Goldberg, Y., & Levy, O. (2014). word2vec Explained: Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint http://arxiv.org/abs/arXiv:1402.3722.

  • Guo, J., Wang, X., Li, Q., & Zhu, D. (2016). Subject–action–object-based morphology analysis for determining the direction of technological change. Technological Forecasting and Social Change, 105, 27–40.

    Article  Google Scholar 

  • Hopcroft, J., Lou, T., & Tang, J. (2011). Who will follow you back?: Reciprocal relationship prediction. Proceedings of the 20th ACM international conference on Information and knowledge management, ACM (2011), pp. 1137–1146.

  • Huang, Z., Chen, H., & Zeng, D. (2004). Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 22(1), 116–142.

    Article  Google Scholar 

  • Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers,1, 873–882.

  • Jeong, B., Ko, N., Son, C., & Yoon, J. (2021). Trademark-based framework to uncover business diversification opportunities: Application of deep link prediction and competitive intelligence analysis. Computers in Industry, 124, 103356.

    Article  Google Scholar 

  • Kroeger P. R., Analyzing grammar: An introduction. Cambridge University Press, 2005.

  • Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint http://arxiv.org/abs/arXiv:1607.05368.

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196).

  • Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177–2185).

  • Li, S., Chua, T. S., Zhu, J., & Miao, C. (2016). Generative topic embedding: A continuous representation of documents. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 666–675).

  • Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Article  Google Scholar 

  • Liu, Y., Liu, Z., Chua, T. S. & Sun, M. (2015). Topical word embeddings. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

  • Liu, W., & Lü, L. (2010). Link prediction based on local random walk. EPL (europhysics Letters), 89(5), 58007.

    Article  Google Scholar 

  • Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica a: Statistical Mechanics and Its Applications, 390(6), 1150–1170.

    Article  Google Scholar 

  • Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55–60).

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint http://arxiv.org/abs/arXiv:1301.3781.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

  • Moehrle, M. G., Walter, L., Geritz, A., & Muller, S. (2005). Patent-based inventor profiles as a basis for human resource decisions in research and development. R&D Management, 35(5), 513–524.

    Article  Google Scholar 

  • Pavlov, M., & Ichise, R. (2007). Finding experts by link prediction in co-authorship networks. FEWS, 290, 42–55.

    Google Scholar 

  • Popescul, A., & Ungar, L. H. (2003, August). Statistical relational learning for link prediction. In IJCAI workshop on learning statistical models from relational data (Vol. 2003).

  • Rajbabu, K., Srinivas, H., & Sudha, S. (2018). Industrial information extraction through multi-phase classification using ontology for unstructured documents. Computers in Industry, 100, 137–147.

    Article  Google Scholar 

  • Rong, X. (2014). word2vec parameter learning explained. arXiv preprint http://arxiv.org/abs/arXiv:1411.2738.

  • Sun H. L., Ch’ng E., Yong X., Garibaldi J. M., See S., Chen D.-B. (2017). An improved game-theoretic approach to uncover overlapping communities International Journal of Modern Physics C, 28 (9), 1750112.

  • Tang, J., Wu, S., Sun, J., & Su. H. (2012). Cross-domain collaboration recommendation. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1285–129.

  • Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., & Qin, B. (2014). Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1555–1565).

  • Tang, J., Qu, M., & Mei, Q. (2015, August). Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1165–1174). ACM.

  • Taskar, B., Wong, M. F., Abbeel, P., & Koller, D. (2004). Link prediction in relational data. In Advances in neural information processing systems (pp. 659–666).

  • Toutanova, K., & Manning, C. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT Conference EMNLP/VLC (pp. 63–71).

  • Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384–394). Association for Computational Linguistics.

  • Wu, J., Zhang, G., & Ren, Y. (2017). A balanced modularity maximization link prediction model in social networks. Information Processing & Management, 53(1), 295–307.

    Article  Google Scholar 

  • Xie, Q., Zhang, X., Ding, Y., & Song, M. (2020). Monolingual and multilingual topic analysis using LDA and BERT embeddings. Journal of Informetrics, 14(3), 101055.

    Article  Google Scholar 

  • Zhang, Y., Lu, J., Liu, F., Liu, Q., Porter, A., Chen, H., & Zhang, G. (2018). Does deep learning help topic extraction? A kernel k-means clustering method with word embedding. Journal of Informetrics, 12(4), 1099–1117.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT under Grant NRF-2017R1D1A1B03036213.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyeonju Seol.

Appendices

Appendix 1: Code for Doc2vec

figure afigure a

Appendix 2: Searching query for UAV technology

figure c

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoon, B., Kim, S., Kim, S. et al. Doc2vec-based link prediction approach using SAO structures: application to patent network. Scientometrics 127, 5385–5414 (2022). https://doi.org/10.1007/s11192-021-04187-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04187-4

Keywords

Navigation