A Workflow-Based Large-Scale Patent Mining and Analytics Framework

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 920)


The analysis of large volumes and complex scientific information such as patents requires new methods and a flexible, highly interactive and easy-to-use platform in order to enable a variety of applications ranging from information search, semantic analysis to specific text- and data mining tasks for information professionals in industry and research. In this paper, we present a scalable patent analytics framework built on top of a big-data architecture and a scientific workflow system. The framework allows to seamlessly integrate essential services for patent analysis employing natural language processing as well as machine learning algorithms for deeply structuring and semantically annotating patent texts for realizing complex scientific workflows. In two case studies we will show how the framework can be utilized for querying, annotating and analyzing large amounts of patent data.


Patent analysis Text- and data mining Big data analytics Visual workflow systems 


  1. 1.
  2. 2.
    Yoon, J., Kim, K.: TrendPerceptor: a property function based technology intelligence system for identifying technology trends from patents. Expert Syst. Appl. 39(3), 2927–2938 (2012)CrossRefGoogle Scholar
  3. 3.
    Choi, S., Park, H., Kang, D., Lee, J.Y., Kim, K.: An SAO based text mining approach to building a technology tree for technology planning. Expert. Syst. Appl. 39(13), 11443–11455 (2012)CrossRefGoogle Scholar
  4. 4.
    Trappey, C.V., Wu, H.Y., Taghaboni-Dutta, F., Trappey, A.J.C.: Using patent data for technology forecasting: China RFID patent analysis. Adv. Eng. Inform. 25(1), 53–64 (2011)CrossRefGoogle Scholar
  5. 5.
    Daim, T.U., Gomez, F.A., Martin, H., Sheikh, N.: Technology roadmap development process (TRDP) in the medical electronic device industry. Int. J. Bus. Innov. Res. 7(2), 228–263 (2013)CrossRefGoogle Scholar
  6. 6.
    Lee, Y., Kim, S., Shin, J.: Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis. Scientometrics 100(1), 227–244 (2014)CrossRefGoogle Scholar
  7. 7.
    Abbas, A., Zhang, L., Khan, S.U.: A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37, 3–13 (2014)CrossRefGoogle Scholar
  8. 8.
    Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., Hu, J.: Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20, 104 (2018)CrossRefGoogle Scholar
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA (2013)Google Scholar
  10. 10.
    Beltz, H., Fueloep, A., Wadhwa, R.R., Erdi, P.: From ranking and clustering of evolving networks to patent citation analysis. In: 2017 International Joint Conference on Neural Networks (IJCNN), vol. 350. IEEE (2017)Google Scholar
  11. 11.
    Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert. Syst. Appl. 41(7), 3204–3212 (2014)CrossRefGoogle Scholar
  12. 12.
    Du, R., Drake, B., Park, H.: Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization, arXiv preprint arXiv:1703.09646
  13. 13.
    Seo, W., Kim, N., Choi, S.: Big data framework for analyzing patents to support strategic R&D planning (2016)Google Scholar
  14. 14.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004 (2004)Google Scholar
  15. 15.
    Tseng, Y., Lin, C., Lin, Y.: Text mining techniques for patent analysis. Inf. Process. Manag. 43(5), 1216–1247 (2007)CrossRefGoogle Scholar
  16. 16.
    Sofean, M.: Automatic segmentation of big data of patent texts. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 343–351. Springer, Cham (2017). Scholar
  17. 17.
    Hackl-Sommer, R., Schwantner, M.: Patent claim structure recognition. Arch. Data Sci. Ser. A 2(1), 15 (2017)Google Scholar
  18. 18.
    Aras, H., Hackl-Sommer, R., Schwantner, M., Sofean, M.: Applications and challenges of text mining with patents. In: IPaMin@KONVENS (2014)Google Scholar
  19. 19.
    Vazquez, M., Krallinger, M., Leitner, F., Valencia, A.: Text mining for drugs and chemical compounds: methods, tools and applications. Mol. Inform. 30, 506–519 (2011)CrossRefGoogle Scholar
  20. 20.
    Matos, P., Alcaentara, R., Dekker, A., Ennis, M., Steinbeck, C.: Chemical entities of biological interest: an update. Nucleic Acids Res. 38, D249–D254 (2010)CrossRefGoogle Scholar
  21. 21.
    Trippe, A.: Guidelines for Preparing Patent Landscape Reports. Patinformatics, LLC, With contributions from WIPO Secretariat (2015)Google Scholar
  22. 22.
    Waltman, L., van Eck, N.J., Noyons, E.C.: A unified approach to mapping and clustering of bibliometric networks. J. Inform. 4(4), 629–635 (2010)CrossRefGoogle Scholar
  23. 23.
    Tang, J., et al.: PatentMiner: topic-driven patent analysis and mining. In: KDD 2012 (2012)Google Scholar
  24. 24.
    Ankam, S., Dou, W., Strumsky, D., Zadrozny, W.: Exploring emerging technologies using patent data and patent classification. In: CHI 2012 (2012)Google Scholar
  25. 25.
    Chen, H., Zhang, Y., Zhang, G., Zhu, D., Lu, J.: Modeling technological topic changes in patent claims. In: Proceedings of PIC MET 2015 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.FIZ KarlsruheEggenstein-LeopoldshafenGermany

Personalised recommendations