Technology Analysis from Patent Data Using Latent Dirichlet Allocation

  • Gabjo Kim
  • Sangsung Park
  • Dongsik Jang
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 271)


This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.


latent Dirchlet allocation topic model text mining tf-idf 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lee, S.J., Yoon, B.Y., Park, Y.T.: An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation 29, 483–484 (2009)CrossRefGoogle Scholar
  2. 2.
    Tseng, Y.H., Lin, C.J., Lin, Y.I.: Text mining techniques for patent analysis. Information Processing & Management 43, 1216–1247 (2007)CrossRefGoogle Scholar
  3. 3.
    Jun, S.H., Park, S.S., Jang, D.S.: Technology forecasting using matrix map and patent clustering. Industrial Management & Data Systems 112(5), 786–807 (2012)CrossRefGoogle Scholar
  4. 4.
    Yoon, B.U., Yoon, C.B., Park, Y.T.: On the development and application of a self-organizing feature map-based patent map. R&D Management 32(4), 291–300 (2002)CrossRefGoogle Scholar
  5. 5.
    Noh, T.G., Park, S.B., Lee, S.J.: A Semantic Representation Based-on Term Co-occurrence Network and Graph Kernel. International Journal of Fuzzy Logic and Intelligent Systems 11(4) (2011)Google Scholar
  6. 6.
    Blei, D.M., Lafferty, J.D.: Dynamic Topic Models. In: 23rd International Conference on Machine Learning, Pittsburgh, PA (2006)Google Scholar
  7. 7.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)CrossRefGoogle Scholar
  8. 8.
    Uhm, D., Jun, S., Lee, S.J.Google Scholar
  9. 9.
    Cho, J.H., Lee, D.J., Park, J.I., Chun, M.G.: Hybrid Feature Selection Using Genetic Algorithm and Information Theory. International Journal of Fuzzy Logic and Intelligent Systems 13(1) (2013)Google Scholar
  10. 10.
    Blei, D.V., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  11. 11.
    Steyvers, M., Griffiths, T.: Probabilistic topic modelsGoogle Scholar
  12. 12.
    Grun, B., Hornik, K.: topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40(13) (2011)Google Scholar
  13. 13.
    Simpson, M.M.: Climate Change Technology Initiative (CCTI): Research, Technology, and Related Program. CRS Report for Congress (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gabjo Kim
    • 1
    • 2
  • Sangsung Park
    • 2
  • Dongsik Jang
    • 1
  1. 1.Division of Industrial Management EngineeringKorea UniversitySeoulKorea
  2. 2.Graduate School of Management of TechnologyKorea UniversitySeoulKorea

Personalised recommendations