Supervised Approaches to Assign Cooperative Patent Classification (CPC) Codes to Patents

  • Tung Tran
  • Ramakanth KavuluruEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10682)


This paper re-introduces the problem of patent classification with respect to the new Cooperative Patent Classification (CPC) system. CPC has replaced the U.S. Patent Classification (USPC) coding system as the official patent classification system in 2013. We frame patent classification as a multi-label text classification problem in which the prediction for a test document is a set of labels and success is measured based on the micro-F1 measure. We propose a supervised classification system that exploits the hierarchical taxonomy of CPC as well as the citation records of a test patent; we also propose various label ranking and cut-off (calibration) methods as part of the system pipeline. To evaluate the system, we conducted experiments on U.S. patents released in 2010 and 2011 for over 600 labels that correspond to the “subclasses” at the third level in the CPC hierarchy. The best variant of our model achieves \(\approx \)70% in micro-F1 score and the results are statistically significant. To the best of our knowledge, this is the first effort to reinitiate the automated patent classification task under the new CPC coding scheme.



We thank anonymous reviewers for their honest and constructive comments that helped improve our paper’s presentation. Our work is primarily supported by the National Library of Medicine through grant R21LM012274. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.


  1. 1.
    Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data, vol. 53. Cambridge University Press, Cambridge (2013)CrossRefzbMATHGoogle Scholar
  2. 2.
    Don, S., Min, D.: Feature selection for automatic categorization of patent documents. Indian J. Sci. Technol. 9(37), 1–17 (2016). Kindly check and confirm the edit made in Ref. [2]CrossRefGoogle Scholar
  3. 3.
    Eisinger, D., Tsatsaronis, G., Bundschus, M., Wieneke, U., Schroeder, M.: Automated patent categorization and guided patent search using IPC as inspired by MeSH and PubMed. J. Biomed. Semant. 4(S1), 1–23 (2013)Google Scholar
  4. 4.
    Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (2003)CrossRefGoogle Scholar
  5. 5.
    Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. (CSUR) 47(3), 52 (2015)CrossRefGoogle Scholar
  6. 6.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J., et al.: A practical guide to support vector classification (2003).
  7. 7.
    Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015)CrossRefGoogle Scholar
  8. 8.
    Kim, J.-H., Choi, K.-S.: Patent document categorization based on semantic structural information. Inf. Process. Manag. 43(5), 1200–1215 (2007)CrossRefGoogle Scholar
  9. 9.
    Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques (2007)Google Scholar
  10. 10.
    Li, X., Chen, H., Zhang, Z., Li, J.: Automatic patent classification using citation network information: an experimental study in nanotechnology. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 419–427. ACM (2007)Google Scholar
  11. 11.
    Liu, D.-R., Shih, M.-J.: Hybrid-patent classification based on patent-network analysis. J. Assoc. Inf. Sci. Technol. 62(2), 246–256 (2011)CrossRefGoogle Scholar
  12. 12.
    Liu, K., Peng, S., Wu, J., Zhai, C., Mamitsuka, H., Zhu, S.: MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence. Bioinformatics 31(12), i339–i347 (2015)CrossRefGoogle Scholar
  13. 13.
    Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification — revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). Google Scholar
  14. 14.
    Richter, G., MacFarlane, A.: The impact of metadata on the accuracy of automated patent classification. World Pat. Inf. 27(1), 13–26 (2005)CrossRefGoogle Scholar
  15. 15.
    Rios, A., Kavuluru, R.: Analyzing the moving parts of a large-scale multi-label text classification pipeline: experiences in indexing biomedical articles. In: 2015 International Conference on Healthcare Informatics (ICHI), pp. 1–7. IEEE (2015)Google Scholar
  16. 16.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)zbMATHGoogle Scholar
  17. 17.
    Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Simmons, H.J.: Categorizing the useful arts: part, present, and future development of patent classification in the United States. Law Libr. J. 106, 563 (2014)Google Scholar
  19. 19.
    Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)CrossRefGoogle Scholar
  20. 20.
    Tong, X., Frame, J.D.: Measuring national technological performance with patent claims data. Res. Policy 23(2), 133–141 (1994)CrossRefGoogle Scholar
  21. 21.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2010). Google Scholar
  22. 22.
    U.S. Patent and Trademark Office: U.S. Patent Statistics Chart. (2016). Accessed 30 Nov 2016
  23. 23.
    U.S. Patent and Trademark Office and European Patent Office: Cooperative Patent Classification Scheme in Bulk. (2015). Accessed 01 Feb 2015
  24. 24.
    U.S. Patent and Trademark Office and European Patent Office: Guide to the CPC. (2015). Accessed 30 Nov 2016
  25. 25.
    Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 754–763. IEEE (2011)Google Scholar
  26. 26.
    Wolter, B.: It takes all kinds to make a world-some thoughts on the use of classification in patent searching. World Pat. Inf. 34(1), 8–18 (2012)CrossRefGoogle Scholar
  27. 27.
    World Intellectual Property Organization: Guide to the IPC. (2016). Accessed 30 Nov 2016
  28. 28.
    Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of KentuckyLexingtonUSA
  2. 2.Division of Biomedical Informatics, Department of Internal MedicineUniversity of KentuckyLexingtonUSA

Personalised recommendations