Skip to main content

A Multi-task Approach to Neural Multi-label Hierarchical Patent Classification Using Transformers

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12656))

Included in the following conference series:

Abstract

With the aim of facilitating internal processes as well as search applications, patent offices categorize documents into taxonomies such as the Cooperative Patent Categorization. This task corresponds to a multi-label hierarchical text classification problem. Recent approaches based on pre-trained neural language models have shown promising performance by focusing on leaf-level label prediction. Prior works using intrinsically hierarchical algorithms, which learn a separate classifier for each node in the hierarchy, have also demonstrated their effectiveness despite being based on symbolic feature inventories. However, training one transformer-based classifier per node is computationally infeasible due to memory constraints. In this work, we propose a Transformer-based Multi-task Model (TMM) overcoming this limitation. Using a multi-task setup and sharing a single underlying language model, we train one classifier per node. To the best of our knowledge, our work constitutes the first approach to patent classification combining transformers and hierarchical algorithms. We outperform several non-neural and neural baselines on the WIPO-alpha dataset as well as on a new dataset of 70k patents, which we publish along with this work. Our analysis reveals that our approach achieves much higher recall while keeping precision high. Strong increases on macro-average scores demonstrate that our model also performs much better for infrequent labels. An extended version of the model with additional connections reflecting the label taxonomy results in a further increase of recall especially at the lower levels of the hierarchy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/boschresearch/hierarchical_patent_classification_ecir2021.

  2. 2.

    https://www.patentsview.org/download.

  3. 3.

    https://www.wipo.int/classifications/ipc/en/ITsupport/Categorization/dataset/.

  4. 4.

    See WIPO-alpha readme and personal correspondence with authors.

  5. 5.

    https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html.

  6. 6.

    https://www.tensorflow.org.

  7. 7.

    https://dublin.zhaw.ch/~benf/HPC.

  8. 8.

    https://github.com/globality-corp/sklearn-hierarchical-classification.

  9. 9.

    We double-checked the surprisingly low macro-scores of HARNN-orig and decided to present results of HARNN tuned for macro-performance as well.

References

  1. Abdelgawad, L., Kluegl, P., Genc, E., Falkner, S., Hutter, F.: Optimizing neural networks for patent classification. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11908, pp. 688–703. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46133-1_41

    Chapter  Google Scholar 

  2. Aggarwal, C.C., Zhai, C.: Mining Text Data, chap. A Survey of Text Classification Algorithms, pp. 163–222. Springer (2012). https://doi.org/10.1007/978-1-4614-3223-4_6

  3. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3615–3620. Association for Computational Linguistics, November 2019

    Google Scholar 

  4. Benites, F.: TwistBytes - hierarchical Classification at GermEval 2019: walking the fine line (of recall and precision). In: Proceedings of the 15th Conference on Natural Language Processing (KONVENS). Erlangen, Germany, October 2019

    Google Scholar 

  5. Benites, F., Malmasi, S., Zampieri, M.: Classifying patent applications with ensemble methods. In: Proceedings of the Australasian Language Technology Association Workshop 2018, pp. 89–92, Dunedin, New Zealand (2018)

    Google Scholar 

  6. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  7. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ICML 2006, Association for Computing Machinery, New York, NY, USA (2006)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)

    Google Scholar 

  9. D’hondt, E., Verberne, S., Oostdijk, N., Beney, J., Koster, C., Boves, L.: Dealing with temporal variation in patent categorization: Inf. Retrieval 17, 520–544 (2014)

    Google Scholar 

  10. Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)

    Article  Google Scholar 

  11. Gomez, J.C., Moens, M.-F.: A survey of automated hierarchical classification of patents. In: Paltoglou, G., Loizides, F., Hansen, P. (eds.) Professional Search in the Modern World. LNCS, vol. 8830, pp. 215–249. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12511-4_11

    Chapter  Google Scholar 

  12. Hepburn, J.: Universal language model fine-tuning for patent classification. In: Proceedings of the Australasian Language Technology Association Workshop 2018, pp. 93–96, Dunedin, New Zealand (2018)

    Google Scholar 

  13. Howard, J., Ruder, S.: Universal Language Model Fine-tuning for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018)

    Google Scholar 

  14. Huang, W., et al.: Hierarchical multi-label text classification: an attention-based recurrent network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1051–1060 (2019)

    Google Scholar 

  15. Jalan, R., Gupta, M., Varma, V.: Medical forum question classification using deep learning. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 45–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_4

    Chapter  Google Scholar 

  16. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, MD, USA, pp. 655–665 (2014)

    Google Scholar 

  17. Kang, D.M., Lee, C.C., Lee, S., Lee, W.: Patent prior art search using deep learning language model. In: Proceedings of the 24th Symposium on International Database Engineering & Applications. IDEAS 2020, Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  18. Kim, Y.: Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  19. Kiritchenko, S., Matwin, S., Famili, A.F.: Functional annotation of genes using hierarchical text categorization. In: Proceedings of BioLINK SIG: Linking Literature, Information and Knowledge for Biology (2005)

    Google Scholar 

  20. Kowsari, K., Brown, D.E., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., Barnes, L.E.: Hdltex: hierarchical deep learning for text classification. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 364–371. IEEE (2017)

    Google Scholar 

  21. Lee, J.S., Hsiang, J.: PatentBERT: patent classification with fine-tuning a pre-trained BERT model. World Patent Inf. 61, 101965 (2020)

    Google Scholar 

  22. Li, S., Hu, J., Cui, Y., Hu, J.: DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics 117(2), 721–744 (2018)

    Article  Google Scholar 

  23. Lu, Z., Du, P., Nie, J.-Y.: VGCN-BERT: augmenting BERT with graph embedding for text classification. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 369–382. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_25

    Chapter  Google Scholar 

  24. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised hierarchical text classification. Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, 6826–6833 (2019)

    Google Scholar 

  25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)

    Google Scholar 

  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  27. Mollá, D., Seneviratne, D.: Overview of the 2018 ALTA shared task: classifying patent applications. In: Proceedings of the Australasian Language Technology Association Workshop 2018, Dunedin, New Zealand, pp. 84–88. (2018)

    Google Scholar 

  28. Nanba, H., Kamaya, H., Takezawa, T., Okumura, M., Shinmori, A., Tanigawa, H.: Automatic translation of scholarly terms into patent terms. In: Proceedings of the 2nd International Workshop on Patent Information Retrieval, pp. 21–24. PaIR 2009, Association for Computing Machinery, New York, NY, USA (2009)

    Google Scholar 

  29. Peng, H., et al.: Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018 World Wide Web Conference, pp. 1063–1072 (2018)

    Google Scholar 

  30. Piroi, F., Hanbury, A.: Multilingual patent text retrieval evaluation: CLEF–IP. Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 365–387. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_15

    Chapter  Google Scholar 

  31. Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)

    Google Scholar 

  32. Risch, J., Garda, S., Krestel, R.: Hierarchical document classification as a sequence generation task. In: Proceedings of the Joint Conference on Digital Libraries (JCDL), pp. 147–155 (2020)

    Google Scholar 

  33. Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in bertology: What we know about how bert works. arXiv preprint arXiv:2002.12327 (2020)

  34. Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining Knowl. Discov. 22, 31–72 (2010)

    Article  MathSciNet  Google Scholar 

  35. Tang, P., Jiang, M., Xia, B.N., Pitera, J.W., Welser, J., Chawla, N.V.: Multi-label patent categorization with non-local attention-based graph convolutional network. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020) (2020)

    Google Scholar 

  36. Tikk, D., Biro, G.: Experiment with a hierarchical text categorization method on the wipo-alpha patent collection. In: Fourth International Symposium on Uncertainty Modeling and Analysis (ISUMA 2003), pp. 104–109 (2003)

    Google Scholar 

  37. Wang, P., Fan, Y., Niu, S., Yang, Z., Zhang, Y., Guo, J.: Hierarchical matching network for crime classification. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), pp. 325–334. ACM (2019)

    Google Scholar 

  38. Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: Proceedings of the 35th International Conference on Machine Learning (ICML 2018), pp. 5075–5084 (2018)

    Google Scholar 

  39. Wolf, T., et al.: Huggingface’s transformers: State-of-the-art natural language processing. ArXiv abs/1910.03771 (2019)

    Google Scholar 

  40. Yao, L., Mao, C., Luo, Y.: Graph Convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2019), pp. 7370–7377 (2019)

    Google Scholar 

Download references

Acknowledgements

We thank Mark Giereth and Jona Ruthardt for the fruitful discussions. We are grateful to Patrick Fievet for his support with the WIPO-alpha dataset. We also thank Alexander Müller for sharing his ideas on patent classification, and Lukas Lange, Trung-Kien Tran and the anonymous reviewers for their comments on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhash Chandra Pujari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pujari, S.C., Friedrich, A., Strötgen, J. (2021). A Multi-task Approach to Neural Multi-label Hierarchical Patent Classification Using Transformers. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72113-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72112-1

  • Online ISBN: 978-3-030-72113-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics