The Hybrid Method for Accurate Patent Classification

Yadrintsev, V. V.; Sochenkov, I. V.

doi:10.1134/S1995080219110325

The Hybrid Method for Accurate Patent Classification

Published: 27 November 2019

Volume 40, pages 1873–1880, (2019)
Cite this article

Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

V. V. Yadrintsev^1,2 &
I. V. Sochenkov^1,3

101 Accesses
2 Citations
Explore all metrics

Abstract

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

A detailed review on word embedding techniques with emphasis on word2vec

Article 03 October 2023

References

D. Eisinger, G. Tsatsaronis, M. Bundschus, U. Wieneke, and M. Schroeder, “Automated patent categorizationand guided patent search using IPC as inspired by MeSH and PubMed,” J. Biomed. Semant, BioMed Centr. 4, S3 (2013).
Article Google Scholar
V. Yadrintsev, A. Bakarov, R. Suvorov, and I. Sochenkov, “Fast and accurate patent classification in search engines,” J. Phys.: Conf. Ser. 1117, 012004 (2018).
Google Scholar
I. V. Sochenkov, D. V. Zubarev, and I. A. Tikhomirov, “Exploratory patent search,” Inform. Prilozh. 12, 89–94 (2018).
Google Scholar
A. Shvets, D. Devyatkin, I. Sochenkov, I. Tikhomirov, K. Popov, and K. Yarygin, “Detection of current research directions based on fulltext clustering,” in Proceedings of the 2015 Science and Information Conference (SAI), 2015, pp. 483–488.
Chapter Google Scholar
H. Schutze, C. D. Manning, and P. Raghavan, Introduction to Information Retrieval (Cambridge Univ. Press, Cambridge, 2008).
MATH Google Scholar
C. D. Manning and H. Schutze, Foundations of Statistical Natural language Processing (MIT press, Boston, MA, 1999).
MATH Google Scholar
K. V. Vorontsov, “Additive regularization for topic models of text collection,” Dokl. Akad. Nauk 89, 301–304 (2014).
MathSciNet MATH Google Scholar
I. Moloshnikov, A. Sboev, D. Gudovskikh, and R. Rybka, “An algorithm of finding thematically similar documents with creating context-semantic graph based on probabilistic-entropy approach,” Proc. Comput. Sci. 66, 297–306 (2015).
Article Google Scholar
M. Nokel and N. Loukachevitch, “Accounting ngrams and multiword terms can improve topic models,” in Proceedings of 12th Workshop on Multiword Expressions (MWE’2016) (ACM, Stroudsburg, PA, USA, 2016), pp. 44–49.
Chapter Google Scholar
T. Grainger, T. Potter, and Y. Seeley, Solr in action (Cherry Hill, Manning, 2014).
Google Scholar
P. Glauner, J. Iwaszkiewicz, J.-Y. Meur, and T. Simko, “Use of Solr and Xapian in the Invenio document repository software.” arXiv: 1310.0250 (2013).
Google Scholar
S. Ilyinsky, M. Kuzmin, A. Melkov, and I. Segalovich, “An efficient method to detect duplicates of Web documents with the use of inverted index,” in Proceedings of the 11th International World Wide Web Conference (WWW2002) (ACM, New York, 2002).
Google Scholar
M. S. Ageev and B. V. Dobrov, “An efficient nearest neighbours search algorithm for full-text documents,” Vestn. SPb. Univ., Prikl. Mat. Komp’yut. Nauki 3, 72–84 (2011) [in Russian].
Google Scholar
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).
Article Google Scholar
Logistic Regression. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Logistic-Regression.html. Accessed 2019.
Linear Support Vector Classification. https://scikit-learn.org/stable/modules/generated/sklearn.svm. Lin-earSVC.html. Accessed 2019.
One-vs-the-rest (OvR) Multiclass/Multilabel Strategy. https://scikit-learn.org/stable/modules/ gener-ated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass. Accessed 2019.
Russian Federal Institute of Industrial Property. http://fips.ru/. Accessed 2019.
M. Krier and F. Zacca, “Automatic categorisation applications at the European patent office,” World Patent Inform. 24, 187–196 (2002).
Article Google Scholar
C. J. Fall and K. Benzineb, “Literature survey: Issues to be considered in the automatic classification of patents,” World Intell. Property Organiz. 29 (2002).
C. J. Fall, A. Torcsvari, K. Benzineb, and G. Karetka, “Automated categorization in the international patent classification,” in Proceedings of the Acm Sigir Forum (ACM, 2003), Vol. 37, pp. 10–25.
Article Google Scholar
A. J. Trappey, F.-C. Hsu, C. V. Trappey, and C.-I. Lin, “Development of a patent document classification and search platform using a back-propagation network,” Expert Syst. Appl. 31, 755–765 (2006).
Article Google Scholar
F. Piroi, M. Lupu, A. Hanbury, A. P. Sexton, W. Magdy, and I. V. Filippov, “Clef-ip 2010: Retrieval experiments in the intellectual property domain,” in Proceedings of the CLEF: Notebook Papers, Labs, Workshops, 2010.
Google Scholar
S. Verberne and E. D’hondt, “Patent classification experiments with the linguistic classification system LCS in CLEF-IP 2011,” in Proceedings of the CLEF: Notebook Papers, Labs, Workshops, 2011.
Google Scholar
Y.-L. Chen and Y.-C. Chang, “A three-phase method for patent classification,” Inform. Process. Manage. 48, 1017–1030 (2012).
Article Google Scholar
E. D’hondt, S. Verberne, C. Koster, and L. Boves, “Text representations for patent classification,” Comput. Linguist. 39, 755–775 (2013).
Article Google Scholar
X. Zhang, “Interactive patent classification based on multi-classifier fusion and active learning,” Neurocomputing 127, 200–205 (2014).
Article Google Scholar
S. Arts, B. Cassiman, and J. C. Gomez, “Text matching to measure patent similarity,” Strateg. Manage. J. 39, 62–84 (2018).
Article Google Scholar

Download references

Acknowledgments

We are grateful to the reviewers for careful reading of the manuscript and helpful remarks.

Funding

This article presents the research results of the project “Text mining tools for big data” as a part of the program supporting Technical Leadership Centers of the National Technological Initiative “Center for Big Data Storage and Processing” at the Moscow State University (Agreement with Fund supporting the NTI-projects no. 13/1251/2018 11.12.2018). The reported study is partially funded by the Russian Foundation for Basic Research (project no. 16-29-12929) and with the support of the “RUDN University Program 5–100.”

Author information

Authors and Affiliations

Federal Research Center Computer Science and Control of the Russian Academy of Sciences, Moscow, 119333, Russia
V. V. Yadrintsev & I. V. Sochenkov
Peoples’ Friendship University of Russia (RUDN University), Moscow, 117198, Russia
V. V. Yadrintsev
Lomonosov Moscow State University, Moscow, 119991, Russia
I. V. Sochenkov

Authors

V. V. Yadrintsev
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Sochenkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to V. V. Yadrintsev or I. V. Sochenkov.

Additional information

Submitted by Vl. V. Voevodin

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yadrintsev, V.V., Sochenkov, I.V. The Hybrid Method for Accurate Patent Classification. Lobachevskii J Math 40, 1873–1880 (2019). https://doi.org/10.1134/S1995080219110325

Download citation

Received: 13 June 2019
Revised: 28 June 2019
Accepted: 14 July 2019
Published: 27 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1134/S1995080219110325

Keywords and phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Hybrid Method for Accurate Patent Classification

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

A detailed review on word embedding techniques with emphasis on word2vec

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords and phrases

Navigation

The Hybrid Method for Accurate Patent Classification

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

A detailed review on word embedding techniques with emphasis on word2vec

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

Search

Navigation