Skip to main content

Automatic Patents Classification Using Supervised Machine Learning

  • Conference paper
  • First Online:
Recent Advances on Soft Computing and Data Mining (SCDM 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 978))

Included in the following conference series:

Abstract

Every year, approximately one million patent documents are issued with unique patent number or symbol. In order to find the relevant patent document, several users query the IPC documents using IPC symbols. So, there is a need of automatic classification and ranking of patent documents w.r.t. user query. Automatic classification is only possible through supervised machine learning techniques. In this paper, we classified patent documents using common classifiers. We collected 1625 patent documents related to eight different classes taken from IPC website using web crawler in an unstructured text. We considered 90% of training and 10% of test samples of the total patents. We built a feature matrix using tf-idf, smart notations and BM25 weighting schemes. This feature matrix is given to each classifier as input and output of each classifier consists of correctly classified and incorrectly classified instances. Finally, we evaluated the accuracy of each classifier using precision, recall and F-measure. We performed comparative analysis of classifiers and observed that by adding more features to each classifier, accuracy of classifier can be improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hogenboom F, Frasincar F, Kaymak U, Jong F, De Caron E (2016) A survey of event extraction methods from the text for decision support systems. Decis Support Syst 85:12–22

    Article  Google Scholar 

  2. Narin F (1994) Patent bibliometrics. Scientometics 1(30):147–155

    Article  Google Scholar 

  3. Karki M (1997) Patent citation analysis: a policy analysis tool. World Patent Inf 19(4):269–272

    Article  MathSciNet  Google Scholar 

  4. Henry C, Stiglitz JE (2010) Intellectual property, dissemination of innovation and sustainable development. Global Policy 1(3):237–251

    Article  Google Scholar 

  5. Malerba F, Breschi S, Lissoni F (2003) Knowledge-relatedness in firm technological diversification. Res Policy 32:69–87

    Article  Google Scholar 

  6. Giovanna V, Cantwell J (2004) Historical evolution of technological diversification. Res Policy 33:511–529

    Article  Google Scholar 

  7. Abbas A, Zhang L, Khan SU (2014) A literature review on the state-of-the-art in patent analysis. World Patent Inf 37:3–13

    Article  Google Scholar 

  8. Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 216–232

    Article  Google Scholar 

  9. Zhao M, Zhou H, Long X, Zhang X, Lin J, Yacoub SM (2018) Document analysis for region classification. In: Book document analysis for region classification

    Google Scholar 

  10. Alba A, Coden AR, Drews C, Gruhl DF, Lewis NR, Mendes PN, Ramakrishnan C, Terdiman JF (eds) (2019) Segmenting and interpreting a document, and relocating document fragments to corresponding sections

    Google Scholar 

  11. Antonie M-L, Zaiane OR (2002) Text document categorization by term association. In: Proceedings of IEEE international conference on data mining, pp 19–26

    Google Scholar 

  12. Khatavkar V, Kulkarni P (2019) Trends in document analysis. In: Data management, analytics and innovation. Springer, Singapore, pp 249–262

    Google Scholar 

  13. Govindarajan UH, Trappey AJ, Trappey CV (2019) Intelligent collaborative patent mining using excessive topic generation. Adv Eng Inf 42:100955

    Article  Google Scholar 

  14. Raghavan VV, Wong SKM (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37(5):279–287

    Article  Google Scholar 

  15. McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceeding of the workshop on learning for text categorization, AAAI’98, Madison, WI, pp 41–48

    Google Scholar 

  16. Nigam K, McCallum AK, Thrun S, Mitchell T (1999) Text classification from labeled and unlabeled documents using EM. Mach Learn J 39(2):103–134

    MATH  Google Scholar 

  17. Agrawal R, Chakrabarti S, Dom B, Raghavan P (1998) Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Int J Very Large Data Bases 7(3):163–178

    Article  Google Scholar 

  18. Larkey LS (1999) A patent search and classification system. In: Proceedings of the fourth ACM conference on digital libraries, pp 179–183

    Google Scholar 

  19. Massey L (2003) On the quality ART1 text clustering. Neural Netw 16:771–778

    Article  Google Scholar 

  20. Selamat A, Omatu S (2004) Web page feature selection and classification using neural networks. Inf Sci 158:69–88

    Article  MathSciNet  Google Scholar 

  21. Lam W, Han Y (2003) Automatic textual document categorization based on generalized instance sets and a metamodel. IEEE Trans Pattern Anal Mach Intell 25:628–633

    Article  Google Scholar 

  22. Mostafa J, Lam W (2000) Automatic classification using supervised learning in a medical document filtering application. Inf Process Manage 36(3):415–444

    Article  Google Scholar 

  23. Van Rijsbergen CJ, Robertson SE, Porter MF (1980) New models in probabilistic information retrieval. British Library, London. (British Library Research and Development Report, No 5587

    Google Scholar 

  24. United States Patent and Trademark Office (USPTO), https://www.uspto.gov/patents-application-process/patent-search/classification-standards-and-development

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Faheem Mushtaq .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shahid, M., Ahmed, A., Mushtaq, M.F., Ullah, S., Matiullah, Akram, U. (2020). Automatic Patents Classification Using Supervised Machine Learning. In: Ghazali, R., Nawi, N., Deris, M., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2020. Advances in Intelligent Systems and Computing, vol 978. Springer, Cham. https://doi.org/10.1007/978-3-030-36056-6_29

Download citation

Publish with us

Policies and ethics