Abstract
Every year, approximately one million patent documents are issued with unique patent number or symbol. In order to find the relevant patent document, several users query the IPC documents using IPC symbols. So, there is a need of automatic classification and ranking of patent documents w.r.t. user query. Automatic classification is only possible through supervised machine learning techniques. In this paper, we classified patent documents using common classifiers. We collected 1625 patent documents related to eight different classes taken from IPC website using web crawler in an unstructured text. We considered 90% of training and 10% of test samples of the total patents. We built a feature matrix using tf-idf, smart notations and BM25 weighting schemes. This feature matrix is given to each classifier as input and output of each classifier consists of correctly classified and incorrectly classified instances. Finally, we evaluated the accuracy of each classifier using precision, recall and F-measure. We performed comparative analysis of classifiers and observed that by adding more features to each classifier, accuracy of classifier can be improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hogenboom F, Frasincar F, Kaymak U, Jong F, De Caron E (2016) A survey of event extraction methods from the text for decision support systems. Decis Support Syst 85:12–22
Narin F (1994) Patent bibliometrics. Scientometics 1(30):147–155
Karki M (1997) Patent citation analysis: a policy analysis tool. World Patent Inf 19(4):269–272
Henry C, Stiglitz JE (2010) Intellectual property, dissemination of innovation and sustainable development. Global Policy 1(3):237–251
Malerba F, Breschi S, Lissoni F (2003) Knowledge-relatedness in firm technological diversification. Res Policy 32:69–87
Giovanna V, Cantwell J (2004) Historical evolution of technological diversification. Res Policy 33:511–529
Abbas A, Zhang L, Khan SU (2014) A literature review on the state-of-the-art in patent analysis. World Patent Inf 37:3–13
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 216–232
Zhao M, Zhou H, Long X, Zhang X, Lin J, Yacoub SM (2018) Document analysis for region classification. In: Book document analysis for region classification
Alba A, Coden AR, Drews C, Gruhl DF, Lewis NR, Mendes PN, Ramakrishnan C, Terdiman JF (eds) (2019) Segmenting and interpreting a document, and relocating document fragments to corresponding sections
Antonie M-L, Zaiane OR (2002) Text document categorization by term association. In: Proceedings of IEEE international conference on data mining, pp 19–26
Khatavkar V, Kulkarni P (2019) Trends in document analysis. In: Data management, analytics and innovation. Springer, Singapore, pp 249–262
Govindarajan UH, Trappey AJ, Trappey CV (2019) Intelligent collaborative patent mining using excessive topic generation. Adv Eng Inf 42:100955
Raghavan VV, Wong SKM (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37(5):279–287
McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceeding of the workshop on learning for text categorization, AAAI’98, Madison, WI, pp 41–48
Nigam K, McCallum AK, Thrun S, Mitchell T (1999) Text classification from labeled and unlabeled documents using EM. Mach Learn J 39(2):103–134
Agrawal R, Chakrabarti S, Dom B, Raghavan P (1998) Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Int J Very Large Data Bases 7(3):163–178
Larkey LS (1999) A patent search and classification system. In: Proceedings of the fourth ACM conference on digital libraries, pp 179–183
Massey L (2003) On the quality ART1 text clustering. Neural Netw 16:771–778
Selamat A, Omatu S (2004) Web page feature selection and classification using neural networks. Inf Sci 158:69–88
Lam W, Han Y (2003) Automatic textual document categorization based on generalized instance sets and a metamodel. IEEE Trans Pattern Anal Mach Intell 25:628–633
Mostafa J, Lam W (2000) Automatic classification using supervised learning in a medical document filtering application. Inf Process Manage 36(3):415–444
Van Rijsbergen CJ, Robertson SE, Porter MF (1980) New models in probabilistic information retrieval. British Library, London. (British Library Research and Development Report, No 5587
United States Patent and Trademark Office (USPTO), https://www.uspto.gov/patents-application-process/patent-search/classification-standards-and-development
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Shahid, M., Ahmed, A., Mushtaq, M.F., Ullah, S., Matiullah, Akram, U. (2020). Automatic Patents Classification Using Supervised Machine Learning. In: Ghazali, R., Nawi, N., Deris, M., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2020. Advances in Intelligent Systems and Computing, vol 978. Springer, Cham. https://doi.org/10.1007/978-3-030-36056-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-36056-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36055-9
Online ISBN: 978-3-030-36056-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)