Chapter

Current Challenges in Patent Information Retrieval

Volume 37 of the series The Information Retrieval Series pp 299-324

Date:

Patent Classification on Subgroup Level Using Balanced Winnow

  • Eva D’hondtAffiliated withRadboud University NijmegenLaboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur Email author 
  • , Suzan VerberneAffiliated withRadboud University Nijmegen
  • , Nelleke OostdijkAffiliated withRadboud University Nijmegen
  • , Lou BovesAffiliated withRadboud University Nijmegen

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In the past decade research into automated patent classification has mainly focused on the higher levels of International Patent Classification (IPC) hierarchy. The patent community has expressed a need for more precise classification to better aid current pre-classification and retrieval efforts (Benzineb and Guyot, Current challenges in patent information retrieval. Springer, New York, pp 239–261, 2011). In this chapter we investigate the three main difficulties associated with automated classification on the lowest level in the IPC, i.e. subgroup level. In an effort to improve classification accuracy on this level, we (1) compare flat classification with a two-step hierarchical system which models the IPC hierarchy and (2) examine the impact of combining unigrams with PoS-filtered skipgrams on both the subclass and subgroup levels. We present experiments on English patent abstracts from the well-known WIPO-alpha benchmark data set, as well as from the more realistic CLEF-IP 2010 data set. We find that the flat and hierarchical classification approaches achieve similar performance on a small data set but that the latter is much more feasible under real-life conditions. Additionally, we find that combining unigram and skipgram features leads to similar and highly significant improvements in classification performance (over unigram-only features) on both the subclass and subgroup levels, but only if sufficient training data is available.