IPC Selection Using Collection Selection Algorithms

  • Anastasia Giachanou
  • Michail Salampasis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8849)


In this paper we view the automated selection of patent classification codes as a collection selection problem that can be addressed using existing methods which we extend and adapt for the patent domain. Our work exploits the manually assigned International Patent Classification (IPC) codes of patent documents to cluster, distribute and index patents through hundreds or thousands of sub-collections. We examine different collection selection methods (CORI, Bordafuse, ReciRank and multilayer) and compare their effectiveness in selecting relevant IPCs. The multilayer method, in addition to utilizing the topical relevance of IPCs at a specific level (e.g. sub-class), exploits the topical relevance of their ancestors in the IPC hierarchy and aggregates those multiple estimations of relevance to a single estimation. The results show that multilayer outperforms CORI and fusion-based methods in the task of IPC suggestion.


IPC suggestion collection selection methods IPC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adams, S.: Using the International Patent Classification in an online environment. World Pat. Inf. 22(4), 291–300 (2000)CrossRefGoogle Scholar
  2. 2.
    Aslam, J.A., Montague, M.: Models for meta search. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284. ACM, New York (2001)Google Scholar
  3. 3.
    Cai, L., Hofmann, T.: Hierarchical Document Categorization with Support Vector Machines. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM, New York (2004)Google Scholar
  4. 4.
    Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)CrossRefGoogle Scholar
  5. 5.
    Callan, J., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM, New York (1995)Google Scholar
  6. 6.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 307–318. ACM, New York (1998)CrossRefGoogle Scholar
  7. 7.
    Chen, Y.-L., Chang, Y.-C.: A three-phase method for patent classification. Inf. Process. Manag. 48(6), 1017–1030 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    D’hondt, E., Verberne, S., Koster, C.H.A., Boves, L.: Text Representations for Patent Classification. Comput. Linguist. 39(3), 755–775 (2013)CrossRefGoogle Scholar
  9. 9.
    Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G., Torcsvari, A.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)CrossRefGoogle Scholar
  10. 10.
    French, J.C., Powell, A.L., Callan, J., Viles, C.L., Emmit, T., Prey, K.J., Mon, Y.: Comparing the performance of database selection algorithms. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 238–245. ACM Press (1999)Google Scholar
  11. 11.
    Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Trans. Inf. Syst. 17(3), 229–249 (1999)CrossRefGoogle Scholar
  12. 12.
    Gey, F., Buckland, M., Chen, A., Larson, R.: Entry Vocabulary – a Technology to Enhance Digital Search. In: Proccedings of the 1st International Conference on Human Language Technology, pp. 91–95 (2001)Google Scholar
  13. 13.
    Giachanou, A., Salampasis, M., Paltoglou, G.: Multilayer Collection Selection and Search of Topically Organized Patents. Integrating IR Technologies for Professional Search (2013)Google Scholar
  14. 14.
    Giachanou, A., Salampasis, M., Satratzemi, M., Samaras, N.: Report on the CLEF-IP 2013 Experiments: Multilayer Collection Selection on Topically Organized Patents. CLEF (Online Working Notes/Labs/Workshop) (2013)Google Scholar
  15. 15.
    Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neural Networks 11(3), 574–585 (2000)CrossRefGoogle Scholar
  16. 16.
    Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ECIR 2010 large scale hierarchical classification workshop. ACM SIGIR Forum 44(1), 23–52 (2010)Google Scholar
  17. 17.
    Larkey, L.S.: A patent search and classification system. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 179–187. ACM, New York (1999)CrossRefGoogle Scholar
  18. 18.
    Larkey, L.S.: Some issues in the automatic classification of US patents. Working Notes for the Workshop on Learning for Text Categorization, Madison, Wisconsin (1998)Google Scholar
  19. 19.
    Lupu, M., Hanbury, A.: Patent Retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)CrossRefGoogle Scholar
  20. 20.
    Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  21. 21.
    Paltoglou, G., Salampasis, M., Satratzemi, M.: A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase. Inf. Process. Manag. 44(4), 1580–1599 (2008)CrossRefGoogle Scholar
  22. 22.
    Paltoglou, G., Salampasis, M., Satratzemi, M.: Modeling information sources as integrals for effective and efficient source selection. Inf. Process. Manag. 47(1), 18–36 (2011)CrossRefGoogle Scholar
  23. 23.
    Paltoglou, G., Salampasis, M., Satratzemi, M.: Simple Adaptations of Data Fusion Algorithms for Source Selection. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 497–508. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  24. 24.
    Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21(4), 412–456 (2003)CrossRefGoogle Scholar
  25. 25.
    Salampasis, M., Paltoglou, G., Giahanou, A.: Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF (Online Working Notes/Labs/Workshop) (2012)Google Scholar
  26. 26.
    Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. 21(4), 457–491 (2003)CrossRefGoogle Scholar
  27. 27.
    Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 391–397. ACM Press (2002)Google Scholar
  28. 28.
    Tikk, D., Biró, G., Törcsvári, A.: A hierarchical online classifier for patent categorization. In: do Prado, H.A., Ferneda, E. (eds.) Emerging Technologies of Text Mining. IGI Global (2007)Google Scholar
  29. 29.
    Vijvers, W.G.W.: The international patent classification as a search tool. World Pat. Inf. 12(1), 26–30 (1990)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Anastasia Giachanou
    • 1
  • Michail Salampasis
    • 2
  1. 1.Faculty of InformaticsUniversity of LuganoLuganoSwitzerland
  2. 2.Department of InformaticsAlexander Technological Educational Institute of ThessalonikiThessalonikiGreece

Personalised recommendations