Exploiting Co-Occurrence of Low Frequent Terms in Patents

  • Akmal Saeed Khattak
  • Gerhard Heyer
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 242)


This paper investigates the role of co-occurrence of low frequent terms in patent classification. A comparison is made between indexing, weighting single term features and multi-term features based on low frequent terms. Three datasets are used for experimentation. An increase of almost 21 percent in classification accuracy is observed through experimentation when multi-term features based on low frequent terms in patents are considered as compared to when all word types are considered.


patent classification co-occurrence multi-term features 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)CrossRefGoogle Scholar
  2. 2.
    Bashir, S., Baig, A.R.: Ramp: high performance frequent itemset mining with efficient bit-vector projection technique. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 504–508. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1), 37–78 (2007)CrossRefGoogle Scholar
  4. 4.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. ACM SIGMOD Record 27(2), 307–318 (1998)CrossRefGoogle Scholar
  5. 5.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)Google Scholar
  6. 6.
    Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263. ACM, New York (2000)CrossRefGoogle Scholar
  7. 7.
    Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (2003)CrossRefGoogle Scholar
  8. 8.
    Han, E.H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)CrossRefGoogle Scholar
  10. 10.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Khattak, A.S., Heyer, G.: Significance of low frequent terms in patent classification using ipc hierarchy. In: Eichler, G., Küpper, A., Schau, V., Fouchal, H., Unger, H. (eds.) Proceedings of the 11th International Conference on Innovative Internet Community Services (IICS 2011). Lecture Notes in Informatics, vol. P-186, pp. 239–250. Gesellschaft für Informatik (2011)Google Scholar
  12. 12.
    Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries (DL 1999), pp. 179–187. ACM, New York (1999)CrossRefGoogle Scholar
  13. 13.
    Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  14. 14.
    Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)Google Scholar
  15. 15.
    Lupu, M., Huang, J., Zhu, J., Tait, J.: TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2), 63–70 (2009)CrossRefGoogle Scholar
  16. 16.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  17. 17.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  18. 18.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  19. 19.
    Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press, Southampton (2005)CrossRefGoogle Scholar
  20. 20.
    Tikk, D., Biró, G., Yang, J.D.: Experiments with a hierarchical text categorization method on wipo patent collections. In: Applied Research in Uncertainty Modelling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)Google Scholar
  21. 21.
    Vapnik, V.N.: The nature of statistical learning theory, 1st edn. Springer, New York (1995)CrossRefMATHGoogle Scholar
  22. 22.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1-2), 69–90 (1999)CrossRefGoogle Scholar
  23. 23.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49. ACM Press (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Computer Science, Faculty of Mathematics and Computer ScienceNatural Language Processing Research GroupLeipzigGermany

Personalised recommendations