Skip to main content

Exploiting Co-Occurrence of Low Frequent Terms in Patents

  • Conference paper
Man-Machine Interactions 3

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 242))

  • 1771 Accesses

Abstract

This paper investigates the role of co-occurrence of low frequent terms in patent classification. A comparison is made between indexing, weighting single term features and multi-term features based on low frequent terms. Three datasets are used for experimentation. An increase of almost 21 percent in classification accuracy is observed through experimentation when multi-term features based on low frequent terms in patents are considered as compared to when all word types are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)

    Article  Google Scholar 

  2. Bashir, S., Baig, A.R.: Ramp: high performance frequent itemset mining with efficient bit-vector projection technique. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 504–508. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1), 37–78 (2007)

    Article  Google Scholar 

  4. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. ACM SIGMOD Record 27(2), 307–318 (1998)

    Article  Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)

    Google Scholar 

  6. Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263. ACM, New York (2000)

    Chapter  Google Scholar 

  7. Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (2003)

    Article  Google Scholar 

  8. Han, E.H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)

    Article  Google Scholar 

  10. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Khattak, A.S., Heyer, G.: Significance of low frequent terms in patent classification using ipc hierarchy. In: Eichler, G., Küpper, A., Schau, V., Fouchal, H., Unger, H. (eds.) Proceedings of the 11th International Conference on Innovative Internet Community Services (IICS 2011). Lecture Notes in Informatics, vol. P-186, pp. 239–250. Gesellschaft für Informatik (2011)

    Google Scholar 

  12. Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries (DL 1999), pp. 179–187. ACM, New York (1999)

    Chapter  Google Scholar 

  13. Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)

    Google Scholar 

  15. Lupu, M., Huang, J., Zhu, J., Tait, J.: TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2), 63–70 (2009)

    Article  Google Scholar 

  16. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  17. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  18. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  19. Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press, Southampton (2005)

    Chapter  Google Scholar 

  20. Tikk, D., Biró, G., Yang, J.D.: Experiments with a hierarchical text categorization method on wipo patent collections. In: Applied Research in Uncertainty Modelling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)

    Google Scholar 

  21. Vapnik, V.N.: The nature of statistical learning theory, 1st edn. Springer, New York (1995)

    Book  MATH  Google Scholar 

  22. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1-2), 69–90 (1999)

    Article  Google Scholar 

  23. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49. ACM Press (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akmal Saeed Khattak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Khattak, A.S., Heyer, G. (2014). Exploiting Co-Occurrence of Low Frequent Terms in Patents. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02309-0_50

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02308-3

  • Online ISBN: 978-3-319-02309-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics