Exploiting Co-Occurrence of Low Frequent Terms in Patents

Khattak, Akmal Saeed; Heyer, Gerhard

doi:10.1007/978-3-319-02309-0_50

Akmal Saeed Khattak⁵ &
Gerhard Heyer⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 242))

1771 Accesses

Abstract

This paper investigates the role of co-occurrence of low frequent terms in patent classification. A comparison is made between indexing, weighting single term features and multi-term features based on low frequent terms. Three datasets are used for experimentation. An increase of almost 21 percent in classification accuracy is observed through experimentation when multi-term features based on low frequent terms in patents are considered as compared to when all word types are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)
Article Google Scholar
Bashir, S., Baig, A.R.: Ramp: high performance frequent itemset mining with efficient bit-vector projection technique. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 504–508. Springer, Heidelberg (2006)
Chapter Google Scholar
Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1), 37–78 (2007)
Article Google Scholar
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. ACM SIGMOD Record 27(2), 307–318 (1998)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)
Google Scholar
Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263. ACM, New York (2000)
Chapter Google Scholar
Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (2003)
Article Google Scholar
Han, E.H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Khattak, A.S., Heyer, G.: Significance of low frequent terms in patent classification using ipc hierarchy. In: Eichler, G., Küpper, A., Schau, V., Fouchal, H., Unger, H. (eds.) Proceedings of the 11th International Conference on Innovative Internet Community Services (IICS 2011). Lecture Notes in Informatics, vol. P-186, pp. 239–250. Gesellschaft für Informatik (2011)
Google Scholar
Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries (DL 1999), pp. 179–187. ACM, New York (1999)
Chapter Google Scholar
Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Google Scholar
Lupu, M., Huang, J., Zhu, J., Tait, J.: TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2), 63–70 (2009)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press, Southampton (2005)
Chapter Google Scholar
Tikk, D., Biró, G., Yang, J.D.: Experiments with a hierarchical text categorization method on wipo patent collections. In: Applied Research in Uncertainty Modelling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)
Google Scholar
Vapnik, V.N.: The nature of statistical learning theory, 1st edn. Springer, New York (1995)
Book MATH Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1-2), 69–90 (1999)
Article Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49. ACM Press (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Mathematics and Computer Science, Natural Language Processing Research Group, Leipzig, Germany
Akmal Saeed Khattak & Gerhard Heyer

Authors

Akmal Saeed Khattak
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Heyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akmal Saeed Khattak .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dr. Aleksandra Gruca
Polish Academy of Sciences and Silesian University of Technology, Gliwice, Poland
Tadeusz Czachórski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khattak, A.S., Heyer, G. (2014). Exploiting Co-Occurrence of Low Frequent Terms in Patents. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-02309-0_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02308-3
Online ISBN: 978-3-319-02309-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics