Abstract
This paper investigates the role of co-occurrence of low frequent terms in patent classification. A comparison is made between indexing, weighting single term features and multi-term features based on low frequent terms. Three datasets are used for experimentation. An increase of almost 21 percent in classification accuracy is observed through experimentation when multi-term features based on low frequent terms in patents are considered as compared to when all word types are considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)
Bashir, S., Baig, A.R.: Ramp: high performance frequent itemset mining with efficient bit-vector projection technique. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 504–508. Springer, Heidelberg (2006)
Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1), 37–78 (2007)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. ACM SIGMOD Record 27(2), 307–318 (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)
Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263. ACM, New York (2000)
Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (2003)
Han, E.H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Khattak, A.S., Heyer, G.: Significance of low frequent terms in patent classification using ipc hierarchy. In: Eichler, G., Küpper, A., Schau, V., Fouchal, H., Unger, H. (eds.) Proceedings of the 11th International Conference on Innovative Internet Community Services (IICS 2011). Lecture Notes in Informatics, vol. P-186, pp. 239–250. Gesellschaft für Informatik (2011)
Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries (DL 1999), pp. 179–187. ACM, New York (1999)
Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Lupu, M., Huang, J., Zhu, J., Tait, J.: TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2), 63–70 (2009)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press, Southampton (2005)
Tikk, D., Biró, G., Yang, J.D.: Experiments with a hierarchical text categorization method on wipo patent collections. In: Applied Research in Uncertainty Modelling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)
Vapnik, V.N.: The nature of statistical learning theory, 1st edn. Springer, New York (1995)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1-2), 69–90 (1999)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49. ACM Press (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Khattak, A.S., Heyer, G. (2014). Exploiting Co-Occurrence of Low Frequent Terms in Patents. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-02309-0_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02308-3
Online ISBN: 978-3-319-02309-0
eBook Packages: EngineeringEngineering (R0)