Skip to main content

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8467)

Abstract

We propose a semantic kernel for Support Vector Machines (SVM) that takes advantage of higher-order relations between the words and between the documents. Conventional approach in text categorization systems is to represent documents as a “Bag of Words” (BOW) in which the relations between the words and their positions are lost. Additionally, traditional machine learning algorithms assume that instances, in our case documents, are independent and identically distributed. This approach simplifies the underlying models, but nevertheless it ignores the semantic connections between words as well as the semantic relations between documents that stem from the words. In this study, we improve the semantic knowledge capture capability of a previous work in [1], which is called χ-Sim Algorithm and use this method in the SVM as a semantic kernel. The proposed approach is evaluated on different benchmark textual datasets. Experiment results show that classification performance improves over the well-known traditional kernels used in the SVM such as the linear kernel (one of the state-of-the-art algorithms for text classification system), the polynomial kernel and the Radial Basis Function (RBF) kernel.

Keywords

  • machine learning
  • support vector machine
  • text classification
  • higher-order paths
  • semantic kernel

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-07173-2_43
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-07173-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bisson, G., Hussain, F.: Chi-Sim: A New Similarity Measure for the Co-clustering Task. In: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, pp. 211–217 (2008)

    Google Scholar 

  2. Wang, P., Domeniconi, C.: Building Semantic Kernels for text classification using Wikipedia. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721. ACM Press, New York (2008)

    CrossRef  Google Scholar 

  3. Ganiz, M.C., Lytkin, N.I., Pottenger, W.M.: Leveraging Higher Order Dependencies between Features for Text Classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 375–390. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  4. Ganiz, M.C., George, C., Pottenger, W.M.: Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification. IEEE Transactions on Knowledge and Data Engineering 23(7), 1022–1034 (2011)

    CrossRef  Google Scholar 

  5. Poyraz, M., Kilimci, Z.H., Ganiz, M.C.: Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification. Journal of Computer Science and Technology (accepted, 2014)

    Google Scholar 

  6. Poyraz, M., Kilimci, Z.H., Ganiz, M.C.: A Novel Semantic Smoothing Method Based on Higher Order Paths for Text Classification. In: IEEE International Conference on Data Mining (ICDM), Brussels, Belgium (2012)

    Google Scholar 

  7. Altinel, B., Ganiz, M.C., Diri, B.: A Novel Higher-order Semantic Kernel. In: ICECCO 2013 (The 10th International Conference on Electronics Computer and Computation), Ankara, Turkey, November 7-9 (2013)

    Google Scholar 

  8. Joachims, T.: Text Categorization with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    CrossRef  Google Scholar 

  9. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information Retrieval and Knowledge Management (ACM-CIKM 1998), pp. 148–155 (1998)

    Google Scholar 

  10. Siolas, G., D’Alche-Buc, F.: Support vectors machines based on a semantic kernel for text Categorization. In: Proceedings of the International Joint Conference on Neural Networks. IEEE Press, Como (2000)

    Google Scholar 

  11. Leopold, E., Kindermann, J.: Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)

    CrossRef  MATH  Google Scholar 

  12. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifier. In: Proc. 5th ACM Workshop, Comput. Learning Theory, Pittsburgh, pp. 144–152 (1992)

    Google Scholar 

  13. Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-Class Support Vector Machines., 415–425 (2002)

    Google Scholar 

  14. Bloehdorn, S., Basili, R., Cammisa, M., Moschitti, A.: Semantic kernels for text classifi-cation based on topological measures of feature similarity. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 808–812 (2006)

    Google Scholar 

  15. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Five Papers on WordNet. Technical report, Stanford University (1993)

    Google Scholar 

  16. Miller, Q., Chen, E., Xiong, H.: A Semantic Term Weighting Scheme for Text Categorization. Journal of Expert Systems with Applications (2011)

    Google Scholar 

  17. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5) (1988)

    Google Scholar 

  18. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann (2012)

    Google Scholar 

  19. Dumais, S.: LSI meets TREC: A status report. In: Hartman, D. (ed.) The First Text Retrieval Conference: NIST Special Publication 500-215, pp. 105–116 (1993)

    Google Scholar 

  20. Kontostathis, A., Pottenger, W.M.: A Framework for Understanding LSI Performance. Information Processing & Management, 56–73 (2006)

    Google Scholar 

  21. Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)

    Google Scholar 

  22. Platt, J.C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In: Advances in Kernel Method: Support Vector Learning, pp. 185–208. MIT Press (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Altinel, B., Ganiz, M.C., Diri, B. (2014). A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8467. Springer, Cham. https://doi.org/10.1007/978-3-319-07173-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07173-2_43

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07172-5

  • Online ISBN: 978-3-319-07173-2

  • eBook Packages: Computer ScienceComputer Science (R0)