Skip to main content

Text Classification Using Novel “Anti-Bayesian” Techniques

  • Conference paper
  • First Online:
Computational Collective Intelligence

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9329))

Abstract

This paper presents a non-traditional “Anti-Bayesian” solution for the traditional Text Classification (TC) problem. Historically, all the recorded TC schemes work using the fundamental paradigm that once the statistical features are inferred from the syntactic/semantic indicators, the classifiers themselves are the well-established statistical ones. In this paper, we shall demonstrate that by virtue of the skewed distributions of the features, one could advantageously work with information latent in certain “non-central” quantiles (i.e., those distant from the mean) of the distributions. We, indeed, demonstrate that such classifiers exist and are attainable, and show that the design and implementation of such schemes work with the recently-introduced paradigm of Quantile Statistics (QS)-based classifiers. These classifiers, referred to as Classification by Moments of Quantile Statistics (CMQS), are essentially “Anti”-Bayesian in their modus operandi. To achieve our goal, in this paper we demonstrate the power and potential of CMQS to describe the very high-dimensional TC-related vector spaces in terms of a limited number of “outlier-based” statistics. Thereafter, the PR task in classification invokes the CMQS classifier for the underlying multi-class problem by using a linear number of pair-wise CMQS-based classifiers. By a rigorous testing on the standard 20-Newsgroups corpus we show that CMQS-based TC attains accuracy that is comparable to the best-reported classifiers. We also propose the potential of fusing the results of a CMQS-based method with those obtained from a traditional scheme.

The authors are grateful for the partial support provided by NSERC, the Natural Sciences and Engineering Research Council of Canada.

B. John Oommen—Chancellor’s Professor; Fellow: IEEE and Fellow: IAPR. This author is also an Adjunct Professor with the University of Agder in Grimstad, Norway.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alahmadi, A., Joorabchi, A., Mahdi, A.E.: A new text representation scheme combining bag-of-words and bag-of-concepts approaches for automatic text classification. In: Proceedings of the 7th IEEE GCC Conference and Exhibition, Doha, Qatar, pp. 108–113, November 2014

    Google Scholar 

  2. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 18th ACM Symposium on Applied Computing, Melbourne, USA, pp. 784–788, March 2003

    Google Scholar 

  3. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. A Wiley Interscience Publication (2006)

    Google Scholar 

  4. Dumoulin, J.: Smoothing of n-gram language models of human chats. In: Proceedings of the Joint 6th International Conference on Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), Kobe, Japan, pp. 1–4, November 2012

    Google Scholar 

  5. Lu, L., Liu, Y.-S.: Research of english text classification methods based on semantic meaning. In: Proceedings of the ITI 3rd International Conference on Information and Communications Technology, Cairo, Egypt, pp. 689–700, December 2005

    Google Scholar 

  6. Madsen, R.E., Sigurdsson, S., Hansen, L.K., Larsen, J.: Pruning the vocabulary for better context recognition. In: Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, vol. 2, pp. 483–488, August 2004

    Google Scholar 

  7. Ning, Y., Zhu, T., Wang, Y.: Affective-word based chinese text sentiment classification. In: Proceedings of the 5th International Conference on Pervasive Computing and Applications (ICPCA), Maribor, Slovenia, pp. 111–115, December 2010

    Google Scholar 

  8. Oommen, B.J., Khoury, R., Schmidt, A.: Text Classification Using “Anti”-Bayesian Quantile Statistics-based Classifiers. Unabridged version of this paper. Submitted for publication

    Google Scholar 

  9. Oommen, B.J., Thomas, A.: Optimal Order Statistics-based “Anti-Bayesian” Parametric Pattern Classification for the Exponential Family. Pattern Recognition 47, 40–55 (2014)

    Article  MATH  Google Scholar 

  10. Qiang, G.: An effective algorithm for improving the performance of naïve bayes for text classification. In: Proceedings of the Second International Conference on Computer Research and Development, Malaysia, pp. 699–701, May 2010

    Google Scholar 

  11. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. Mc-Graw Hill Book Company, New York (1983)

    MATH  Google Scholar 

  12. Salton, G., Yang, C.S., Yu, C.: Term weighting approaches in automatic text retrieval. Technical Report, Ithaca, NY, USA (1987)

    Google Scholar 

  13. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  14. Thomas, A., Oommen, B.J.: The Fundamental Theory of Optimal “Anti-Bayesian” Parametric Pattern Classification Using Order Statistics Criteria. Pattern Recognition, 376–388 2013

    Google Scholar 

  15. Thomas, A., Oommen, B.J.: Order Statistics-based Parametric Classification for Multi-dimensional Distributions. Pattern Recognition, 3472–3482 (2013)

    Google Scholar 

  16. Thomas, A., Oommen, B.J.: Corrigendum to Three Papers that deal with “Anti”-Bayesian Pattern Recognition. Pattern Recognition, 2301–2302 (2014)

    Google Scholar 

  17. Wu, G., Liu, K.: Research on text classification algorithm by combining statistical and ontology methods. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, China, pp. 1–4, December 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. John Oommen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Oommen, B.J., Khoury, R., Schmidt, A. (2015). Text Classification Using Novel “Anti-Bayesian” Techniques. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9329. Springer, Cham. https://doi.org/10.1007/978-3-319-24069-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24069-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24068-8

  • Online ISBN: 978-3-319-24069-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics