Skip to main content
Log in

Normalized table-matching algorithm as approach to text categorization

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This research is concerned with the improved version of table-based matching algorithm as the approach to text categorization tasks. It is intended to tackle the three problems in encoding texts into numerical vectors and the unstable performance by the fluctuations from text lengths in the previous version. In this research, we encode texts into tables rather than into numerical vectors, define the similarity measure between two tables which is always as a normalized value between zero and one, and apply it to the tasks of text categorization. As the benefits from this research, we expect better performance by solving the three problems resulting from encoding texts into numerical vectors, and more stable performance by improving the previous version. Therefore, we empirically validate the proposed approach through the four sets of experiments, with respect to both performance and stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  • Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054

    Article  Google Scholar 

  • Eyheramendy S, Lewis D, Madigan D (2003) On the Naive Bayes model for text categorization. In: The Proceedings of the 9th international workshop on artificial intelligence and statistics, pp 165–171

  • Hearst M (1998) Support vector machines. IEEE Intell Syst 13(4):18–28

  • Jo T (2000) NeuroTextCategorizer: a new model of neural network for text categorization. In: The Proceedings of ICONIP 2000, pp 280–285

  • Jo T (2004) Machine learning based approach to text categorization with resampling methods. In: The Proceedings of the 8th world multi-conference on systemics, cybernetics and informatics, pp 93–98

  • Jo T, Lee M (2007) Mistaken driven and unconditional learning of NTC. Lect Notes Comput Sci 4491:1205–1214

    Google Scholar 

  • Jo T, Cho D (2008) Index based approach for text categorization. Int J Math Comput Simul 2(1):127–132

    Google Scholar 

  • Jo T (2008) Table based matching algorithm for soft categorization of news articles in Reuter 21578. J Korea Multimed Soc 11(6):875– 882

  • Jo T (2008) Single pass algorithm for text clustering by encoding documents into tables. J Korea Multimed Soc 11(12):1749–1757

    Google Scholar 

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: The Proceedings of 10th European conference on machine learning, pp 143–151

  • Jo T, Seo J (2001) ’Text categorization oriented connectionist model. In: The Proceedings of ICCPOL 2001, pp 65–68

  • Kononenko I (1989) ID3, sequential Bayes, naive Bayes and Bayesian neural networks. In: The Proceedings of 4th European working session on learning, Montpellier, pp 91–98

  • Lee K, Kageura K (2007) Virtual relevant documents in text categorization with support vector machines. Inf Process Manag 43(4):902– 913

  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification with string kernels. J Mach Learn Res 2(2):419–444

    MATH  Google Scholar 

  • Massand B, Linoff G, Waltz D (1992) Classifying news stories using memory based reasoning. In: The Proceedings of 15th ACM international conference on research and development in information retrieval, pp 59–65

  • McClelland J, Rumelhart D (1986) Parallel distributed processing, vol 1 and 2. MIT Press, Cambridge, MA, USA

  • Mitchell TM (1997) Machine learning. McGraw-Hill, Singapore

    MATH  Google Scholar 

  • Mladenic D, Grobelink M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: The Proceedings of international conference on machine learning, pp 256–267

  • Peters C, Koster CHA (2002) Uncertainty-based noise reduction and term selection in text categorization. Lect Note Comput Sci 2291:248–267

    Article  Google Scholar 

  • Ruiz ME, Srinivasan P (2002) Hierarchical text categorization using neural networks. Inf Retr 5(1):87–118

    Article  MATH  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Article  Google Scholar 

  • Snchez SN, Triantaphyllou E, Kraft D (2002) A feature mining based approach for the classification of text documents into disjoint classes. Inf Process Manag 38(4):583–604

    Article  Google Scholar 

  • Tai X, Ren F, Kita K (2002) An information retrieval model based on vector space method by supervised learning. Inf Process Manag 38(6):749–764

    Article  MATH  Google Scholar 

  • Wang C, Wang W (2005) Using term clustering and supervised term affinity construction to boost text classification. Lect Note Comput Sci 3518:813–819

    Article  Google Scholar 

  • Wiener ED (1995) A neural network approach to topic spotting in text. The Thesis of Master of University of Colorado

  • Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1–2):67–88

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taeho Jo.

Additional information

Communicated by J.-W. Jung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jo, T. Normalized table-matching algorithm as approach to text categorization. Soft Comput 19, 839–849 (2015). https://doi.org/10.1007/s00500-014-1411-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1411-9

Keywords

Navigation