Advertisement

An Adaptive Fuzzy kNN Text Classifier

  • Wenqian Shang
  • Houkuan Huang
  • Haibin Zhu
  • Yongmin Lin
  • Youli Qu
  • Hongbin Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3993)

Abstract

In recent years, kNN algorithm is paid attention by many researchers and is proved one of the best text categorization algorithms. Text categorization is according to training set which is assigned class label to decide a new document which is not assigned class label belongs to some kind of document. Until now, kNN algorithm has still some issues to need to study further. Such as: improvement of decision rule; selection of k value; selection of dimensions (i.e. feature set selection); problems of multiclass text categorization; the algorithm’s executive efficiency (time and space) etc. In this paper, we mainly focus on improvement of decision rule and dimension selection. We design an adaptive fuzzy kNN text classifier. Here the adaptive indicate the adaptive of dimension selection. The experiment results show that our algorithm is effective and feasible.

Keywords

Decision Rule Categorization Performance Text Categorization Dimension Selection Text Categorization Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transaction on Information Theory IT-13, 21–27 (1967)Google Scholar
  2. 2.
    Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 76–88 (1997)Google Scholar
  3. 3.
    Yang, Y., Lin, X.: A Re-examination of Text Categorization Methods. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval, pp. 42–49. ACM Press, New York (1999)CrossRefGoogle Scholar
  4. 4.
    Masand, B., Lino, G., Waltz, D.: Classifying news stories using memory based reasoning. In: 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, pp. 59–64 (1992)Google Scholar
  5. 5.
    Lewis, D.D.: Naïve (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Mccallum, A., Nigam, K.: A comparison of event models for naïve bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, Madison, Wisconsin, pp. 41–48 (1998)Google Scholar
  7. 7.
    Lewis, D.D., Ringuette, M.: Comparison of two learning algorithms for text categorization. In: Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, pp. 81–93 (1994)Google Scholar
  8. 8.
    Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proc. of the Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, CMU, pp. 487–499 (1998)Google Scholar
  9. 9.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information System 12, 252–277 (1994)CrossRefGoogle Scholar
  11. 11.
    Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: 20th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)Google Scholar
  12. 12.
    Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proc. of the 4th Annual Symposium on Document Analysis and Information Retrieval, pp. 317–332 (1995)Google Scholar
  13. 13.
    Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Application 28, 667–671 (2005)CrossRefGoogle Scholar
  14. 14.
    Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–66. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Shankar, S., Karpis, G.: A Feature Weight Adjustment Algorithm for Document Categorization. In: Proc. of the International Workshop on Multimedia Data Mining (2000)Google Scholar
  16. 16.
    Li, B., Lu, Q., Yu, S.: An Adaptive k-Nearest Neighbor Text Categorization Strategy. ACM Transactions on Asian Language Information Processing 3, 215–226 (2004)CrossRefGoogle Scholar
  17. 17.
    Lim, H.: An Improved KNN Learning Based Korean Text Classifier with Heuristic Information. In: Proc. of the 9th International Conference on Neural Information Processing, pp. 731–735 (2002)Google Scholar
  18. 18.
    Dubois, D., Prade, H.: Fuzzy sets and systems (Theory and application). Academic Press, Oxford (1980)Google Scholar
  19. 19.
    Zhao, S.: The method of fuzzy mathematics in pattern recognition. School of the West-North Electronic Engineering Press, Xi’an (1987)Google Scholar
  20. 20.
    Bian, J., Zhang, X.: Pattern recognition. Tsinghua University Press, Beijing (2000)Google Scholar
  21. 21.
    Cardoso-Cachopo, A., Oliveira, A.L.: An empirical comparison of text categorization methods. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 183–196. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wenqian Shang
    • 1
  • Houkuan Huang
    • 1
  • Haibin Zhu
    • 2
  • Yongmin Lin
    • 1
  • Youli Qu
    • 1
  • Hongbin Dong
    • 1
  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityChina
  2. 2.Senior Member, IEEE, Dept. of Computer ScienceNipissing UniversityNorth BayCanada

Personalised recommendations