Advertisement

Difference-Similitude Matrix in Text Classification

  • Xiaochun Huang
  • Ming Wu
  • Delin Xia
  • Puliu Yan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3614)

Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Communications of the ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  2. 2.
    Setiono, R., Liu, H.: Neural network feature selector. IEEE Transactions on Neural Networks, vol 8(39), 645–662 (1997)Google Scholar
  3. 3.
    Barker, A.L.: Selection of Distance Metrics and Feature Subsets for k-Nearest Neighbor Classifiers (1997)Google Scholar
  4. 4.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)zbMATHGoogle Scholar
  5. 5.
    Pawlak, Z.: Rough Classification. International Journal of Man-Machine Studies 20(5), 469–483 (1984)zbMATHCrossRefGoogle Scholar
  6. 6.
    Nguyen, S.H.: Scalable classification method based on rough sets. In: Proceedings of Rough Sets and Current Trends in Computing, pp. 433–440 (2002)Google Scholar
  7. 7.
    Pawlak, Z.: Rough Sets. Informational Journal of Information and Computer Sciences 11(5), 341–356 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Xia, D., Yan, P.: A New Method of Knowledge Reduction for Information System – DSM Approach. Research Report of Wuhan University, Wuhan (2001)Google Scholar
  9. 9.
    Jiang, H., Yan, P., Xia, D.: A New Reduction Algorithm – Difference-Similitude Matrix. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, 2-5 Xi’an, pp. 1533–1537 (2004)Google Scholar
  10. 10.
    Wu, M., Xia, D., Yan, P.: A New Knowledge Reduction Method Based on Difference-Similitude Set Theory. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 1413–1418 (2004)Google Scholar
  11. 11.
    Aizawa, A.: The feature quantity: An information theoretic perspective of tfidf-like measures. In: Proceedings of SIGIR 2000, pp. 104–111 (2000)Google Scholar
  12. 12.
    Chen, Y., Wang, J.Z.: Support Vector Learning for Fuzzy Rule-Based Classification System. IEEE Transactions on Fuzzy Systems 11(6), 716–728 (2003)CrossRefGoogle Scholar
  13. 13.
    Li, H., Kenji, Y.: Text Classification Using ESC-based Stochastic Decision List. In: Proceedings of the 8th ACM International Conference on Information and Knowledge Management (CIKM 1999), pp. 122–130 (1999)Google Scholar
  14. 14.
    Han, E.-H., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Technical Report #99-019 (1999)Google Scholar
  15. 15.
    Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Using EM to Classify Text from Labeled and Unlabeled Documents. Technical Report CMU-CS-98-120, School of Computer Science, CMU, Pittsburgh, p. 15213 (1998)Google Scholar
  16. 16.
    Fung, B.C.M., Wang, K., Ester, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the SIAM International Conference on Data Mining (2003)Google Scholar
  17. 17.
    Zhou, J., Xia, D., Yan, P.: Incremental Machine Learning Theorem and Algorithm Based on DSM Method. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 2202–2207 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Xiaochun Huang
    • 1
  • Ming Wu
    • 1
  • Delin Xia
    • 1
  • Puliu Yan
    • 1
  1. 1.School of Electronic InformationWuhan UniversityWuhanChina

Personalised recommendations