Multiple Sets of Rules for Text Categorization

  • Yaxin Bi
  • Terry Anderson
  • Sally McClean
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3261)


This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various experiments have been carried out on 10 out of the 20-newsgroups – a benchmark data collection – individually and in combination. Our experimental results support the claim that “k experts may be better than any one if their individual judgements are appropriately combined”.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Duda, R., Hart, P., Stork, D.: Pattern classification. John Wiley & Sons, Inc., New York (2001)MATHGoogle Scholar
  2. 2.
    Klein, L.A.: Sensor and data fusion concepts and applications. Society of Photooptical Instrumentation Engineers. 2nd edn. (1999)Google Scholar
  3. 3.
    Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 (1996)Google Scholar
  5. 5.
    Mitchell, T.: Machine learning and data mining. Communications of ACM 42(11) (1999)Google Scholar
  6. 6.
    Bi, Y.: Combining Multiple Piece of Evidence for Text Categorization using Dempster’s rule of combination. Internal report (2004)Google Scholar
  7. 7.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Matero (1993)Google Scholar
  8. 8.
    Apte, C., Damerau, F., Weiss, S.: Automated Learning of Decision Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)CrossRefGoogle Scholar
  9. 9.
    Weiss, S.M., Indurkhya, N.: Lightweight Rule Induction. In: Proceedings of the International Conference on Machine Learning, ICML (2000)Google Scholar
  10. 10.
    Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)MATHGoogle Scholar
  11. 11.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)Google Scholar
  12. 12.
    Joachims, T.: Text categorization With Support Vector Machines: Learning With Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)Google Scholar
  13. 13.
    Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Yaxin Bi
    • 1
    • 2
  • Terry Anderson
    • 3
  • Sally McClean
    • 3
  1. 1.School of Computer ScienceQueen’s University of BelfastBelfastUK
  2. 2.School of Biomedical ScienceUniversity of UlsterColeraine, LondonderryUK
  3. 3.Faculty of EngineeringUniversity of UlsterNewtownabbey, Co. AntrimUK

Personalised recommendations