Artificial Intelligence Review

, Volume 26, Issue 3, pp 191–209 | Cite as

Combining rough decisions for intelligent text mining using Dempster’s rule

Article

Abstract

An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster’s rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster’s rule of combination. We apply these methods to 10 of the 20-newsgroups—a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster–Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.

Keywords

Rule induction Text mining Rough set Dempster’s rule of combination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aphinyanaphongs Y, Aliferis CF (2003) Text categorization models for retrieval of high quality articles in internal medicine. In: Proceedings of the American Medical Informatics Association (AMIA) annual symposium, Washington, DC, USA, pp 31–35Google Scholar
  2. Apte C, Damerau F and Weiss S (1994). Automated Learning of Decision Text Categorization. ACM Trans Inf Syst 12(3): 233–251 CrossRefGoogle Scholar
  3. Baker D, McCallum A (1998) Distributional clustering of words for text classification. In: Proceedings of 21st ACM international conference on research and development in information retrieval, pp 96–103Google Scholar
  4. Bi Y (2004) Combining multiple classifiers for text categorization using Dempster’s rule of combination. PhD dissertation, University of UlsterGoogle Scholar
  5. Bi Y, Anderson T, McClean S (2004a) Combining rules for text categorization using Dempster’s rule of combination. In: Proceedings of 5th international conference on intelligent data engineering and automated learning. LNCS 3177, Spring-Verlag, pp 457–463Google Scholar
  6. Bi Y, Bell D, Guan JW (2004b) Combining evidence from classifiers in text categoriza-tion. In: Proceedings of the 8th international conference on knowledge-based intelligent information & engineering systems. LNCS 3215, Spring, pp 521–528Google Scholar
  7. Chouchoulas A and Shen Q (2001). Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9): 843–873 CrossRefGoogle Scholar
  8. Cohen WW, Singer Y (1999) Simple, fast, and effective rule learner. In: Proceedings of annual conference of American association for artificial intelligence, pp 335–342Google Scholar
  9. Denoeux T (2000). A neural network classifier based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern A 30(2): 131–150 CrossRefMathSciNetGoogle Scholar
  10. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 148–156Google Scholar
  11. Freund Y and Schapire RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139 MATHCrossRefMathSciNetGoogle Scholar
  12. Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: A statistical view of boosting (Technical Report). Stanford University Statistics Department. http://www.stat-stanford.edu/~tibs
  13. Grzymala-Busse J (1992) LERS—A System for learning from examples based on Rough Sets. In: Slowinski R (ed), Intelligent decision support. Kluwer Academic, pp 3–17Google Scholar
  14. Guang JW and Bell D (1998). Rough computational methods for information systems. Artif Intell 105: 77–103 CrossRefGoogle Scholar
  15. Kittler J, Hatef M, Duin RPW and Matas J (1998). On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239 CrossRefGoogle Scholar
  16. Kuncheva L (2001) Combining classifiers: soft computing solutions. In: Pal SK, Pal A (eds) Pattern recognition: from classical to modern approaches. World Scientific, pp 427–451Google Scholar
  17. Lam L (2000) Classifier combinations: implementation and theoretical issues. In: Kittler J, Roli F (eds) Multiple classifier systems. LNCS 1857, Spring, pp 78–86Google Scholar
  18. Mitchell T (1999). Machine learning and data mining. Commun ACM 42(11): 31–36 CrossRefGoogle Scholar
  19. Nardiello P, Sebastiani F, Sperduti A (2003) Discretizing continuous attributes in AdaBoost for text categorization. In: Proceedings of 25th European conference on information retrieval. LNCS 2633, Springer-Verlag, Berlin, pp 320–334Google Scholar
  20. Opitz D and Maclin R (1999). Popular ensemble methods: an empirical study. J Artif Intell Res 11: 169–198 MATHGoogle Scholar
  21. Pawlak Z (1991) Rough Set: theoretical aspects of reasoning about data. Kluwer AcademicGoogle Scholar
  22. Quinlan JR (1996) Bagging, boosting, and C4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730Google Scholar
  23. Schapire RE and Singer Y (2000). BoosTexter: aboosting-based system for text categorization. Mach Learn 39(2/3): 135–168 MATHCrossRefGoogle Scholar
  24. Shafer G (1976). A mathematical theory of evidence. Princeton University Press, Princeton MATHGoogle Scholar
  25. Skowron A and Grzymala-Busse J (1994). From rough set theory to evidence theory. In: Yager, R, Fedrizzi, M and Kacprzyk, J (eds) Advances of the Dempster–Shafer Theory of Evidence, pp 193–236. Wiley, New York Google Scholar
  26. Tumer K and Ghosh JR (2002). Combining of disparate classifiers through order statistics. Pattern Anal Appl 6(1): 41–46 Google Scholar
  27. Xu L, Krzyzak A and Suen CY (1992). Several methods for combining multiple classifiers and their applications in handwritten character recognition. IEEE Trans Syst Man Cybern 22(3): 418–435 CrossRefGoogle Scholar
  28. Yao YY and Lingras PJ (1998). Interpretations of belief functions in the theory of rough sets. Inf Sci 104(1–2): 81–106 MATHCrossRefMathSciNetGoogle Scholar
  29. Yang Y (1999). An evaluation of statistical approaches to text categorization. J Inf Retr 1(1/2): 67–88 Google Scholar
  30. van Rijsbergen CJ (1979) Information retrieval, 2nd edn. ButterworthsGoogle Scholar
  31. Weiss S, Kulikowski C (1991) Computer system that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan KaufmannGoogle Scholar
  32. Weiss SM, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning, pp 1135–1142Google Scholar
  33. Whiteaker CJ, Kuncheva L (2003) Examining the relationship between majority vote accuracy and diversity in bagging and boosting. Technical report. University of Wales, BangorGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.School of Computing and MathematicsUniversity of UlsterNewtownabbey, AntrimNorthern Ireland, UK
  2. 2.School of Computing and Information EngineeringUniversity of UlsterLondonderryNorthern Ireland, UK

Personalised recommendations