Quality Measures and Semi-automatic Mining of Diagnostic Rule Bases
Semi-automatic data mining approaches often yield better results than plain automatic methods, due to the early integration of the user’s goals. For example in the medical domain, experts are likely to favor simpler models instead of more complex models. Then, the accuracy of discovered patterns is often not the only criterion to consider. Instead, the simplicity of the discovered knowledge is of prime importance, since this directly relates to the understandability and the interpretability of the learned knowledge.
In this paper, we present quality measures considering the understandability and the accuracy of (learned) rule bases. We describe a unifying quality measure, which can trade-off small losses concerning accuracy vs. an increased simplicity. Furthermore, we introduce a semi-automatic data mining method for learning understandable and accurate rule bases. The presented work is evaluated using cases from a real world application in the medical domain.
Unable to display preview. Download preview PDF.
- 1.Ho, T., Saito, A., Kawasaki, S., Nguyen, D., Nguyen, T.: Failure and Success Experience in Mining Stomach Cancer Data. In: International Workshop Data Mining Lessons Learned, International Conf. Machine Learning, pp. 40–47 (2002)Google Scholar
- 4.Puppe, F., Ziegler, S., Martin, U., Hupp, J.: Wissensbasierte Diagnosesysteme im Service-Support (Diagnostic Knowledge Systems for the Service-Support). Springer, Heidelberg (2001)Google Scholar
- 8.Tuzhilin, A.: Usefulness, Novelty, and Integration of Interestingness Measures. In: Klösgen, Z. (ed.) Handbook of Data Mining and Knowledge Discovery, ch. 19.2.2. Oxford University Press, New York (2002)Google Scholar
- 10.Lewis, D.D., Gale, W.A.: A Sequential Algorithm for Training Text Classifiers. In: Proc. of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1994), London, pp. 3–12. Springer, Heidelberg (1994)Google Scholar
- 11.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
- 13.Yen, S.J., Chen, A.L.P.: An Efficient Algorithm for Deriving Compact Rules from Databases. In: Ling, M. (ed.) Proceedings of the 4th International Conference on Database Systems for Advanced Applications 1995, pp. 364–371. World Scientific, Singapore (1995)Google Scholar
- 15.Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Mateo (2000)Google Scholar