Advertisement

Abstract

This paper presents some recent results of the research on the scalability of rough set based classification methods. The proposed solution is based on the close relationship between reduct calculation problem in rough set theory and association rule generation problem. This is a continuation of our previous results (see, e.g. [10] [11]). In this paper, the set of decision rules satisfying the test object is generated directly from the training data set. To make it scalable, we adopted the idea of the FP-growth algorithm for frequent item-sets [7], [6]. The experimental results on some benchmark data sets are showing the ability of the proposed solution to process a growing data sets.

Keywords

Data mining Scalability Rough set Lazy learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules, Menlo Park, CA, USA. American Association for Artificial Intelligence, pp. 307–328 (1996)Google Scholar
  2. 2.
    Bazan, J.G.: A comparison and non-dynamic rough set method for extracting laws decision tables. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1. Methodology and Applications. Studies in Fuzziness and Soft Computing, pp. 321–365. Physica-Verlag, Heidelberg (1998)Google Scholar
  3. 3.
    Bondi, A.B.: Characteristics of scalability and their impact on performance. In: WOSP 2000: Proceedings of the 2nd international workshop on Software and performance, pp. 195–203. ACM, New York (2000)Google Scholar
  4. 4.
    Fayyad, U.M., Haussler, D., Stolorz, P.E.: Mining scientific data. Commun. ACM 39(11), 51–57 (1996)CrossRefGoogle Scholar
  5. 5.
    Grahne, G., Zhu, J.: High performance mining of maximal frequent itemsets. In: Proceedings of 6th International Workshop on High Performance Data Mining, HPDM 2003 (2003)Google Scholar
  6. 6.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2000)MATHGoogle Scholar
  7. 7.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, May 2000, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  8. 8.
    Komorowski, H.J., Pawlak, Z., Polkowski, L.T., Skowron, A.: Rough Sets: A Tutorial, pp. 3–98. Springer, Singapore (1999)Google Scholar
  9. 9.
    Kwiatkowski, P.: Scalable classification method based on FP-growth algorithm (in Polish). Master’s thesis, Warsaw University (2008)Google Scholar
  10. 10.
    Nguyen, H.S.: Scalable classification method based on rough sets. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 433–440. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Nguyen, H.S.: Approximate boolean reasoning: Foundations and applications in data mining 4100, 334–506 (2006)Google Scholar
  12. 12.
    Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Theory and decision library. D: System theory, knowledge engineering and problem solving, vol. 9. Kluwer Academic Publishers, Dordrecht (1991)MATHGoogle Scholar
  13. 13.
    Shafer, J.C., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: Vijayaraman, T.M., et al. (eds.) VLDB 1996, Proceedings of 22nd International Conference on Very Large Data Bases, Mumbai, India, September 3-6, pp. 544–555. Morgan Kaufmann, San Francisco (1996)Google Scholar
  14. 14.
    Skowron, A., Rauszer, C.M.: The discernibility matrices and functions in information systems, ch. 3, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)Google Scholar
  15. 15.
    Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1. Methodology and Applications. Studies in Fuzziness and Soft Computing, pp. 500–529. Physica-Verlag, Heidelberg (1998)Google Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  17. 17.
    Wroblewski, J.: Covering with reducts - a fast algorithm for rule generation. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 402–407. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Ziarko, W.: Rough sets as a methodology for data mining. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1. Methodology and Applications. Studies in Fuzziness and Soft Computing, pp. 554–571. Physica-Verlag, Heidelberg (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Piotr Kwiatkowski
    • 1
  • Sinh Hoa Nguyen
    • 2
  • Hung Son Nguyen
    • 1
  1. 1.Institute of MathematicsWarsaw UniversityWarsawPoland
  2. 2.Polish-Japanese Institute of Inf. TechnologyWarszawaPoland

Personalised recommendations