Journal of Intelligent Information Systems

, Volume 35, Issue 3, pp 447–463

Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

Article
  • 97 Downloads

Abstract

Gradual rules allow users to be provided with rules describing the ordering correlations among attributes. Such a rule is for instance given by the higher the salary and the lower the number of cars, the higher the number of tourist travels. Previously intensively used in fuzzy command systems, these rules were manually provided to the system. More recently, they have received attention from the data mining community and methods have been defined to automatically extract and maintain gradual rules from numerical databases. However, no method has been shown to be able to handle data streams, as no method is scalable enough to manage the high rate which stream data arrive at. In this paper, we thus propose an original approach to mine data streams for gradual rules. Our method is based on B-Trees and OWA (Ordered Weighted Aggregation) operator in order to speed up the process. B-Trees are used to store already-known gradual rules in order to maintain the knowledge over time, while OWA operators provide a fast way to discard non relevant data.

Keywords

Data streams Gradual rules OWA operators 

References

  1. Aggarwal, C. (2007). Data streams: Models and algorithms. New York: Springer.MATHGoogle Scholar
  2. Alon, N., Matias, Y., & Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proc. of the 28th annual ACM symposium on theory of computing (STOC’96) (pp. 20–29).Google Scholar
  3. Berzal, F., Cubero, J., Sanchez, D., Vila, M., & Serrano, J. (2007). An alternative approach to discover gradual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), 15(5), 559–570.MATHCrossRefMathSciNetGoogle Scholar
  4. Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.MATHCrossRefGoogle Scholar
  5. Calders, T., Dexters, N., & Goethals, B. (2007). Mining frequent itemsets in a stream. In Proc. of the IEEE int. conference on data mining (ICDM’07) (pp. 83–92).Google Scholar
  6. Calders, T., Dexters, N., & Goethals, B. (2008). Mining frequent items in a stream using flexible windows. Journal of Intelligent Data Analysis, 12(3), 293–304.Google Scholar
  7. Charikar, M., Chen, K., & Farach-Colton, M. (2004). Finding frequent items in data streams. Theoretical Computer Sciences, 312(1), 3–15.MATHCrossRefMathSciNetGoogle Scholar
  8. Chi, Y., Wang, H., Yu, P., & Muntz, R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proc. of the IEEE int. conference on data mining (ICDM’04).Google Scholar
  9. Coenen, F., Goulbourne, G., & Leng, P. (2004). Tree structures for mining association rules. Data Mining and Knowledge Discovery, 8(1), 25–51.CrossRefMathSciNetGoogle Scholar
  10. Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. Cambridge: MIT.Google Scholar
  11. Greco, S., Matarazzo, B., Pappalardo, N., & Slowinski, R. (2006). Measuring expected effects of interventions based on decision rules. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2).Google Scholar
  12. Greenwa, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proc. of the ACM int. conference on management of data (SIGMOD’01) (pp. 56–66).Google Scholar
  13. Han, J., Chen, Y., Dong, G., Pei, J., Wah, B., Wang, J., et al. (2005). Stream cube: An architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases, 18(2), 173–197.CrossRefGoogle Scholar
  14. Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proc. of ACM int. conference on knowledge discovery and data mining (KDD’00).Google Scholar
  15. Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8, 53–87.CrossRefMathSciNetGoogle Scholar
  16. Hüllermeier, E. (2002). Association rules for expressing gradual dependences. In Proc. of the 6th Eu. conf. on principles of data mining and knowledge discovery (PKDD’02) (pp. 200–211).Google Scholar
  17. Jorio, L. D., Laurent, A., & Teisseire, M. (2008). Fast extraction of gradual association rules: A heuristic based method. In IEEE/ACM int. conference on soft computing as transdisciplinary science and technology (CSTST).Google Scholar
  18. Jorio, L. D., Laurent, A., & Teisseire, M. (2009). Mining for gradualness over time using sequential patterns. In Intelligent decision technologies (IDT’09).Google Scholar
  19. Li, H.-F., Lee, S., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of 1st int. workshop on knowledge discovery in data streams.Google Scholar
  20. Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proc.of 28th int. conf. very large databases.Google Scholar
  21. Masseglia, F., Poncelet, P., Teisseire, M., & Marascu, A. (2008). Web usage mining: extracting unexpected periods from web logs. Data Mining and Knowledge Discover, 16(1), 39–65.CrossRefMathSciNetGoogle Scholar
  22. Miller, R. J., & Yang, Y. (1997). Association rules over interval data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 452–461).Google Scholar
  23. Nag, B., Deshpande, P. M., & DeWitt, D. J. (1999). Using a knowledge cache for interactive discovery of association rules. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 244–253).Google Scholar
  24. Raissi, C., Poncelet, P., & Teisseire, M. (2006). Speed: Mining maximal sequential patterns over data streams. In Proc. of the 3rd IEEE int. conference on intelligent systems (IS2006).Google Scholar
  25. Raissi, C., Poncelet, P., & Teisseire, M. (2007). Towards a new approach for mining maximal frequent itemsets over data stream. Journal of Intelligent Information Systems (JIIS), 28(1), 23–36.CrossRefGoogle Scholar
  26. Ras, Z., & Wieczorkowska, A. (2000). Action rules: how to increase profit of a company. In Proceedings of the principles of data mining and knowledge discovery (PKDD 00) (pp. 587–592). Lyon, France.Google Scholar
  27. Skowron, A., & Synak, P. (2006). Planning based on reasoning about information changes. In Rough sets and current trends in computing (pp. 165–173).Google Scholar
  28. Stollnitz, E. J., Derose, T. D., & Salesin, D. H. (1996). Wavelets for computer graphics: Theory and applications. San Mateo: Morgan Kaufmann.Google Scholar
  29. Teng, W.-G., Chen, M.-S., & Yu, P. (2003). A regression-based temporal patterns mining schema for data streams. In Proc. of 29th int. conf. very large databases (VLDB’03) (pp. 93–104).Google Scholar
  30. Torra, V. (1997). The weighted owa operator. International Journal of Intelligent Systems, 12, 153–166.MATHCrossRefGoogle Scholar
  31. Torra, V. (2004). Owa operators in data modeling and re-identification. IEEE Transaction on Fuzzy Systems, 12(5), 652–660.CrossRefGoogle Scholar
  32. Torra, V., & Narukawa, Y. (2007). Modeling decisions: Information fusion and aggregation operators. New York: Springer.Google Scholar
  33. Torra, V., & Nin, J. (2008). Record linkage for database integration using fuzzy integrals. International Journal of Intelligent Systems, 23(6), 715–734.MATHCrossRefGoogle Scholar
  34. Tsay, L.-S., & Ras, Z. (2005). Action rules discovery system dear, method and experiments. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 119–128.MATHCrossRefGoogle Scholar
  35. Tzacheva, A., & Ras, Z. (2005). Action rules mining. International Journal of Intelligent Systems, 20(7), 719–736.MATHCrossRefGoogle Scholar
  36. Verma, K., Vyas, O., & Vyas, R. (2005). Temporal approach to association rule mining using t-tree and p-tree. In Machine learning and data mining in pattern recognition. Lecture notes in computer sciences (Vol. 3587, pp. 651–659). Springer.Google Scholar
  37. Vitter, J. S. (1985). Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 37–57.MATHCrossRefMathSciNetGoogle Scholar
  38. Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transaction on Systems, Man and Cybernetics, 18, 183–190.MATHCrossRefMathSciNetGoogle Scholar
  39. Yager, R. R. (1993). Families of owa operators. Fuzzy Sets and Systems, 59, 125–148.MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.LAAS, Lab. d’Analyse et d’Architecture des SystèmesCNRS, Centre National de la Recherche ScientifiqueToulouseFrance
  2. 2.LIRMM - CNRS UMR 5506Univ. Montpellier 2Montpellier Cedex 5France

Personalised recommendations