Data Streams pp 309-331 | Cite as

Algorithms for Distributed Data Stream Mining

  • Kanishka Bhaduri
  • Kamalika Das
  • Krishnamoorthy Sivakumar
  • Hillol Kargupta
  • Ran Wolff
  • Rong Chen
Part of the Advances in Database Systems book series (ADBS, volume 31)


The field of Distributed Data Mining (DDM) deals with the problem of analyzing data by paying careful attention to the distributed computing, storage, communication, and human-factor related resources. Unlike the traditional centralized systems, DDM offers a fundamentally distributed solution to analyze data without necessarily demanding collection of the data to a single central site. This chapter presents an introduction to distributed data mining for continuous streams. It focuses on the situations where the data observed at different locations change with time. The chapter provides an exposure to the literature and illustrates the behavior of this class of algorithms by exploring two very different types of techniques—one for the peer-to-peer and another for the hierarchical distributed environment. The chapter also briefly discusses several different applications of these algorithms.


Sensor Network Bayesian Network Data Stream Query Processing Central Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    C Aggarwal. A framework for diagnosing changes in evolving data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.Google Scholar
  2. [2]
    C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In VLDB conference, 2003.Google Scholar
  3. [3]
    C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In KDD, 2004.Google Scholar
  4. [4]
    B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In In Principles of Database Systems (PODS’02), 2002.Google Scholar
  5. [5]
    B. Babcock and C. Olston. Distributed top-k monitoring. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.Google Scholar
  6. [6]
    S. Ben-David, J. Gehrke, and D. Kifer. Detecting change in data streams. In VLDB Conference, 2004.Google Scholar
  7. [7]
    J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for Internet databases. In ACM SIGMOD’00 International Conference on Management of Data, 2000.Google Scholar
  8. [8]
    R. Chen, K. Sivakumar, and H. Kargupta. An approach to online bayesian learning from multiple data streams. In Proceedings of the Workshop on Ubiquitous Data Mining (5th European Conference on Principles and Practice of Knowledge Discovery in Databases), Freiburg, Germany, September 2001.Google Scholar
  9. [9]
    R. Chen, K. Sivakumar, and H. Kargupta. Collective mining of bayesian networks from distributed heterogeneous data. Knowledge and Information Systems, 6:164–187, 2004.CrossRefGoogle Scholar
  10. [10]
    P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In ACM Symposium on Parallel Algorithms and Architectures, 2001.Google Scholar
  11. [11]
    S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In IEEE Symposium on FOCS, 2000.Google Scholar
  12. [12]
    D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, 1995.Google Scholar
  13. [13]
    M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq System Research Center, 1998.Google Scholar
  14. [14]
    G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In SIGKDD, 2001.Google Scholar
  15. [15]
    R. Jin and G. Agrawal. Efficient decision tree construction on streaming data. In SIGKDD, 2003.Google Scholar
  16. [16]
    H. Kargupta and K. Sivakumar. Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT press, 2004.Google Scholar
  17. [17]
    J. Kotecha, V. Ramachandran, and A. Sayeed. Distributed multi-target classification in wireless sensor networks. IEEE Journal of Selected Areas in Communications (Special Issue on Self-Organizing Distributed Collaborative Sensor Networks), 2003.Google Scholar
  18. [18]
    D. Krivitski, A. Schuster, and R. Wolff. A local facility location algorithm for sensor networks. In Proc. of DCOSS’05, 2005.Google Scholar
  19. [19]
    S. Kutten and D. Peleg. Fault-local distributed mending. In Proc. of the ACM Symposium on Principle of Distributed Computing (PODC), pages 20–27, Ottawa, Canada, August 1995.Google Scholar
  20. [20]
    S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, series B, 50:157–224, 1988.zbMATHMathSciNetGoogle Scholar
  21. [21]
    N. Linial. Locality in distributed graph algorithms. SIAM Journal of Computing, 21:193–201, 1992.zbMATHCrossRefMathSciNetGoogle Scholar
  22. [22]
    A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In International Conference on Data Engineering (ICDE’05), 2005.Google Scholar
  23. [23]
    C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.Google Scholar
  24. [24]
    J. Widom and R. Motwani. Query processing, resource management, and approximation in a data stream management system. In CIDR, 2003.Google Scholar
  25. [25]
    R. Wolff, K. Bhaduri, and H. Kargupta. Local L2 thresholding based data mining in peer-to-peer systems. In Proceedings of SIAM International Conference in Data Mining (SDM), Bethesda, Maryland, 2006.Google Scholar
  26. [26]
    R. Wolff and A. Schuster. Association rule mining in peer-to-peer systems. In Proceedings of ICDM’03, Melbourne, Florida, 2003.Google Scholar
  27. [27]
    J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoring wireless sensor networks. In Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Kanishka Bhaduri
    • 1
  • Kamalika Das
    • 1
  • Krishnamoorthy Sivakumar
    • 2
  • Hillol Kargupta
    • 1
  • Ran Wolff
    • 1
  • Rong Chen
    • 2
  1. 1.Dept of CSEEUniversity of MarylandBaltimore County
  2. 2.School of EECSWashington State UniversityWashington

Personalised recommendations