Distributed Computing

, Volume 24, Issue 5, pp 207–222 | Cite as

Distributed data clustering in sensor networks

Article

Abstract

Low overhead analysis of large distributed data sets is necessary for current data centers and for future sensor networks. In such systems, each node holds some data value, e.g., a local sensor read, and a concise picture of the global system state needs to be obtained. In resource-constrained environments like sensor networks, this needs to be done without collecting all the data at any location, i.e., in a distributed manner. To this end, we address the distributed clustering problem, in which numerous interconnected nodes compute a clustering of their data, i.e., partition these values into multiple clusters, and describe each cluster concisely. We present a generic algorithm that solves the distributed clustering problem and may be implemented in various topologies, using different clustering types. For example, the generic algorithm can be instantiated to cluster values according to distance, targeting the same problem as the famous k-means clustering algorithm. However, the distance criterion is often not sufficient to provide good clustering results. We present an instantiation of the generic algorithm that describes the values as a Gaussian Mixture (a set of weighted normal distributions), and uses machine learning tools for clustering decisions. Simulations show the robustness, speed and scalability of this algorithm. We prove that any implementation of the generic algorithm converges over any connected topology, clustering criterion and cluster representation, in fully asynchronous settings.

Keywords

Sensor networks Distributed clustering Robust aggregation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asada, G., Dong, M., Lin, T., Newberg, F., Pottie, G., Kaiser, W., Marcy, H.: Wireless integrated network sensors: low power systems on a chip. In: ESSCIRC, Elsevier, Den Hague (1998)Google Scholar
  2. 2.
    Birk, Y., Liss, L., Schuster, A., Wolff, R.: A local algorithm for ad hoc majority voting via charge fusion. In: DISC, Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Boyd, S.P., Ghosh, A., Prabhakar, B., Shah, D.: Gossip algorithms: design, analysis and applications. In: INFOCOM, IEEE, Miami (2005)Google Scholar
  4. 4.
    Datta, S., Giannella, C., Kargupta, H.: K-means clustering over a large, dynamic network. In: SDM, SIAM (2006)Google Scholar
  5. 5.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1) 1–38 (1977). http://www.jstor.org/stable/2984875
  6. 6.
    Duda R.O., Hart P.E., Stork D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)Google Scholar
  7. 7.
    Eugster P.T., Guerraoui R., Handurukande S.B., Kouznetsov P., Kermarrec A.-M.: Lightweight probabilistic broadcast. ACM Trans. Comput. Syst. 21(4), 341–374 (2003)CrossRefGoogle Scholar
  8. 8.
    Eyal, I., Keidar, I., Rom, R.: Distributed clustering for robust aggregation in large networks. In: HotDep, IEEE (2009)Google Scholar
  9. 9.
    Eyal, I., Keidar, I., Rom, R.: Distributed data classification in sensor networks. In: PODC, ACM (2010)Google Scholar
  10. 10.
    Flajolet P., Martin G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Gurevich M., Keidar I.: Correctness of gossip-based membership under message loss. SIAM J. Comput. 39(8), 3830–3859 (2010)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Haridasan, M., van Renesse, R.: Gossip-based distribution estimation in peer-to-peer networks. In: International Workshop on Peer-to-Peer Systems (IPTPS 08) (2008)Google Scholar
  13. 13.
    Heller J.: Catch-22. Simon & Schuster, New York (1961)Google Scholar
  14. 14.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, IEEE Computer Society, Los Alamitos (2003)Google Scholar
  15. 15.
    Kowalczyk, W., Vlassis, N.A.: Newscast em. In: NIPS (2004)Google Scholar
  16. 16.
    Macqueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967)Google Scholar
  17. 17.
    Mark Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.M., van Steen, M.: Gossip based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)Google Scholar
  18. 18.
    Nath, S., Gibbons, P.B., Seshan, S., Anderson, Z.R.: Synopsis diffusion for robust aggregation in sensor networks. In: SenSys, ACM, New York (2004)Google Scholar
  19. 19.
    Sacha, J., Napper, J., Stratan, C., Pierre, G.: Reliable distribution estimation in decentralised environments. Submitted for Publication (2009)Google Scholar
  20. 20.
    Salmond, D.J.: Mixture reduction algorithms for uncertain tracking. Tech. rep., RAE Farnborough (UK) (1988)Google Scholar
  21. 21.
    Warneke, B., Last, M., Liebowitz, B., Pister, K.: Smart dust: communicating with a cubic-millimeter computer. Computer 34(1) (2001). doi:10.1109/2.895117

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department of Electrical EngineeringThe Technion—Israel Institute of TechnologyTechnion city, HaifaIsrael

Personalised recommendations