Making Data Analysis Ubiquitous: My Journey Through Academia and Industry

  • Hillol Kargupta


This chapter presents an overview of my work in the field of distributed data mining over the last 15 years and shares some of my experiences from this journey. The chapter first describes the context and my early work in this area. Next, it presents an overview of the milestones of my algorithmic work and the commercialization of the technology. It spends some time explaining the applications that made a difference in the real life and explains some of the challenges I faced in doing so. Finally, it shares my thoughts about some of the lessons I learned and some suggestions for the next generation of data mining researchers.


Data Mining Data Mining Algorithm Data Mining Problem Data Stream Mining Data Analysis Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The author would like to thank all the organizations that funded his research over the last 15 years including the National Science Foundation, NASA, Department of Defense, Department of Energy, Caterpillar, IBM, and many other organizations. The current work is funded by the NASA grant and AF MURI grant.


  1. 1.
    H. Kargupta, I. Hamzaoglu, B. Stafford, Scalable, distributed data mining using an agent-based architecture, in Proceedings of Knowledge Discovery and Data Mining, ed. by D. Heckerman, H. Mannila, D. Pregibon, R. Uthurusamy (AAAI, Palo Alto, CA, 1997), pp. 211–214Google Scholar
  2. 2.
    H. Kargupta, K. Sivakumar, Existential pleasures of distributed data mining, in Data Mining: Next Generation Challenges and Future Directions, ed. by H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (AAAI, Palo Alto, CA, 2004)Google Scholar
  3. 3.
    M. May, L. Saitta, Ubiquitous knowledge discovery: challenges, techniques, applications. Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence Series 6202 (2010)Google Scholar
  4. 4.
    S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, S. Datta, Clustering distributed data streams in P2P environments. Inf. Sci. 176(14), 1952–1985 (2006)CrossRefGoogle Scholar
  5. 5.
    S. Datta, C. Giannella, H. Kargupta, Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans. Knowl. Data Eng. 21, 1372–1388 (2009)CrossRefGoogle Scholar
  6. 6.
    W. Kowalczyk, M. Jelasity, A.E. Eiben, Towards data mining in large and fully distributed peer-to-peer overlay networks, in Proceedings of BNAIC, 2003, pp. 203–210Google Scholar
  7. 7.
    K. Das, K. Bhaduri, K. Liu, H. Kargupta, Distributed identification of top-l inner product elements and its application in a P2P network. TKDE 20(4), 475–488 (2008)Google Scholar
  8. 8.
    S. Mukherjee, H. Kargupta, Distributed probabilistic inferencing in sensor networks using variational approximation. JPDC 68(1), 78–92 (2008)zbMATHGoogle Scholar
  9. 9.
    K. Bhaduri, H. Kargupta, A scalable local algorithm for distributed multivariate regression. Stat. Anal. Data Min. 1(3), 177–194 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  11. 11.
    L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees (Wadsworth, Belmont, CA, 1984)zbMATHGoogle Scholar
  12. 12.
    K. Bhaduri, R. Wolff, C. Giannella, H. Kargupta, Distributed decision tree induction in P2P systems. Stat. Anal. Data Min. 1(2), 85–103 (2008)MathSciNetCrossRefGoogle Scholar
  13. 13.
    H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques, in Proceedings of the IEEE International Conference on Data Mining, Melbourne, FL, 2003, pp. 99–106Google Scholar
  14. 14.
    K. Liu, C. Giannella, H. Kargupta, A survey of attack techniques on privacy-preserving data perturbation methods, in Privacy-Preserving Data Mining: Models and Algorithms, ed. by C. Aggarwal, P.S. Yu (Springer, Berlin, 2008), pp. 357–380. Chapter 15Google Scholar
  15. 15.
    K. Liu, H. Kargupta, J. Ryan, Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)CrossRefGoogle Scholar
  16. 16.
    H. Kargupta, K. Das, K. Liu, A game theoretic approach toward multi-party privacy-preserving distributed data mining, in 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Polland, September 2007Google Scholar
  17. 17.
    T. Mahule, K. Borne, S. Dey, S. Arora, H. Kargupta, PADMINI: a peer-to-peer distributed astronomy data mining system and a case study, in Proceedings of the Conference on Intelligent Data Understanding, 2010Google Scholar
  18. 18.
    H. Dutta, X. Zhu, T. Mahule, H. Kargupta, K. Borne, C. Lauth, F. Holz, G. Heyer, TagLearner: a P2P classifier learning system from collaboratively tagged text documents, in Proceedings of the International Conference on Data Mining (ICDM), Workshop on Mining Multiple Information Sources, December, 2009Google Scholar
  19. 19.
    H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (eds.), Data Mining: Next Generation Challenges and Future Directions (AAAI, Palo Alto, CA, 2004)Google Scholar
  20. 20.
    H. Kargupta, P. Chan, Advances in Distributed and Parallel Knowledge Discovery (AAAI, Palo Alto, CA, 2000)Google Scholar
  21. 21.
    H. Kargupta, K. Sivakumar, W. Huang, R. Ayyagari, R. Chen, B. Park, E. Johnson, Towards ubiquitous mining of distributed data, in Data Mining for Scientific and Engineering Applications, ed. by R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. Namburu (Kluwer, Dordrecht, 2001), pp. 281–306CrossRefGoogle Scholar
  22. 22.
    H. Kargupta, B. Park, D. Hershberger, E. Johnson, Collective data mining: a new perspective toward distributed data mining, in Advances in Distributed and Parallel Knowledge Discovery, ed. by H. Kargupta, P. Chan (AAAI, Palo Alto, CA, 2000), pp. 133–184Google Scholar
  23. 23.
    B. Park, H. Kargupta, Distributed data mining: algorithms, systems, and applications, In Data Mining Handbook, ed. by N. Ye (Lawrence Earlbaum Associates, 2002).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Computer Science and Electrical Engineering DepartmentUniversity of MarylandBaltimore CountyUSA
  2. 2.Agnik, LLCColumbiaUSA

Personalised recommendations