Abstract
This chapter presents an overview of my work in the field of distributed data mining over the last 15 years and shares some of my experiences from this journey. The chapter first describes the context and my early work in this area. Next, it presents an overview of the milestones of my algorithmic work and the commercialization of the technology. It spends some time explaining the applications that made a difference in the real life and explains some of the challenges I faced in doing so. Finally, it shares my thoughts about some of the lessons I learned and some suggestions for the next generation of data mining researchers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
H. Kargupta, I. Hamzaoglu, B. Stafford, Scalable, distributed data mining using an agent-based architecture, in Proceedings of Knowledge Discovery and Data Mining, ed. by D. Heckerman, H. Mannila, D. Pregibon, R. Uthurusamy (AAAI, Palo Alto, CA, 1997), pp. 211–214
H. Kargupta, K. Sivakumar, Existential pleasures of distributed data mining, in Data Mining: Next Generation Challenges and Future Directions, ed. by H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (AAAI, Palo Alto, CA, 2004)
M. May, L. Saitta, Ubiquitous knowledge discovery: challenges, techniques, applications. Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence Series 6202 (2010)
S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, S. Datta, Clustering distributed data streams in P2P environments. Inf. Sci. 176(14), 1952–1985 (2006)
S. Datta, C. Giannella, H. Kargupta, Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans. Knowl. Data Eng. 21, 1372–1388 (2009)
W. Kowalczyk, M. Jelasity, A.E. Eiben, Towards data mining in large and fully distributed peer-to-peer overlay networks, in Proceedings of BNAIC, 2003, pp. 203–210
K. Das, K. Bhaduri, K. Liu, H. Kargupta, Distributed identification of top-l inner product elements and its application in a P2P network. TKDE 20(4), 475–488 (2008)
S. Mukherjee, H. Kargupta, Distributed probabilistic inferencing in sensor networks using variational approximation. JPDC 68(1), 78–92 (2008)
K. Bhaduri, H. Kargupta, A scalable local algorithm for distributed multivariate regression. Stat. Anal. Data Min. 1(3), 177–194 (2008)
J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees (Wadsworth, Belmont, CA, 1984)
K. Bhaduri, R. Wolff, C. Giannella, H. Kargupta, Distributed decision tree induction in P2P systems. Stat. Anal. Data Min. 1(2), 85–103 (2008)
H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques, in Proceedings of the IEEE International Conference on Data Mining, Melbourne, FL, 2003, pp. 99–106
K. Liu, C. Giannella, H. Kargupta, A survey of attack techniques on privacy-preserving data perturbation methods, in Privacy-Preserving Data Mining: Models and Algorithms, ed. by C. Aggarwal, P.S. Yu (Springer, Berlin, 2008), pp. 357–380. Chapter 15
K. Liu, H. Kargupta, J. Ryan, Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
H. Kargupta, K. Das, K. Liu, A game theoretic approach toward multi-party privacy-preserving distributed data mining, in 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Polland, September 2007
T. Mahule, K. Borne, S. Dey, S. Arora, H. Kargupta, PADMINI: a peer-to-peer distributed astronomy data mining system and a case study, in Proceedings of the Conference on Intelligent Data Understanding, 2010
H. Dutta, X. Zhu, T. Mahule, H. Kargupta, K. Borne, C. Lauth, F. Holz, G. Heyer, TagLearner: a P2P classifier learning system from collaboratively tagged text documents, in Proceedings of the International Conference on Data Mining (ICDM), Workshop on Mining Multiple Information Sources, December, 2009
H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (eds.), Data Mining: Next Generation Challenges and Future Directions (AAAI, Palo Alto, CA, 2004)
H. Kargupta, P. Chan, Advances in Distributed and Parallel Knowledge Discovery (AAAI, Palo Alto, CA, 2000)
H. Kargupta, K. Sivakumar, W. Huang, R. Ayyagari, R. Chen, B. Park, E. Johnson, Towards ubiquitous mining of distributed data, in Data Mining for Scientific and Engineering Applications, ed. by R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. Namburu (Kluwer, Dordrecht, 2001), pp. 281–306
H. Kargupta, B. Park, D. Hershberger, E. Johnson, Collective data mining: a new perspective toward distributed data mining, in Advances in Distributed and Parallel Knowledge Discovery, ed. by H. Kargupta, P. Chan (AAAI, Palo Alto, CA, 2000), pp. 133–184
B. Park, H. Kargupta, Distributed data mining: algorithms, systems, and applications, In Data Mining Handbook, ed. by N. Ye (Lawrence Earlbaum Associates, 2002).
Acknowledgments
The author would like to thank all the organizations that funded his research over the last 15 years including the National Science Foundation, NASA, Department of Defense, Department of Energy, Caterpillar, IBM, and many other organizations. The current work is funded by the NASA grant and AF MURI grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kargupta, H. (2012). Making Data Analysis Ubiquitous: My Journey Through Academia and Industry. In: Gaber, M. (eds) Journeys to Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28047-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-28047-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28046-7
Online ISBN: 978-3-642-28047-4
eBook Packages: Computer ScienceComputer Science (R0)