Skip to main content

Making Data Analysis Ubiquitous: My Journey Through Academia and Industry

  • Chapter
  • First Online:

Abstract

This chapter presents an overview of my work in the field of distributed data mining over the last 15 years and shares some of my experiences from this journey. The chapter first describes the context and my early work in this area. Next, it presents an overview of the milestones of my algorithmic work and the commercialization of the technology. It spends some time explaining the applications that made a difference in the real life and explains some of the challenges I faced in doing so. Finally, it shares my thoughts about some of the lessons I learned and some suggestions for the next generation of data mining researchers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.kd2u.org.

  2. 2.

    http://www.agnik.com.

References

  1. H. Kargupta, I. Hamzaoglu, B. Stafford, Scalable, distributed data mining using an agent-based architecture, in Proceedings of Knowledge Discovery and Data Mining, ed. by D. Heckerman, H. Mannila, D. Pregibon, R. Uthurusamy (AAAI, Palo Alto, CA, 1997), pp. 211–214

    Google Scholar 

  2. H. Kargupta, K. Sivakumar, Existential pleasures of distributed data mining, in Data Mining: Next Generation Challenges and Future Directions, ed. by H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (AAAI, Palo Alto, CA, 2004)

    Google Scholar 

  3. M. May, L. Saitta, Ubiquitous knowledge discovery: challenges, techniques, applications. Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence Series 6202 (2010)

    Google Scholar 

  4. S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, S. Datta, Clustering distributed data streams in P2P environments. Inf. Sci. 176(14), 1952–1985 (2006)

    Article  Google Scholar 

  5. S. Datta, C. Giannella, H. Kargupta, Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans. Knowl. Data Eng. 21, 1372–1388 (2009)

    Article  Google Scholar 

  6. W. Kowalczyk, M. Jelasity, A.E. Eiben, Towards data mining in large and fully distributed peer-to-peer overlay networks, in Proceedings of BNAIC, 2003, pp. 203–210

    Google Scholar 

  7. K. Das, K. Bhaduri, K. Liu, H. Kargupta, Distributed identification of top-l inner product elements and its application in a P2P network. TKDE 20(4), 475–488 (2008)

    Google Scholar 

  8. S. Mukherjee, H. Kargupta, Distributed probabilistic inferencing in sensor networks using variational approximation. JPDC 68(1), 78–92 (2008)

    MATH  Google Scholar 

  9. K. Bhaduri, H. Kargupta, A scalable local algorithm for distributed multivariate regression. Stat. Anal. Data Min. 1(3), 177–194 (2008)

    Article  MathSciNet  Google Scholar 

  10. J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  11. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees (Wadsworth, Belmont, CA, 1984)

    MATH  Google Scholar 

  12. K. Bhaduri, R. Wolff, C. Giannella, H. Kargupta, Distributed decision tree induction in P2P systems. Stat. Anal. Data Min. 1(2), 85–103 (2008)

    Article  MathSciNet  Google Scholar 

  13. H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques, in Proceedings of the IEEE International Conference on Data Mining, Melbourne, FL, 2003, pp. 99–106

    Google Scholar 

  14. K. Liu, C. Giannella, H. Kargupta, A survey of attack techniques on privacy-preserving data perturbation methods, in Privacy-Preserving Data Mining: Models and Algorithms, ed. by C. Aggarwal, P.S. Yu (Springer, Berlin, 2008), pp. 357–380. Chapter 15

    Google Scholar 

  15. K. Liu, H. Kargupta, J. Ryan, Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)

    Article  Google Scholar 

  16. H. Kargupta, K. Das, K. Liu, A game theoretic approach toward multi-party privacy-preserving distributed data mining, in 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Polland, September 2007

    Google Scholar 

  17. T. Mahule, K. Borne, S. Dey, S. Arora, H. Kargupta, PADMINI: a peer-to-peer distributed astronomy data mining system and a case study, in Proceedings of the Conference on Intelligent Data Understanding, 2010

    Google Scholar 

  18. H. Dutta, X. Zhu, T. Mahule, H. Kargupta, K. Borne, C. Lauth, F. Holz, G. Heyer, TagLearner: a P2P classifier learning system from collaboratively tagged text documents, in Proceedings of the International Conference on Data Mining (ICDM), Workshop on Mining Multiple Information Sources, December, 2009

    Google Scholar 

  19. H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha (eds.), Data Mining: Next Generation Challenges and Future Directions (AAAI, Palo Alto, CA, 2004)

    Google Scholar 

  20. H. Kargupta, P. Chan, Advances in Distributed and Parallel Knowledge Discovery (AAAI, Palo Alto, CA, 2000)

    Google Scholar 

  21. H. Kargupta, K. Sivakumar, W. Huang, R. Ayyagari, R. Chen, B. Park, E. Johnson, Towards ubiquitous mining of distributed data, in Data Mining for Scientific and Engineering Applications, ed. by R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. Namburu (Kluwer, Dordrecht, 2001), pp. 281–306

    Chapter  Google Scholar 

  22. H. Kargupta, B. Park, D. Hershberger, E. Johnson, Collective data mining: a new perspective toward distributed data mining, in Advances in Distributed and Parallel Knowledge Discovery, ed. by H. Kargupta, P. Chan (AAAI, Palo Alto, CA, 2000), pp. 133–184

    Google Scholar 

  23. B. Park, H. Kargupta, Distributed data mining: algorithms, systems, and applications, In Data Mining Handbook, ed. by N. Ye (Lawrence Earlbaum Associates, 2002).

    Google Scholar 

Download references

Acknowledgments

The author would like to thank all the organizations that funded his research over the last 15 years including the National Science Foundation, NASA, Department of Defense, Department of Energy, Caterpillar, IBM, and many other organizations. The current work is funded by the NASA grant and AF MURI grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hillol Kargupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kargupta, H. (2012). Making Data Analysis Ubiquitous: My Journey Through Academia and Industry. In: Gaber, M. (eds) Journeys to Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28047-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28047-4_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28046-7

  • Online ISBN: 978-3-642-28047-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics