Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data
Knowledge discovery from large acoustic images is a computationally intensive task. The data-mining step in the knowledge discovery process that involves unsupervised learning (clustering) consumes the bulk of the computation. We have developed a technique that allows us to partition the data, distribute it to different processors for training, and train a single system to join the results of the independent categorizers. We report preliminary results using this approach for knowledge discovery with large acoustic images having more than 10,000 training instances.
KeywordsFeature Vector Knowledge Discovery Base Classifier Message Passing Interface Unsupervised Learning
Unable to display preview. Download preview PDF.
- 1.Avalon Computer Systems, Inc. 1998. Avalon Series A12 Parallel Supercomputers. http://www.teraflop.com/html/a12.html, accessed May 15, 1998.
- 2.Bradley, P. S., Usama Fayyad, and Cory Reina. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Edited by Rakesh Agrawal and Paul Stolorz. Menlo Park, CA: AAAI Press. 9–15.Google Scholar
- 3.Bridges, Susan, Julia Hodges, Bruce Wooley, Donald Karpovich, George Brannon Smith. 1998. Knowledge discovery in an oceanographic database. Submitted for publication.Google Scholar
- 4.Chan, Philip K., and Salvatore J. Stolfo. 1995. Learning arbiter and combiner trees from partitioned data for scaling machine learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. Edited by Usama Fayyad and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 39–44.Google Scholar
- 5.Chan, Philip K., and Salvatore J. Stolfo. 1996. Scalable exploratory data mining of distributed geoscientific data. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Edited by Evangelos Simoudis, Jiawei Han and Usama Fayyad. Menlo Park, CA: AAAI Press. 2–7.Google Scholar
- 6.Cheeseman, Peter, and John Stutz. 1996. Bayesian classification (AutoClass): Theory and results. Advances in Knowledge Discovery and Data Mining. Edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 158–180.Google Scholar
- 7.Cheeseman, P. J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. 1988. AutoClass: A Bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning. Reprinted in Readings in Machine Learning, edited by Jude W. Shavlik and Thomas G. Dietterich, San Mateo, CA: Morgan Kaufmanns Publishers, Inc. 296–306.Google Scholar
- 8.Fayyad, Usama M., Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery: An overview. Advances in knowledge discovery and data mining. Edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 1–36.Google Scholar
- 9.Hodges, Julia, Susan Bridges, Bruce Wooley, Donald Karpovich, and Brannon Smith. 1997. Knowledge Discovery in an Object-Oriented Oceanographic Database System. October 21, 1997. Mississippi State University Technical Report #971021.Google Scholar
- 10.Karpovich, Donald. 1998. Choosing the optimal features and texel sizes in image categorization. In Proceedings of the 36th ACM Southeast Conference held in Marietta, GA, April 1–3, 1998. 104–107Google Scholar
- 11.Livny, Miron, Raghu Ramakrishnan, and Tian Zhang. 1998. Fast density and probability estimation using CF-Kernel method for very large databases. http://www.cs.wisc.edu/~zhang/birch.html, accessed Oct 1998.
- 12.NASA Ames Research Center, Computational Sciences Division. 1998. AutoClass C General Information. http://ic-www.arc.nasa.gov/ic/projects/bayesgroup/autoclass/autoclass-c-program.html, accessed May 15, 1998.
- 14.Snir, Marc, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra. 1996. MPI: The Complete Reference. Cambridge, Massachusetts: The MIT Press.Google Scholar
- 15.Wooley, Bruce and George Brannon Smith. 1998. Region-growing techniques based on texture for provincing the ocean floor. In Proceedings of the 36th ACM Southeast Conference held in Marietta, GA, April 1–3, 1998. 99–103.Google Scholar
- 16.Wooley, Bruce, Yoginder Dandass, Susan Bridges, Julia Hodges, And Anthony Skjellum. 1998. Scalable knowledge discovery from oceanographic data. In Intelligent engineering systems through artificial neural networks. Volume 8 (ANNIE 98). Edited by Cihan H Dagli, Metin Akay, Anna L Buczak, Okan Ersoy, and Benito R. Fernandez. New York, NY: ASME Press. 413–24.Google Scholar