Advertisement

Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data

  • Bruce Wooley
  • Susan Bridges
  • Julia Hodges
  • Anthony Skjellum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1821)

Abstract

Knowledge discovery from large acoustic images is a computationally intensive task. The data-mining step in the knowledge discovery process that involves unsupervised learning (clustering) consumes the bulk of the computation. We have developed a technique that allows us to partition the data, distribute it to different processors for training, and train a single system to join the results of the independent categorizers. We report preliminary results using this approach for knowledge discovery with large acoustic images having more than 10,000 training instances.

Keywords

Feature Vector Knowledge Discovery Base Classifier Message Passing Interface Unsupervised Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Avalon Computer Systems, Inc. 1998. Avalon Series A12 Parallel Supercomputers. http://www.teraflop.com/html/a12.html, accessed May 15, 1998.
  2. 2.
    Bradley, P. S., Usama Fayyad, and Cory Reina. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Edited by Rakesh Agrawal and Paul Stolorz. Menlo Park, CA: AAAI Press. 9–15.Google Scholar
  3. 3.
    Bridges, Susan, Julia Hodges, Bruce Wooley, Donald Karpovich, George Brannon Smith. 1998. Knowledge discovery in an oceanographic database. Submitted for publication.Google Scholar
  4. 4.
    Chan, Philip K., and Salvatore J. Stolfo. 1995. Learning arbiter and combiner trees from partitioned data for scaling machine learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. Edited by Usama Fayyad and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 39–44.Google Scholar
  5. 5.
    Chan, Philip K., and Salvatore J. Stolfo. 1996. Scalable exploratory data mining of distributed geoscientific data. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Edited by Evangelos Simoudis, Jiawei Han and Usama Fayyad. Menlo Park, CA: AAAI Press. 2–7.Google Scholar
  6. 6.
    Cheeseman, Peter, and John Stutz. 1996. Bayesian classification (AutoClass): Theory and results. Advances in Knowledge Discovery and Data Mining. Edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 158–180.Google Scholar
  7. 7.
    Cheeseman, P. J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. 1988. AutoClass: A Bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning. Reprinted in Readings in Machine Learning, edited by Jude W. Shavlik and Thomas G. Dietterich, San Mateo, CA: Morgan Kaufmanns Publishers, Inc. 296–306.Google Scholar
  8. 8.
    Fayyad, Usama M., Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery: An overview. Advances in knowledge discovery and data mining. Edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 1–36.Google Scholar
  9. 9.
    Hodges, Julia, Susan Bridges, Bruce Wooley, Donald Karpovich, and Brannon Smith. 1997. Knowledge Discovery in an Object-Oriented Oceanographic Database System. October 21, 1997. Mississippi State University Technical Report #971021.Google Scholar
  10. 10.
    Karpovich, Donald. 1998. Choosing the optimal features and texel sizes in image categorization. In Proceedings of the 36th ACM Southeast Conference held in Marietta, GA, April 1–3, 1998. 104–107Google Scholar
  11. 11.
    Livny, Miron, Raghu Ramakrishnan, and Tian Zhang. 1998. Fast density and probability estimation using CF-Kernel method for very large databases. http://www.cs.wisc.edu/~zhang/birch.html, accessed Oct 1998.
  12. 12.
    NASA Ames Research Center, Computational Sciences Division. 1998. AutoClass C General Information. http://ic-www.arc.nasa.gov/ic/projects/bayesgroup/autoclass/autoclass-c-program.html, accessed May 15, 1998.
  13. 13.
    Reed, Thomas Beckett IV, and Donald Hussong. 1989. Digital image processing techniques for enhancement and classification of SeaMARC II side scan sonar imagery. Journal of Geophysical Research. 94(B6). 7469–7490.CrossRefGoogle Scholar
  14. 14.
    Snir, Marc, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra. 1996. MPI: The Complete Reference. Cambridge, Massachusetts: The MIT Press.Google Scholar
  15. 15.
    Wooley, Bruce and George Brannon Smith. 1998. Region-growing techniques based on texture for provincing the ocean floor. In Proceedings of the 36th ACM Southeast Conference held in Marietta, GA, April 1–3, 1998. 99–103.Google Scholar
  16. 16.
    Wooley, Bruce, Yoginder Dandass, Susan Bridges, Julia Hodges, And Anthony Skjellum. 1998. Scalable knowledge discovery from oceanographic data. In Intelligent engineering systems through artificial neural networks. Volume 8 (ANNIE 98). Edited by Cihan H Dagli, Metin Akay, Anna L Buczak, Okan Ersoy, and Benito R. Fernandez. New York, NY: ASME Press. 413–24.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Bruce Wooley
    • 1
  • Susan Bridges
    • 1
  • Julia Hodges
    • 1
  • Anthony Skjellum
    • 1
  1. 1.Department of Computer ScienceMississippi State UniversityUSA

Personalised recommendations