Abstract
Data mining techniques are increasingly gaining popularity in various scientific domains as viable approaches to the analysis of massive data sets. In this chapter, we describe our experiences in applying data mining to a problem in astronomy, namely, the identification of radio-emitting galaxies with a bent-double morphology. Until recently, astronomers associated with the FIRST (Faint images of the radio Sky at Twenty-cm) survey identified these galaxies through a visual inspection of images. White this manual approach has been very subjective and tedious, it is also becoming increasingly infeasible as the survey has grown in size. Upon completion, FIRST will include almost a million galaxies, making the use of semi-automated analysis methods necessary. We describe the FIRST data set and the problem of identifying bent-double galaxies. We discuss our solution approach, focusing on the challenges we face in the application of data mining to a scientific data set. We explain why, in contrast with most commercial data mining applications, data preprocessing requires a considerable effort in scientific applications. Using decision tree classifiers, we describe the work we are doing in the detection of bent-double galaxies. Our results indicate that data mining techniques, steered by proper domain knowledge, can greatly enhance the manual exploration of massive data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. CRC Press, 1984.
R. H. Becker, R.L. White, and D.J. Helfand. The FIRST survey: Faint images of the radio sky at twenty-cm. Astrophysical Journal, 450:559, 1995.
I. K. Fodor, E. CantĂș-Paz, C. Kamath, and N. Tang. Finding bent-double radio galaxies: A case study in data mining. In Interface: Computer Science and Statistics, volume 33, April 2000.
FIRST: Faint images of the radio sky at twenty centimeters. http://sundog.stsci.edu/.
C. Kamath, C. Baldwin, I. Fodor, and N. Tang. On the design and implementation of a parallel, object-oriented, image processing toolkit. In Proceedings International Symposium on Optical Science and Technology, SPIE Annual Meeting, San Diego, July 2000.
C. Kamath and E. CantĂș-Paz. On the design of a parallel object-oriented data mining toolkit. In Workshop on Distributed and Parallel Knowledge Discovery at the Knowledge Discovery and Data Mining Conference Boston, August 2000.
C. Kamath and R. Musick. Scalable data mining through finegrained parallelism: The present and the future. In H. Kargupta and P. Chan, editors, Advances in Distributed and Parallel Knowledge Discovery, pages 29â77. AAAI Press/The MIT Press, 2000.
J. Lehar, A. Buchalter, R. McMahon, C. Kochanek, D. Helfand, R. Becker, and T. Muxlow. The FIRST efficient gravitational lens survey. 1999. submitted to âGravitational Lensing: Recent progress and Future Goals, eds: T. Brainerd and C. Kochanek, ASP Conf Series See also http://xxx.lanl.gov/abs/astro-ph/9908353/abs/astro-ph/9908353.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
Sapphire: Large-scale data mining and pattern recognition. http://www.llnl.gov/casc/sapphire/casc/sapphire.
R. L. White, R.H. Becker, D.J. Helfand, and M.D. Gregg. A catalog of 1.4 GHz radio sources from the FIRST survey. Astrophysical Journal, 475:479, 1997.
R. L. White, 1999. Private Communication.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Kamath, C., CantĂș-Paz, E., Fodor, I.K., Tang, N.A. (2001). Searching for Bent-Double Galaxies in the First Survey. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1733-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-0114-7
Online ISBN: 978-1-4615-1733-7
eBook Packages: Springer Book Archive