Abstract
The exponential growth of astronomical data collected by both ground-based and spaceborne instruments has fostered the growth of astroinformatics: a new discipline lying at the intersection between astronomy, applied computer science, and information and computation technologies. At the very heart of astroinformatics is a complex set of methodologies usually called data mining (DM) or knowledge discovery in databases (KDD). In the astronomical domain, DM/KDD are still in a very early usage stage, even though new methods and tools are being continuously deployed to cope with the massive data sets (MDSs) that can only grow in the future. In this paper, we briefly outline some general problems encountered when applying DM/KDD methods to astrophysical problems and describe the DAME (Data Mining and Exploration) Web application. While specifically tailored to work on MDSs, DAME can be effectively applied also to smaller data sets. As an illustration, we describe two applications of DAME to two different problems: the identification of candidate GCs in external galaxies and the classification of active Galactic nuclei (AGN). We believe that tools and services of this nature will become increasingly necessary for data-intensive astronomy (and indeed all sciences) in the twenty-first century.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Large Synoptic Survey Telescope (LSST) Science Collaborations and LSST Project 2009, LSST Science Book, Version 2.0, arXiv:0912.0201. http://www.lsst.org/lsst/scibook
International Virtual Observatory Alliance (IVOA) Web site: http://ivoa.org/
Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G, Sessa S, Staiano A, Volpicelli A (2003) Neural networks in astronomy, in the special issue on neural network analysis of complex scientific data: astronomy and geosciences. Neural Networks 16:297
Ball NM, Brunner RJ (2010) Data mining and machine learning in astronomy. Int J Modern Phys D (arXiv/0906.2173)
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, UK
Duda RO (2004) Pattern classification. Wiley, New York
D’Abrusco R, Longo G, Walton N (2009) Quasar candidates in the virtual observatory era. Mon Not Roy Astron Soc. 396:223
Rajaraman A, Ullmann JD (2010) Mining of massive data sets. http://infolab.stanford.edu/ullman/mmds.html
Meng Joo E, Fan L (2009) Genetic algorithms for MLP neural network parameters optimization. In: Control and Decision Conference, Guilin, China, pp 3653–3658
Chang CC, Lin CJ (2001) Training support vector classifiers: theory and algorithms. Neural Computation 13:2119
Paliouras G (1993) Scalability of machine learning algorithms. M.Sc. Thesis, University of Manchester
Brescia M, Longo G, Djorgovski GS, Cavuoti S, D’Abrusco R, Donalek C, Di Guido A, Fiore M, Garofalo M, Laurino O, Mahabal A, Manna F, Nocella A, d’Angelo G, Paolillo P (2011) DAME: a web oriented infrastructure for scientific data mining and exploration. (arXiv1010.4843B) (in press)
DAME Web Application: http://dame.dsf.unina.it/beta_info.html
Merola L (2008) The SCOPE project. In: Proceedings of the Final Workshop of GRID projects PON Ricerca 2000–2006, Catania, Italy
Carlson MN, Holtzman JA (2001) Measuring sizes of marginally resolved young globular clusters with the hubble space telescope. Publ Astron Soc Pac 113:1522
Cavuoti S, Brescia M, Paolillo M, Longo G, Puzia T (2011) The detection of globular clusters in galaxies as a data mining problem. Submitted to Mon Not Roy Astron Soc
Paolillo M, Puzia TH, Goudfrooij P, Zepf SE, Maccarone TJ, Kundu A, Fabbiano G, Angelini L (2011) Probing the GC-LMXB connection in NGC 1399: a wide-field study with the Hubble space telescope and Chandra. Astrophys J 736:90
Kundu A, Zepf SE, Hempel M, Morton D, Ashman KM, Maccarone TJ, Kissler-Patig M, Puzia TH, Vesperini E (2005) The ages of globular clusters in NGC 4365 revisited with Deep HST observations. ApJL 634:L41
Bassino LP, Faifer FR, Forte JC, Dirsch B, Richtler T, Geisler D, Schuberth Y (2006) Large-scale study of the NGC 1399 globular cluster system in Fornax. A&A 451:789
Heckman TM (1980) An optical and radio survey of the nuclei of bright galaxies - activity in normal Galactic nuclei. A&A 87:182
Kauffman G, Heckman TM, Tremonti C, Brinchmann J, Charlot S, White SDM, Ridgway SE, Brinkmann J, Fukugita M, Hall PB, Ivezà Z, Richards GT, Schneider DP (2003) The host galaxies of active Galactic nuclei. Mon Not Roy Astron Soc 346:1055
Kewley LJ, Dopita MA, Sutherland RS, Heisler CA, Trevena J (2001) Theoretical model of starburst galaxies. Astrophys J 556:121
Cavuoti S (2008) Search for AGN in multiband photometric surveys. M.Sc. Thesis, University of Napoli Federico II
SDSS data release 4, http://www.sdss.org/dr4
Baldwin JA, Phillips MM, Terlevich R (1981) Classification parameters for the emission-line spectra of extragalactic objects. Publ Astron Soc Pac 93:5
Sorrentino G, Radovich M, Rifatto A (2006) The environment of active galaxies in the SDSS-DR4. A&A 451:809
D’Abrusco R, Staiano A, Longo G, Brescia M, De Filippis E, Paolillo M, Tagliaferri R (2007) Mining the SDSS archive. I. Phot z in the nearby Universe. Astrophys J 663:752
Chih-Wei H, Chih-Chung C, Chih-Jen L. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
KNIME web site: http://www.knime.org/
Djorgovski SG, Brunner R, Mahabal A, Odewahn S, de Carvalho R, Gal R, Stolorz P, Granat R, Curkendall D, Jacob J, Castro S (2001) Exploration of large digital sky surveys. In: Banday AJ et al (eds) Mining the Sky, ESO Astrophysics Symposia, vol 305. Springer, Berlin
Brunner R, Djorgovski SG, Prince T, Szalay A (2001) Massive data sets in astronomy. In:Â Abello J, Pardalos P, Resende M (eds) Handbook of Massive Data Sets. Kluwer, Boston. p 931
Djorgovski SG, Mahabal A, Brunner R, Williams R, Granat R, Curkendall D, Jacob J, Stolorz P (2001) Exploration of parameter spaces in a virtual observatory. In: Starck J-L, Murtagh F (eds) Astronomical Data Analysis. Proc SPIE 4477:43
Djorgovski SG, Williams R (2005) Virtual observatory: from concept to implementation. In: Kassim N et al (eds) From Clark Lake to the Long Wavelength Array: Bill Erickson’s Radio Science. ASP Conf Ser 345:517
Djorgovski SG (2005) Virtual Astronomy, information technology, and the new scientific methodology. In: Di Gesu V, Tegolo D (eds) IEEE Proc. of CAMP05: Computer Architectures for Machine Perception. p 125
Djorgovski SG, Donalek C, Mahabal A, Williams R, Drake A, Graham M, Glikman E (2006) Some pattern recognition challenges in data-intensive astronomy. In: Tang et al YY (eds) Proceedings of 18th International Conference on Pattern Recognition (ICPR 2006), vol 1. IEEE Press, New York, p 856
Djorgovski SG (2011) Astronomy in the Era of an exponential data abundance. In: Bainbridge W (ed) Leadership in Science and Technology. SAGE Publ., London (in press)
Mahabal A, Wozniak P, Donalek C, Djorgovski SG (2009) Transients and variable stars in the Era of synoptic imaging. In: ref. [1] LSST Science Book, Chap. 8, vol 4. p 261
Djorgovski SG, Donalek C, Mahabal A, Moghaddam B, Turmon M, Graham M, Drake A, Sharma N, Chen Y (2011) Towards an automated classification of transient events in synoptic sky surveys. In: Srivasatva et al A (eds) To appear in Proceedings of CIDU 2011 conference. (in press)
VOSpace protocol definition: http://www.ivoa.net/Documents/VOSpace/
Davidon WC (1991) Variable metric method for minimization. SIAM J Optim 1:1–17
Meng Joo Er, Fan Liu (2009) Proceedings of the 21-st annual international conference on Chinese control and decision conference. IEEE Press
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, MI
Acknowledgements
The DAME Web application was funded in part by the Italian Ministry of Foreign Affairs through bilateral projects between Italy and the USA and by the Italian Ministry of Education, Universities, and Research through the PON 1575 S.Co.P.E. SGD and CD acknowledge partial support through NASA Grant 08-AISR08-0085, NSF Grants AST-0834235 and AST-0909182, and the Fishbein Family Foundation. We thank numerous collaborators for many interesting discussions on these and related issues over the years.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this chapter
Cite this chapter
Brescia, M., Cavuoti, S., Djorgovski, G.S., Donalek, C., Longo, G., Paolillo, M. (2012). Extracting Knowledge from Massive Astronomical Data Sets. In: Sarro, L., Eyer, L., O'Mullane, W., De Ridder, J. (eds) Astrostatistics and Data Mining. Springer Series in Astrostatistics, vol 2. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3323-1_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3323-1_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3322-4
Online ISBN: 978-1-4614-3323-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)