Skip to main content

Extracting Knowledge from Massive Astronomical Data Sets

  • Chapter
  • First Online:
Astrostatistics and Data Mining

Abstract

The exponential growth of astronomical data collected by both ground-based and spaceborne instruments has fostered the growth of astroinformatics: a new discipline lying at the intersection between astronomy, applied computer science, and information and computation technologies. At the very heart of astroinformatics is a complex set of methodologies usually called data mining (DM) or knowledge discovery in databases (KDD). In the astronomical domain, DM/KDD are still in a very early usage stage, even though new methods and tools are being continuously deployed to cope with the massive data sets (MDSs) that can only grow in the future. In this paper, we briefly outline some general problems encountered when applying DM/KDD methods to astrophysical problems and describe the DAME (Data Mining and Exploration) Web application. While specifically tailored to work on MDSs, DAME can be effectively applied also to smaller data sets. As an illustration, we describe two applications of DAME to two different problems: the identification of candidate GCs in external galaxies and the classification of active Galactic nuclei (AGN). We believe that tools and services of this nature will become increasingly necessary for data-intensive astronomy (and indeed all sciences) in the twenty-first century.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Large Synoptic Survey Telescope (LSST) Science Collaborations and LSST Project 2009, LSST Science Book, Version 2.0, arXiv:0912.0201. http://www.lsst.org/lsst/scibook

  2. International Virtual Observatory Alliance (IVOA) Web site: http://ivoa.org/

  3. Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G, Sessa S, Staiano A, Volpicelli A (2003) Neural networks in astronomy, in the special issue on neural network analysis of complex scientific data: astronomy and geosciences. Neural Networks 16:297

    Article  Google Scholar 

  4. Ball NM, Brunner RJ (2010) Data mining and machine learning in astronomy. Int J Modern Phys D (arXiv/0906.2173)

    Google Scholar 

  5. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, UK

    Google Scholar 

  6. Duda RO (2004) Pattern classification. Wiley, New York

    Google Scholar 

  7. D’Abrusco R, Longo G, Walton N (2009) Quasar candidates in the virtual observatory era. Mon Not Roy Astron Soc. 396:223

    Article  ADS  Google Scholar 

  8. Rajaraman A, Ullmann JD (2010) Mining of massive data sets. http://infolab.stanford.edu/ullman/mmds.html

  9. Meng Joo E, Fan L (2009) Genetic algorithms for MLP neural network parameters optimization. In: Control and Decision Conference, Guilin, China, pp 3653–3658

    Google Scholar 

  10. Chang CC, Lin CJ (2001) Training support vector classifiers: theory and algorithms. Neural Computation 13:2119

    Article  MATH  Google Scholar 

  11. Paliouras G (1993) Scalability of machine learning algorithms. M.Sc. Thesis, University of Manchester

    Google Scholar 

  12. Brescia M, Longo G, Djorgovski GS, Cavuoti S, D’Abrusco R, Donalek C, Di Guido A, Fiore M, Garofalo M, Laurino O, Mahabal A, Manna F, Nocella A, d’Angelo G, Paolillo P (2011) DAME: a web oriented infrastructure for scientific data mining and exploration. (arXiv1010.4843B) (in press)

    Google Scholar 

  13. DAME Web Application: http://dame.dsf.unina.it/beta_info.html

  14. Merola L (2008) The SCOPE project. In: Proceedings of the Final Workshop of GRID projects PON Ricerca 2000–2006, Catania, Italy

    Google Scholar 

  15. Carlson MN, Holtzman JA (2001) Measuring sizes of marginally resolved young globular clusters with the hubble space telescope. Publ Astron Soc Pac 113:1522

    Article  ADS  Google Scholar 

  16. Cavuoti S, Brescia M, Paolillo M, Longo G, Puzia T (2011) The detection of globular clusters in galaxies as a data mining problem. Submitted to Mon Not Roy Astron Soc

    Google Scholar 

  17. Paolillo M, Puzia TH, Goudfrooij P, Zepf SE, Maccarone TJ, Kundu A, Fabbiano G, Angelini L (2011) Probing the GC-LMXB connection in NGC 1399: a wide-field study with the Hubble space telescope and Chandra. Astrophys J 736:90

    Article  ADS  Google Scholar 

  18. Kundu A, Zepf SE, Hempel M, Morton D, Ashman KM, Maccarone TJ, Kissler-Patig M, Puzia TH, Vesperini E (2005) The ages of globular clusters in NGC 4365 revisited with Deep HST observations. ApJL 634:L41

    Article  ADS  Google Scholar 

  19. Bassino LP, Faifer FR, Forte JC, Dirsch B, Richtler T, Geisler D, Schuberth Y (2006) Large-scale study of the NGC 1399 globular cluster system in Fornax. A&A 451:789

    Article  ADS  Google Scholar 

  20. Heckman TM (1980) An optical and radio survey of the nuclei of bright galaxies - activity in normal Galactic nuclei. A&A 87:182

    ADS  Google Scholar 

  21. Kauffman G, Heckman TM, Tremonti C, Brinchmann J, Charlot S, White SDM, Ridgway SE, Brinkmann J, Fukugita M, Hall PB, Ivezí Z, Richards GT, Schneider DP (2003) The host galaxies of active Galactic nuclei. Mon Not Roy Astron Soc 346:1055

    Article  ADS  Google Scholar 

  22. Kewley LJ, Dopita MA, Sutherland RS, Heisler CA, Trevena J (2001) Theoretical model of starburst galaxies. Astrophys J 556:121

    Article  ADS  Google Scholar 

  23. Cavuoti S (2008) Search for AGN in multiband photometric surveys. M.Sc. Thesis, University of Napoli Federico II

    Google Scholar 

  24. SDSS data release 4, http://www.sdss.org/dr4

  25. Baldwin JA, Phillips MM, Terlevich R (1981) Classification parameters for the emission-line spectra of extragalactic objects. Publ Astron Soc Pac 93:5

    Article  ADS  Google Scholar 

  26. Sorrentino G, Radovich M, Rifatto A (2006) The environment of active galaxies in the SDSS-DR4. A&A 451:809

    Article  ADS  Google Scholar 

  27. D’Abrusco R, Staiano A, Longo G, Brescia M, De Filippis E, Paolillo M, Tagliaferri R (2007) Mining the SDSS archive. I. Phot z in the nearby Universe. Astrophys J 663:752

    Google Scholar 

  28. Chih-Wei H, Chih-Chung C, Chih-Jen L. http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  29. KNIME web site: http://www.knime.org/

  30. Djorgovski SG, Brunner R, Mahabal A, Odewahn S, de Carvalho R, Gal R, Stolorz P, Granat R, Curkendall D, Jacob J, Castro S (2001) Exploration of large digital sky surveys. In: Banday AJ et al (eds) Mining the Sky, ESO Astrophysics Symposia, vol 305. Springer, Berlin

    Google Scholar 

  31. Brunner R, Djorgovski SG, Prince T, Szalay A (2001) Massive data sets in astronomy. In: Abello J, Pardalos P, Resende M (eds) Handbook of Massive Data Sets. Kluwer, Boston. p 931

    Google Scholar 

  32. Djorgovski SG, Mahabal A, Brunner R, Williams R, Granat R, Curkendall D, Jacob J, Stolorz P (2001) Exploration of parameter spaces in a virtual observatory. In: Starck J-L, Murtagh F (eds) Astronomical Data Analysis. Proc SPIE 4477:43

    Google Scholar 

  33. Djorgovski SG, Williams R (2005) Virtual observatory: from concept to implementation. In: Kassim N et al (eds) From Clark Lake to the Long Wavelength Array: Bill Erickson’s Radio Science. ASP Conf Ser 345:517

    Google Scholar 

  34. Djorgovski SG (2005) Virtual Astronomy, information technology, and the new scientific methodology. In: Di Gesu V, Tegolo D (eds) IEEE Proc. of CAMP05: Computer Architectures for Machine Perception. p 125

    Google Scholar 

  35. Djorgovski SG, Donalek C, Mahabal A, Williams R, Drake A, Graham M, Glikman E (2006) Some pattern recognition challenges in data-intensive astronomy. In: Tang et al YY (eds) Proceedings of 18th International Conference on Pattern Recognition (ICPR 2006), vol 1. IEEE Press, New York, p 856

    Google Scholar 

  36. Djorgovski SG (2011) Astronomy in the Era of an exponential data abundance. In: Bainbridge W (ed) Leadership in Science and Technology. SAGE Publ., London (in press)

    Google Scholar 

  37. Mahabal A, Wozniak P, Donalek C, Djorgovski SG (2009) Transients and variable stars in the Era of synoptic imaging. In: ref. [1] LSST Science Book, Chap. 8, vol 4. p 261

    Google Scholar 

  38. Djorgovski SG, Donalek C, Mahabal A, Moghaddam B, Turmon M, Graham M, Drake A, Sharma N, Chen Y (2011) Towards an automated classification of transient events in synoptic sky surveys. In: Srivasatva et al A (eds) To appear in Proceedings of CIDU 2011 conference. (in press)

    Google Scholar 

  39. VOSpace protocol definition: http://www.ivoa.net/Documents/VOSpace/

  40. Davidon WC (1991) Variable metric method for minimization. SIAM J Optim 1:1–17

    Article  MathSciNet  MATH  Google Scholar 

  41. Meng Joo Er, Fan Liu (2009) Proceedings of the 21-st annual international conference on Chinese control and decision conference. IEEE Press

    Google Scholar 

  42. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, MI

    Google Scholar 

Download references

Acknowledgements

The DAME Web application was funded in part by the Italian Ministry of Foreign Affairs through bilateral projects between Italy and the USA and by the Italian Ministry of Education, Universities, and Research through the PON 1575 S.Co.P.E. SGD and CD acknowledge partial support through NASA Grant 08-AISR08-0085, NSF Grants AST-0834235 and AST-0909182, and the Fishbein Family Foundation. We thank numerous collaborators for many interesting discussions on these and related issues over the years.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Brescia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this chapter

Cite this chapter

Brescia, M., Cavuoti, S., Djorgovski, G.S., Donalek, C., Longo, G., Paolillo, M. (2012). Extracting Knowledge from Massive Astronomical Data Sets. In: Sarro, L., Eyer, L., O'Mullane, W., De Ridder, J. (eds) Astrostatistics and Data Mining. Springer Series in Astrostatistics, vol 2. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3323-1_3

Download citation

Publish with us

Policies and ethics