Skip to main content

The Art of Data Science

  • Chapter
  • First Online:
Astrostatistics and Data Mining

Part of the book series: Springer Series in Astrostatistics ((SSIA,volume 2))

Abstract

To flourish in the new data-intensive environment of twenty-first century science, we need to evolve new skills. These can be expressed in terms of the systemized framework that formed the basis of mediaeval education—the trivium (logic, grammar and rhetoric) and quadrivium (arithmetic, geometry, music and astronomy). However, rather than focusing on number, data are the new keystone. We need to understand what rules they obey, how they are symbolized and communicated, and what their relationship is to physical space and time. In this paper, we will review this understanding in terms of the technologies and processes that data require. We contend that, at least, an appreciation of all these aspects is crucial to enabling us to extract scientific information and knowledge from the data sets that threaten to engulf and overwhelm us.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chang F et al (2006) Bigtable: a distributed storage system for structured data. http://labs.google.com/papers/bigtable.html

  2. Data mining examples. http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaKDDguideScience

  3. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. http://labs.google.com/papers/mapreduce.html

  4. Drake A et al (2009) First results from the catalina real-time transient survey. Astrophys J 69:870

    Article  ADS  Google Scholar 

  5. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Magazine 17:37–54

    Google Scholar 

  6. FITS image compression programs. http://heasarc.gsfc.nasa.gov/fitsio/fpack

  7. Fluke CJ, Barnes DG, Barsdell BR, Hassan AH (2011) Astrophysical supercomputing with GPUs: critical decisions for early adopters. Publ Astron Soc Aust 28:15–27

    Article  ADS  Google Scholar 

  8. Ghemaway S, Gobioff H, Leung ST (2003) The google file system. http://labs.google.com/papers/gfs.html

  9. Gray N (2010) Data is a singular noun. http://purl.org/nxg/note/singular-data

  10. Hey T, Tansley S, Tolle K (2009) The fourth paradigm. Microsoft Research, Redmond

    Google Scholar 

  11. Hogg DW, Lang D (2008) Astronomical imaging: the theory of everything. Am Inst Phys Conf Series 1082:331

    ADS  Google Scholar 

  12. Hogg DW, Lang D (2011) Telescopes don’t make catalogs! EAS Publications Series 45:351

    Article  Google Scholar 

  13. International Virtual Observatory Alliance (IVOA). http://www.ivoa.net

  14. IVOA Knowledge Discovery in Databases. http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaKDD

  15. Kline M (1953) Mathematics in western culture. Oxford University Press, Oxford

    MATH  Google Scholar 

  16. Kogge P et al (2008) ExaScale computing study: technology challenges in achieving exascale systems. doi: 10.1.1.165.6676

    Google Scholar 

  17. Riess A et al (1998) Observational Evidence from Supernovae for an accelerating universe and a cosmological constant. AJ 116:1009

    Article  ADS  Google Scholar 

  18. Tian HJ, Neyrinck MC, Budavari T, Szalay AS (2011) Redshift-space enhancement of line-of-sight baryon acoustic oscillations in the sloan digital sky survey main-galaxy sample. Astrophys J 728:34

    Article  ADS  Google Scholar 

  19. Wiley K, Connolly A, Gardner J, Krughoff S, Balazinska M, Howe B, Kwon Y, Bu Y (2011) Astronomy in the cloud: using mapreduce for image co-addition. Publ Astron Soc Pac 123:366–380

    Article  ADS  Google Scholar 

Download references

Acknowledgements

We would like to thank Norman Gray and Helen Angove for useful feedback and discussion about this paper. This work was supported in part by the NSF Grants AST-0834235, AST-0909182, and HCC-0917814 and NASA Grant 08-AISR08-0085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew J. Graham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this chapter

Cite this chapter

Graham, M.J. (2012). The Art of Data Science. In: Sarro, L., Eyer, L., O'Mullane, W., De Ridder, J. (eds) Astrostatistics and Data Mining. Springer Series in Astrostatistics, vol 2. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3323-1_4

Download citation

Publish with us

Policies and ethics