Abstract
To flourish in the new data-intensive environment of twenty-first century science, we need to evolve new skills. These can be expressed in terms of the systemized framework that formed the basis of mediaeval education—the trivium (logic, grammar and rhetoric) and quadrivium (arithmetic, geometry, music and astronomy). However, rather than focusing on number, data are the new keystone. We need to understand what rules they obey, how they are symbolized and communicated, and what their relationship is to physical space and time. In this paper, we will review this understanding in terms of the technologies and processes that data require. We contend that, at least, an appreciation of all these aspects is crucial to enabling us to extract scientific information and knowledge from the data sets that threaten to engulf and overwhelm us.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chang F et al (2006) Bigtable: a distributed storage system for structured data. http://labs.google.com/papers/bigtable.html
Data mining examples. http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaKDDguideScience
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. http://labs.google.com/papers/mapreduce.html
Drake A et al (2009) First results from the catalina real-time transient survey. Astrophys J 69:870
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Magazine 17:37–54
FITS image compression programs. http://heasarc.gsfc.nasa.gov/fitsio/fpack
Fluke CJ, Barnes DG, Barsdell BR, Hassan AH (2011) Astrophysical supercomputing with GPUs: critical decisions for early adopters. Publ Astron Soc Aust 28:15–27
Ghemaway S, Gobioff H, Leung ST (2003) The google file system. http://labs.google.com/papers/gfs.html
Gray N (2010) Data is a singular noun. http://purl.org/nxg/note/singular-data
Hey T, Tansley S, Tolle K (2009) The fourth paradigm. Microsoft Research, Redmond
Hogg DW, Lang D (2008) Astronomical imaging: the theory of everything. Am Inst Phys Conf Series 1082:331
Hogg DW, Lang D (2011) Telescopes don’t make catalogs! EAS Publications Series 45:351
International Virtual Observatory Alliance (IVOA). http://www.ivoa.net
IVOA Knowledge Discovery in Databases. http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaKDD
Kline M (1953) Mathematics in western culture. Oxford University Press, Oxford
Kogge P et al (2008) ExaScale computing study: technology challenges in achieving exascale systems. doi: 10.1.1.165.6676
Riess A et al (1998) Observational Evidence from Supernovae for an accelerating universe and a cosmological constant. AJ 116:1009
Tian HJ, Neyrinck MC, Budavari T, Szalay AS (2011) Redshift-space enhancement of line-of-sight baryon acoustic oscillations in the sloan digital sky survey main-galaxy sample. Astrophys J 728:34
Wiley K, Connolly A, Gardner J, Krughoff S, Balazinska M, Howe B, Kwon Y, Bu Y (2011) Astronomy in the cloud: using mapreduce for image co-addition. Publ Astron Soc Pac 123:366–380
Acknowledgements
We would like to thank Norman Gray and Helen Angove for useful feedback and discussion about this paper. This work was supported in part by the NSF Grants AST-0834235, AST-0909182, and HCC-0917814 and NASA Grant 08-AISR08-0085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this chapter
Cite this chapter
Graham, M.J. (2012). The Art of Data Science. In: Sarro, L., Eyer, L., O'Mullane, W., De Ridder, J. (eds) Astrostatistics and Data Mining. Springer Series in Astrostatistics, vol 2. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3323-1_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3323-1_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3322-4
Online ISBN: 978-1-4614-3323-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)