Earth Science Informatics

, Volume 3, Issue 1–2, pp 59–65 | Cite as

Tracking provenance of earth science data

Research Article


Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of data and a record of the process that led to its current state. It encompasses the documentation of a variety of artifacts related to particular data. Provenance is important for understanding and using scientific datasets, and critical for independent confirmation of scientific results. Managing provenance throughout scientific data processing has gained interest lately and there are a variety of approaches. Large scale scientific datasets consisting of thousands to millions of individual data files and processes offer particular challenges. This paper uses the analogy of art history provenance to explore some of the concerns of applying provenance tracking to earth science data. It also illustrates some of the provenance issues with examples drawn from the Ozone Monitoring Instrument (OMI) Data Processing System (OMIDAPS) (Tilmes et al. 2004) run at NASA’s Goddard Space Flight Center by the first author.


Data processing Provenance 


  1. Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28. doi:10.1145/1057977.1057978 CrossRefGoogle Scholar
  2. da Silva PP, McGuinness DL, Fikes R (2006) A proof markup language for Semantic Web services. Inf Syst 31(4–5):381–395. doi:10.1016/, http://www.sciencedirect.c7cb2466e94e825 , the Semantic Web and Web ServicesGoogle Scholar
  3. Freire J, Missier P, Moreau L, Schreiber A, Mattoso M, Silva CT (2008) Provenance and annotation of data and processes, vol 5272/2008. Springer, Berlin. doi:10.1007/978-3-540-89965-5 CrossRefGoogle Scholar
  4. Heinis T, Alonso G (2008) Efficient lineage tracking for scientific workflows. In: SIGMOD ’08: proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York, pp 1007–1018. doi:10.1145/1376616.1376716 CrossRefGoogle Scholar
  5. Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan S, Chin GJ, Clifford B, Cohen S, Cohen-Boulakia S, Davidson S, Deelman E, Digiampietri L, Foster J, Freire I, Frew J, Futrelle J, Gibson T, Gil Y, Goble C, Golbeck J, Groth P, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips T, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger C, Schuchardt K, Seltzer M, Simmhan YL, Silva C, Slaughter P, Stephan E, Stevens R, Turi D, Vo H, Wilde M, Zhao J, Zhao Y (2007) Special issue: the first provenance challenge. Concurr Comput: Practice and Experience 20(5):409–418. doi:10.1002/cpe.1233 CrossRefGoogle Scholar
  6. Moreau L, Freire J, Futrelle J, Mcgrath R, Myers J, Paulson P (2008a) The open provenance model: an overview. Provenance and annotation of data and processes, pp 323–326. doi:10.1007/978-3-540-89965-5_31
  7. Moreau L, Groth P, Miles S, Vazquez-Salceda J, Ibbotson J, Jiang S, Munroe S, Rana O, Schreiber A, Tan V, Varga L (2008b) The provenance of electronic data. Commun ACM 51(4):52–58. doi:10.1145/1330311.1330323 CrossRefGoogle Scholar
  8. Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The eucalyptus open-source cloud-computing system. In: CCGRID ’09: proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid. IEEE Computer Society, Washington, DC, pp 124–131. doi:10.1109/CCGRID.2009.93 Google Scholar
  9. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-science. SIGMOD Rec 34(3):31–36. doi:10.1145/1084805.1084812 CrossRefGoogle Scholar
  10. Suarez-Sola I, Davey A, Hourcle JA (2008) What are we tracking ... and why? AGU Fall Meeting Abstracts, pp C1047+Google Scholar
  11. Tilmes C, Linda M, Fleig A (2004) Development of two Science Investigator-led Processing Systems (SIPS) for NASA’s Earth Observation System (EOS). In: Geoscience and remote sensing symposium, 2004. In: IGARSS ’04. Proceedings. 2004 IEEE International, vol 3, pp 2190–2195. doi:10.1109/IGARSS.2004.1370795

Copyright information

© US Government 2010

Authors and Affiliations

  1. 1.NASA Goddard Space Flight CenterGreenbeltUSA
  2. 2.University of MarylandBaltimoreUSA

Personalised recommendations