Skip to main content

Reasoning About Discovery Clouds

  • 582 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9698)


A discovery cloud is a set of automated, cloud-hosted services to which individuals may outsource their routine and not-so-routine research tasks: finding relevant data, inferring links between data, running computational experiments, inferring new knowledge claims, evaluating the credibility of knowledge claims produced by others, designing experiments, and so on. If developed successfully, a discovery cloud can accelerate and democratize access to data and knowledge tools and the collaborative construction of new knowledge. Such systems are also fascinating to consider from a reasoning perspective because they integrate great complexity at multiple levels: the underlying cloud-based hardware and software, for which issues of reliability and responsiveness may be paramount; the knowledge bases and inference engines that sit on that cloud substrate, for which issues of correctness may be less well defined; and the human communities that form around the discovery clouds, and that arguably form as much as part of the cloud as the hardware, software, and data. I raise questions here about what it might mean to reason about such systems. I do not provide any answers.


  • Cloud Discovery
  • Finding Relevant Data
  • Substantial Cloud
  • Discovery Engine
  • Seed Systems

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.


  1. Whitehead, A.N.: Introduction to Mathematics. Williams and Norgate, London (1911)

    MATH  Google Scholar 

  2. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)

    CrossRef  Google Scholar 

  3. Quoc, V.L.: Building high-level features using large scale unsupervised learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598. IEEE (2013)

    Google Scholar 

  4. Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)

    CrossRef  Google Scholar 

  5. Daniel, D.L., Lipson, H.: Learning symbolic representations of hybrid dynamical systems. J. Mach. Learn. Res. 13(1), 3585–3618 (2012)

    MathSciNet  MATH  Google Scholar 

  6. Honavar, V.G., Hill, M.D., Yelick, K.: Accelerating science: a computing research agenda. A white paper prepared for the Computing Community Consortium committee of the Computing Research Association (2016).

  7. Djorgovski, S.G.: Virtual astronomy, information technology, and the new scientific methodology. In: 7th International Workshop on Computer Architecture for Machine Perception, pp. 125–132. IEEE (2005)

    Google Scholar 

  8. Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Joubert, G., Grandinetti, L. (eds.) Big Data and High Performance Computing (in press, 2015)

    Google Scholar 

  9. Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., vandenBerg, J.: The SDSS SkyServer - public access to the sloan digital sky server data. In: ACM SIGMOD, pp. 1–11 (2002)

    Google Scholar 

  10. Overbeek, R.A., Disz, T., Stevens, R.L.: The SEED: a peer-to-peer environment for genome annotation. Commun. ACM 47(11), 46–51 (2004)

    CrossRef  Google Scholar 

  11. Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(D1), D206–D214 (2014)

    CrossRef  Google Scholar 

  12. Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)

    CrossRef  Google Scholar 

  13. Szalay, A.S.: From simulations to interactive numerical laboratories. In: 2014 Winter Simulation Conference, pp. 875–886. IEEE Press (2014)

    Google Scholar 

  14. O’Mullane, W., Li, N., Nieto-Santisteban, M., Szalay, A., Thakar, A., Gray, J.: Batch is back: CasJobs, serving multi-TB data on the Web. In: IEEE International Conference on Web Services, pp. 33–40. IEEE (2005)

    Google Scholar 

  15. Chong, F., Carraro, G.: Architecture strategies for catching the long tail. MSDN Library, Microsoft Corporation, pp. 9–10 (2006)

    Google Scholar 

  16. Dubey, A., Wagle, D.: Delivering software as a service. The McKinsey Quarterly, May 2007

    Google Scholar 

  17. Foster, I., Vasiliadis, V., Tuecke, S.: Software as a service as a path to software sustainability. Technical report (2013). doi:10.6084/m9.figshare.791604

  18. Lawton, G.: Developing software online with platform-as-a-service technology. Computer 41(6), 13–15 (2008)

    CrossRef  Google Scholar 

  19. Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011)

    CrossRef  Google Scholar 

  20. Madhavan, K.P.C., Beaun, D., Shivarajapura, S., Adams, G.B., Klimeck, G.: serving over 120,000 users worldwide: its first cyber-environment assessment. In: 10th IEEE Conference on Nanotechnology (IEEE-NANO), pp. 90–95. IEEE (2010)

    Google Scholar 

  21. Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., et al.: The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)

    Google Scholar 

  22. Foster, I.: Service-oriented science. Science 308(5723), 814–817 (2005)

    CrossRef  Google Scholar 

  23. Foster, I., Chard, K., Tuecke, S.: The discovery cloud: accelerating and democratizing research on a global scale. In: International Conference on Cloud Engineering (2016)

    Google Scholar 

  24. Chard, K., Tuecke, S., Foster, I.: Efficient and secure transfer, synchronization, and sharing of big data. IEEE Cloud Comput. 1(3), 46–55 (2014)

    CrossRef  Google Scholar 

  25. Ananthakrishnan, R., Chard, K., Foster, I., Tuecke, S.: Globus platform-as-a-service for collaborative science applications. Concurrency Comput.: Pract. Exp. 27(2), 290–305 (2015)

    CrossRef  Google Scholar 

  26. Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)

    MathSciNet  CrossRef  Google Scholar 

  27. Rzhetsky, A., Foster, J.G., Foster, I.T., Evans, J.A.: Choosing experiments to accelerate collective discovery. Proc. Natl. Acad. Sci. 112(47), 14569–14574 (2015)

    CrossRef  Google Scholar 

  28. Mead, C.: Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–1636 (1990)

    CrossRef  Google Scholar 

  29. Goecks, J., Nekrutenko, A., Taylor, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)

    CrossRef  Google Scholar 

  30. Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Karan, V., Berriman, G.B., Good, J., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  31. Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 11, 50–60 (2009)

    CrossRef  Google Scholar 

  32. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34(suppl 2), W729–W732 (2006)

    CrossRef  Google Scholar 

  33. Van der Aalst, W.M.P.: The application of Petri nets to workflow management. J. Circuits, Syst. Comput. 8(01), 21–66 (1998)

    CrossRef  Google Scholar 

  34. Simonet, A., Fedak, G., Ripeanu, M.: Active data: a programming model to manage data life cycle across heterogeneous systems and infrastructures. Future Gener. Comput. Syst. 53, 25–42 (2015)

    CrossRef  Google Scholar 

  35. Simonet, A., Chard, K., Fedak, G., Foster, I.: Using active data to provide smart data surveillance to e-science users. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 269–273. IEEE (2015)

    Google Scholar 

Download references


I am grateful to the organizers of Petri Nets 2016 for the opportunity to contribute this article to the proceedings. This work is supported in part by the US Department of Energy contract DE-AC02-06CH11357.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ian Foster .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Foster, I. (2016). Reasoning About Discovery Clouds. In: Kordon, F., Moldt, D. (eds) Application and Theory of Petri Nets and Concurrency. PETRI NETS 2016. Lecture Notes in Computer Science(), vol 9698. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39085-7

  • Online ISBN: 978-3-319-39086-4

  • eBook Packages: Computer ScienceComputer Science (R0)