Reasoning About Discovery Clouds

  • Ian FosterEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9698)


A discovery cloud is a set of automated, cloud-hosted services to which individuals may outsource their routine and not-so-routine research tasks: finding relevant data, inferring links between data, running computational experiments, inferring new knowledge claims, evaluating the credibility of knowledge claims produced by others, designing experiments, and so on. If developed successfully, a discovery cloud can accelerate and democratize access to data and knowledge tools and the collaborative construction of new knowledge. Such systems are also fascinating to consider from a reasoning perspective because they integrate great complexity at multiple levels: the underlying cloud-based hardware and software, for which issues of reliability and responsiveness may be paramount; the knowledge bases and inference engines that sit on that cloud substrate, for which issues of correctness may be less well defined; and the human communities that form around the discovery clouds, and that arguably form as much as part of the cloud as the hardware, software, and data. I raise questions here about what it might mean to reason about such systems. I do not provide any answers.



I am grateful to the organizers of Petri Nets 2016 for the opportunity to contribute this article to the proceedings. This work is supported in part by the US Department of Energy contract DE-AC02-06CH11357.


  1. 1.
    Whitehead, A.N.: Introduction to Mathematics. Williams and Norgate, London (1911)zbMATHGoogle Scholar
  2. 2.
    Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)CrossRefGoogle Scholar
  3. 3.
    Quoc, V.L.: Building high-level features using large scale unsupervised learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598. IEEE (2013)Google Scholar
  4. 4.
    Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
  5. 5.
    Daniel, D.L., Lipson, H.: Learning symbolic representations of hybrid dynamical systems. J. Mach. Learn. Res. 13(1), 3585–3618 (2012)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Honavar, V.G., Hill, M.D., Yelick, K.: Accelerating science: a computing research agenda. A white paper prepared for the Computing Community Consortium committee of the Computing Research Association (2016).
  7. 7.
    Djorgovski, S.G.: Virtual astronomy, information technology, and the new scientific methodology. In: 7th International Workshop on Computer Architecture for Machine Perception, pp. 125–132. IEEE (2005)Google Scholar
  8. 8.
    Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Joubert, G., Grandinetti, L. (eds.) Big Data and High Performance Computing (in press, 2015)Google Scholar
  9. 9.
    Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., vandenBerg, J.: The SDSS SkyServer - public access to the sloan digital sky server data. In: ACM SIGMOD, pp. 1–11 (2002)Google Scholar
  10. 10.
    Overbeek, R.A., Disz, T., Stevens, R.L.: The SEED: a peer-to-peer environment for genome annotation. Commun. ACM 47(11), 46–51 (2004)CrossRefGoogle Scholar
  11. 11.
    Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(D1), D206–D214 (2014)CrossRefGoogle Scholar
  12. 12.
    Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)CrossRefGoogle Scholar
  13. 13.
    Szalay, A.S.: From simulations to interactive numerical laboratories. In: 2014 Winter Simulation Conference, pp. 875–886. IEEE Press (2014)Google Scholar
  14. 14.
    O’Mullane, W., Li, N., Nieto-Santisteban, M., Szalay, A., Thakar, A., Gray, J.: Batch is back: CasJobs, serving multi-TB data on the Web. In: IEEE International Conference on Web Services, pp. 33–40. IEEE (2005)Google Scholar
  15. 15.
    Chong, F., Carraro, G.: Architecture strategies for catching the long tail. MSDN Library, Microsoft Corporation, pp. 9–10 (2006)Google Scholar
  16. 16.
    Dubey, A., Wagle, D.: Delivering software as a service. The McKinsey Quarterly, May 2007Google Scholar
  17. 17.
    Foster, I., Vasiliadis, V., Tuecke, S.: Software as a service as a path to software sustainability. Technical report (2013). doi: 10.6084/m9.figshare.791604
  18. 18.
    Lawton, G.: Developing software online with platform-as-a-service technology. Computer 41(6), 13–15 (2008)CrossRefGoogle Scholar
  19. 19.
    Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011)CrossRefGoogle Scholar
  20. 20.
    Madhavan, K.P.C., Beaun, D., Shivarajapura, S., Adams, G.B., Klimeck, G.: serving over 120,000 users worldwide: its first cyber-environment assessment. In: 10th IEEE Conference on Nanotechnology (IEEE-NANO), pp. 90–95. IEEE (2010)Google Scholar
  21. 21.
    Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., et al.: The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)Google Scholar
  22. 22.
    Foster, I.: Service-oriented science. Science 308(5723), 814–817 (2005)CrossRefGoogle Scholar
  23. 23.
    Foster, I., Chard, K., Tuecke, S.: The discovery cloud: accelerating and democratizing research on a global scale. In: International Conference on Cloud Engineering (2016)Google Scholar
  24. 24.
    Chard, K., Tuecke, S., Foster, I.: Efficient and secure transfer, synchronization, and sharing of big data. IEEE Cloud Comput. 1(3), 46–55 (2014)CrossRefGoogle Scholar
  25. 25.
    Ananthakrishnan, R., Chard, K., Foster, I., Tuecke, S.: Globus platform-as-a-service for collaborative science applications. Concurrency Comput.: Pract. Exp. 27(2), 290–305 (2015)CrossRefGoogle Scholar
  26. 26.
    Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Rzhetsky, A., Foster, J.G., Foster, I.T., Evans, J.A.: Choosing experiments to accelerate collective discovery. Proc. Natl. Acad. Sci. 112(47), 14569–14574 (2015)CrossRefGoogle Scholar
  28. 28.
    Mead, C.: Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–1636 (1990)CrossRefGoogle Scholar
  29. 29.
    Goecks, J., Nekrutenko, A., Taylor, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)CrossRefGoogle Scholar
  30. 30.
    Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Karan, V., Berriman, G.B., Good, J., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)Google Scholar
  31. 31.
    Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 11, 50–60 (2009)CrossRefGoogle Scholar
  32. 32.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34(suppl 2), W729–W732 (2006)CrossRefGoogle Scholar
  33. 33.
    Van der Aalst, W.M.P.: The application of Petri nets to workflow management. J. Circuits, Syst. Comput. 8(01), 21–66 (1998)CrossRefGoogle Scholar
  34. 34.
    Simonet, A., Fedak, G., Ripeanu, M.: Active data: a programming model to manage data life cycle across heterogeneous systems and infrastructures. Future Gener. Comput. Syst. 53, 25–42 (2015)CrossRefGoogle Scholar
  35. 35.
    Simonet, A., Chard, K., Fedak, G., Foster, I.: Using active data to provide smart data surveillance to e-science users. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 269–273. IEEE (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Argonne National LaboratoryArgonneUSA
  2. 2.The University of ChicagoChicagoUSA

Personalised recommendations