Reasoning About Discovery Clouds
A discovery cloud is a set of automated, cloud-hosted services to which individuals may outsource their routine and not-so-routine research tasks: finding relevant data, inferring links between data, running computational experiments, inferring new knowledge claims, evaluating the credibility of knowledge claims produced by others, designing experiments, and so on. If developed successfully, a discovery cloud can accelerate and democratize access to data and knowledge tools and the collaborative construction of new knowledge. Such systems are also fascinating to consider from a reasoning perspective because they integrate great complexity at multiple levels: the underlying cloud-based hardware and software, for which issues of reliability and responsiveness may be paramount; the knowledge bases and inference engines that sit on that cloud substrate, for which issues of correctness may be less well defined; and the human communities that form around the discovery clouds, and that arguably form as much as part of the cloud as the hardware, software, and data. I raise questions here about what it might mean to reason about such systems. I do not provide any answers.
I am grateful to the organizers of Petri Nets 2016 for the opportunity to contribute this article to the proceedings. This work is supported in part by the US Department of Energy contract DE-AC02-06CH11357.
- 3.Quoc, V.L.: Building high-level features using large scale unsupervised learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598. IEEE (2013)Google Scholar
- 6.Honavar, V.G., Hill, M.D., Yelick, K.: Accelerating science: a computing research agenda. A white paper prepared for the Computing Community Consortium committee of the Computing Research Association (2016). http://cra.org/ccc/resources/ccc-led-whitepapers/
- 7.Djorgovski, S.G.: Virtual astronomy, information technology, and the new scientific methodology. In: 7th International Workshop on Computer Architecture for Machine Perception, pp. 125–132. IEEE (2005)Google Scholar
- 8.Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Joubert, G., Grandinetti, L. (eds.) Big Data and High Performance Computing (in press, 2015)Google Scholar
- 9.Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., vandenBerg, J.: The SDSS SkyServer - public access to the sloan digital sky server data. In: ACM SIGMOD, pp. 1–11 (2002)Google Scholar
- 11.Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(D1), D206–D214 (2014)CrossRefGoogle Scholar
- 12.Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)CrossRefGoogle Scholar
- 13.Szalay, A.S.: From simulations to interactive numerical laboratories. In: 2014 Winter Simulation Conference, pp. 875–886. IEEE Press (2014)Google Scholar
- 14.O’Mullane, W., Li, N., Nieto-Santisteban, M., Szalay, A., Thakar, A., Gray, J.: Batch is back: CasJobs, serving multi-TB data on the Web. In: IEEE International Conference on Web Services, pp. 33–40. IEEE (2005)Google Scholar
- 15.Chong, F., Carraro, G.: Architecture strategies for catching the long tail. MSDN Library, Microsoft Corporation, pp. 9–10 (2006)Google Scholar
- 16.Dubey, A., Wagle, D.: Delivering software as a service. The McKinsey Quarterly, May 2007Google Scholar
- 17.Foster, I., Vasiliadis, V., Tuecke, S.: Software as a service as a path to software sustainability. Technical report (2013). doi: 10.6084/m9.figshare.791604
- 20.Madhavan, K.P.C., Beaun, D., Shivarajapura, S., Adams, G.B., Klimeck, G.: nanoHUB.org serving over 120,000 users worldwide: its first cyber-environment assessment. In: 10th IEEE Conference on Nanotechnology (IEEE-NANO), pp. 90–95. IEEE (2010)Google Scholar
- 21.Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., et al.: The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)Google Scholar
- 23.Foster, I., Chard, K., Tuecke, S.: The discovery cloud: accelerating and democratizing research on a global scale. In: International Conference on Cloud Engineering (2016)Google Scholar
- 30.Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Karan, V., Berriman, G.B., Good, J., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)Google Scholar
- 35.Simonet, A., Chard, K., Fedak, G., Foster, I.: Using active data to provide smart data surveillance to e-science users. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 269–273. IEEE (2015)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.