Abstract
This paper presents a data minimization method that aims at reducing overhead for data reuse in grid environments. The data reuse here is designed to promote efficient use of grid resources by avoiding multiple executions of the same computation in a collaborative community. To promote this at the program block level, our method minimizes the data size of attribute values, which are used for identification of computation products stored in a database (DB) server. Because attribute values are specified in queries used for store, search, or retrieval of computation products, their reduction leads to less communication between computing nodes and the DB server, minimizing the runtime overhead of data reuse. We also show some experimental results obtained using a time-consuming medical application. We find that the method successfully reduces the data size of a query from 683 MB to 52 B. This reduction allows our data reuse framework to reduce execution time from approximately 9 minutes to 27 seconds.
This work was partly supported by JSPS Grant-in-Aid on Priority Areas (170320007), for Scientific Research (B)(2)(18300009), and for Young Researchers (17700060).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Int’l J. High Performance Computing Applications 15, 200–222 (2001)
Nishikawa, T., Nagashima, U., Sekiguchi, S.: Design and implementation of intelligent scheduler for gaussian portal on quantum chemistry grid. In: Proc. 3rd Int’l Conf. Computational Science (ICCS 2003), Part III, pp. 244–253 (2003)
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto grid environments. J. Grid Computing 1, 25–39 (2003)
Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 219–237 (2005)
Zhao, Y., Wilde, M., Foster, I., Voeckler, J., Dobson, J., Gilbert, E., Jordan, T., Quigg, E.: Virtual data grid middleware services for data-intensive science. Concurrency and Computation: Practice and Experience 18, 595–608 (2006)
Altintas, I., Birnbaum, A., Baldridge, K.K., Sudholt, W., Miller, M., Amoreira, C., Potier, Y., Ludaescher, B.: A framework for the design and reuse of grid workflows. In: Herrero, P., S. Pérez, M., Robles, V. (eds.) SAG 2004. LNCS, vol. 3458, pp. 120–133. Springer, Heidelberg (2005)
Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS parameter sweep template: User-level middleware for the Grid. In: Proc. High Performance Networking and Computing Conf (SC 2000) (2000)
Santos-Neto, E., Cirne, W., Brasileiro, F., Lima, A.: Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 210–232. Springer, Heidelberg (2005)
Strout, M.M., Carter, L., Ferrante, J., Freeman, J., Kreaseck, B.: Combining performance aspects of irregular Gauss-Seidel via sparse tiling. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 90–110. Springer, Heidelberg (2005)
Issenin, I., Brockmeyer, E., Miranda, M., Dutt, N.: Data reuse analysis technique for software-controlled memory hierarchies. In: Proc. Design, Automation and Test in Europe Conf. and Exhibition (DATE 2004), pp. 202–207 (2004)
Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high-performance computing. ACM Computing Surveys 26, 345–420 (1994)
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15, 29–36 (1995)
Ino, F., Ooyama, K., Hagihara, K.: A data distributed parallel algorithm for nonrigid image registration. Parallel Computing 31, 19–43 (2005)
Message Passing Interface Forum: MPI: A message-passing interface standard. Int’l J. Supercomputer Applications and High Performance Computing 8, 159–416 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ino, F., Matsuo, K., Mizutani, Y., Hagihara, K. (2006). Minimizing Data Size for Efficient Data Reuse in Grid-Enabled Medical Applications. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_18
Download citation
DOI: https://doi.org/10.1007/11946465_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68063-5
Online ISBN: 978-3-540-68065-9
eBook Packages: Computer ScienceComputer Science (R0)