Minimizing Data Size for Efficient Data Reuse in Grid-Enabled Medical Applications

  • Fumihiko Ino
  • Katsunori Matsuo
  • Yasuharu Mizutani
  • Kenichi Hagihara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)


This paper presents a data minimization method that aims at reducing overhead for data reuse in grid environments. The data reuse here is designed to promote efficient use of grid resources by avoiding multiple executions of the same computation in a collaborative community. To promote this at the program block level, our method minimizes the data size of attribute values, which are used for identification of computation products stored in a database (DB) server. Because attribute values are specified in queries used for store, search, or retrieval of computation products, their reduction leads to less communication between computing nodes and the DB server, minimizing the runtime overhead of data reuse. We also show some experimental results obtained using a time-consuming medical application. We find that the method successfully reduces the data size of a query from 683 MB to 52 B. This reduction allows our data reuse framework to reduce execution time from approximately 9 minutes to 27 seconds.


Directed Acyclic Graph Data Size Data Minimization Grid Resource Grid Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Int’l J. High Performance Computing Applications 15, 200–222 (2001)CrossRefGoogle Scholar
  2. 2.
    Nishikawa, T., Nagashima, U., Sekiguchi, S.: Design and implementation of intelligent scheduler for gaussian portal on quantum chemistry grid. In: Proc. 3rd Int’l Conf. Computational Science (ICCS 2003), Part III, pp. 244–253 (2003)Google Scholar
  3. 3.
    Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto grid environments. J. Grid Computing 1, 25–39 (2003)CrossRefGoogle Scholar
  4. 4.
    Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 219–237 (2005)Google Scholar
  5. 5.
    Zhao, Y., Wilde, M., Foster, I., Voeckler, J., Dobson, J., Gilbert, E., Jordan, T., Quigg, E.: Virtual data grid middleware services for data-intensive science. Concurrency and Computation: Practice and Experience 18, 595–608 (2006)CrossRefGoogle Scholar
  6. 6.
    Altintas, I., Birnbaum, A., Baldridge, K.K., Sudholt, W., Miller, M., Amoreira, C., Potier, Y., Ludaescher, B.: A framework for the design and reuse of grid workflows. In: Herrero, P., S. Pérez, M., Robles, V. (eds.) SAG 2004. LNCS, vol. 3458, pp. 120–133. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS parameter sweep template: User-level middleware for the Grid. In: Proc. High Performance Networking and Computing Conf (SC 2000) (2000)Google Scholar
  8. 8.
    Santos-Neto, E., Cirne, W., Brasileiro, F., Lima, A.: Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 210–232. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Strout, M.M., Carter, L., Ferrante, J., Freeman, J., Kreaseck, B.: Combining performance aspects of irregular Gauss-Seidel via sparse tiling. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 90–110. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Issenin, I., Brockmeyer, E., Miranda, M., Dutt, N.: Data reuse analysis technique for software-controlled memory hierarchies. In: Proc. Design, Automation and Test in Europe Conf. and Exhibition (DATE 2004), pp. 202–207 (2004)Google Scholar
  11. 11.
    Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high-performance computing. ACM Computing Surveys 26, 345–420 (1994)CrossRefGoogle Scholar
  12. 12.
    Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15, 29–36 (1995)CrossRefGoogle Scholar
  13. 13.
    Ino, F., Ooyama, K., Hagihara, K.: A data distributed parallel algorithm for nonrigid image registration. Parallel Computing 31, 19–43 (2005)CrossRefGoogle Scholar
  14. 14.
    Message Passing Interface Forum: MPI: A message-passing interface standard. Int’l J. Supercomputer Applications and High Performance Computing 8, 159–416 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Fumihiko Ino
    • 1
  • Katsunori Matsuo
    • 1
  • Yasuharu Mizutani
    • 2
  • Kenichi Hagihara
    • 1
  1. 1.Graduate School of Information Science and TechnologyOsaka UniversityToyonaka, OsakaJapan
  2. 2.Faculty of Information Science and TechnologyOsaka Institute of Technology 

Personalised recommendations