Storlet Engine for Executing Biomedical Processes Within the Storage System

  • Simona Rabinovici-Cohen
  • Ealan Henis
  • John Marberg
  • Kenneth Nagin
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 202)

Abstract

The increase in large biomedical data objects stored in long term archives that continuously need to be processed and analyzed requires new storage paradigms. We propose expanding the storage system from only storing biomedical data to directly producing value from the data by executing computational modules - storlets - close to where the data is stored. This paper describes the Storlet Engine, an engine to support computations in secure sandboxes within the storage system. We describe its architecture and security model as well as the programming model for storlets. We experimented with several data sets and storlets including de-identification storlet to de-identify sensitive medical records, image transformation storlet to transform images to sustainable formats, and various medical imaging analytics storlets to study pathology images. We also provide a performance study of the Storlet Engine prototype for OpenStack Swift object storage.

Keywords

Chromium 

Notes

Acknowledgments

The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under grant agreement 270000 and under grant agreement 600826.

References

  1. 1.
    Factor, M., Naor, D., Rabinovici-Cohen, S., Ramati, L., Reshef, P., Satran, J., Giaretta, D.: Preservation DataStores: architecture for preservation aware storage. In: MSST 2007, Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, San Diego, CA, pp. 3–15, September 2007Google Scholar
  2. 2.
    Rabinovici-Cohen, S., Marberg, J., Nagin, K., Pease, D.: PDS Cloud: Long term digital preservation in the cloud. In: IC2E 2013, Proceedings of the IEEE International Conference on Cloud Engineering, San Francisco, CA, March 2013Google Scholar
  3. 3.
    Rajaraman, A., Ullman, J.: Mining of Massive Datasets. Lecture Notes for Stanford CS345A Web Mining (2011)Google Scholar
  4. 4.
    Rabinovici-Cohen, S., Henis, E., Marberg, J., Nagin, K.: Storlet engine: performing computations in cloud storage. Technical report H-0320, IBM Research - Haifa, August 2014Google Scholar
  5. 5.
    Shahar, Y.: The elicitation, representation, application, and automated discovery of time-oriented declarative clinical knowledge. In: Lenz, R., Miksch, S., Peleg, M., Reichert, M., Riaño, D., ten Teije, A. (eds.) ProHealth 2012 and KR4HC 2012. LNCS, vol. 7738, pp. 1–29. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  6. 6.
    Cooper, L., Carter, A., Farris, A., Wang, F., Kong, J., Gutman, D., Widener, P., Pan, T., Cholleti, S., Sharma, A., Kurç, T., Brat, D., Saltz, J.: Digital pathology: data-intensive frontier in medical imaging. Proc. IEEE 100(4), 317–323 (2012)CrossRefGoogle Scholar
  7. 7.
    Le, X., Wang, D.: Neuroimage data sets: rethinking privacy policies. In: HealthSec (2012)Google Scholar
  8. 8.
    Rabinovici-Cohen, S., Wolfson, O.: Why a single parallelization strategy is not enough in knowledge bases. J. Comput. Syst. Sci. 47(1), 2–44 (1993)CrossRefGoogle Scholar
  9. 9.
    Weil, S., Brandt, S., Miller, E., Long, D., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: OSDI 2006, Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (2006)Google Scholar
  10. 10.
  11. 11.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Simona Rabinovici-Cohen
    • 1
  • Ealan Henis
    • 1
  • John Marberg
    • 1
  • Kenneth Nagin
    • 1
  1. 1.IBM Research – HaifaHaifaIsrael

Personalised recommendations