Storlet Engine for Executing Biomedical Processes Within the Storage System
The increase in large biomedical data objects stored in long term archives that continuously need to be processed and analyzed requires new storage paradigms. We propose expanding the storage system from only storing biomedical data to directly producing value from the data by executing computational modules - storlets - close to where the data is stored. This paper describes the Storlet Engine, an engine to support computations in secure sandboxes within the storage system. We describe its architecture and security model as well as the programming model for storlets. We experimented with several data sets and storlets including de-identification storlet to de-identify sensitive medical records, image transformation storlet to transform images to sustainable formats, and various medical imaging analytics storlets to study pathology images. We also provide a performance study of the Storlet Engine prototype for OpenStack Swift object storage.
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under grant agreement 270000 and under grant agreement 600826.
- 1.Factor, M., Naor, D., Rabinovici-Cohen, S., Ramati, L., Reshef, P., Satran, J., Giaretta, D.: Preservation DataStores: architecture for preservation aware storage. In: MSST 2007, Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, San Diego, CA, pp. 3–15, September 2007Google Scholar
- 2.Rabinovici-Cohen, S., Marberg, J., Nagin, K., Pease, D.: PDS Cloud: Long term digital preservation in the cloud. In: IC2E 2013, Proceedings of the IEEE International Conference on Cloud Engineering, San Francisco, CA, March 2013Google Scholar
- 3.Rajaraman, A., Ullman, J.: Mining of Massive Datasets. Lecture Notes for Stanford CS345A Web Mining (2011)Google Scholar
- 4.Rabinovici-Cohen, S., Henis, E., Marberg, J., Nagin, K.: Storlet engine: performing computations in cloud storage. Technical report H-0320, IBM Research - Haifa, August 2014Google Scholar
- 5.Shahar, Y.: The elicitation, representation, application, and automated discovery of time-oriented declarative clinical knowledge. In: Lenz, R., Miksch, S., Peleg, M., Reichert, M., Riaño, D., ten Teije, A. (eds.) ProHealth 2012 and KR4HC 2012. LNCS, vol. 7738, pp. 1–29. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 7.Le, X., Wang, D.: Neuroimage data sets: rethinking privacy policies. In: HealthSec (2012)Google Scholar
- 9.Weil, S., Brandt, S., Miller, E., Long, D., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: OSDI 2006, Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (2006)Google Scholar
- 10.OpenStack Savanna. https://wiki.openstack.org/wiki/Savanna
- 11.ZeroVM. http://zerovm.org