Improving the Execution Performance of FreeSurfer
- 929 Downloads
A scheme to significantly speed up the processing of MRI with FreeSurfer (FS) is presented. The scheme is aimed at maximizing the productivity (number of subjects processed per unit time) for the use case of research projects with datasets involving many acquisitions. The scheme combines the already existing GPU-accelerated version of the FS workflow with a task-level parallel scheme supervised by a resource scheduler. This allows for an optimum utilization of the computational power of a given hardware platform while avoiding problems with shortages of platform resources. The scheme can be executed on a wide variety of platforms, as its implementation only involves the script that orchestrates the execution of the workflow components and the FS code itself requires no modifications. The scheme has been implemented and tested on a commodity platform within the reach of most research groups (a personal computer with four cores and an NVIDIA GeForce 480 GTX graphics card). Using the scheduled task-level parallel scheme, a productivity above 0.6 subjects per hour is achieved on the test platform, corresponding to a speedup of over six times compared to the default CPU-only serial FS workflow.
KeywordsFreeSurfer MRI Medical imaging GPU CUDA Resource scheduler
This research has been supported by INNDACYT Association, partly funded by the Talent Empresa program of the AGAUR Agency, Knowledge and Economy Department of Generalitat de Catalunya and also MICINN-Spain under contract TIN2011-28689-C02-01.
Our work was possible thanks to the cooperation of INNDACYT Association with Port d’Informació Científica (PIC), and Hospital de la Santa Creu i Sant Pau (Barcelona). The Port d’Informació Científica (PIC) is maintained through a collaboration agreement between the Generalitat de Catalunya, CIEMAT, IFAE and the Universitat Autònoma de Barcelona.
We thank specially Richard Edgar, Bruce Fischl et al. from Athinoula A. Martinos Center for Biomedical Imaging (Harvard-MIT) for the offered support.
- Fung, W., Sham, I., Yuan, G., Aamodt, T. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (Micro). pp. 407420. IEEE Computer Society.Google Scholar
- NVIDIA CUDA Programming Guide - pp. 86, 136-139 - http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming Guide.pdf
- NVIDIAs Next Generation CUDA Compute Architecture: Fermi. White Paper (Oct 2009) http://www.nvidia.com/content/PDF/fermi white papers/NVIDIA Fermi Compute Architecture Whitepaper.pdf.
- PBS: Portable Batch System, External Reference Specification, Bayucan, A., Henderson, R. L., Lesiak, C., Mann, B., Proett, T., Tweten, D., (1999). MRJ Technology Solutions, November.Google Scholar
- PSCHED: An API for Parallel Job/Resource Management, The PSCHED Working Group, November (1996).Google Scholar
- Edgar R. (2011). Acceleration of the Freesurfer Suite for Neuroimaging Analysis, GPU Technology Conference.Google Scholar
- Membarth, R., Lupp, J.-H., Hannig, F., Teich, J., Krner, M., & Eckert, W. (2012). Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging, Proceedings of the 25th International Conference on Architecture of Computing Systems (ARCS), Munich.Google Scholar
- SLURM: Simple Linux Utility for Resource Management, Yoo, A., Jette, M. & Grondona, M. (2003). Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, (pp. 44–60). Springer-Verlag.Google Scholar