Fast Parallel Model Estimation on the Cell Broadband Engine
In this paper, we present fast parallel implementations of the RANSAC algorithm on the Cell processor, a multicore SIMD architecture. We present our developed strategies for efficient parallel implementation of the RANSAC algorithm by exploiting the specific features of the Cell processor. We also discuss our new method for model generation to increase the efficiency of calculation of the Homography transformation by RANSAC. In fact, by using this new method and change of algorithm, we have been able to increase the overall performance by a factor of almost 3. We also discuss in details our approaches for further increasing the efficiency by a careful vectorization of the computation as well as by reducing the communication overhead by overlapping computation and communication. The results of our practical implementations clearly demonstrate that a very high sustained computational performance (in terms of sustained GFLOPS) can be achieved with a minimum of communication overhead, resulting in a capability of real-time generation and evaluation of a very large number of models. With a date set of size 2048 data and a number of 256 models, we have achieved the performance of over 80 sustained GFLOPS. Since the peak computing power of our target architecture is 179 GFLOPS, this represents a sustained performance of about 44% of the peak power, indicating the efficiency of our algorithms and implementations. Our results clearly demonstrate the advantages of parallel implementation of RANSAC on MIMD-SIMD architectures such as Cell processor. They also prove that, by using such a parallel implementation over the sequential one, a problem with a fixed number of iterations (hypothetical models) can be solved much faster leading to a potentially better accuracy of the model.
KeywordsParallel Implementation Global Memory Hypothetical Model Loop Unroll Cell Processor
Unable to display preview. Download preview PDF.
- 2.Chum, O., Matas, J.: Randomized RANSAC with T d,d test. In: Proc. British Machine Vision Conference, pp. 448–457 (2002)Google Scholar
- 3.Pritchett, P., Zisserman, A.: Wide baseline stereo matching. In: Proc. Int. Conf. on Computer Vision, pp. 754–760 (1998)Google Scholar
- 4.Torr, P.H.S.: Outlier detection and motion segmentation. Ph.D. dissertation, Dept. of Engineering Science, University of Oxford (1995)Google Scholar
- 5.McLauchlan, P., Jaenicke, A.: Image mosaicing using sequential bundle adjustment. In: Proc. British Machine Vision Conference, pp. 751–759 (2000)Google Scholar
- 7.Chum, O., Matas, J.: Matching with PROSAC - progressive sample consensus. In: Proc. Int. Conf. on Computer Vision and Pattern Recognition, pp. 220–226 (2005)Google Scholar
- 9.Myatt, D.R., Torr, P.H.S., Nasuto, S.J., Bishop, J.M., Craddock, R.: NAPSAC: High noise, high dimensional robust estimation. In: Proc. British Machine Vision Conference, pp. 458–467 (2002)Google Scholar
- 11.Iser, R., Kubus, D., Wahl, F.M.: An efficient parallel approach to random sample matching (pRANSAM). In: Proc. Int. Conf. of Robotics and Automation, pp. 1199–1206 (2009)Google Scholar
- 12.Winkelbach, S., Molkenstruck, S., Wahl, F.M.: Low-cost laser range scanner and fast surface registration approach. In: 28th Annual Symp. of the German Association for Pattern Recognition, pp. 718–728 (2006)Google Scholar
- 16.Arevalo, A., et al.: Programming the Cell Broadband Engine Architecture: Examples and Best Practices. IBM Redbook (2008)Google Scholar