Advertisement

Journal of Real-Time Image Processing

, Volume 14, Issue 2, pp 341–361 | Cite as

Highly efficient image registration for embedded systems using a distributed multicore DSP architecture

  • Roelof Berg
  • Lars König
  • Jan Rühaak
  • Ralph Lausen
  • Bernd Fischer
Original Research paper

Abstract

We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss–Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 × 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.

Keywords

Image registration Embedded systems Parallelization Distributed computing DSP 

Notes

Acknowledgments

The software created during this work is open source and can be accessed at http://www.github.com/RoelofBerg/fimreg.

In deep sorrow, we commemorate Prof. Dr. rer. nat. Bernd Fischer who passed away during the creation of this paper. Our thoughts are with his family.

References

  1. 1.
    Advantech (2013) DSPC-8681—half-length PCI express card with 4 TMS320C6678 DSPs. http://downloadt.advantech.com/ProductFile/PIS/DSPC-8681/Product%20-%20Datasheet/DSPC-8681_DS(03.31.14)20140519134025.pdf
  2. 2.
    Alavi, A., et al.: Is PET-CT the only option? Eur. J. Nucl. Med. Mol. Imag. 34, 819–821 (2007)CrossRefGoogle Scholar
  3. 3.
    Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)CrossRefGoogle Scholar
  4. 4.
    Capek, K.: Optimisation strategies applied to global similarity based image registration methods. In: International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), vol 2, 369–374 (1999)Google Scholar
  5. 5.
    Castro-Pareja, C.R., Jagadeesh, J.M., Shekhar, R.: FAIR: a hardware architecture for real-time 3-D image registration. IEEE Trans. Inf. Technol. Biomed 7(4), 426–434 (2003)CrossRefGoogle Scholar
  6. 6.
    Dennis, J.J.E., Schnabel, R.B.: Numerical methods for unconstrained optimization and nonlinear equations. SIAM (1983)Google Scholar
  7. 7.
    Evans, J.R., Arslan, T.: The implementation of an evolvable hardware system for real time image registration on a system-on-chip platform. In: Evolvable Hardware, 2002. Proceedings. NASA/DoD Conference on, IEEE, 142–146 (2002)Google Scholar
  8. 8.
    Eyre, J., Bier, J.: The evolution of DSP processors. IEEE Signal Process. Mag 17(2), 43–51 (2000)CrossRefGoogle Scholar
  9. 9.
    Fischer, B., Modersitzki, J.: Ill-posed medicine—an introduction to image registration. Inverse Problems 24(3):034,008 (2008)Google Scholar
  10. 10.
    Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell 32(7), 1239–1258 (2010)CrossRefGoogle Scholar
  11. 11.
    Gigengack, F., Ruthotto, L., Burger, M., Wolters, C.H., Jiang, X., Schafers, K.P.: Motion correction in dual gated cardiac PET using mass-preserving image registration. IEEE Trans. Med. Imag31(3), 698–712 (2012)CrossRefGoogle Scholar
  12. 12.
    Gonzalez, R.C., Woods, R.E.: Digital Imag. Process., vol 2. Addison-Wesley (1992) Google Scholar
  13. 13.
    Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Compu 27(5), 1594–1607 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. Methods Inf. Med 46, 292–9 (2007)Google Scholar
  15. 15.
    Hossny, M., Nahavandi, S., Creighton, D., Bhatti, A.: Towards autonomous image fusion. In: Control Automation Robotics and Vision (ICARCV), 2010 11th International Conference on, IEEE, 1748–1754 (2010)Google Scholar
  16. 16.
    Intel Corporation Desktop 3rd generation Intel Core processor family, desktop Intel Pentium processor family, and desktop Intel Celeron processor family. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/3rd-gen-core-desktop-vol-1-datasheet.pdf (2013)
  17. 17.
    Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graph. Models Imag. process53(3), 231–239 (1991)Google Scholar
  18. 18.
    Kabus, S., Lorenz, C.: Fast elastic image registration. Grand Challenges in Medical Image Analysis, 81–89 (2010)Google Scholar
  19. 19.
    Karam, L.J., AlKamal, I., Gatherer, A., Frantz, G.A., Anderson, D.V., Evans, B.L.: Trends in multicore DSP platforms. IEEE Signal Process. Mag 26(6), 38–49 (2009)CrossRefGoogle Scholar
  20. 20.
    Kessler, C.W.: Compiling for VLIW DSPs. In: Handbook of Signal Processing Systems, Springer, 1177–1214 (2013)Google Scholar
  21. 21.
    König, L., Rühaak, J.: A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on, IEEE, 580–583 (2014)Google Scholar
  22. 22.
  23. 23.
    Kontron, A.G.: Embedded computer solutions for advanced automation control. http://www.kontron.com/resources/collateral/industry_brochures/folder_automation_2013.pdf (2013)
  24. 24.
    Leon, F.P., Kammel, S.: Image fusion techniques for robust inspection of specular surfaces. In: AeroSense 2003, International Society for Optics and Photonics, 77–86 (2003)Google Scholar
  25. 25.
    Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imag 16(2), 187–198 (1997)CrossRefGoogle Scholar
  26. 26.
    Mahapatra, N.R., Venkatrao, B.: The processor-memory bottleneck: problems and solutions. Crossroads 5(3es):2 (1999)Google Scholar
  27. 27.
    Mattes, D., Haynor, D.R., Vesselle, H., Lewellen, T.K., Eubank, W.: PET-CT image registration in the chest using free-form deformations. IEEE Trans. Med. Imag 22(1), 120–128 (2003)CrossRefGoogle Scholar
  28. 28.
    Modersitzki, J.: Numerical methods for image registration. Oxford University Press (2004)Google Scholar
  29. 29.
    Modersitzki, J.: FAIR—Flexible algorithms for image registration. SIAM, Philadelphia (2009)CrossRefzbMATHGoogle Scholar
  30. 30.
    Mueller, B., Olesch, J., Lotz, J., Barendt, S., Sedlaczek, O., Lahrmann, B., Grabe, N., Bestvater, F., Kauczor, U., Schnabel, P., Hoffmann, H., Fischer, B., Schirmacher, P., Warth, A., Breuhahn, K.: 3D reconstruction of lung adenocarcinomas—one module for the development of mathematical multiscale models of lung cancer. Der Pathologe 34(1), 140 (2013)Google Scholar
  31. 31.
    Nocedal, J., Wright, S.: Numerical optimization, 2nd edn. Springer, Berlin, Heidelberg (2006)zbMATHGoogle Scholar
  32. 32.
    Reed, J.M., Hutchinson, S.: Image fusion and subpixel parameter estimation for automated optical inspection of electronic components. IEEE Trans. Indus. Electr 43(3), 346–354 (1996)CrossRefGoogle Scholar
  33. 33.
    Remagnino, P., Jones, G.: Automated registration of surveillance data for multi-camera fusion. In: Information Fusion, 2002. Proceedings of the Fifth International Conference on, IEEE, vol 2, 1190–1197 (2002)Google Scholar
  34. 34.
    Rühaak, J., Heldmann, S., Kipshagen, T., Fischer, B.: Highly accurate fast lung CT registration. In: SPIE Medical Imaging, International Society for Optics and Photonics (2013)Google Scholar
  35. 35.
    Rühaak, J., König, L., Hallmann, M., Papenberg, N., Heldmann, S., Schumacher, H., Fischer, B.: A fully parallel algorithm for multimodal image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, 572–575 (2013)Google Scholar
  36. 36.
  37. 37.
    Schmitt, O., Modersitzki, J., Heldmann, S., Wirtz, S., Fischer, B.: Image registration of sectioned brains. Intern. J. Comp. Vision 73(1), 5–39 (2007)CrossRefGoogle Scholar
  38. 38.
    Sen, M., Hemaraj, Y., Plishker, W., Shekhar, R., Bhattacharyya, S.S.: Model-based mapping of reconfigurable image registration on FPGA platforms. J. Real-time Imag. Process 3(3), 149–162 (2008)CrossRefGoogle Scholar
  39. 39.
    Stotzer, E., Jayaraj, A., Ali, M., Friedmann, A., Mitra, G., Rendell, A., Lintault, I.: OpenMP on the low-power TI keystone II ARM/DSP system-on-chip. In: Rendell, A., Chapman, B., Müller, M. (eds.) OpenMP in the Era of Low Power Devices and Accelerators. Lecture Notes in Computer Science, vol 8122, 114–127. Springer, Berlin Heidelberg (2013)Google Scholar
  40. 40.
    Texas Instruments: AM335x sitara processors. http://www.ti.com/lit/ds/symlink/am3359.pdf (2013)
  41. 41.
    Texas Instruments: AM335x starter kit. http://www.ti.com/tool/tmdssk3358 (2014a)
  42. 42.
    Texas Instruments: C6678 power consumption model (rev. d). http://www.ti.com/litv/zip/sprm545d (2014b)
  43. 43.
    Texas Instruments: SYS/BIOS (TI-RTOS kernel) v6.40. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014c)
  44. 44.
    Texas Instruments: TMS320C6678 - multicore fixed and floating-point digital signal processor. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014d)
  45. 45.
    Texas Instruments : TMS320C6678 evaluation modules. www.ti.com/tool/tmdsevm6678 (2014e)
  46. 46.
    Tramnitzke, F., Rühaak, J., König, L., Modersitzki, J., Köstler, H.: GPU Based Affine Linear Image Registration using Normalized Gradient Fields. In: Proc. Seventh International Workshop on High Performance Computing for Biomedical Image Analysis (HPC-MICCAI), Boston, MA, USA (2014)Google Scholar
  47. 47.
    Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)CrossRefGoogle Scholar
  48. 48.
    Viola, P., Wells III, W.M.: Alignment by maximization of mutual information. Intern. J. Comp. Vision 24(2), 137–154 (1997)CrossRefGoogle Scholar
  49. 49.
    Wu, H., Kim, Y.: Fast wavelet-based multiresolution image registration on a multiprocessing digital signal processor. Intern. J. Imag. Syst. Technol. 9(1), 29–37 (1998)CrossRefGoogle Scholar
  50. 50.
    Zitová, B., Flusser, J.: Image registration methods: a survey. Imag. Vision Compu. 21(11), 977–1000 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Roelof Berg
    • 1
  • Lars König
    • 2
  • Jan Rühaak
    • 2
  • Ralph Lausen
    • 3
  • Bernd Fischer
    • 2
  1. 1.Berg SolutionsLübeckGermany
  2. 2.Fraunhofer MEVISLübeckGermany
  3. 3.DHBW KarlsruheKarlsruheGermany

Personalised recommendations