Skip to main content
Log in

Hardware Implementation of Reconfigurable 1D Convolution

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount of computing resources. A novel one-dimensional (1D) convolution processor with reconfigurable architecture is implemented in this study. This processor is a combination of a line buffer, controller units, as well as a reconfigurable and separable convolution module. The use of a reconfigurable architecture and separable convolution approach improves the flexibility and performance of the convolution processor. The reconfigurable and separable convolution array, which is the main component of the processor, can simultaneously execute convolution operation with different kernels, with a maximum kernel size of up to 24 × 24. Experimental results show that the maximum frames rate of the processor is approximately 194 frames per second (fps), which exceeds the real-time requirement. Synthesis results show that the processor occupies 13.39 mm 2 at a 204 MHz system clock and consumes a power of 419 mW at maximum kernel size at a 120 MHz system clock in SMIC 0.18 μm CMOS technology. Verification experiments on field programmable gate arrays (FPGAs) demonstrate that the processor is suitable for real-time image processing applications even for high-resolution images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Parmar, J.M., & Patil, S.A. (2013). Performance evaluation and comparison of modified denoising method and the local adaptive wavelet image denoising method. International Conference on Intelligent Systems and Signal Processing, 101–105.

  2. Foi, A., & Boracchi, G. (2013). Anisotropically foveated nonlocal image denoising. In 2013 20th IEEE International Conference on Image Processing (ICIP) (pp. 464–468).

  3. Zhu, Q., Zheng, D, Xiong, H. (2012). 3D tubular structure extraction using kernel-based superellipsoid model with Gaussian process regression. IEEE Visual Communications and Image Processing (VCIP), 1–6.

  4. Letourneau, E., Verhaeghe, J., Reader, A.J. (2012). Impact of tracer distribution, count level, iterations and post-smoothing on PET quantification using a variously weighted least squares algorithm. IEEE Nuclear Science Symposium and Medical Imaging Conference, 2351–2353.

  5. Hamarsheh, Q. (2012). Unified matrix processor design for FCT-based filtering, convolution and correlation of signals. Second International Conference on Digital Information and Communication Technology and its Applications, 293–299.

  6. Chan, C., Fulton, R., Barnett, R., Feng, D.D., Meikle, S. (2014). Postreconstruction nonlocal means filtering of whole-body PET with an anatomical prior. IEEE Transactions on Medical Imaging, 33(3), 636–650.

    Article  Google Scholar 

  7. Ok, A.O. (2014). A new approach for the extraction of aboveground circular structures from Near-Nadir VHR satellite imagery. IEEE Transactions on Geoscience and Remote Sensing, 52(6), 3125–3140.

    Article  Google Scholar 

  8. Franchini, S., Gentile, A., Sorbello, F., Vassallo, G., Vitabile, S. (2013). A specialized architecture for color image edge detection based on clifford algebra, Seventh International Conference on Complex. Intelligent, and Software Intensive Systems (CISIS), 128–135.

  9. Niclass, C., Soga, M., Matsubara, H., Ogawa, M., Kagami, M. (2014). A 0.18- μm CMOS SoC for a 100-m-range 10-frame/s 200 × 96-pixel time-of-flight depth sensor. IEEE Journal of Solid-State Circuits, 49(1), 315–330.

    Article  Google Scholar 

  10. Talmon, R., Cohen, I., Gannot, S. (2013). Single-channel transient interference suppression with diffusion maps. IEEE Transactions on Audio, Speech, and Language Processing, 21(1), 132–144.

    Article  Google Scholar 

  11. Zhang, J., Fu, N., Peng, X. (2014). Compressive circulant matrix based analog to information conversion. IEEE Signal Processing Letters, 21(4), 428–431.

    Article  Google Scholar 

  12. Chen, W. (2014). Determination of displacement from an image sequence based on time-reversal invariance. IEEE Transactions on Geoscience and Remote Sensing, 52(5), 2575–2592.

    Article  Google Scholar 

  13. Zamarreno-Ramos, C., Linares-Barranco, A., Serrano-Gotarredona, T., Linares-Barranco, B. (2013). Multicasting mesh AER: A scalable assembly approach for reconfigurable neuromorphic structured AER systems. Application to convNets, IEEE Transactions on Biomedical Circuits and Systems, 7(1), 82–102.

    Article  Google Scholar 

  14. Li, W.X.Y., Cheung, R.C.C., Chan, R.H.M., Song, D., Berger, T.W. (2013). A reconfigurable architecture for real-time prediction of neural activity. IEEE International Symposium on Circuits and Systems, 1869–1872.

  15. Roy, D. (2005). Machine vision: theory, algorithms, practicalities. Singapore: Elsevier.

    Google Scholar 

  16. Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., Keutzer, K. (2013). Communication-minimizing 2D convolution in GPU registers, 20th IEEE International Conference on Image Processing (ICIP) (2116–2120).

  17. Wang, X.X., & Shi, B.E. (2010). GPU implemention of fast Gabor filters. Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 373–376.

  18. Hartung, S., Shukla, H., Miller, J.P., Pennypacker, C. (2012). GPU acceleration of image convolution using spatially-varying kernel. 19th IEEE International Conference on Image Processing (ICIP), 1685–1688.

  19. Krill, B., & Amira, A. (2011). Efficient reconfigurable architectures of generic cyclic convolution. IEEE 15th International Symposium on Consumer Electronics, 560–564.

  20. Mohammad, K., & Agaian, S. (2009). Efficient FPGA implementation of convolution. IEEE International Conference on Systems, Man and Cybernetics, 3478–3483.

  21. Vega-Rodriguez, M.A., Sanchez-Perez, J.M., Gomez-Pulido, J.A. (2004). An optimized architecture for implementing image convolution with reconfigurable hardware. Proceedings of the 2004 World Automation Congress, 16, 131–136.

    Google Scholar 

  22. Hashemi, M.R., & Eshghi, M. (2012). Design of a reconfigurable parallel convolver. 19th International Conference on Systems, Signals and Image Processing, 181–184.

  23. Zhang, B., Mei, K., Zheng, N. (2013). Coarse-grained dynamically reconfigurable processor for vision pre-processing. Journal of Signal Processing Systems.

  24. Zhang, H., Xia, M., Hu, G. (2007). A multiwindow partial buffering scheme for FPGA-Based 2-D convolvers. IEEE Transactions on Circuits and Systems II: Express Briefs, 54(2), 200–204.

    Article  Google Scholar 

  25. Cardells-Tormo, F., & Molinet, P.L. (2006). Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing. IEEE Transactions on Circuits and Systems II: Express Briefs, 53(2), 105–109.

    Article  Google Scholar 

  26. Ohsang Kwon., Nowka K. Swartzlander E.E. (2000). A 16-bit × 16-bit MAC design using fast 5:2 compressors. IEEE International Conference on Application-Specific Systems, Architectures, and Processors, 235–243.

  27. Rao, D.V., & Patil, S. (2006). Implementation and evaluation of image processing algorithms on reconfigurable architecture using C-based hardware descriptive languages. International Journal of Engineering and Applied Computer Sciences, 1(1), 9–34.

    Google Scholar 

  28. Joginipelly, A., Varela, A., Charalampidis, D., Schott, R., Fitzsimmons, Z. (2012). Efficient FPGA implementation of steerable Gaussian smoothers. 44th Southeastern Symposium on System Theory (SSST), 78–82.

  29. Elboher, E., & Werman, M. (2012). Efficient and accurate Gaussian image filtering using running sums. 12th International Conference on Intelligent Systems Design and Applications, 897–902.

  30. Charalampidis, D. (2009). Efficient directional Gaussian smoothers. IEEE Geoscience and Remote Sensing Letters, 6(3), 383–387.

    Article  Google Scholar 

  31. Chip-Hong, C., Jiangmin, G., Mingyan, Z. (2004). Ultra low-voltage low-Power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE Transactions on Circuits and Systems-I, 51(10), 1985–1997.

    Article  Google Scholar 

  32. Veeramachaneni, S., Krishna, M.K., Avinash, L., Puppala, S.R., Srinivas, M.B. (2007). Novel architectures for high-speed and low-power 3-2, 4-2 and 5-2 compressors, 6th International Conference on Embedded Systems., 20th International Conference on VLSI Design (324–329).

  33. Alexey, L. (2011). A SIMD cellular processor array visionchip with asynchronous processing capabilities. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(10), 2420–2431.

    Article  MathSciNet  Google Scholar 

  34. Wan-cheng, Z., Qiu-yu, F., Nan-jian, W. (2011). A programmable vision chip based on multiple levels of parallel processors. IEEE Journal of Solid-State Circuits, 46(9), 1–16.

    Article  Google Scholar 

  35. Camunas-Mesa, L., Zamarreno-Ramos, C., Linares-Barranco, A., Acosta-Jimenez, A.J., Serrano-Gotarredona, T., Linares-Barranco, B. (2012). An event-driven multi-kernel convolution processor module for event-driven vision sensors. IEEE Journal of Solid-State Circuits, 47(2), 504–517.

    Article  Google Scholar 

  36. Liu, Z., Song, Y., Shao, M., Li, S., Li, L., Ishiwata, S., Nakagawa, M., Goto, S., Ikenaga, T. (2009). HDTV1080p H.264/AVC encoder chip design and performance analysis. IEEE Journal of Solid-State Circuits, 44(2), 594–608.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Project funded by China Postdoctoral Science Foundation (2014M550492), National Natural Science Foundation of China (61231018), and Natural Science Basic Research Plan in Shaanxi Province of China (2013JQ8025).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, L., Zhang, B. & Zhao, J. Hardware Implementation of Reconfigurable 1D Convolution. J Sign Process Syst 82, 1–16 (2016). https://doi.org/10.1007/s11265-015-0969-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-0969-5

Keywords

Navigation