A single-cycle parallel multi-slice connected components analysis hardware architecture

  • Michael J. Klaiber
  • Donald G. Bailey
  • Sven Simon
Original Research Paper


In this paper, a memory-efficient architecture for single-pass connected components analysis suited for high-throughput embedded image processing systems is proposed which achieves a speedup by partitioning the image into slices. Although global data dependencies of image segments spanning several image slices exist, a temporal and spatial local algorithm is proposed, together with a suited FPGA hardware architecture processing pixel data at low latency. The low latency of the proposed architecture allows reuse of labels associated with the image objects. This reduces the amount of memory by a factor of more than 5 in the considered implementations which is a significant contribution since memory is a critical resource in embedded image processing on FPGAs. Therefore, a significantly higher bandwidth of pixel data can be processed with this architecture compared to the state-of-the-art architectures using the same amount of hardware resources.


Connected component analysis Connected component labelling FPGA hardware architecture Feature extraction High-throughput Low latency 



The authors would like to thank the German Research Foundation (DFG) for the financial support. This work was carried out within the research Project Si 586 7/1 which belongs to the priority program DFG-SPP 1423 Prozess-Spray.


  1. 1.
    Xilinx Inc.: Xilinx User Guide—Virtex-6 FPGA Memory Resources UG363 (v1.6). Xilinx Inc., San Jose, CA, USA (2011)Google Scholar
  2. 2.
    Appiah, K., Hunter, A., Dickinson, P., Owens, J.: A run-length based connected component algorithm for FPGA implementation. In: International Conference on Field Programmable Technology, FPT 2008, pp. 177–184 (2008)Google Scholar
  3. 3.
    Bailey, D., Johnston, C.: Single pass connected components analysis. In: Proceedings of Image and Vision Computing New Zealand, pp. 282–287 (2007)Google Scholar
  4. 4.
    Bailey, D., Johnston, C., Ma, N.: Connected components analysis of streamed images. In: International Conference on Field Programmable Logic and Applications (FPL 2008), pp. 679–682 (2008)Google Scholar
  5. 5.
    Galler, B.A., Fisher, M.J.: An improved equivalence algorithm. Commun. ACM 7(5), 301–303 (1964)CrossRefzbMATHGoogle Scholar
  6. 6.
    Hopcroft, J., Ullman, J.: Set merging algorithms. SIAM J. Comput. 2(4), 294–303 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Johnston, C., Bailey, D.: FPGA implementation of a single pass connected components algorithm. In: 4th IEEE International Symposium on Electronic Design, Test and Applications, 2008. DELTA 2008, pp. 228–231 (2008)Google Scholar
  8. 8.
    Klaiber, M., Rockstroh, L., Wang, Z., Baroud, Y., Simon, S.: A memory-efficient parallel single pass architecture for connected component labeling of streamed images. In: International Conference on Field-Programmable Technology (FPT), pp. 159–165 (2012)Google Scholar
  9. 9.
    Klaiber, M.J., Bailey, D.G., Ahmed, S., Baroud, Y., Simon, S.: A high-throughput FPGA architecture for parallel connected components analysis based on label reuse. In: 2013 International Conference on Field-Programmable Technology (FPT), pp. 302–305 (2013)Google Scholar
  10. 10.
    Kumar, V.S., Irick, K., Maashri, A.A., Narayanan, V.: A scalable bandwidth-aware architecture for connected component labeling. In: Voros, N., Mukherjee, A., Sklavos, N., Masselos, K., Huebner, M. (eds.) VLSI 2010 Annual Symposium. Lecture Notes in Electrical Engineering, vol. 105, pp. 133–149. Springer (2011)Google Scholar
  11. 11.
    Lin, C.Y., Li, S.Y., Tsai, T.H.: A scalable parallel hardware architecture for connected component labeling. In: 17th IEEE International Conference on Image Processing (ICIP), pp. 3753–3756 (2010)Google Scholar
  12. 12.
    Ma, N., Bailey, D., Johnston, C.: Optimised single pass connected components analysis. In: International Conference on Field Programmable Technology, FPT 2008, pp. 185–192 (2008)Google Scholar
  13. 13.
    Rosenfeld, A., Pfaltz, J.L.: Sequential operations in digital picture processing. J. ACM 13, 471–494 (1966)CrossRefzbMATHGoogle Scholar
  14. 14.
    Seidel, R., Sharir, M.: Top-down analysis of path compression. SIAM J. Comput. 34(3), 515–525 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Tarjan, R., van Leeuwen, J.: Worst-case analysis of set union algorithms. J. ACM 31(2), 245–281 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Trein, J., Schwarzbacher, A.T., Hoppe, B., Noffz, K., Trenschel, T.: Development of a FPGA based real-time blob analysis circuit. In: Irish Signals and Systems Conference, Derry, Northern Ireland, pp. 121–126 (2007)Google Scholar
  17. 17.
    Zhao, F., Lu, H.Z., Zhang, Z.Y.: Real-time single-pass connected components analysis algorithm. EURASIP J. Image Video Process 21, 1–10 (2013). doi: 10.1186/1687-5281-2013-21 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Michael J. Klaiber
    • 1
  • Donald G. Bailey
    • 2
  • Sven Simon
    • 1
  1. 1.University of StuttgartStuttgartGermany
  2. 2.Massey UniversityPalmerston NorthNew Zealand

Personalised recommendations