Skip to main content

Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices

  • Conference paper
  • First Online:
Sustained Simulation Performance 2017
  • 294 Accesses

Abstract

Labeling connected components in binary lattices is a basic function in image processing with applications in a range of fields, such as robotic vision, machine learning, and even computational fluid dynamics (CFD, percolation theory). While standard algorithms often employ recursive designs that seem ill-suited for parallel execution as well as being prone to excessive memory consumption and even stack-overflows, the described new algorithm is based on a cellular automaton (CA) that is immune against these drawbacks. Furthermore, being an inherently parallel system in itself, the CA also promises speedup and scalability on vector supercomputers as well as on current accelerators, such as GPGPU and Xeon PHI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A comprehensive introduction to CA theory can be found in [2].

  2. 2.

    The large-scale research project ‘Exastencils’ (http://www.exastencils.org/) is also to be mentioned in this context.

  3. 3.

    Let’s consider a cubic lattice of dimension N. For any lattice element at some position (x, y, z) the unique positional information can be used to derive an initial coloring C o l o r = (((z ∗ N) + y) ∗ N + x.

  4. 4.

    While all modern CPUs do actually support high-throughput instructions that operate on short vectors of data elements (e.g. SSE, AVX, Altivec, etc.), we want to make the distinction against pipelined vector processing, which is capable of processing vectors of arbitrary length while also employing a richer set of instructions compared to standard x86-based processors.

  5. 5.

    As proposed by Holger Berger of NEC Germany.

  6. 6.

    The NEC implementation of OpenMP offers pragma based hints to the compiler which signify independence of nested loops, such as loops ‘row’ and ‘col’ in Fig. 3. By adding ‘#pragma cdir nodep’ to the inner loops, the compiler is set to optimize in more aggressive way.

  7. 7.

    While this may seem to be a limiting factor in the application of the kernel for large CAs, it should be stated that the corresponding amount of necessary GPU-memory quickly fills the available on-chip resources of the accelerator, which might be the main limitation towards employing larger datasets.

  8. 8.

    Employing 3584 cores to a CA-kernel of dimension 10243 would yield hardware utilization rates below of 29%.

  9. 9.

    While code complexity is not regarded an issue on standard CPU-based systems, it certainly can lead to an inflation of the size of the binary executable, which in extreme cases can result in non-executable kernels.

  10. 10.

    Even more so when we want to put this into relation with the age of the hardware concept of this generation of the NEC processor, that apparently goes back at least 5 years from the time of this report.

  11. 11.

    The results that we report here have to be taken with some caution, as the ACE-SX and GTX 1080Ti belong to rather different eras of their respective development time.

References

  1. Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (2008)

    Google Scholar 

  2. Hoekstra, A.G., Kroc, J., Sloot, P.M.A.: Simulating Complex Systems by Cellular Automata. Springer, Berlin (2010)

    MATH  Google Scholar 

  3. Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance Code Generation for Stencil Computations on GPU Architectures. ACM, New York (2012). doi:10.1145/2304576.2304619

    Google Scholar 

  4. Stamatovic, B., Trobec, R.: Cellular automata labeling of connected components in n-dimensional binary lattices. J. Supercomput. 72(11), 4221–4232 (2016). doi:10.1007/s11227-016-1761-4

    Article  Google Scholar 

  5. Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., Leiserson, C.E., The Pochoir Stencil Compiler. ACM, New York (2011). doi:10.1145/1989493.1989508

    Google Scholar 

  6. Trobec, R., Stamatovic, B.: Analysis and classification of flow-carrying backbones in two-dimensional lattices. Adv. Eng. Softw. 103, 38–45 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to sincerely thank Professor Michael M. Resch and the whole team of HLRS for their valuable support, continued guidance and discussions, and provision of systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Zinterhof .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zinterhof, P. (2017). Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices. In: Resch, M., Bez, W., Focht, E., Gienger, M., Kobayashi, H. (eds) Sustained Simulation Performance 2017 . Springer, Cham. https://doi.org/10.1007/978-3-319-66896-3_6

Download citation

Publish with us

Policies and ethics