Abstract
Labeling connected components in binary lattices is a basic function in image processing with applications in a range of fields, such as robotic vision, machine learning, and even computational fluid dynamics (CFD, percolation theory). While standard algorithms often employ recursive designs that seem ill-suited for parallel execution as well as being prone to excessive memory consumption and even stack-overflows, the described new algorithm is based on a cellular automaton (CA) that is immune against these drawbacks. Furthermore, being an inherently parallel system in itself, the CA also promises speedup and scalability on vector supercomputers as well as on current accelerators, such as GPGPU and Xeon PHI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A comprehensive introduction to CA theory can be found in [2].
- 2.
The large-scale research project ‘Exastencils’ (http://www.exastencils.org/) is also to be mentioned in this context.
- 3.
Let’s consider a cubic lattice of dimension N. For any lattice element at some position (x, y, z) the unique positional information can be used to derive an initial coloring C o l o r = (((z ∗ N) + y) ∗ N + x.
- 4.
While all modern CPUs do actually support high-throughput instructions that operate on short vectors of data elements (e.g. SSE, AVX, Altivec, etc.), we want to make the distinction against pipelined vector processing, which is capable of processing vectors of arbitrary length while also employing a richer set of instructions compared to standard x86-based processors.
- 5.
As proposed by Holger Berger of NEC Germany.
- 6.
The NEC implementation of OpenMP offers pragma based hints to the compiler which signify independence of nested loops, such as loops ‘row’ and ‘col’ in Fig. 3. By adding ‘#pragma cdir nodep’ to the inner loops, the compiler is set to optimize in more aggressive way.
- 7.
While this may seem to be a limiting factor in the application of the kernel for large CAs, it should be stated that the corresponding amount of necessary GPU-memory quickly fills the available on-chip resources of the accelerator, which might be the main limitation towards employing larger datasets.
- 8.
Employing 3584 cores to a CA-kernel of dimension 10243 would yield hardware utilization rates below of 29%.
- 9.
While code complexity is not regarded an issue on standard CPU-based systems, it certainly can lead to an inflation of the size of the binary executable, which in extreme cases can result in non-executable kernels.
- 10.
Even more so when we want to put this into relation with the age of the hardware concept of this generation of the NEC processor, that apparently goes back at least 5 years from the time of this report.
- 11.
The results that we report here have to be taken with some caution, as the ACE-SX and GTX 1080Ti belong to rather different eras of their respective development time.
References
Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (2008)
Hoekstra, A.G., Kroc, J., Sloot, P.M.A.: Simulating Complex Systems by Cellular Automata. Springer, Berlin (2010)
Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance Code Generation for Stencil Computations on GPU Architectures. ACM, New York (2012). doi:10.1145/2304576.2304619
Stamatovic, B., Trobec, R.: Cellular automata labeling of connected components in n-dimensional binary lattices. J. Supercomput. 72(11), 4221–4232 (2016). doi:10.1007/s11227-016-1761-4
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., Leiserson, C.E., The Pochoir Stencil Compiler. ACM, New York (2011). doi:10.1145/1989493.1989508
Trobec, R., Stamatovic, B.: Analysis and classification of flow-carrying backbones in two-dimensional lattices. Adv. Eng. Softw. 103, 38–45 (2015)
Acknowledgements
We would like to sincerely thank Professor Michael M. Resch and the whole team of HLRS for their valuable support, continued guidance and discussions, and provision of systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zinterhof, P. (2017). Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices. In: Resch, M., Bez, W., Focht, E., Gienger, M., Kobayashi, H. (eds) Sustained Simulation Performance 2017 . Springer, Cham. https://doi.org/10.1007/978-3-319-66896-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-66896-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66895-6
Online ISBN: 978-3-319-66896-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)