Large-scale tomographic PIV in forced and mixed convection using a parallel SMART version
Authors
- First Online:
- Received:
- Revised:
- Accepted:
DOI: 10.1007/s00348-012-1301-9
- Cite this article as:
- Kühn, M., Ehrenfried, K., Bosbach, J. et al. Exp Fluids (2012) 53: 91. doi:10.1007/s00348-012-1301-9
- 258 Views
Abstract
Large-scale tomographic particle image velocimetry (tomographic PIV) was used to study large-scale flow structures of turbulent convective air flow in an elongated rectangular convection cell. Three flow cases have been investigated, that is, pure forced convection and mixed convection at two different Archimedes numbers. The Reynolds number was constant at Re = 1.04 × 10^{4} for all cases, while the Archimedes numbers were Ar = 2.1 and 3.6 for the mixed convection cases, corresponding to Rayleigh numbers of Ra = 1.6 × 10^{8} and 2.8 × 10^{8}, respectively. In these investigations, the size of the measurement volume was as large as 840 mm × 500 mm × 240 mm. To allow for statistical analysis of the measured instantaneous flow fields, a large number of samples needed to be evaluated. Therefore, an efficient parallel implementation of the tomographic PIV algorithm was developed, which is based on a version of the simultaneous multiplicative reconstruction technique (SMART). Our algorithm distinguishes itself amongst other features by the fact that it does not store any weighting coefficients. The measurement of forced convection reveals an almost two-dimensional roll structure, which is orientated in the longitudinal cell direction. Its mean velocity field exhibits a core line with a wavy shape and a wavelength, which corresponds to the height and depth of the cell. In the instantaneous fields, the core line oscillates around its mean position. Under the influence of thermal buoyancy forces, the global structure of the flow field changes significantly. At lower Archimedes numbers, the resulting roll-like structure is shifted and deformed as compared to pure forced convection. Additionally, the core line oscillates much more strongly around its mean position due to the interaction of the roll structure with the rising hot air. If the Archimedes number is further increased, the roll-like structure breaks up into four counter-rotating convection rolls as a result of the increased influence of buoyancy forces. Moreover, large-scale tomographic PIV reveals that the orientation of these rolls reflects a ‘W’-like shape in the horizontal X–Z-plane of the convection cell.
1 Introduction
A tomographic particle image velocimetry (tomographic PIV, Elsinga et al. 2006) system for measuring large-scale flow structures in air by using helium-filled soap bubbles (HFSBs) as tracer particles has recently been developed (Kühn et al. 2008, 2011). The technique is referred to as large-scale tomographic PIV and is capable of measuring instantaneous and three-dimensional velocity fields in measurement volumes of the order of one cubic metre with a spatial resolution of centimetres. However, for the application of tomographic PIV to such large measurement volumes using HFSBs as tracer particles, some specific challenges have to be faced. First to note are the spot lights, which appear on the surface of the HFSBs, the position of which slightly changes with the viewing angle of the observer relative to the light source. Second, the size of the particles becomes comparable to the size of a single voxel. Finally, the use of imaging lenses of small focal length results in a noticeable change in the magnification along the line of sight. A detailed discussion of the influence of these issues is given in Kühn et al. (2011).
The large-scale tomographic PIV technique is a very promising approach for studies of time-dependent three-dimensional large-scale flow structures in turbulent thermal and mixed convection. The fundamentals of turbulent thermal convection have been intensively studied in Rayleigh-Bénard cells (Ahlers et al. 2009). Further, turbulent thermal convection plays an important role in many technical applications like the air conditioning of residential buildings or vehicle passenger compartments (Bosbach et al. 2006; Kühn et al. 2009), where the natural convection induced by the occupants is superimposed by the externally driven flow through the air in- and outlets, a flow case, which is referred to as mixed convection.
To study forced and mixed convection in a well-defined environment, a convection cell with a length of 2.5 m and a quadratic cross-section of 0.5 m×0.5 m was developed (see Kühn et al. 2011; Schmeling et al. 2010, 2011). In a previous investigation of forced convection employing large-scale tomographic PIV in this configuration, the mean velocity field revealed a nearly two-dimensional roll-like structure with a homogeneous wavy shape of the core line (Kühn et al. 2011). However, the observed large-scale flow structures were time dependent and three-dimensional. For mixed convection, a break-up of the two-dimensional roll structure into four large-scale convection rolls was observed in experimental studies using planar PIV (Schmeling et al. 2010, 2011). Direct numerical simulations of Rayleigh-Bénard convection in the same configuration revealed a similar arrangement of up to four convection rolls (Kaczorowski and Wagner 2009). While detailed three-dimensional data are available from direct numerical simulations of pure thermal convection (Kaczorowski and Wagner 2009), detailed quantitative information on the spatial extension and temporal behaviour of large-scale structures is still lacking with regard to mixed convection. In order to fill this gap, the presented statistical study of the three-dimensional large-scale flow structures has been conducted by applying the large-scale tomographic PIV system to the mixed convective flow in the cell.
For the statistical analysis of the flow, a large number of samples have to be evaluated, making an efficient tomographic PIV implementation mandatory. Consequently, a parallel version of the tomographic PIV algorithms employing a version of the simultaneous multiplicative algebraic reconstruction technique (SMART, Mishra et al. 1999)^{1} has been developed, which was recently successfully applied in a tomographic PIV experiment by Atkinson and Soria (2009). In contrast to the multiplicative algebraic reconstruction technique (MART, Herman and Lent 1976), which is up to now the standard reconstruction algorithm in tomographic PIV, the SMART is in principle suitable for fully parallel processing of the tomographic reconstruction step on multiple computers.
The paper is outlined as follows: In Sect. 2, the modified version of SMART is described. A validation of its implementation is then provided by comparison with MART data from a real large-scale experiment. Furthermore, a parallel version of the tomographic PIV routines is presented, and subsequently, the application of the large-scale tomographic PIV system to the flow in the convection cell is described. The used experimental set-up and procedure is summarized in Sect. 3, and the large-scale flow structures obtained in forced and mixed convection are discussed in detail in Sect. 4. Finally, the results are summarized in Sect. 5.
2 Modification of the SMART
Up until now, MART has been the standard reconstruction algorithm used in tomographic PIV experiments (Discetti and Astarita 2012; Elsinga et al. 2006; Kühn et al. 2011, Scarano and Poelma 2009; Worth and Nickels 2008). However, the use of the SMART for tomographic PIV, as recently proposed by Atkinson and Soria (2009), principally provides an opportunity for the fully parallel processing of the reconstruction step on multiple computers, which are connected via a network. Recently, this simultaneous reconstruction technique was also used by Atkinson et al. (2011); Buchmann et al. (2011); Kühn et al. (2010) and Schanz et al. (2010) in their tomographic PIV experiments. In this section, a version of SMART for parallel processing is presented.
2.1 Implementation
Thereby, N_{j} denotes the number of pixels that contribute to (2) and μ is a relaxation coefficient. Then, the forward and backward projection process is repeated for all pixels of the next camera recordings for a chosen number of iterations.
This approach is similar to that proposed for the simultaneous algebraic reconstruction technique (SART, Andersen and Kak 1984). However, the workflow differs from the original implementation of Mishra et al. (1999), in which the voxel’s intensity is updated on the basis of the forward projection of all cameras. In other words, Mishra et al. (1999) use the product over all pixels of all cameras observing a specific voxel to calculate the product in Eq. (2) (see also Fig. 1). In our implementation, which is discussed here, the product is first calculated over all pixels of one camera only. This step is then subsequently repeated for all other cameras (see Fig. 1). The main advantage of this approach is the lower memory consumption in the parallel version of the algorithm, since each process only needs to save the actual forward projection. In the original implementation of Mishra et al. (1999), all forward projections have to be saved by all processes. However, this advantage becomes more pronounced if the SMART is applied to camera recordings with a very large number of pixels, that is, more than 10 megapixels, and by using a large number of cores (e.g. 48 cores).
A bilinear interpolation filter is used for calculating the weighting coefficients. The derivation of the weighting coefficients from d_{min} was already described in Kühn et al. (2011). In order to save computational resources, the radius of the projected filter kernel was set to one pixel for all voxels. Due to the decreasing magnification factor with growing distance from the camera, the size of the interpolation kernel increases in physical space. Thus, objects in the volume far away from the camera are resolved by fewer pixels than objects closer to the camera. This is valid for large-scale tomographic PIV experiments employing camera lenses with small focal length. For the data set presented in this paper, the radii of the interpolation kernels differ by 30 % between those voxels farthest away from and closest to the camera. We would like to note that this corresponds to an undersampling of the measurement volume and would lead to reconstruction artefacts if MART was employed (see Mueller 1998). However, no reconstruction artefacts occur if simultaneous reconstruction techniques like SART are used (see Mueller 1998). Of course, the change in resolution is also a function of the thickness of the measurement volume. In tomographic PIV experiments with small measurement volumes in the order of 10 cm³, as, for example, of Elsinga et al. (2006), the mentioned change of resolution is less than 3 %.
To further decrease the computational time, only voxels with non-zero intensity are considered in the reconstruction process as proposed by Atkinson and Soria (2009) and Worth and Nickels (2008). We would like to note that in our SMART implementation the matrix or parts of the matrix containing the necessary weighting coefficients are not stored in the RAM of the computer. The coefficients are rather calculated during the run-time of the algorithms each time they are needed. Hence, no first guess method is applied and non-contributing voxels are filtered out during the run-time using an ‘if-condition’ in the implementation.
2.2 Validation
To validate the modified SMART version, the large-scale tomographic PIV measurement series presented in Kühn et al. (2011) was re-evaluated using the SMART version implemented. In this experiment, forced convection was measured in the convection cell discussed in Sect. 3 (four cameras, seeding density of ~0.03 particle per pixel, measurement volume size of 750 mm × 450 mm × 165 mm or 999 × 600 × 220 voxels, 248 samples). For the re-evaluation of the data with the newly implemented version of SMART, the same parameters were used as for the evaluation with MART (see Kühn et al. 2011).
It turned out that the difference in the mean displacement of the reconstructed particles between both techniques is less than 0.1 voxels after five iterations of the reconstruction algorithms. This indicates that the modified SMART produces no noticeable systematic bias. Furthermore, variations in the measuring accuracy in the volume depth direction are insignificant, indicating that the influence of the above-mentioned change in the size of the interpolation kernel in the volume depth direction (see Sect. 2.1) on the displacement field obtained is negligible for the measurement set-up considered.
The root mean square (RMS) value of the difference between the instantaneous displacements after reconstruction with MART and SMART in the X-, Y- and Z-directions corresponds to 0.11, 0.16 and 0.15 voxels, respectively, if all data points in the measurement volume of the 248 samples are considered. These values are quite similar to those reported in an earlier measurement of a turbulent boundary layer by Atkinson and Soria (2009). They determined RMS values of the displacement differences of 0.21, 0.16 and 0.28 voxels in the X-, Y- and Z-directions after five MART iterations and ten iterations of their SMART version.
However, a closer look on the spatial distribution of the RMS values of the differences in the measurement volume reveals a large bulk region along the volume X-direction, with values as low as 0.05 voxels in X- and Y-directions and 0.1 voxels in Z-direction. These values are at least one-third lower than the RMS values obtained using all data points in the volume. Furthermore, at the borders of the measurement volumes in Y- and Z-directions, the values increase by up to 0.35 voxels. We believe that these local increases are mainly caused by less bright illuminated particles at the borders of the measurement volume. There, the intensity of the light source used decreases (see Kühn et al. 2011), resulting in a lower signal to noise ratio in the particle recordings.
2.3 Calculation time
Summary of the performance of different reconstruction algorithms
Algorithm | t (min) | t per number of particles (s) | t per number of particles and iterations (s) | Notes |
---|---|---|---|---|
MART on desktop computer (Kühn et al. 2011) | 17.0 | 0.057 | 0.0114 | |
SMART on desktop computer (Sect. 2.1) | 19.0 | 0.063 | 0.0126 | |
SMART on HPCC (Sect. 2.1) | 9.3 | 0.031 | 0.0062 | |
SMART (Atkinson and Soria 2009) | 22.4 | 0.134 | 0.0034 | Matrix containing necessary w_{i,j} and respective indices are stored in RAM |
Atkinson and Soria (2009) in turn needed 22.4 min of calculation time to reconstruct the intensity distribution in a measurement volume of 1,000 × 1,000 × 160 voxels with their version of SMART. The reported particle density of the four camera recordings corresponds to 0.01 particle per pixel, and the quoted time includes the time for initialization of the intensity distribution by MLOS (multiplicative line of sight multiplication, see Atkinson and Soria 2009). Furthermore, for their test case, SMART was iterated 40 times, whereas, in our study, SMART was only iterated five times, which appears to be sufficient (see Kühn et al. 2010).
Particularly from comparison with the number of iterations used by Atkinson and Soria (2009), one might conclude that our implementation requires more computation time. The reason for this might be that Atkinson and Soria computed the weighting coefficients of the non-zero voxels during the first iteration and stored them in the RAM of the computer for further reference. The disadvantage of this approach is, of course, that a much larger amount of RAM is required. The actual demand on RAM, however, mainly depends on the particle image density of the camera recordings and the number of voxels in the measurement volume.
Furthermore, we would like to stress that a direct comparison of the necessary calculation time is very difficult since it is also strongly dependent on the computer hardware used as well as the compiler and the compiler settings. Reconstruction of the above-mentioned test case can be conducted in only 9.3 min if the SMART code is compiled using the Intel^{®} C compiler instead of the GNU C compiler (the GNU C compiler is used for the investigations presented at the begin of this subsection) and executed on one core of a high-performance computer cluster (HPCC; each node consists of two Intel^{®} Xeon^{®} Processors X5650 of 2.67 GHz and has 12 cores, 24 MB cache and 48 GB of RAM in total). Hence, the necessary calculation time is halved by only employing different computer hardware and compiler. Note that in cases where the weighting coefficients are calculated during run-time every time they are needed, the calculation time for the tomographic reconstruction is also strongly dependent on the way in which these weighting coefficients are determined. In which case, a very efficient way is the therein used splatting algorithm (see Sect. 2.1 and Mueller 1998).
2.4 Parallel processing of tomographic PIV routines
In order to exploit the complete computational resources of multi-core CPUs with a lower amount of RAM per CPU, an efficient parallelization of the tomographic PIV algorithms is required. Furthermore, by the parallel processing of the reconstruction and correlation routines, a faster processing of single tomographic PIV samples becomes possible. This is indispensable for a fast determination of the optimal processing parameters needed to evaluate a complete tomographic PIV series.
Parallelization of an algorithm is in general possible by two concepts, which differ in the way the necessary memory is utilized. In the first concept, the memory is shared by all employed cores, and it can be easily implemented by using, for example, openMP (open multi-processing, see, for example Chapman et al. 2007). However, the distribution of the workload is limited to the cores of a single CPU. If the workload needs to be distributed to the cores of different CPUs, which are referred to as processes in the following, the algorithm needs to be parallelized using MPI (message passing interface, see, for example Gropp et al. 1999). In such an approach, each process allocates its own memory. However, the implementation of this concept is in general more complicated.
For the sake of much greater flexibility and number of processes that can be employed, the SMART as well as the correlation routines were parallelized using MPI. The parallelization approach realized in the respective code is quite simple: in the reconstruction routine, each process calculates the forward projection (see Fig. 1 and Sect. 2.1) for a specified amount of voxels. When finished for one camera, the forward-projection data need to be interchanged with all processes. Subsequently, in the backward projection, each process updates the voxels already considered in the forward-projection step. It should be noted that due to the modification of the SMART workflow (see Sect. 2.1), less RAM is required in the parallelization since only the actual forward projection needs to be saved by each process compared with all forward projections in the original implementation by Mishra et al. (1999).^{3}
The cross-correlation routine^{4} determining the displacement of the reconstructed particles between the two light pulses is parallelized as follows: each process calculates the displacement in a certain fraction of the measurement volume only and the displacement fields are subsequently recombined. If iterative correlation algorithms are used (see Raffel et al. 2007; Scarano 2002), the data are sent to one master process between the iterations. The master process thus performs the possible filter operations of the intermediate vector fields and subsequently sends the filtered velocity data back to the other processes.
Summary of speed-up and efficiency of parallelization
Algorithm | Number of processes | Low workload | High workload | ||
---|---|---|---|---|---|
Speed-up | Efficiency | Speed-up | Efficiency | ||
SMART | 12 | 7.0 | 0.59 | 10.4 | 0.87 |
SMART | 48 | 18.5 | 0.39 | 42.1 | 0.88 |
Correlation | 12 | 10.2 | 0.85 | 11.1 | 0.93 |
Correlation | 128 | 63.5 | 0.50 | 91.3 | 0.71 |
One very important advantage of parallelization is the ability to optimize the settings of the algorithms by processing single samples of a measurement series very quickly using a high number of processes. Once the optimal settings are determined, the speed-up of the parallel processing can be maximized by evaluating different samples of a complete measurement series at the same time. Thereby, a single sample is evaluated parallel by all cores of one node of the HPCC in order to avoid time-consuming data exchange between the nodes. Hence, the necessary wall-clock time for the evaluation of a complete measurement series can be minimized.
3 Experimental set-up and procedure
3.1 Convection cell and tomographic PIV set-up
For the quantitative measurement of the large-scale flow structures, a specially built tomographic PIV system (see Kühn et al. 2011) was applied to the cell, which covered a measurement volume as large as 840 mm × 500 mm × 240 mm. The position of the volume in the cell is shown in Fig. 3. It was observed by a simultaneously operating camera system, which consisted of five cameras with spatial resolutions of 1,392 × 1,024 pixels (pixelfly, PCO). Each camera was equipped with a f = 21 mm lens (Distagon T* 2.8/21, Carl Zeiss), whose aperture was set to f/4. The arrangement of the camera system is depicted in Fig. 3. Tracer particles of neutrally buoyant helium-filled soap bubbles (HFSBs) with a mean diameter between 0.2 and 0.3 mm were used. A description of the bubble generator is given in Bosbach et al. (2009). The HFSBs were injected through the above-mentioned particle injection chamber and illuminated by a specially built LED light source. A detailed description and a characterization of the light source used are given in Kühn et al. (2011). The light source is located at X = 2,920 mm in front of the cell. At the opposite side of the cell, a mirror back reflects the light.
3.2 Studied flow configurations
Summary of flow cases studied
Flow case | \(\dot {\text{V}}\,(\text{l}/\text{s})\) | ΔT (K) | Re | Ra | Pr | Ar | N_{S} | f_{S} (Hz) | Particle image density (ppp) |
---|---|---|---|---|---|---|---|---|---|
Forced convection (FC) | 20.0 | 0.0 | 1.04 × 10^{4} | 0.0 | 0.7 | 0.0 | 256 | 2/3 | 0.03 |
Mixed convection (MCI) | 20.0 | 13.0 | 1.04 × 10^{4} | 1.6 × 10^{8} | 0.7 | 2.1 | 512 | 2/3 | 0.03 |
Mixed convection (MCII) | 20.0 | 22.0 | 1.04 × 10^{4} | 2.8 × 10^{8} | 0.7 | 3.6 | 512 | 2/3 | 0.03 |
3.3 Processing of tomographic PIV data
For the calibration of the camera system, images of a calibration plate were recorded at five different Z-positions inside the measurement volume (see Kühn et al. 2011 for more details). The volume self-calibration method (Wieneke 2008) was applied in order to further decrease the calibration error of the camera system to less than 0.1 pixels. The measurement volume was discretized by 1,050 × 625 × 300 voxels, so that at the farthest point from the camera the size ratio between a voxel and a magnified pixel equals one. Five SMART iterations were used to reconstruct the three-dimensional intensity distribution. In the correlation routine, a multi-grid approach with four iterations and a final interrogation volume size of 48 × 48 × 32 voxels (38 mm × 38 mm × 26 mm) was used with a 75 % overlap. The interrogation volumes are shifted according to the local velocity field and are additionally deformed in the last iteration step. Thereby, fifth-order B-spline functions were employed for intensity interpolation in the volume (see Raffel et al. 2007).
4 Results
4.1 Forced convection
Standard deviation of core line position in the Y- and Z-direction
Re | Ar | σ_{total,Y} (mm) | σ_{total,Z} (mm) |
---|---|---|---|
1.04 × 10^{4} | 0 | 13.8 | 14.3 |
1.04 × 10^{4} | 2.1 | 27.3 | 24.3 |
4.2 Mixed convection
In the mixed convection flow cases (MCI and MCII), a temperature gradient between the bottom and top plate was applied by switching on the heating plate. This causes buoyancy forces, that is, hot air to rise from the bottom plate, which significantly changes the global flow structure in comparison with pure forced convection.
Hot air rises at approximately X = 2,000 mm from the heating plate to the cooling plate, and thus forms a stagnation point at the top plate of the convection cell (B in Fig. 10 right). The incoming air interacts with this buoyant flow: it mixes with the hot fluid and is deflected laterally away from the stagnation point to the front of the convection cell (positive Z-direction). This leads to the strong deformation of the iso-surface of the velocity magnitude at the top (A in Fig. 10, left). After passing the cooling plate, the air turns back to the bottom plate at the front wall. It then flows obliquely to the back side of the convection cell at Z = 0 mm due to the decreased pressure induced by the rising hot fluid. By passing the heating plate, the fluid heats up again. As a result, counter-rotating convection rolls are formed, which are tilted around the Y-axis (C in Fig. 10 left). A minor fraction of the back-flowing air leaves the convection cell through the air outlet at the bottom (Z = 0 mm).
Moreover, in Fig. 14 (right), α (see Eq. 3) is also plotted along the vertical line. It is revealed that only in the lower region α corresponds to approximately 60° (see Fig. 12 bottom). In the upper part, α is lower and amounts to approximately 45°. It can finally be concluded that the deformation of the average convection rolls is much more complex than initially anticipated.
5 Summary
Large-scale tomographic PIV of forced and mixed convection in an elongated rectangular convection cell has been performed in order to study the large-scale flow structures generated. To be able to evaluate the large number of samples needed for a statistical analysis, an efficient parallel implementation of the tomographic PIV algorithms was developed. It is based on a parallel version of the simultaneous algebraic reconstruction technique (SMART).
The measurement of forced convection in the cell reveals an almost two-dimensional roll structure, which is orientated in longitudinal cell direction. Its mean velocity field exhibits a core line with a wavy shape and a wavelength, which corresponds to the height and depth of the cell. In the instantaneous fields, the core line oscillates around its mean position. Under the influence of thermal buoyancy forces, the global structure of the flow field changes significantly. At lower Archimedes numbers, the resulting roll-like structure is shifted and deformed as compared to pure forced convection. Additionally, the core line oscillates much more strongly around its mean position due to the interaction of the roll structure with the rising hot air. If the Archimedes number is further increased, the roll-like structure breaks up into four counter-rotating convection rolls as a result of the increased influence of buoyancy forces. Moreover, large-scale tomographic PIV reveals that the orientation of these rolls reflects a ‘W’-like shape in the horizontal X–Z-plane of the convection cell.
It should be noted that the calculation time for reconstruction and subsequent cross-correlation is of the order of 1 h when employing current workstation computers.
The total reduction of required RAM can be, for example, as large as 18 GB or 60 % for a measurement volume of 50 voxels in thickness if the SMART is executed by 48 cores and the four camera recordings are 16 megapixels in size each.
Acknowledgments
The authors are grateful to Katharina Rabe for her support during the measurement campaigns.