Efficient ray casting of volumetric images using distance maps for empty space skipping

Volume and isosurface rendering are methods of projecting volumetric images to two dimensions for visualisation. These methods are common in medical imaging and scientific visualisation. Head-mounted optical see-through displays have recently become an affordable technology and are a promising platform for volumetric image visualisation. Images displayed on a head-mounted display must be presented at a high frame rate and with low latency to compensate for head motion. High latency can be jarring and may cause cybersickness which has similar symptoms to motion sickness. Volumetric images can be very computationally expensive to render as they often have hundreds of millions of scalar values. Fortunately, certain materials in images such as air surrounding an object boundary are often made transparent and need not be sampled, which improves rendering efficiency. In our previous work we introduced a novel ray traversal technique for rendering large sparse volumetric images at high frame rates. The method relied on the computation of an occupancy and distance map to speed up ray traversal through empty regions. In this work we achieve higher frame rates than our previous work with an improved method of resuming empty space skipping and the use of anisotropic Chebyshev distance maps. An optimised algorithm for computing Chebyshev distance maps on a graphical processing unit is introduced supporting real-time transfer function editing.


Introduction
Volumetric images (volumes) are three-dimensional scalar fields of voxels (volume + pixels). Volumes can be acquired with imaging technologies such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) which make it possible to see inside opaque physical objects or people.
Computer graphics hardware and pipelines have generally been optimised for rendering surface representations of scenes. Volumes are not surfaces and must be rendered with specialised volume rendering techniques. Volume ray casting is a common volume rendering method well suited to acceleration on a Graphical Processing Unit (GPU) [1][2][3]. Rays are projected from each pixel of a display which pass through volumes and composite the colours and opacities of the voxels they pass through.
Voxels can be assigned a colour and opacity based on their scalar value (and often their gradient magnitude) according to a transfer function. Transfer functions can be manipulated by a user to control the appearance and visibility of different materials. Lighting effects such as shadowing, specularity, and scattering can be applied but are not widely used in medical image analysis.
Isosurfaces are surfaces representing constant scalar values within a volume [4]. An isosurface can be extracted as a mesh and rendered using conventional surface rendering techniques. Isosurfaces can also be rendered using ray casting directly on an image without a mesh extraction [5].
Modern augmented reality Head-Mounted Displays (HMDs) which are optically see-through enable new volumetric image visualisation applications. These devices are able to resolve their position as a user moves his/her head and eyes. This makes it possible for a virtual object to be displayed in a user's field of view which may appear fixed in space or potentially aligned to a physical object. For example, a surgeon could visualise a CT or MRI image overlaid on a patient without referring to an external display. This application would provide depth perception of internal structures within the body and a surgeon would not need to look away from their hands.
Images must be displayed on a HMD at high frame rates and with low latency to sufficiently compensate for head and eye motion. Otherwise, the wearer of a HMD may experience cybersickness. Virtual reality HMDs fully replace a user's view of the world and current generation devices require frames to be rendered at 90 Hz. Recent augmented reality HMDs with see-through displays require lower frame rates of around 60 Hz.
Volume rendering can be very computationally expensive, particularly for high resolution images. Rays passing through a volume may require a large number of samples of the volumetric image texture before an output colour is determined or an isosurface is located. Ray casting must be accelerated to higher frame rates for visualisation on HMDs.
Most volumetric images have some portion of empty space which does not contribute to the output image. For volume rendering, empty space refers to voxels which have zero opacity as defined by the transfer function. For example, the air surrounding an object and some object materials may be made entirely transparent. Voxels which are not adjacent to surface interfaces are considered empty for isosurface rendering. Empty space can be skipped to reduce the number of texture samples and improve frame rates.
We recently introduced an efficient empty space skipping approach for ray casting [6]. A reduced resolution occupancy map is generated describing which regions of a volume are occupied. A distance map is then generated which encodes the Chebyshev distance to the nearest occupied region. The distance map is used to efficiently leap rays past empty regions during voxel traversal and reduce the number of unnecessary texture samples. This paper is an extended version of our previous short format paper [6] which provides more details of the method and introduces: • an optimisation of the ray traversal approach to more effectively skip empty space within and behind objects, • an algorithm for efficiently generating isotropic and anisotropic Chebyshev distance maps on a GPU for accelerated ray casting, and • an extended performance analysis of the approach with 2D transfer functions and various occupancy map region sizes.

Related work
Empty space can be skipped when ray casting by only sampling between geometry which tightly bounds the occupied space within a volume [5]. The bounding geometry may be defined by coarse bricks or a meshed isosurface. These approaches typically only skip external empty space and may have a high overhead from rasterisation of the bounding geometry. Generating the bounding geometry can also be computationally expensive. Kruger and Westermann [1] created an occupancy map indicating which 8 3 regions of a volume are empty. A ray caster can use the occupancy map to skip sampling points in empty regions and reduce the total number of texture samples. Their occupancy map has a low memory overhead as it has a low resolution compared to the volume.
Distance maps encoding the Chebyshev distance metric to the nearest occupied voxel have proven to be effective for ray casting acceleration [7,8]. A ray is able to skip many voxels with a single sample of the distance map if far away from an occupied voxel. These approaches can substantially improve frame rates for sparse images but the time needed to generate distance maps can inhibit interactive transfer function editing.
Massive volumetric images which may not fit entirely within GPU memory are becoming increasingly common from simulations and high-resolution imaging technology. They can be rendered by only allocating occupied bricks (regions) on the GPU [3,9,10]. Bricks can be referenced in hierarchical structures (such as an octree) and empty bricks are simply skipped as the structure is traversed.
SparseLeap [10] rasterises occupied hierarchically referenced bricks into per-pixel linked lists. Only the occupied segments of rays go through the ray casting pipeline. Their method outperformed octree-based empty space skipping and could be used to render terabyte sized sparse volumes. Bricks are not loaded onto the GPU if they are outside of the field of view or occluded.
Our previous paper described an efficient method of empty space skipping utilising distance maps [6]. Interactive transfer function editing was supported by updating reduced resolution distance maps on the GPU and massive volumes could be rendered by only allocating occupied bricks. Our method outperformed SparseLeap, octree-based, and bounding geometry (isosurface) acceleration on a set of images.

Ray traversal
A volumetric image is stored on a GPU as a 3D texture with dimensions d = [width, height, depth]. Voxels stored in a texture (texels) can be referenced by texel coordinates which can be unnormalised, u ∈ [0, d), or normalised, t ∈ [0, 1], and are related according to Volumes are typically sampled at equidistant points along rays that pass through them. A ray intersecting with a volume from t entry to t exit can be sampled at n equidistant points as defined by where − − → max is the maximum component operator and f is a sampling factor used for quality adjustment. This equation is viewpoint independent and ensures that there is at least one sample per voxel passed through (on average) provided f is greater than or equal to 1.
The change in normalised texture coordinates between each sample point is and the i th sampling point along the ray is given by A volume renderer will sample the volume (with trilinear interpolation) at each t i from i = 0 to n − 1 and map the sampled values to colours and opacities. This process can become very computationally expensive and memory bandwidth intensive for large images where n is large.

Gradient map
2D transfer functions which use both voxel scalar values and gradient magnitudes can be used to highlight material boundaries and hide thick homogeneous regions when volume rendering. The gradient magnitude of a voxel can be computed efficiently using the tetrahedron method [11] shown in Algorithm 1 which is written in the OpenGL Shading Language (GLSL). This method requires only 4 voxel neighbour samples unlike the conventional central differences approach which requires 6. x + k. yyx * imageLoad ( image , pos + k. yyx ).x + k. yxy * imageLoad ( image , pos + k. yxy ).x + k. xxx * imageLoad ( image , pos + k. xxx ).x); return length ( gradient_dir ); } Additional 4 texture samples at each sampling point to compute gradients can significantly reduce frame rates. Gradients can instead be precomputed and stored for faster sampling during ray casting.

Occupancy map
We create an occupancy map indicating which B ∈ N 3 sized regions, or blocks, of a volume are empty given a transfer function or isosurface thresholds. The occupancy map is a simple acceleration structure used to avoid sampling voxel values and gradients in regions where all voxels have zero opacity. It is a foundation for more advanced acceleration structures.
The occupancy map has dimensions d occ = d vol /B . The memory requirement of an occupancy map is trivial for larger block sizes. For example, a block size of B = 4 3 requires 1/64 th of the memory of an associated volume. The occupancy map voxel at u occ is associated with a volume block with extents: Texture sampling coordinates on the volume are mapped to the occupancy map according to This mapping is necessary as B may not evenly divide d vol so t vol and t occ may not be equivalent. The exact method of computing an occupancy map depends on whether it will be used for volume rendering or isosurface rendering. If volume rendering, a region is considered occupied if it contains any voxels with non-zero alpha as set by the transfer function. The occupancy map must be updated when the transfer function changes.
An isosurface exists in a region (and it is considered occupied) when the isosurface threshold lies between the minimum and maximum voxel values within the region. An isosurface may exist at the boundary between two regions so a single voxel halo around each region must also be considered when generating an occupancy map for isosurface rendering.
An occupancy map can be generated from a volume with a simple compute shader. Each shader invocation evaluates a unique volume block for occupancy and writes the result to the associated occupancy map voxel.
The performance and limitations of such a simple compute shader is evaluated in Section 4.3.

Efficient ray traversal
If an occupancy map voxel at u occ indicates an empty volume block then the ray segment within that block can be skipped. Figure 1 shows a 2D schematic of a ray passing through an occupancy map. The sampling points along the ray are at Δu intervals. The number of steps, Δi, to reach a sampling point in the next region from u needs to be computed.
The first sampling point outside of the region on dimension j ∈ {x, y, z} is  can be vectorised to give the steps on each dimension: where step is the Heaviside step function. The first sampling point outside of the region surrounding u will be the minimum Δi j , thus which is clamped to a minimum of 1 as Δu will be infinite on one dimension for an axis-aligned ray. The Δu for a ray passing through the occupancy map is computed as which follows from Eq. (6). The occupancy map is repeatedly sampled and Eq. (9) applied to skip the ray forward until an occupied region is found. The volume is then sampled at each successive sampling point. A ray will skip more empty voxels for every occupancy map sample if B is larger. However, it will not be able to skip as close to occupied voxels as more empty voxels may be included in occupied regions. This suggests there may be an optimal B which will be explored in Section 4.2.
A sampling point near the edge of an empty block may map to a non-zero opacity if a neighbouring block is occupied due to trilinear interpolation of the volume. Similarly, an isosurface may exist on the boundary between an occupied and an empty region. To account for this, the ray step number is decremented after an occupied region is found according to where f is the sampling factor from Eq. (2). This equation ensures the ray goes back at least one voxel length. The ray must not go back as far as the previous furthest i sampled on the volume. The occupancy map is only sampled again if u occ changes when i is incremented and the previously sampled voxel had zero opacity. Empty space skipping resumes as early as possible and the occupancy map is not sampled more than necessary. This method of resuming empty space skipping is faster than our original approach which resumed occupancy map sampling after a set length of empty voxels [6].

Chebyshev distance map
The Chebyshev distance metric is the greatest difference between two points along any dimension and is defined as where p 1 and p 2 are two points of interest. A distance map can be derived from an occupancy map representing the Chebyshev distance, D, to the nearest occupied block for every block. A ray can skip at least D blocks on any dimension before it may reach an occupied block. The number of steps to the next sampling point D blocks away on dimension j ∈ {x, y, z} is determined from Fig. 1 as Distance maps are not commonly used for ray casting acceleration because of the computational complexity of generating them which must be done every time the transfer function changes. Previous papers using Chebyshev distance maps for ray casting acceleration generated them at the full resolution of the volume with a serial algorithm [7,8].
Our method of skipping rays forward with a Chebyshev distance map is simpler than previous approaches [7,8] and supports lower resolution distance maps which reduces memory overhead and their computation time. In Section 4 we examine the impact on ray casting performance when using a reduced resolution distance map.
We have adapted the algorithm of Saito and Toriwaki [12] to efficiently transform an occupancy map into a Chebyshev distance map using the parallel capability of a GPU. The transformation is decomposed into three one-dimensional transformations which are all trivially parallelised as they operate on rows, columns, and then slices independently.
The first transformation is shown as a GLSL compute shader in Algorithm 2. This approach adapts the first transformation described in Ref. [12] for Chebyshev distance rather than squared Euclidean distance. The first transformation can safely read The second transformation is shown in Algoritihm 3 and has several differences from the original approach in Ref. [12]. The columns from transformation 1 are copied to a 1D buffer in the original approach. We instead read directly from the output of transformation 1 and write the output to a swap image with matching dimensions to the distance map. Another change is the transformation runs in a single pass and zigzags out from each voxel until the minimum distance is found. This reduces the number of IO operations and is faster than the original algorithm.
The third transformation is similar to the second but is applied on slices rather than columns. Algorithm 3 describes the minor changes to the second transformation for it to function as the third. Distances output from the second transformation are read from the swap image and the final Chebyshev distance is written to the distance map.
The distance map does not need to be a separate texture to the occupancy map as it can simply overwrite it. However, the swap image still doubles the overall memory requirements of this acceleration structure compared to just using an occupancy map. Distances are stored as unsigned 8-bit integers to reduce memory and bandwidth requirements which limits the maximum distance to 255. This seems sufficient for most images given the distance map is at a lower resolution than the volumetric image being rendered.

Anisotropic Chebyshev distance map
A ray exiting an occupied region will initially only skip small distances as D is small. Ideally, the ray would skip a large distance when exiting an occupied region provided the next occupied region in the direction of the ray is far away. This can be accomplished using anisotropic Chebyshev distances.
Ray directions can be grouped into 8 octants in 3D defined by the sign of each component (±x, ±y, ±z). Distance maps representing the distance to the nearest occupied region for each directional octant are needed. These can be generated by modifying the isotropic Chebyshev distance map transformations (Algorithms 2 and 3) to scan in single directions only.
The transformations could be applied in sequence to generate the distance maps for each directional octant which would require a total of 24 transformations. However, the first and second transformations only need to run 2 and 4 times respectively (rather than 8) as the intermediate distance maps they generate can be reused.
The ray caster simply needs to index the correct anisotropic distance map texture and can then apply the same ray traversal approach using Eq. (14) as described in Section 3.5. The index only needs to be computed once for each ray as the direction is fixed for the entire traversal.
Es andİşler [8] take the acceleration structure even further and compute the extended anisotropic Chebyshev distance. This represents the distance that can be skipped on the x, y, and z axis separately for each octant. This approach is effective for skipping empty spaces which are thin in some dimensions. However, it requires 24 times the memory of the original isotropic Chebyshev distance map approach described in Section 3.5. We did not implement this approach as the performance gains would likely be quite marginal or possibly worse given the higher memory bandwidth requirements.

Testing methodology
Maximising rendering frame rates is the primary motivation for developing our ray casting method. This section will discuss the improvements in frame rate with the occupancy map and distance map acceleration structures described in the previous section. The update time for acceleration structures and their memory requirements are also important considerations in assessing the viability of our method.
Three publicly available [13] volumetric images with a range of dimensions were evaluated: • a small image of a "present" with thin occupied layers, • a "stag beetle" with large exterior empty regions and some internal empty space, and • a large "king snake" with an internal skeleton composed of fine complex geometry. The dimensions of each image are shown in Table 1.
The volumes were rescaled to an unsigned 8-bit representation and normalised before upload to the where n 0 and n 1 are image specific normalisation constants specified in Table 1. Volume renderings of the evaluated images are shown in Fig. 2 which are rendered using our open source volume renderer [14]. The images have a wide range of occupied voxel proportions and are rendered with both 1D and 2D (gradient-based) transfer functions.
Voxels are mapped to an opacity (alpha value) with a simple equation: where v is the sampled voxel value, g is the gradient, and v 0 , v 1 , g 0 , g 1 are parameters which define the transfer function. Voxel values and gradients are normalised between 0 and 1 from their underlying 8-bit representation when sampled. Transfer functions are generally not defined by such simple functions. They are usually created through some complex user interaction and stored in a 2D texture for efficient lookup. For consistency with other volume renderers, Eq. (16) is applied to precompute the opacity for each unique intensity and gradient pair which is stored in a 256×256 texture. Colour is also stored and is set as greyscale with intensity proportional to the alpha value.
The transfer functions evaluated for each image are defined in Table 2. If g 0 and g 1 are not specified then the transfer function is 1D, a gradient = 1, and the gradient is not sampled during ray casting.
Frame rates are measured by rotating volumes at a rate of 90 • every 60 frames about their vertical axis. The average frame rate is taken over 1000 frames. Volumetric images were scaled to occupy 1 cubic meter and positioned at a distance of √ 3 meters from the camera in the virtual coordinate space. The camera has a perspective projection with a 1 radian horizontal and vertical FOV. In this configuration the entire volume is within the viewport and no clipping occurs.
Early ray termination acceleration is disabled for all tests which stops rays when an opacity limit is reached [1]. This ensures rays traverse all the way to back of the volume regardless of the transfer function.
The sampling factor f from Eq. (2) is 1.
Baseline frame rates with no ray casting acceleration enabled are shown in Table 2. Images were rendered to a 1200×1200 viewport which is the approximate display resolution for a single eye on a modern virtual reality HMD. If the frame rates are halved to account for the second eye display then only the low resolution "present" image with a 1D transfer function renders fast enough without acceleration.

Frame rate improvements
Frame rates were measured for the images with each empty space skipping acceleration structure discussed in Section 3 and are shown in Table 3. The maximum average frame rate with each acceleration structure is shown which is determined over a range of block sizes from 2 3 to 6 3 . Using 2D transfer functions significantly reduces baseline frame rates even though precomputed gradients are used. Table 3 reveals the occupancy map structure gives a substantial relative performance boost from the baseline frame rate in all cases. It is already fast enough to render most of the images on a HMD (except the snake with a 2D transfer function).
The frame rate is always improved when using an isotropic Chebyshev distance map compared to an occupancy map for acceleration. The improvement ranges from 4.5% for the "king snake" image with a 2D transfer function to 56% for the "beetle" image with a 1D transfer function.
Frame rates are improved by up to 20% with an anisotropic Chebyshev distance map compared to an isotropic distance map. However, for some images the performance improvement is negligible. The largest relative frame rate increase occurs on the snake image which shows that anisotropic Chebyshev distance maps are most effective for skipping empty space near fine complex geometry. Figure 3 shows the relative frame rate compared to the baseline with each acceleration structure as a function of the block size. Frame rates with occupancy map ray traversal on the snake image worsen at larger block sizes. This can be explained by the high geometric complexity of the snake which results in a lot of empty space being included in occupied blocks at larger block sizes. Conversely, the beetle image benefits from larger block sizes when using an occupancy map as the large open regions exterior to the beetle can be skipped with fewer samples of the occupancy map. Frame rates are consistently improved with an isotropic distance map but degrade as the block size increases. This makes intuitive sense as rays are not able to skip as close to occupied voxels. A very small block size may not be practical as more memory and computation time is required (particularly for anisotropic distance maps). The additional performance with an anisotropic distance map compared to an isotropic distance map is quite marginal in most cases. Given this, it is unlikely that using extended anisotropic Chebyshev distance maps (see Section 3.6) would substantially improve frame rates. Figure 4 qualitatively shows the number of texture samples with each ray casting acceleration structure. The difference gets progressively smaller between each acceleration structure. This indicates the approach may be reaching its performance limit and further suggests extended anisotropic distance maps may not provide meaningful performance gains. Fig. 4 Relative number of total texture samples (white is more) with each acceleration structure for the "king snake" image.

Distance map update time
Occupancy and distance maps must be updated whenever the transfer function is changed. The delay between adjusting a transfer function and seeing the resulting rendered image should be low to support interactive transfer function editing.
The time to generate occupancy maps for the evaluation images is shown for multiple block sizes in the left of Fig. 5 (note that the y scale is logarithmic). Occupancy maps can take much longer to generate with 2D transfer functions as more texture samples are required to determine the opacity of each voxel. The occupancy map update time worsens for block sizes over 4 3 as GPU resources are not as well utilised by the compute shader described in Section 3.3. The work needed for each individual occupancy map region could be divided up with a more complicated compute shader, although we did not explore this.
Isotropic or anisotropic Chebyshev distance maps are computed from the occupancy map for the faster ray casting approaches. The time to generate these distance maps (inclusive of the time taken to update the occupancy map) is also shown in Fig. 5 over a range of block sizes.
For a cubic block size of B = (b, b, b), the occupancy and distance map have only 1/b 3 of the number of volume voxels. Thus it is expected for the computation time to increase exponentially as b approaches 1 which is observed in Fig. 5. Distance maps are generated faster from occupancy maps with higher occupancies as the average distance from each voxel that must be searched in Algorithm 3 is reduced.
The total time to generate distance maps is not necessarily minimised by choosing a large block size due to the overhead of generating the occupancy map. The isotropic Chebyshev distance is computed most efficiently when using a 2D transfer function with a block size of 4 3 or 5 3 as seen in Fig. 5. The isotropic distance map update time with a 4 3 block size on the large "king snake" image is only 37 milliseconds which enables highly interactive transfer function editing.
Anisotropic Chebyshev distance maps take much longer to compute than isotropic distance maps. Isotropic Chebyshev distance maps at 4 3 or 5 3 could be utilised for fast interactive transfer function prototyping. After a suitable transfer function has been identified, an isotropic or anisotropic distance map with a smaller block size could be generated to improve ray casting performance.

Limitations
In our previous paper a large sparse volume which did not fit directly in GPU memory was rendered by only allocating occupied regions on the GPU [6]. A distance map does not require much memory if the block size is large; however, this can limit potential performance gains as seen in Fig. 3.
Performance could be improved by using multiresolution distance maps. Sparse higher resolution distance maps could be allocated only near occupied regions to enable rays to skip closer to occupied space without drastically increasing memory requirements. However, we have not explored methods of efficiently generating or effectively utilising sparse multiresolution distance maps.

Conclusions
This work accelerates ray casting for volume or isosurface rendering by utilising a low resolution Chebyshev distance map for empty space skipping of sparse volumes. This work improves on the performance of our earlier approach [6] with more efficient methods of resuming empty space skipping and updating distance maps. Large sparse volumes can be rendered at high frame rates even with computationally expensive gradient-based transfer functions.
Distance maps can be generated at reduced resolutions with a low memory requirement and still provide excellent frame rate improvements. They can be updated in real-time enabling interactive transfer function editing for volume rendering. Anisotropic Chebyshev distance maps were evaluated but the frame rate improvements were marginal in most cases. This suggests acceleration with distance maps may be reaching a performance limit. Faster sampling within occupied regions without compromising rendering quality is the next challenge for volume rendering acceleration.
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.