1 Introduction

X-ray micro-computed tomography (\(\upmu \)-CT) multiphase flow experiments are aimed at imaging the in situ fluid phase distributions in porous media (Wildenschild et al. 2005; Silin et al. 2010; Iglauer et al. 2011; Setiavan et al. 2012; Andrew et al. 2014). Recently, fast synchrotron-based \(\upmu \)-CT has been used to image fluid distributions under dynamic conditions (Berg et al. 2013b); the term “dynamic” means that fluid flow is maintained while imaging, with ms integration times thus enabling dynamic process studies on basis of a series of images. By imaging the pore-scale displacement process during fluid injection and analyzing, the resulting non-wetting phase cluster size redistribution (e.g., Georgiadis et al. 2013) proposed that fundamental mechanisms like invasion percolation for drainage can be validated by experimental data. Additionally, the mechanisms underlying multiphase flow through porous media can be studied and validated by combining pore-scale numerical modeling with \(\upmu \)-CT multiphase flow experiments. However, these approaches require \(\upmu \)-CT images that are segmented into binary images representing the “true” structure of the imaged rock and immiscible phases (Koroteev et al. 2013). In image segmentation, rock grains (e.g., quartz grains in typical sandstone rock) are binarized by applying algorithms that account for either local or global differences of gray-scale values between rock and void space. However, the image segmentation process is non-trivial, and the mislabelling of features occurs easily due to non-ideal signal-to-noise ratios and/or image artefacts. Therefore, the influence of image segmentation on sequential pore space characterization and pore-scale modeling for multiphase flow (i.e., involving parameters important for flow, like permeability, and capillary dominated displacement like capillary pressure) must be thoroughly evaluated in further detail since the accuracy of the segmentation process determines how well the resulting binary image represents the “true” rock structure. Thus, any pore space characterization, analysis of experimental results or flow modeling will ultimately be influenced by subtle differences in image segmentation (Iassonov et al. 2009).

Reconstruction of \(\upmu \)-CT images yields in a 3-dimensional (3D) map of X-ray attenuation coefficients that represent the sample microstructure. This 3D map is constructed of elementary units called voxels that are assigned an attenuation coefficient over a given bit range (e.g., 1–256 for an 8-bit image or 1–65,536 for a 16-bit image). Weak absorbers (low-density materials) exhibit low attenuation and when visualized on a typical gray-scale map these voxels appear as dark regions, whereas strong absorbers (high-density materials) exhibit high attenuation and appear as brighter regions (Ketcham and Carlson 2001). In an ideal situation where the contrast between attenuation values for the different phases (e.g., void and matrix) is large enough, the gray-scale intensity distribution (histogram) of the \(\upmu \)-CT image is bimodal, with two distinct peaks representing each phase. In practice such a situation is almost never encountered. Typical image histograms do exhibit maxima, e.g., for grains and pores but often the range of gray levels in between does include voxels that below to both grains and pores, and hence a histogram-based separation is not easily possible. There is a range of reasons why that happens. Individual voxels are often assigned attenuation values different to those expected for a given material for numerous reasons, e.g., phase contrast, beam hardening and, in particular, so-called partial volume effects (Wildenschild and Sheppard 2013) and noise. Phase contrast is commonly observed in \(\upmu \)-CT data as low intensity areas around phases with higher intensity edges resulting from refraction that is not accounted for during image reconstruction. During image reconstruction, only the linear attenuation of X-rays is considered. However, X-ray trajectories are bent as propagation speed increases when traveling through denser material (refraction) which causes a slight change in direction that is manifested in the resulting image as bright and dark fringes near phase boundaries (Wildenschild and Sheppard 2013). At synchrotron-based and hence monochromatic (narrow bandwidth of energy) \(\upmu \)-CT facilities, this fringing effect can be reduced by decreasing the sample to detector distance (Wildenschild and Sheppard 2013). Unlike for synchrotron facilities that can operate with monochromatic energies, beam hardening is a typical problem when using lab-based polychromatic X-ray energies. Lower energies are attenuated more strongly along the outer perimeter of a sample, which results in higher attenuation coefficients along the perimeter of a sample. As a consequence, the energy spectrum of the beam “hardens” causing the edges of the sample to appear more attenuating (Wildenschild and Sheppard 2013; Jovanović et al. 2013). Partial volume effects occur when at a finite resolution of the image, single voxels represent multiple phases due to overlap in space and thus are assigned an average attenuation value (Ketcham and Carlson 2001). This common artefact occurs, e.g., at grain contacts, at interfaces between phases, in clay-rich regions or in any other region with features below image resolution. Voxels in interfacial and/or grain contact regions are in a complex manner affected by a combination of both partial volume effects and phase contrast. This means that potentially a multitude of individual voxels represents features of different phases albeit with the same attenuation value. Lastly, noise is also a commonly encountered artefact leading to different pixel gray levels. Noise can originate from the X-ray source, the detector, electronics but also from scattering of X-ray from the sample and other solid objects in the beam cone outside of the detector’s direct field of view like sample holders, mounting elements, etc.

Some of the aforementioned effects like noise appear to be reduced by filtering of a dataset, which often results in an image with higher contrast between phases and improves the separation of the phases in the image histogram. However, the filtering may introduce a bias on the results, even though the bimodality in the histogram may visually appear to be improved (e.g., Armstrong et al. 2012). For an overview of different image filters applied in porous media research see Kaestner et al. (2008). According to Wildenschild and Sheppard (2013), the most popular and widely applied image filters are the anisotropic diffusion (Perona and Malik 1990), median filter, or statistical approaches (Besag 1986). In applying image filters for enhancing features of the image, there is always a loss of information (Iassonov et al. 2009), which must be considered in the sequential steps of image segmentation and analysis of results.

For the segmentation process in terms of the binarization of a digital image based on either local or global differences of voxel intensities, a variety of algorithms are available as reviewed by Sezgin and Sankur (2004) and Wirjadi (2007). The most common image segmentation is based on the global threshold approach. Even though it is tempting to manually segment an image by choosing a threshold value which is visually permissive, this choice is operator biased (Wildenschild and Sheppard 2013). To minimize the effect of subjectivity, many automated histogram-based algorithms are applied classifying different phases on the basis of the shape of the image histogram. Among such automated global threshold segmentation algorithms, the Otsu’s algorithm (Otsu 1979) or k-means clustering have proven to produce the most accurate results, as shown in a comparative study by Iassonov et al. (2009). When a histogram-based segmentation algorithm is applied, a mislabelling of voxels with gray-scale intensity values between the peak values in the image histogram will occur, and the extent to which this influences the results is unclear. Schlüter et al. (2014) give an excellent overview of the current state-of-the-art methods in evaluation of image segmentation methods, and report also on novel histogram bias removal methods, e.g., histogram equalization or ROI dilations, that may improve the threshold selection process by decreasing the influence of partial volume effects.

Generally speaking, algorithms that segment data by considering image gradients or local and non-local statistics are also available nowadays and are often preferred over global thresholding. Among the most favored algorithms is indicator kriging (Oh and Lindquist 1999), which has been applied to many different sample types and materials (Prodanovic et al. 2007; Porter and Wildenschild 2010). Another approach used in this study is watershed-based segmentation (Vincent and Soille 1991), which builds on the fact that in a histogram the gray levels around the maxima can be clearly assigned to, e.g., pores and grains acting as seeds. The gray levels in between, which are difficult to assign from the histogram alone, are computed from a region-growing method where phase boundaries are placed along the inflection region of image intensity gradients that typically occur at phase boundaries. In general, the watershed-based segmentation approach is more robust towards long-range gradients and noise compared to global thresholding. A drawback of watershed-based segmentation is the requirement of seed placement prior to the segmentation process, which is often based on a global value and concomitant user bias (Iassonov et al. 2009). Ideally, the seeds are placed in regions of low image intensity gradients (referred to as basins to initiate the growth algorithm); then the seed regions are grown using a watershed algorithm where adjacent seed regions meet and form interfaces at regions with high image intensity gradients (Vincent and Soille 1991; Wang 1997). Another local segmentation algorithm, also evaluated in this study, is hysteresis (or bilevel) segmentation (Vogel and Kretzschmar 1996). For this method, two thresholds \(t_{\mathrm{min}}\) and \(t_{\mathrm{max}}\) are selected for voxels that belong to a low gray value and a high gray value intensity class, respectively. The voxels assigned to the lower class are seed regions for a region-growing algorithm that stepwise assigns any voxel in between both thresholds to the lower intensity class if it is connected to the lower class. The remaining voxels will be assigned to the high class, and therefore the algorithm performs well in connecting isolated voxels of the transition zone (Schlüter et al. 2010).

In general, the current understanding is that no universal segmentation algorithm can produce consistent results for every type of data (Wildenschild and Sheppard 2013), especially, when image quality is poor (noise, low spatial resolution, image artefacts). There are strategies that are more robust than others, but the optimum choice of methods and workflows is still largely dependent on the actual image. Therefore, it is not our intention to give preference to one specific method in this work (also because such a comparison will be practically limited to specific implementation of methods in different software platforms and highly influenced by different levels of maturity of the respective implementation). Our intention is to give a demonstration of how the choice of a segmentation method can influence the outcome and rather focus on the identification of suitable quality control parameters for efficacy meaningful validation of a given image segmentation approach.

However, estimating the bias caused by a given image segmentation algorithm is difficult, as there is no available absolute reference for the segmented structure. The pore structure of a sandstone rock, e.g., is not known a priori and cannot be determined (at least in 3D) by independent techniques. Also bias likely depends on the characteristics and quality of the image data, and likely has to be evaluated for each data series. As best practice, macroscopic properties obtained from computer modeling are often used as validation criteria, whereby the calculated pore structure-related values like porosity (Iassonov et al. 2009; Vogel et al. 2005), permeability (Mostaghimi et al. 2013), and capillary pressure curves (Silin et al. 2010), are compared to experimental data. Permeability can be derived from Lattice–Boltzmann (LBM) single-phase flow simulations on the segmented images (Chen et al. 1991; Coles et al. 1998; Ferreol and Rothmann 1995; Lehmann et al. 2008; Vogel et al. 2005; Zhang and Kwok 2006). Capillary pressure curves can be obtained from quasi-static two-phase flow simulation using a pore morphological approach. Hilpert and Miller (2001) found good agreement between the horizontal part of the experimental capillary pressure curves from mercury intrusion porosimetry (MIP) and values calculated using a pore morphology approach. This approach, first used by Hazlett (1995), operates with several morphological erosion and dilation operations, thus simulating static fluid distributions and saturation of wetting and non-wetting phase in the binary image of the pore space (Hilpert and Miller 2001; Vogel et al. 2005). The pore morphology approach was successfully applied by Silin et al. (2010) to compute capillary pressure curves and model fluid distributions of supercritical \(\hbox {CO}_{2}\) in brine-saturated porous media under reservoir conditions (\(\upmu \)-CT two-phase flow experiment). One of the most systematic studies assessing different segmentation algorithms, on overall performance and global applicability to extracting the pore space of the porous medium, was made by Iassonov et al. (2009), who compared calculated porosity values to experimental values.

However, a single porosity value may be represented by a multitude of different pore space geometries. Even though comparing segmentation results to experimental values is generally being accepted as important, the comparison with experimentally derived porosity values is often unsatisfactory, even if attributed to sample heterogeneity. Unfortunately, if at all performed, the error assessment in most studies is not going beyond this simple comparison. Our aim was a detailed and systematic study to evaluate the bias associated with a given image segmentation process and show a path to a validation workflow for flow and displacement-related purpose that can be used more generally also on different data sets. We do not aim to make an extensive comparison of all different processing and segmentation methods (for that we refer to Schlüter et al. 2014) but rather select a few common methods with largely different characteristics to highlight their sensitiveness. After a brief analysis of common image filters, a comparison is made between global thresholding and watershed-based segmentation. For global thresholding, the intensity threshold value used for image segmentation was systematically varied over a range of visually permissible values, whereas for the watershed-based segmentation, the intensity threshold value for setting the seed points was systematically varied. Then porosity, permeability, and capillary pressure curves were derived from the segmented images and compared to experimental data from MICP and a buoyancy method for porosity (also see Berg et al. 2013a). Furthermore, we compare data from a multiphase flow experiment by Berg et al. (2013b) to fluid distributions modeled using the same segmented data. We will show that the watershed-based segmentation approach was more robust than global thresholding. Moreover, by systematically varying the global threshold value at 5 narrow steps, we gain insight into the region of the image histogram that is most critical for image segmentation and the sequential modeling of fluid flow properties and/or pore space characterization. Using this extended evaluation approach, we identify a category of pixels that are critical for the accurate prediction of fluid flow properties and assess the sensitivity of no only porosity but also different other parameters that can be used for evaluating the efficacy of a given segmentation approach. Overall, we explore (i) two readily available image segmentation approaches, (ii) highlight how subtle differences in user-biases can influence sequential modeling and characterization results, (iii) identify parameters for judging the accuracy of the image segmentation process, and (iv) highlight major challenges associated with modeling pore-scale multiphase flow from digital rock images.

2 Materials and Methods

2.1 Materials

Berea sandstone is easily accessible and well studied (Oren and Bakke 2003). Therefore, upper Berea sandstone is a standard material for flow experiments (core flooding) using micro-CT or for flow modeling with a pore network derived directly from the rock (Hazlett 1995). A rock core measuring 10 mm in height and 4 mm in diameter was drilled from a block of fine-grained upper Berea sandstone (Churcher et al. 1991), surrounded by a polycarbonate tube, heat-shrunk to the core to prevent circumventing fluid flow. An average permeability of 700 mD and a porosity of 19.9 % was determined for 6 different core plugs of the same rock by Shell standard laboratory measurements (Berg et al. 2013a), along with capillary pressure curves from MICP. The mineralogy of the rock was determined by X-ray diffraction, containing 89–90 % quartz, 3.9–8.7 % feldspars, 1.2 % dolomite, 0.2 % siderite, 0.2 % calcite, 1 % illite, 4.3 % kaolinite, 0.2 % clay minerals, and 0.1 % hematite. Churcher et al. (1991) reported similar mineralogies and porosities ranging from 19.04 % to 26.10 % and permeability values between 114 and 1,168 mD for different rock samples of the upper Berea sandstone.

2.2 Experimental Methods: Imaging and Flow Setup

The synchrotron \(\upmu \)-CT images were collected in an in situ experiment performed under dynamic flow conditions at the TOMCAT beamline at the Swiss Light Source, Paul Scherrer Institut, Villigen, Switzerland. For more technical details of the experimental setup, the reader is referred to Berg et al. (2013b). In brief, multiple scanning cycles were conducted for the cylindrical core of Berea sandstone, first the dry rock (pore space filled with air), followed by multiple drainage and imbibition cycles of two-phase fluids (decane and brine). The experiment settings were optimized for fast tomography with monochromatic X-ray beam energy set at 21.25 keV (Mokso et al. 2011). To enhance the X-ray contrast between brine (therefore enhanced contrast between the two phases in the 16-bit output images, which is necessary for image segmentation) and injected decane, the brine was doped with 10 wt % of CsCl salt. A full tomogram (1,440 \(\times \) 1,440 \(\times \) 896 pixel\(^{3}\)) at a spatial resolution of 2.99 \(\upmu \)m voxel length was obtained in as fast as 16.8 s (1,401 projections, 12 ms per integration time). A 2D slice of the dry scan showing image quality of the scan along with different grey value intensity ranges of the different phases can be found in the Supplementary Material.

2.3 Data Processing

As part of the segmentation workflow, the dry scans were filtered and segmented with image filters and segmentation algorithms available within the commercial software package Avizo (FEI Visualization Sciences Group, Mérignac Cedex, France). In this, study grain and pore boundaries were of specific interest, since all simulations were carried out in the segmented pore space. Therefore, special focus was set on the conservation and representation of the boundary regions. More specific results of the filter analysis are shown in the Supplementary Material. The sensitivity of global segmentation to threshold variation was studied on a cubic region-of-interest (ROI volume of \(750^{3}\) voxel) of the anisotropic diffusion filtered images of the dry scan. The initial choice of threshold value referred to as mid value was set on a visual basis representing an intuitive manual segmentation threshold selection by comparing the extent of masked area to the image. In this approach, the maximum gray value intensity threshold for segmentation of the pore space chosen as mid value in the previous step was varied by \(\pm \)1 % of the maximum gray value intensity of the mid value per step. This resulted in five different files, each with a slightly different threshold for the pore space. Each file is named after its relative position towards the mid value (intuitive choice). Starting from the lowest possible visible permissive threshold value and progressing to the maximum possible threshold value, the individual segmented files are referred to as lowest, low, mid, high, and highest respectively. The same procedure was repeated for the assessment of the watershed-based segmentation (two-phase segmentation), varying the threshold value for the seeds for the pores. In the same manner, the \(t_{\mathrm{min}}\) value representing the seed region for the growing algorithm was varied for hysteresis thresholding. The initial values (mid) for \(t_{\mathrm{min}}\) and \(t_{\mathrm{max}}\) were determined closely following a routine by Schlüter et al. (2010). For the \(t_{\mathrm{max}}\) value, the gradient image (sobel filter) was calculated from the gray-scale image and segmented with a global threshold. The resulting mask was multiplied with the gray value image, and \(t_{\mathrm{max}}\) was determined according to the mode of the histogram. To be comparable with the other methods, the \(t_{\mathrm{min}}\) value was determined manually by visual comparison. For all segmented structures, the three studies porosity and permeability were calculated and plotted versus the threshold variation in percent. All different files of the segmented pore space were used for modeling of permeability, flow fields, capillary pressure curves, and fluid distributions. A detailed overview of the workflow along with a description of the REV study is presented in the Supplementary Material.

2.4 Data Modeling

In the first modeling step, the Lattice–Boltzmann method (LBM) was applied on the five different watershed-based, five hysteresis thresholding, and five global segmented files to obtain permeability and flow fields for the full variability in the thus segmented micro-structures. The Par-pac module, implemented in the commercial software package GeoDict (Fraunhofer ITWM, Math2Market GmbH, Kaiserslautern, Germany), operates on a D3Q15 lattice with three dimensions and 15 velocity directions. Input parameters chosen for single-phase flow simulation characterizing the fluid (i.e., water) were a temperature of 20 \(^{\circ }\)C, a density of 998.234 kg/m\(^{3}\), and a dynamic viscosity of 1 g/ms. Permeability was calculated for all three Cartesian directions (\(x,y,z\)), each with periodic boundary conditions. In the model, a pressure drop of 10 Pa was applied, and average velocities in the given direction were calculated. Values (input or calculated) for the length of the sample, fluid viscosity pressure drop, and flow velocity were then used to calculate permeability values from Darcy’s law. The calculated values were then plotted versus threshold variation. Porosity values for the different volumes were obtained by counting the connected pore-voxels, while disconnected pores represented a small fraction of less than 1.5 % of the connected pore volume of the structures. Flow fields and flow paths computed with LBM showing the fluid velocity for each direction at each voxel in m/s were processed and rendered in 3D with Avizo code for visualization in 3D, elucidating the effect of threshold variation.

In the second modeling step, drainage (quasi-static two-phase flow decane/brine) was simulated on selected ROIs with SatuDict, a module implemented in the GeoDict software. The structures of the pore space with varied thresholds for the segmentation of the pore space chosen for this were the watershed-based segmented mid, hysteresis segmented mid and highest and global segmentation lowest and highest respectively. The pore morphology approach used in this modeling step calculates capillary pressure versus saturation curves from different saturation levels. As boundary conditions for the simulation, a tension of 0.0288 N/m for the brine/decane interface (determined experimentally), a contact angle of brine and quartz of 0\(^{\circ }\) (Xie and Morrow 1998), and a step size by which the radii were increased to 0.5 voxel for the largest volume and one voxel for ROI were used. Which of the pores are invaded during simulated drainage or imbibition is determined by several series of morphological erosion and dilation operations on the grains, performed by a spherical structuring element. Hence, for the invasion of an immiscible fluid into the pore space fully saturated with another phase, the model is connected to an infinite reservoir of the invading phase, while the sides are closed. The pores may be invaded only if the phase is connected to the reservoir (Becker et al. 2008; Hilpert and Miller 2001; Vogel et al. 2005). The following steps were performed in sequence for the simulation of drainage:

  • Initially, the pore space is saturated with wetting fluid at a capillary pressure \(P_\mathrm{c} = 0\), and one side of the sample is connected to a reservoir of the intruding phase, while lateral sides are closed.

  • Starting with a minimum diameter of a sphere as structuring element, the pore space is eroded to obtain connectivity of the pores. Eroded pores connected to the reservoir can be invaded, pores that are disconnected from the infinite non-wetting phase reservoir are removed.

  • The connected pore space is dilated by the same sphere radius leading to morphological opening of the pore space. In this way, a pore size distribution of the pores is obtained.

  • Using the Young–Laplace equation, a capillary pressure of the dilated connected pores is obtained. The saturation is calculated by subtracting the volume of unconnected pores from the total pore space.

  • The entire erosion and dilation procedure is then repeated with the next larger sphere radius.

The calculated capillary pressure curves for the present study were compared to experimental MICP data from core plugs. MICP capillary pressure curves require a closure correction, do not show irreducible wetting phase saturation, and generally exhibit higher capillary pressure values compared to the simulated curves due to the interfacial tension and contact angle of mercury and quartz. As a consequence, the experimental capillary pressure values were scaled to the water–oil system, and the curve was closure corrected. In this way, the simulated fluid distributions in the structures were rendered and compared to each other, visualizing the effect of threshold variation. In addition, simulated fluid distributions from the mid watershed-based segmented files were compared to segmented experimental fluid distributions in several selected sub-volumes.

3 Results

The results for threshold variation applying global threshold segmentation on the gray-scale image are shown in Fig. 1, where a blue mask marking the selected threshold in each step is superimposed on a sub-volume of a 2D slice of the filtered gray value image. The maximum threshold value is increased for each step from lowest to highest. The histogram shows the gray value intensity range (16-bit) and the maximum gray value threshold of each of the five threshold selections, marked with a red line and labeled with lowest to highest, respectively. In other words, the images of the masked area within the visually permissive range (left) are corresponding to the threshold value selected for the pore space in the histogram (right). The results demonstrate that the range of threshold values appear reasonable to the human eye. Therefore, the chosen threshold value is highly subjective, and any value within this range could be a likely solution.

Fig. 1
figure 1

Selection of gray value threshold range for segmentation with stepwise increasing maximum threshold from lowest to highest choice, as shown by the blue mask superimposed on gray level image (left) representing the selected area, and the corresponding maximum thresholds in the histogram (right, 16-bit gray scale intensity range, but only the interval 10,605–19,000 is shown)

As shown in Fig. 2, even subtle variation of the threshold value for the pore space results in a significant variation for both the calculated porosity and permeability. Each value is displayed as a relative change of the mid value in percent. Note that the mid value represents an arbitrary selected absolute maximum gray value intensity value representing the intuitive segmentation by eye. The terms lowest, low, high, and highest correspond to the relative position towards the mid value (intuitive choice). The low value has a 1 % lower maximum threshold value selection than the mid, and the highest has a 2 % higher maximum threshold selection than the mid value, respectively. Watershed-based segmentation derived values varied by maximum 8 % for permeability and less than 1 % for porosity, respectively. In contrast, for permeability, maximum values derived from global threshold segmentation varied by 28 and 32 %, respectively, and about 8 % for porosity. The variation for hysteresis thresholding was from \(-\)16 to 19 % for permeability and about 6 % for porosity. The absolute permeability values calculated from global segmentation (green line) ranged from 600 to 1,000 mD. Porosity values between 16 and 19 % were obtained. For the watershed-based segmented files, the permeability and porosity variation was lower, ranging from 680 to 780 mD and 16 to 17 %, respectively. For the last approach, the permeability was in between 520 and 730 mD, and the porosity varied from 16.7 to 18.7 %. For all cases, however, porosity values estimated from the digital data were less than the 19.9 % porosity measured on a larger core sample in the laboratory. However, the permeability value of the mid threshold selection (722 mD) was close to experimental permeability values of 700 mD. This threshold selection was thus reasonable, while the mid selection of the hysteresis thresholding segmentation underestimated permeability by 100 mD. Only the highest selection for the latter approach showed an equally good match with the experimental values as the mid selection for the watershed-based segmentation. However, a better match in porosity compared to watershed-based segmentation was observed (hysteresis: 18.73 %, watershed-based: 16.57 % experimental: 19.9 %). Clearly porosity is less sensitive to threshold variation, showing a linear trend in response to threshold variation, while permeability appears to be much more sensitive, showing a nonlinear trend (Fig. 2). Values derived from the watershed-based segmentation were more robust towards threshold variation, while global segmentation derived parameters showed a broader distribution. Comparing two extreme cases in terms of absolute numbers from the lowest to the highest threshold selection, 3,804,494 more voxels were included in the watershed-based segmentation, but even four times more (12,716,205 voxels) were included in the global threshold segmentation. The number of voxels from lowest to highest for global threshold selection is 3 % of the total amount of voxels of the volume, demonstrating that the voxels causing significant permeability changes are situated in a very narrow area in the histogram.

Fig. 2
figure 2

Porosity and permeability values derived from the increasing gray value threshold sensitivity tests for pore space segmented by the watershed (red), global threshold (green), and hysteresis thresholding (blue) approach, respectively. Values are expressed as relative difference to the mid value and show that variation magnitude is significantly higher for the global threshold segmentation approach for permeability in particular compared to the other two segmentation approaches

To visualize the effect of threshold variation, flow fields from modeling with LBM in the lowest and high threshold structures (global segmentation) were compared. A closer look on the selected ROI revealed that individual flow pathways were opened at higher threshold values, i.e., in narrow channels or in micro-porous areas. The first image of Fig. 3 shows the gray value image of a subsection with several quartz grains and narrow pores with a tube-shaped channel in its center. The other images of Fig. 3 show the corresponding globally segmented binary images (with lowest and high threshold values for the pore space). A comparison reveals that the depicted narrow pore throat was opened in both images but at different throat widths. The throat was wider by at least two voxels at its narrowest part in the high threshold selection file. Voxels affected by threshold variation thus were situated in transition regions representing phase boundaries with partial volume effects.

Fig. 3
figure 3

Three ROI images showing quartz grains and a narrow canal-shaped pore in the center of the 2D slice with varying widths, depending on the selected threshold value

Changing the pore throat width has a direct consequence for the flow field as illustrated in Fig. 4 (both in 2D and in 3D). In the 2D slices, the connection is only accessed half way for the lowest threshold value, while flow in the connection is apparently open to the adjacent pore body in the high threshold image. In the 3D view, no flow connection exists for the lower threshold, while for the high threshold, the flow path is fully connected. It is also evident from the fluid velocities that the average flow velocity increased as the pore diameter generally increased with the higher threshold value. This general trend can be observed also from visualizations of the entire flow field in 3D (supplementary material). Apparently, smaller pathways have formed, and overall flux has increased in the large pore bodies from lowest to high threshold selection. However, flow pathways of a small size such as that shown in Fig. 4 also influence the magnitude of stagnant areas and play a more important role for diffusion and dispersion processes (Bijeljic et al. 2013).

Fig. 4
figure 4

2D slices showing that threshold increase (left, top to bottom) caused invasion of narrow channels. The same view in 3D (right) reveals that pore bodies were connected, and overall flux was increased (top to bottom)

Fig. 5
figure 5

Comparison of calculated capillary pressure to experimental curve shows that the plateau of the watershed-based and the highest global threshold segmented structure is in good agreement with experimental data, indicating that the main pore throats are well resolved. Note that the experimental curve was closure corrected shifting it to the right. In addition, the capillary pressure values were scaled to the water–oil system. The simulated curves show residual wetting phase saturation and therefore do not reach a saturation below 0.2. Close to the resolution limit curves start to diverge, showing that the micro-structure is not well segmented

The permeability calculated for the watershed-based segmentation approach showed good agreement with experimental data. For instance, the mid watershed-based segmentation gives a match in permeability but at a less than 17 % porosity, which is lower than the experimental values of 19.9 %. To explain this intriguing difference, capillary pressure was used as a third matching criterion. Thus, capillary pressure curves for the lowest and highest global threshold, for the mid and highest hysteresis thresholding, and mid value watershed-based segmented pore space were compared to the experimental data in Fig. 5. The shape of the graph indicated that the horizontal part of the capillary pressure curve from the mid threshold value of the watershed-segmented and the highest global structure showed good agreement, while global threshold lowest overestimated, and the hysteresis threshold segmented structure significantly overestimated the saturation and capillary pressure of the curve. Strong resemblance of the middle part of simulated and experimental curves suggests that the core plug and sample pore throat geometry is similar, and that the main pore throats are captured well with watershed-based mid and global threshold highest segmentation. At a capillary pressure value of 6500 to 7000 Pa, curves started to diverge. It appears that this divergence corresponds to image resolution, since this range represents the resolution limit, and the micro-porous areas appear to be not well resolved. Note that scaling the capillary pressure values to a dimensionless J-value using the Leverett J-function (Leverett 1941) showed that the curves are very sensitive to the porosity and permeability values chosen as input parameters. Scaling to the calculated parameters of the image resulted in skewed curves, especially, for the global segmented structures. Most likely this was due to the porosity values which were underestimated due to the partial volume effects. Therefore, the (manually) closure-corrected MICP curve was scaled to the water–oil system for our comparison.

The difference images of the mid watershed-based and highest hysteresis threshold segmented files are shown in Fig. 6, indicating that hysteresis thresholding included more voxels to the void space in the clay-rich regions. In contrast, looking at the pore throats and pore and quartz grain contacts, the results are more ambiguous. It was expected that in general, pore throats were more narrow in the hysteresis segmented file for the capillary pressure values are higher compared to the other curves, but this could not be verified from the 2D images.

Fig. 6
figure 6

Comparison of the hysteresis thresholding highest (a) and watershed-based mid (b) segmented file. c The voxels of the difference images where orange represents the voxels that were segmented as solid by the hysteresis thresholding method and white representing the voxels segmented as solid by the watershed-based (mid) segmentation

To visualize the effect of threshold value variation on modeling, the corresponding fluid distributions from the simulation in the globally segmented files were rendered in Fig. 7. Each fluid distribution for the highest threshold value structure on top represents the same capillary pressure as below for the lowest structure. As expected, the simulation process resulted in different saturation levels and fluid distributions for the different structure. The higher the segmentation threshold value, the more pore volume is filled with decane per step.

Fig. 7
figure 7

3D rendering of the simulated drainage in the two different global threshold segmented structures (lowest and highest global threshold value increasing from bottom to top). Comparing each step corresponding to a fixed capillary pressure in top and bottom reveals that for initial steps (ac), fluid distributions look significantly different

Finally, comparison of segmented pore drainage events from the experiment by Berg et al. (2013b) to modeled fluid distributions in a ROI of the mid watershed-based segmented structure revealed that fluid distributions could be modeled reasonably well as shown in Fig. 8. In the first image, the modeled fluid distribution at the bottom (light green) differs from experimental data (rendered in dark green), while in the following steps, the distribution is in reasonably good agreement. The pore morphology approach worked well on larger volumes [about 30–40 % of total field of view (FoV)] and results in a reasonably good match with experimental data. The segmentation process of experimental data and additional experimental and modeled fluid distributions are shown in the Supplementary Material.

Fig. 8
figure 8

Comparison of experiment (top) to modeled (bottom, input file pore space: watershed-based segmentation) fluid distributions, showing that a reasonably good agreement is established for the central and last images of each sequence (b, c and e, f). The initial fluid distribution (a, d) is very different because of different boundary conditions

4 Discussion

A result of the sensitivity assessment of different segmentation algorithms (Fig. 2) was that the operator bias can be significant when manually selecting a threshold value. This holds true especially for global threshold segmentation. Marginal variation in selected thresholds within the visually permissive range caused large variations of the derived porosity and permeability. Variation was stronger for permeability than for porosity for all segmentation routines suggesting that permeability is more sensitive towards threshold variation and consequently more relevant as a validation criterion. Global threshold segmentation produced extremely biased results overall. This was most likely caused by single voxels affected by partial volume effects in combination with phase contrast situated in the phase transition zone. These voxels affected by threshold variation were situated in a narrow part of the histogram, within the range of only about \(\pm \)3 % of the total voxels between both peaks. Close inspection of segmented images and flow fields from LBM simulation revealed that the threshold variation caused pore throat diameters to vary, thus altering the derived permeability. This shows that only a narrow part of the histogram is relevant for an accurate segmentation of the flow-controlling voxels in the transition zone. The overall good agreement of calculated and experimental permeability of 720 and 700 mD, respectively, indicated that the main structure was well represented by the mid file watershed-based segmentation. The same impression was reflected by the good match of the plateau of the experimental and modeled capillary pressure curve derived from the image. From that we conclude that the main pore structure was well segmented using the watershed-based approach. Only for capillary pressure values representing pores close to and below the resolution limit did the curves diverge, indicating that structures were not resolved well. The porosity offset was probably related to partial volume effects from pores with diameters below imaging resolution, e.g., in clay-rich regions. The micro-porous structure in clay-rich regions is causing partial volume effects in the corresponding voxels (see the supplementary material); thus, gradients between pores and clay minerals were low. As a consequence, the watershed-based algorithm could not successfully place boundaries based on the gradients. In addition, the seed placement in clay-rich regions was hampered for average voxel gray-scale intensities were higher than for macro pores. However, in direct comparison, the hysteresis thresholding results overall produced a closer match to experimental porosity values, while not showing such a dramatic shift in permeability as the global segmentation, suggesting that the segmentation is reasonable. Comparing the capillary pressure curves from hysteresis thresholding to the other curves, a significant overestimation of the curve was observed indicating that the threshold selection for \(t_{\mathrm{max}}\) failed resulting in false labeling of the pore throats. The comparison of 2D images of the segmented structures could not explain this offset. Maybe the connectivity was increased for the voxels in the transition zone for the hysteresis segmented files, as expected from the implementation of the method, adding to the overall porosity and permeability and explaining part of the offset. The individual two capillary pressure curves of the hysteresis method in comparison, however, did not differ significantly indicating that the pore throat segmentation was similar. Schlüter et al. (2010) propose to base the \(t_{\mathrm{max}}\) threshold selection on more than one filter, and there are more robust routes to determine \(t_{\mathrm{max}}\) values. Hysteresis threshold results were robust relative to each other but skewed with respect to absolute capillary pressure values. A more careful selection might have resulted in better segmentation results. Still the watershed-based segmentation produced the most robust results overall. This agrees well with observations of other groups that segmentation algorithms based on local criteria are more accurate (Iassonov et al. 2009; Porter and Wildenschild 2010; Wildenschild and Sheppard 2013). Overall, the results show that porosity values can differ from experimental data, while the main structure of the pore space may still be represented well. This agrees well with a study by Andrew et al. (2013), where for a Ketton Oolite limestone core, an offset of calculated to experimental porosity values was observed related to pores below \(\upmu \)-CT resolution. Furthermore, Sok et al. (2009) revealed that the micro-porosity of carbonate rocks can contribute largely to the total porosity of the sample. According to Hurst and Nadeau (1989), up to 50 %, porosity for diagenetic kaolinite clay-rich areas may account for a few percent of the total effective porosity in this study. Simultaneously, the simulated permeability values can show a reasonable agreement to experimental data; however, the capillary pressure curves do differ as seen for the hysteresis thresholded files. Vice versa permeability may be over predicted, while capillary pressure is in reasonable agreement as demonstrated for the global segmented file with the highest threshold. The conclusion is that permeability and capillary pressure curves may still serve as a robust validation criterion, but it is important to look at multiple parameters at the same time. Porosity, however, seems to be the poorest validation criterion.

The pore morphology approach was applied by many different groups for modeling quasi-static fluid distribution from multiphase flow experiments (Silin et al. 2010). In the present study, the visualization of the fluid distribution for the varied segmentation thresholds shows that the fluid distribution is significantly different, emphasizing that a correct representation of the pore structure is vital for the modeling results for otherwise the distributions differ. The agreement between segmented fluid distributions from a dynamic flow experiment and modeled static fluid distributions is reasonably good, demonstrating that the segmentation was successful, even though different shortcomings were observed resulting from different boundary conditions and the implementation of the method.

The morphological approach includes only capillary forces but accounts for connectivity. Neglecting viscous forces in modeling pore-scale displacement could lead to a misrepresentation of the pore-scale physics. In a recent publication, Armstrong and Berg (2013) demonstrate that capillary action is a consequence of differences between leading and trailing menisci during a pore drainage event, resulting in large fluid velocities near the displacement front. The main issue when comparing experiment with modeling results is that the fluid distribution and hence long-range connectivity outside of the imaged FoV are virtually unknown because of the capillary differences range over multiple pores distance (Armstrong and Berg 2013). Since these differences between menisci over multiple pores can cause the redistribution of fluids (Armstrong and Berg 2013), the displacement processes cannot be represented correctly if the ROI is too small to properly represent phase topology. The first challenge, which might be accommodated by most modeling tools, is to provide meaningful boundary conditions for a modeling domain, i.e., which pore will lead to inflow of, e.g., a non-wetting phase in drainage, connected to the non-wetting phase outside the modeled ROI. Only after properly defining, the pore connectivity can the impact of higher order boundary effects be studied, i.e., the difference between model-assumed constant flow and pressure and the reality, which is neither constant flux nor constant pressure (Berg et al. 2013b).

Even though the pore morphological approach was tested on small volumes only, we believe that it is suited for a pre-selection of samples before the actual flow experiments in a synchrotron facility, where beamtime is limited. For instance, samples already scanned in a benchtop \(\upmu \)-CT can be used for the simulation, and in this way, the expected flow pattern can be studied and suitable samples selected. This may apply for samples with more complex pore geometry, e.g., carbonate rocks, as long as the pore shape is more or less spherical. For more different geometries, e.g., sheet- or crack-like pores, the interfaces will be misrepresented. The pore morphology approach was shown to be easily applicable for this task does not require as much computational power as the LBM approach, and is therefore potentially well suited for integration into a multiphase flow image processing workflow. From our simulations and experimental fluid distributions, we see that the main pore bodies are more relevant for two-phase flow than small pores, in agreement with the literature reports (Wildenschild and Sheppard 2013; Okabe and Oseto 2006; Silin et al. 2010; Blunt et al. 2013). Beyond that our extensive and unique sensitivity study elucidated how threshold variation can change the flow path geometry for narrow pore throats and in micro-porous areas.

5 Summary and Conclusions

Our study of synchrotron \(\upmu \)-CT images of Berea sandstone shows in detail how sensitive three arbitrarily chosen segmentation methods, global, hysteresis thresholding, and watershed-based segmentation are towards variation of manually required input parameters in the respective methods. For all investigated methods, a gray value intensity threshold is required, which for all methods was systematically varied by \(\pm \)1 % of the intuitive intensity value per step, selected on a visual basis. For validation of the result of the segmentation, the properties, porosity, permeability, capillary pressure curves, and quasi-static fluid distributions were derived directly from the image data, using Lattice–Boltzmann single-phase flow simulations for permeability and a pore morphological approach for capillary pressure. All derived parameters were compared to experimental data from MICP and porosity measurements as well as experimental fluid distributions from a synchrotron-based two-phase flow experiment in the same samples, to ensure consistency between the computed capillary pressure and the associated fluid distributions for the morphological approach. It was found that modeled fluid distributions were in reasonably good agreement to experimental data which validates the morphological approach for quasi-static applications like the computation of Hg-air intrusion. Differences between experiment and simulation relate to unknown distributions outside the imaged FoV which set the connectivity boundary conditions.

The three different segmentation methods lead to different results for the computed properties porosity, permeability, and capillary pressure, which in turn also showed very different sensitivity to the selection of the input parameters. Generally, permeability is more sensitive to the selection of input parameters than porosity. A narrow area of the histogram contained critical voxels of pore throats that were opened or closed by threshold variation. Hence, variations of threshold values can also lead to opening and closing of flow paths and impact larger-scale connectivity, which may also impact parameters sensitive to stagnant flow regions like dispersion. Also porosity showed a large offset to measured porosity which is explained by sub-resolution micro-porosity and mixed-pixel porosity. Therefore, permeability is a more reliable and also more sensitive validation parameter and hence clearly preferred over porosity. Also capillary pressure was identified as a suitable parameter for validation. However, it was found that neither permeability nor capillary pressure curve alone could sufficiently indicate whether the structure was represented well or not. For the case of watershed segmentation, consistency for permeability and capillary pressure was achieved, but there was an offset for porosity due to micro-porosity. We consequently suggest the comparison of both, the permeability values and capillary pressure curves at the same time for validation. Using porosity as a validation criterion only may result in errors regardless of the type of segmentation algorithm used.

Other segmentation methods were less successful. Using global thresholding on a visual basis will lead to excessive variation of calculated parameters and large operator bias, while watershed-based segmentation was far more effective in accurately segmenting the main structural features appropriately. Micro-porous regions below imaging resolution did not contribute to the flow behavior, most likely causing an offset with respect to experimental values of porosity. Hysteresis thresholding was more robust towards threshold variation and the segmentation favored labelling of the voxels in micro-porous areas, as pore space, potentially leading to elevated porosity and permeability values. However, an overprediction of the capillary pressure curves was observed indicating that the segmentation of the pore throats was less ideal due to the upper threshold selection.