1 Introduction

Fractures and their associated geometry typically govern the behaviour of rock masses. The geometry of the two surfaces defining the fracture is often referred to as the roughness of the fracture. The roughness, together with strength and deformability of the surrounding rock, control the mechanical properties of the fracture by the behaviour of the contacts, whilst groundwater flow and transport of solutes are controlled by the voids between the contacts. Hence, when characterising rock mass behaviour a useful starting point is a description of the fracture roughness (Brown 1981).

Barton (1973) initially suggested that fracture roughness could be quantified using the joint roughness coefficient expressed as:

$${\text{JRC}}=\frac{{{{\tan }^{ - 1}}\left( {{\tau \mathord{\left/ {\vphantom {\tau {{\sigma _{\text{n}}}}}} \right. \kern-0pt} {{\sigma _{\text{n}}}}}} \right) - {\Phi _{\text{b}}}}}{{{{\log }_{10}}\left( {{{{\sigma _{\text{c}}}} \mathord{\left/ {\vphantom {{{\sigma _{\text{c}}}} {{\sigma _{\text{n}}}}}} \right. \kern-0pt} {{\sigma _{\text{n}}}}}} \right)}}$$
(1)

where σn is the normal stress, τ is the peak shear strength for the normal stress σn, Φb is the basic friction angle, and σc is the rock compressive strength.

The values for the parameters in Eq. (1) were determined from laboratory direct shear tests that were carried out with a constant normal stress. In 1977, Barton and Choubey introduced the ten type-profiles as a practical means for visually estimating JRC. However, Beer et al. (2002) found that it is challenging to visually estimate JRC to obtain consistent results. The study showed that about 50 individual judgements were needed to get a stable mean and variance of the interpreted JRC, which is not practically feasible. Since the 1980s there have been several attempts to quantify JRC and address the deficiencies described by Beer et al. (2002). Those attempts were reviewed by Grasselli (2006) who introduced a quantitative three-dimensional surface parameter to replace JRC. However, the method by Grasselli (2006) presumes the fracture surface to be fully exposed, which is rarely the case in engineering applications.

Bandis et al. (1981) suggested that the JRC methodology which was initially based on millimetre-scale laboratory tests could be scaled by measuring the inclination of asperities sampled with step-sizes approximately 2% of the length of each specimen. This implies that it is possible to determine a JRC value from a 100-mm sample and scale it to the size of the engineering problem, say tenth of metre. Bandis et al. (1981) concluded, based on their model results, that JRC reduces significantly with increasing fracture length. Whether the methodology proposed by Bandis et al. (1981) is appropriate for engineering-scale problems is challenging to validate.

Mandelbrot (1985) indicated that fractures should conform to self-affine surfaces and Russ (1994) proclaimed that the circumference of the intersection between a fractal surface and a plane is self-similar if, and only if, the cutting plane is parallel to the mean fractal surface. By showing that the divider method does not work properly on fracture traces, Den Outer et al. (1995) confirmed that fracture traces cannot be self-similar. More recent researchers (Renard et al. 2006; Candela et al. 2009, 2012; Brodsky et al. 2011) have shown that fracture geometry can be described as mono-fractal self-affine surfaces over several orders of magnitude. This implies that the appearance of fractures are scale dependent; see e.g. Fig. 1.

Fig. 1
figure 1

Two traces with the same fractal parameters, H = 0.800 and σδh(0.1 mm) = 0.025 mm. Due to using linear scaling the 1000 mm long, 10 times down-scaled, lower trace appears to be smoother than the 10 times up-scaled 10 mm upper trace. This despite the upper trace being a small portion of the lower trace

Self-affine fractals need two parameters to be fully constrained; the fractal dimension, which steers the persistence of correlation between vertices of digitalised data, and the magnitude parameter to scale the values at the vertices. The fractal dimension can be expressed as the Hurst exponent, H. Its relationship to the dimension of a trace, D1D, and dimension of a surface, D2D, can be described as, e.g. Russ (1994):

$$\begin{gathered} H=2 - {D_{1{\text{D}}}} \hfill \\ H=3 - {D_{2{\text{D}}}} \hfill \\ \end{gathered}$$
(2)

By substituting H in Eq. (2) it is shown that the difference in fractal dimension between a fracture surface and any of its traces is exactly 1. This does, however, not mean nor demand that fractures need to have the same fractal dimension in different directions, only that the fractal dimension of the surface in the direction of the trace is one unit larger. Hence, fracture traces can be used to determine the dimension of a fracture surface in the direction of the trace.

The magnitude parameter can be described in different ways. For example Brown (1987), Malinverno (1990) and Johansson and Stille (2014) use the constant κ0.5 in Eq. (3); Odling (1994) and Hong-fa et al. (2002) use σδhx)2; whilst Renard et al. (2006), Candela et al. (2009) and Stigsson (2015) use σδhx), i.e. the standard deviation of height differences of points Δx apart.

$$\sigma \delta h\left( {\Delta x} \right)=\sqrt \kappa \cdot \Delta {x^H}$$
(3)

Despite the knowledge that fractures are self-affine and hence need two parameters to be constrained, there are numerous examples (from Turk et al. 1987; Li and Huang 2015; via e.g.; Lee et al. 1990; Wakabayashi and Fukushige 1992; Xie and Wang 1999; Jiang et al. 2006; Bae et al. 2011, etc.), where only the fractal dimension of fracture traces have been evaluated using methods only applicable to strictly self-similar lines, such as the divider method, compass walking, or h-l method. These erroneously evaluated dimensions have then been used to develop relations to infer JRC. Results from such studies are therefore highly questionable. Other researchers have instead only concentrated on different asperity measures, e.g. standard deviation of the first derivative of the profile, Z2, or roughness profile index, RP (e.g. Tse and Cruden 1979; Yang et al. 2001; Tatone and Grasselli 2010). Due to the nature of self-affine fractals, these methods become sensitive to the sampling resolution and new relationships have to be derived for each new resolution used. Another issue is that there is an indication that the conversion from Z2 to JRC may give unrealistic results for traces with JRC values below 3–6, shown in Sect. 8 in the Online Resource 1. An early exception is Odling (1994) who examined the relationship between JRC and H and the relationship between JRC and the structure function, i.e. the variance of the correlation function. However, he never combined the evaluations to infer JRC. Another exception is Hong-fa et al. (2002) who also used the variance of the correlation function to estimate both the fractal dimension and asperity distribution simultaneously. Unfortunately, they only used the results to create random replicas of the ten type traces and compared how well these replicas reflected different relationships to other established measures such as the resolution-sensitive relationship between JRC and Z2 and the relationship between JRC and the erroneously evaluated fractal dimension, D, using the divider method (please refer to Sects. 6 and 8 in the Online Resource 1, attached to the study, for issues regarding the usage of these two methods).

Despite the advances in the research field, the subjective methodology originally proposed by Barton and Choubey (1977) is still widely used in rock engineering practice. In this study fractures are treated as mono-fractal self-affine surfaces. A combination of four different evaluation methods, applicable to fracture traces, are used to infer the fractal dimension and asperity measure of the ten type traces in Barton and Choubey (1977) together with the seven traces in Bakhtar and Barton (1984). The results from this inference are used to develop a novel conceptual model that objectively estimates JRC as accurately as a large ensemble of geologists, from mapped fracture traces only.

The paper is organized as follows: the four methods to infer the fractal parameters, applicable to mono-fractal self-affine traces, are briefly presented together with one method of generating synthetic fracture traces in the methods section. Thereafter a section dealing with uncertainties and biases when evaluating fracture traces follows before the traces in Barton and Choubey (1977) and Bakhtar and Barton (1984) are analysed. In the conceptual model section three multilinear models are developed and one model is chosen as the most appropriate to infer JRC from fracture traces. Thereafter the performance of the model is shown before the discussion about the work and how the model can be further developed. The study ends with some conclusions of the work carried out.

2 Methods

There are various methods to determine the Hurst exponent, H, and the asperity measure, σδhL), of mono-fractal self-affine fracture traces. The methods have different biases and uncertainties and hence may result in different interpretations of H and σδhL) depending on resolution, trace length and the value of H itself (Malinverno 1990; Gallant et al. 1994; Russ 1994; Candela et al. 2009; and Sect. 5 in Online Resource 1).

Using several evaluation methods, and evaluating each method’s bias and uncertainty, it is possible to make more accurate and robust inferences of the fractal parameters. Hence, not one single method, but four, are used for the determination of the fractal parameters.

Fast Fourier transform, FFT, is a quick way to translate time series between the time domain and the frequency domain (Cooley and Tukey 1965). Using the same algorithm, there is an equivalent transform between the spatial domain, the trace, and the “length frequency” domain, i.e. the power spectrum. The slope and the intercept of the power spectrum are used to infer H and, σδh(1p).

To make a self-affine trace look similar at different magnifications, the ordinate needs to be scaled by λH if the abscissa is scaled by λ (Candela et al. 2009). This relationship is employed by the standard deviation of the correlation function method, RMS-COR, to infer H and σδh(1p) from the slope and intercept of height differences at different distances.

The Korcak plot of zero sets, Zero set/Korcak, makes use of the intersections between a self-affine trace and the abscissa conforming to a Cantor dust. The complementary cumulative distribution function of the lengths between these intersections is used to infer H.

The box counting method uses the relationship between the number of boxes visited by the trace and the number of divisions of the parent box. This relation is used to infer H.

The above-mentioned four methods have different advantages and disadvantages, listed in Table 1. They are described in, e.g. Russ (1994) for FFT and Zero set/Korcak, Renard et al. (2006) and Candela et al. (2009) for RMS-COR and Malinverno (1990) for box counting. As a service to the interested reader, in depth information is given in Sect. 5 in Online Resource 1 attached to this study, together with the computer codes used.

Table 1 Summary of the four evaluation models used

There are several methods to generate random fractal lines and surfaces, for example random midpoint displacement method, conditionalised random midpoint displacement method, Mandelbrot–Weierstrass functions, or IFT of power spectrum, described in, e.g. Penttinen and Virtamo (2000), Russ (1994) and Saupe (1988). Based on the findings in Saupe (1988), Gallant et al. (1994) and Russ (1994), the inverse fast Fourier transform, IFT, of power spectrum seems to be the most appropriate method to generate self-affine traces. Hence this method is employed to generate synthetic fracture traces throughout this study. The method is elaborated in Sect. 4 in Online Resource 1.

3 Uncertainty Studies of Synthetic Traces

The mean and variance of the inferred fractal parameters may depend on the generated H itself and on the number of vertices used during the evaluation. The Hurst exponent will, however, not depend on the magnification of the asperities, whilst the asperity measure, σδh(1p), will only be linearly affected by the scaling. To evaluate the bias of each method due to H and number of vertices, a set of synthetic traces are generated and analysed.

3.1 Number of Realisations Needed to Get Stable Measures

The number of traces needed to get stable mean and variance of the fractal parameters are studied using two different Hurst exponents, H = 0.975 and H = 0.600, both with σδh(1p) = 0.20. The Hurst exponents are chosen to reflect one high value and one low value, whilst σδh(1p) is arbitrary since it is not supposed to impact the number of realisations. The full study is provided in Sect. 9 in the Online Resource 1 where it is shown that the two setups show similar results. Both the arithmetic mean and variance, represented by the standard deviation, have stabilised after evaluating 128–256 traces for all methods except the Zero set/Korcak method that needs 512 traces to be stable; see Fig. 2. To have some margin to the minimum number of required traces, 1024 realisations and evaluations are carried out in the analyses.

Fig. 2
figure 2

The number of traces needed to get stable arithmetic mean and standard deviation using Zero set/Korcak evaluation

3.2 Evaluation of Hurst Exponent, H

The effect that the generated H will have on the inferred H is analysed by comparing equally long traces with different generated values of H. The size of the effect will differ due to the length of the evaluated trace and hence different lengths of traces are extracted as well (from traces with 65,536 traces down to 64 vertices). The results from the analysis are presented in Sect. 10 in the Online Resource 1 and only a summary of the findings follow below.

As expected, evaluating H using FFT gives a mean that is exactly on the 1:1 line since it is the inverse of the generation method. For the other three methods the slope of the evaluated H vs generated H is less than 1:1, resulting in a bias in the mean, dependent on the Hurst exponent. The different evaluation methods have different variances, but the variance of each method is almost not dependent on the generated Hurst exponent; see Fig. 3a.

Fig. 3
figure 3

The evaluated arithmetic mean of the Hurst exponent, as markers, and standard deviation, as whiskers. a Evaluated H as a function of generated H from traces with 1024 vertices. b Evaluated H as a function of the number of vertices used by shortening the trace length for generated H = 0.975. c Evaluated H as a function of the number of vertices used by skipping in-between vertices for generated H = 0.975

The number of vertices analysed can be altered in two ways. Either all adjacent vertices are used of a shorter sub-trace, or the full length of the trace is used but skipping fractions of in-between vertices. As the number of vertices decreases, the bias in both the mean and variance will increase. Whether shortening the trace or skipping in-between vertices, the difference in the bias of the mean is negligible, cf., Fig. 3b, c. However, there is a difference in variance. The variance, expressed as standard deviation, will be about half if the full trace, skipping in-between vertices, is used compared to using all adjacent vertices on a shorter trace, cf., Fig. 3b, c.

3.3 Evaluation of Asperity Measure, σδhL)

When measuring a fracture trace there will always be a trade-off between high resolution of a small section or low resolution of a long trace. Measuring a full fracture trace with high resolution will always give the correct σδhL) for ΔL equal to the resolution or larger. However, in the case that only a fraction of the fracture trace is measured, the value of σδh(1p) will be underestimated compared to the full trace value due to the de-trending of the short evaluated trace.

Evaluating the trace using all adjacent vertices of a shorter sub-trace, both the bias in the mean and the variance around the mean will increase as the trace gets shorter; see Fig. 4a. The bias and variance will be larger the larger H of the trace. Decreasing the number of vertices by skipping in-between vertices will scale σδhL) as Eq. (3), shown as the theoretical line in Fig. 4b. As the number of vertices decreases, the bias of the mean increases together with an increase in the variance. The variance, expressed as standard deviation, is however small, less than 2% and hence not seen in Fig. 4b. The slope of the inferred values in Fig. 4b is only 0.91, compared to the theoretical value 0.975, indicating that the IFT of the power spectrum method is not capable of generating correct traces as H approaches 1. Estimating σδhL) for ΔL smaller than the distance between the measured points is delicate. Due to the need of extrapolation of a power function with an uncertain value of the exponent, H, the estimated σδhL) will be highly uncertain as the trace gets shorter; see Fig. 4c.

Fig. 4
figure 4

The evaluated standard deviation of the asperity differences, σδhL), using generated H = 0.975. a The effect of using full resolution, 1p, but changing the length of the evaluated trace. b The effect of changing the resolution, i.e. the distance between the evaluated vertices is extended, while the full length of the trace is kept. c The effect of changing the resolution using full length trace, but extrapolating the result below the distance between the evaluated vertices

4 Evaluation of Type Curves

The ten type curves in Barton and Choubey (1977) together with the seven traces in Bakhtar and Barton (1984) are chosen as the basis for developing a model to infer JRC from the fractal parameters of a fracture trace.

4.1 Data

The ten type curves in Barton and Choubey (1977) have been widely used for visual interpretation of fracture roughness. Each curve is the result of drawing the most representative profile out of three measurements using a profile gauge. The rods of the gauge were 1 mm wide and hence 1 mm is the maximum possible resolution. Jang et al. (2014) developed a method to digitise the ten profiles with 0.1 mm resolution using a computer algorithm. Using this algorithm, the length of the traces varies between 96 and 101 mm. As a complement to the high-resolution algorithm-based digitalisation in Jang et al. (2014), the traces were manually digitised by us for this study. The digitalisation procedure is described in Sect. 11 in Online Resource 1 and the manually digitised vertices are provided in Online Resource 2. To get some idea of the uncertainty during manual digitalisation, the traces were digitised twice, leaving a few days in between, first from left to right and then from right to left. Figure 5 shows a comparison between the three digitalisations of an excerpt of curve number 7, i.e. JRC = 12.8.

Fig. 5
figure 5

Close-up of difference between the algorithm-based digitalisation, presented in Jang et al. (2014), and the manual digitalisation, by us, of type curve 7, JRC = 12.8, in Barton and Choubey (1977). Observe that the length axis is exaggerated 10 times and the height axis 50 times

Digitising the ten curves using higher resolution than the profile gauge used by Barton and Choubey (1977), i.e. ΔL < 1 mm, does not provide any new fractal information but interpolation and noise. Using all 0.1 mm vertices in Jang et al. (2014) will result in a too steep slope of the linear regression and hence a too high estimation of H. This is due to the relatively straight lines between the 1.0 mm equidistant vertices, shown in Fig. 5, and hence too low σδhL) for ΔL < 1 mm. The lack of presence of small-scale undulation is clearly seen as a drop in the power spectrum for the high frequencies between logarithmic frequency 2 and 2.5 in Fig. 6. The evaluations of the ten type curves in Barton and Choubey (1977) are hence made using 1 mm equidistance both for the algorithm-based digitalisation and the manual digitalisation.

Fig. 6
figure 6

The power spectrum of the algorithm-based digitalisation by Jang et al. (2014) of JRC type curve 7 in Barton and Choubey (1977) using 0.1 mm resolution. The drop in power for the high frequencies, circa 2–2.5, is an indication of lack of presence of those frequencies, i.e. there is only interpolation and noise measured

Before analysing the traces they are rectified using Deming regression or de-trend together with vertical adjustment to avoid artificial high power biases of low frequencies. Another issue is to find the correct start vertex, i.e. the one that best reflects the position of each rod of the profile gauge. The start vertex is found by maximising σδh(1 mm) starting at each of the ten first vertices.

As a complement to the ten short, ~ 100 mm, standard type traces in Barton and Choubey (1977), the seven long, ~ 1000 mm, traces in Bakhtar and Barton (1984) were manually digitised, using the same procedures as for the ten ~ 100 mm traces. There is no indication in Bakhtar and Barton (1984) of the resolution, but evaluating the seven profiles it seems that the maximum possible resolution from the digitalisation is around 10 mm. Hence, these seven traces are evaluated using ΔL ≥ 10 mm.

4.2 Evaluation of the Ten Curves in Barton and Choubey

The digitised JRC curves are between ~ 96 and ~ 101 mm long in the horizontal direction, meaning that the largest possible number of vertices to use in a single evaluation is 64, i.e. 26, vertices, using 1 mm resolution. Evaluating such few data, the uncertainties in the results are large and the different methods have large deviations from the hypothetical values; see Fig. 3b, c. However, the knowledge about the bias, gained from the synthetic study, can be used to compensate the evaluated values to produce better inferences.

The difference in evaluated fractal parameters, H and σδh (1p), between the algorithm-based digitalisation of traces and the manually digitised ones are very small for all methods except the inference of H using the FFT method where some differences are noted; see Fig. 7. The difference seen may depend on the manual digitalisation resulting in less power for long waves or the algorithm-based method missing power for the short waves. The negligible difference between the two manual digitalisations implies that the variance is low between different manual digitalisations. The conclusion is, hence, that manual digitalisation will usually perform equally well as algorithm-based ones; see Fig. 7.

Fig. 7
figure 7

Evaluated Hurst exponent and σδh(1 mm) from the ten type traces in Barton and Choubey (1977) using the four different evaluation methods and the three different digitalisations

The evaluated σδh (1 mm) values follow a nearly linear increasing trend from low to high JRC, for both the FFT method and the RMS-COR-method. The RMS-COR method has a slightly steeper slope and hence slightly higher σδh (1 mm) for high JRC values, cf. Fig. 7e, f.

Evaluating the Hurst exponent for the ten type curves in Barton and Choubey (1977) shows a general trend, though somewhat sinusoidal, of larger H, i.e. lower fractal dimension as JRC increases; see Fig. 7a–d. This suggestion might seem counter-intuitive, but the higher the fractal dimension, the slower the increase of amplitude. That is, high fractal dimension will give a lower amplitude difference at large scale than a low fractal dimension if equal amplitude difference at small scale. These findings are also in accordance with earlier findings (Odling 1994; Lee and Bruhn 1996; Candela et al. 2009) where it is concluded that fractures subjected to shear movement have higher fractal dimension and hence lower H than pure tensile fractures.

The four different evaluation methods have similar shapes of the development of H as JRC increases, though different spread and absolute values. This is expected as the different methods have different difficulties to infer H as H gets higher; see Fig. 3a, and the trace gets shorter; see Fig. 3b. Using this knowledge the estimated values can be corrected accordingly.

Further, recalling that the generation of fractal lines using inverse FFT method is not capable of generating the correct lines as H approaches 1, see Fig. 4b, the conclusion is that the FFT method slightly overestimates H as the true H approaches 1. By fitting a curve to the plot of the estimated Hurst exponent, Hest, as a function of the generated Hurst exponent, Hgen, the relationship can be described as

$$\begin{array}{*{20}{l}} {{H_{{\text{est}}}}={H_{{\text{gen}}}}}&{{\text{for}}\;{H_{{\text{gen}}}} \leqslant 0.700} \\ {{H_{{\text{est}}}}=0.616\cdot\ln \left( {{H_{{\text{gen}}}}} \right)+0.920}&{{\text{for}}\;{H_{{\text{gen}}}}>0.700} \end{array}$$
(4)

Using the information of difference between generated and evaluated H from Fig. 3 together with Eq. (4), inferences of H are calculated and shown in Fig. 8. Some of the corrections have to be extrapolated, shown as open markers, and hence the uncertainty is larger. The filled markers are interpolated and hence have higher confidence.

Fig. 8
figure 8

Inferred Hurst exponent from the ten type traces in Barton and Choubey (1977) after compensation for method and length bias

Compensating for the evaluation method and length bias for each evaluated Hurst exponent makes the spread shrink between the different methods. An inverse-variance weighting is performed to merge the results from the three digitalisations of the four evaluation methods into a single mean and standard deviation for each of the ten type curves, as shown in Fig. 9. Only for type curve 3, JRC = 5.8, there is a statistically significant (p < 0.05) difference in the mean value between the four methods. For the other nine type curves there is no significant difference.

Fig. 9
figure 9

Inverse-variance weighted mean of H

There is a good agreement between the results from this study, using several evaluation methods, and the results from Odling (1994) using only one method; see Table 2. For three traces Odling (1994) did not report any H due to ambiguous interpretations of the evaluation.

Table 2 Inferred fractal parameters using multiple methods (this study) and using a variant of RMS-COR (Odling 1994)

There is no obvious choice of shape of a best fit curve between H and JRC; see Fig. 9. A sinusoidal curve or polynomial curve of order three or higher would all do. However, it is important that H never exceeds 1 since that would mean that the line has a dimension below 1, i.e. it is a line with voids which does not have any physical meaning.

4.3 Evaluation of the Seven Curves in Bakhtar and Barton

The seven ~ 1-m-long type curves in Bakhtar and Barton (1984) are evaluated using the same technique as described in Sect. 4.2 and Online Resource 1 with the exception that no algorithm-based digitalisation is available. The lack of algorithm-based digitalisation should be of minor importance as the manual and algorithm-based digitalisations seem to give reasonably similar results; see Fig. 7.

Unfortunately there is no indication of the resolution of the seven traces in Bakhtar and Barton (1984). Evaluating the traces it seems that the resolution is about 10 mm, i.e. ten times coarser than the traces in Barton and Choubey (1977). The results from the evaluation of the seven traces in Bakhtar and Barton (1984) are shown in Fig. 10 together with the results from Sect. 4.2. There are two H values, JRC = 4.2 and 6.0 of the seven 1 m traces, that stand out (higher values) compared to the ten 100 mm traces. For both of these traces the Zero set/Korcak evaluation gives H values that are much higher than expected compared to the other three evaluation methods which result in an anomalous value, H > 1.200, after compensating for trace length and evaluation method bias. Excluding these anomalous results the evaluation provides values of H in the range 0.650–0.750, which better fits the other traces. To compare the asperity measure σδh (10 mm) from the seven traces with the σδh (1 mm) from the ten type curves, the data need to be extrapolated according to

Fig. 10
figure 10

Inferred (a) H and (b) σδh (1 mm) from the seven traces in Bakhtar and Barton (1984) together with the inferred values from Barton and Choubey (1977)

$$\sigma \delta h \left( {1\,{\text{mm}}} \right)=\frac{{\sigma \delta h\,\left( {10\,{\text{mm}}} \right)}}{{{{\left( {{{10\,{\text{mm}}} \mathord{\left/ {\vphantom {{10\,{\text{mm}}} {1\,{\text{mm}}}}} \right. \kern-0pt} {1\,{\text{mm}}}}} \right)}^H}}}$$
(5)

where the inferred H is used in the exponent. The uncertainty in H may have a large impact on the estimated value of σδh (1 mm) which can be seen for JRC = 9.2 and 10.7 in Fig. 10b. The figure also shows that most of the extrapolated σδh (1 mm) values are about a factor of two lower than the σδh (1 mm) values evaluated using traces in Barton and Choubey (1977). If this difference is an artefact of extrapolation or that larger fractures behaving differently when estimating JRC is an open question.

5 Conceptual Model

As discussed above, there is a lot of uncertainties in the inferred H and σδhL) values due to the low resolution of the type traces. Hence, there is no reason to develop a fancy model to calculate JRC from H and σδh (1 mm) that fits all the data points perfectly. Instead a basic multi-linear model is developed and its performance is evaluated against the used data. Three multiple linear regression analyses are run using the data set established in Sect. 4; see Table 3.

Table 3 Inferred data as input to the three models developed and their errors (residuals) to underlying data

Using all 17 data from both Barton and Choubey (1977) and Bakhtar and Barton (1984) gives a model where the slope for σδh (1 mm) is highly significant, p = 2·10− 6, and the slope for H is weakly significant, p = 0.064. The intercept, however, is not significant, p = 0.357. Despite that two of the seven values of H stand out and the σδh (1 mm) values, extrapolated from σδh (10 mm), are severely affected by uncertainty in H, the result is surprisingly good; the model estimates JRC within ± 3 units with 95% confidence. Analysing the errors, one cannot reject that they are independent and normally distributed, i.e. p values above are valid.

Due to the large uncertainties in the evaluated values from the seven traces in Bakhtar and Barton (1984), a multiple linear regression using only the ten data from Barton and Choubey (1977) is performed. This data set results in a model where all coefficients are significant to highly significant, 6·10− 6 < p < 0.040; see Table 3. The F-statistics shows that the null hypothesis of an intercept-only model performing as well as the model at hand can be rejected at level p = 3·10− 6. Neither the hypothesis of normality Jarque–Bera test (Jarque and Bera 1980), p = 0.861, nor the hypothesis of homoscedasticity Breusch–Pagan test (Breusch and Pagan 1979), p = 0.604, or Koenker–Bassett test (Koenker and Basset 1982), p = 0.561, can be rejected at level p < 0.05, hence the p values are valid. Furthermore, the model has an adjusted R2 value of 0.965, i.e. it can explain 96.5% of the variance in JRC. All statistical tests, hence, show that this linear model fit the used data very well despite its simplicity. However, this model, developed from the ten type traces in Barton and Choubey (1977), largely under-estimates the JRC traces with values 5.5, 7.4 and 8.5 in Bakhtar and Barton (1984); see Table 3. For the other four traces it makes a reasonable estimation despite the difference in length and resolution.

For completeness the seven traces in Bakhtar and Barton (1984) are used to run a multiple linear regression. Both the intercept and coefficient for σδh(1 mm) are significant, p < 0.05, but not the coefficient for H, p = 0.130. Furthermore, the model estimates a negative correlation between H and JRC which is not anticipated (Odling 1994; Lee and Bruhn 1996; Candela et al. 2009).

Consequently, using only the ten original traces from Barton and Choubey (1977) will be the most appropriate conceptual approach to determine JRC using fractal dimension and asperity distribution of mapped fracture traces. Hence, the model to objectively determine JRC can be described as:

$${\text{JRC}}=~ - 4.3+54.6 \cdot \sigma \delta h\left( {1\,{\text{mm}}} \right)+4.3 \cdot H$$
(6)

where JRC is the joint roughness coefficient, σδh(1 mm) is the standard deviation of asperity difference of points 1 mm apart, H is the Hurst exponent.

Equation (6) requires the standard deviation of height differences to be calculated for the distance 1 mm, i.e. σδh(1 mm). In the case the data resolution is not 1 mm, σδh(1 mm) can be calculated using Eq. (5) for any resolution. However, if the resolution is much coarser than 1 mm it is wise to perform a sensitivity analysis to evaluate the effects seen in Fig. 4c.

6 Performance of the Model

The established model can properly reproduce the data underlying the development of it, which is an absolute minimum request. However, how well the model can predict other data is of great interest. Lacking a collection of fractures to be used for shear or tilt tests, Stigsson (2018) used a simpler approach to test the performance of the model; the community of geologists were simply asked to interpret JRC of nine 100-mm-long synthetic fracture traces and the model was used to infer JRC of the very same traces. It is recognised that the approach does not show how well the model can predict JRC, but how well it can estimate the subjectively interpreted JRC by an ensemble of geologists.

Eleven geologists answered the call and returned their interpretation of the traces. According to Beer et al. (2002), eleven geologists are too few interpreters to get stable statistics and hence the comparison is done using median and quartiles rather than mean and standard deviations; see Fig. 11. The results from the case study show that the difference in median JRC between the model and the geologists varies between − 1.3 and + 1.0 units. The absolute difference between the model and the geologists is usually (seven of the nine traces) less than 0.9 units and the median difference is 0.2 units. The median inter-quartile range, IQR, for the model is 1.2 units and 2.0 units for the ensemble of geologists; i.e. the model has about 40% lower spread than the ensemble of geologists. For six of the nine traces the model has lower IQR than the ensemble of geologists; for two traces the model has 10–20% larger IQR than the ensemble of geologists; and for one trace the IQR is about 47% larger for the model compared to the ensemble of geologists. The conclusion from the case study is, hence, that there is not much difference between the inferred median of JRC by the model and the visually interpreted JRC by the ensemble of geologists, but that the uncertainty in the inferred values are usually substantially lower for the model compared to the ensemble of geologists.

Fig. 11
figure 11

Inter-quartile ranges of visually interpreted JRC as blue boxes, and model inferred JRC as red whiskers. The visually interpreted JRC comes from an ensemble of eleven geologists while the model inferred JRC comes from all possible 489 sub-traces, of 512 vertices length, for each trace

7 Discussion

The data from Barton and Choubey (1977) underlying the development of the model have two major drawbacks. First, each trace is only one single subjectively chosen representative trace of the fracture surface, and hence there is no possibility to estimate the uncertainty in the measured parameters. Second, the resolution is low, resulting in large uncertainty in the evaluated fractal parameters. The latter problem is partly overcome using multiple evaluation methods to decrease the uncertainty and hence get more reliable inferences of the parameters.

Despite the many sources of uncertainty, the multiple linear regression analysis showed that all three coefficients of the model, Eq. (6), were statistically significant using data from the classic ten type traces in Barton and Choubey (1977). It is recognised that the model is simple, using only linear relationships and no cross-interaction, but still it can reproduce the back-calculated JRC values within ± 2 units with 95% confidence and it can explain 96.5% of the variance in JRC.

Benchmarking the inference of JRC by the developed model against the visual interpretation of JRC by an ensemble of geologists showed surprisingly good agreement; the median difference is only 0.2 units. The model also showed 40% lower uncertainty in the inferred JRC compared to the ensemble of eleven geologists. This may, however, be a result of too few geologists interpreting JRC. Beer et al. (2002) suggested that at least 50 geologists should be consulted to get stable statistics.

The coefficient of H is 4.3 which means that H can only affect the inferred JRC by less than this amount due to the restriction 0 < H < 1. In reality the impact on JRC will most certainly be around 2 units since fracture traces are not supposed to be anti-correlated, i.e. H is restricted to be between 0.5 and 1. This might be one of the reasons why there are many studies only concentrating on the asperity measure inferring JRC. However, neglecting the Hurst exponent will fail the possibility to capture the small-scale and the large-scale properties of the fracture surfaces simultaneously.

There is an indication that the length of the trace might be important when evaluating JRC. For example, given equal JRC the seven long traces presented in Bakhtar and Barton (1984) have a circa two times lower σδh(1 mm) than the ten type traces in Barton and Choubey (1977); see Fig. 10b. However, including trace lengths (not presented in this study) and results from the seven more uncertain traces in Bakhtar and Barton (1984), the model to objectively infer JRC becomes less good than the model presented in Eq. (6).

As the underlying data are uncertain, due to low resolution and only being a representative trace of the surface, the model could be developed further. This could be done either by performing shear tests on numerous real-fractures where the surfaces have been measured using high-resolution scanners or performing numerical shear tests on synthetic fractures generated using Monte Carlo realisations. An advantage of the synthetic approach is that the modeller can decide the fractal parameters and size of the fracture to be tested whilst real fractures will have the fractal parameters and size as measured. A synthetic study would, though, benefit from a limited number of tilt or shear tests on real fractures to confirm the synthetic results. Another benefit using synthetic traces is that not only the JRC can be evaluated but also the peak and residual shear strength using different sizes of the synthetic fractures. The JRC and shear strength could be evaluated using numerical models based on for example PFC (Itasca 2014a), UDEC/3DEC (Itasca 2013, 2014b) or semi-analytical models (Casagrande et al. 2018).

8 Conclusions

In this work, we have shown that measuring fracture profiles using low resolution will render large uncertainties in the evaluated H and σδhL). A more accurate inference of H and σδhL) can be achieved by combining several different evaluation methods in parallel, compensating for each method’s biases.

Using the ten type traces in Barton and Choubey (1977), a multi-linear model has been developed that estimates JRC objectively. All coefficients of the model are significant at level p < 0.05. Despite being simple, linear and without interaction, the model can reproduce the back-calculated JRC values in Barton and Choubey (1977) within ± 2 units with 95% confidence, and it explains 96.5% of the variance in JRC. The model was benchmarked against an ensemble of geologists and showed that the median difference was only 0.2 units and the median uncertainty in the inferred value about 40% smaller for the developed model. The presented approach can be further refined by numerical modelling of synthetic fractures or using semi-analytical models.