1 Introduction

Under the deepening of Western China’s development strategy, energy and resource projects are being executed in regions characterized by complex geological conditions and frequent hazards. This necessitates extensive rock mass engineering, encompassing mining, underground tunnel construction, and water conservancy projects (Du et al. 2022a; Jiang et al. 2022; Wu et al. 2023; Yu et al. 2023; Tao et al. 2020). The shear strength of rock joints plays a pivotal role in evaluating the stability and deformation of these rock masses (Barton and Bandis 2017; Barton and Choubey 1977; Grasselli et al. 2002; Hoek and Bray 1981; Jaeger 1971; Müller-Salzburg 1964; Wasantha et al. 2015; Wu and Kulatilake 2012). Direct shear tests are commonly conducted to determine the shear strength of rock joints (Barton 1973; Li et al. 2020; Muralha et al. 2014). However, estimating or selecting an appropriate shear strength value for a given rock joint can be challenging due to significant variations along the same joint surface (Ankah et al. 2022; Barton and Choubey 1977; Kulatilake et al. 2021). Consequently, doubts may arise regarding the shear strength obtained from testing only a few rock joint specimens.

Rock masses are characterized by heterogeneity caused by natural geological processes influencing the composition and structure of rocks (Kveldsvik et al. 2007; Sow et al. 2017). The shear strength of rock joints demonstrates notable spatial variability, even when assessed on identical joint surfaces and in parallel orientations. This phenomenon is commonly referred to as shear strength heterogeneity (Ankah et al. 2022; Du 1998). The presence of such heterogeneity plays a crucial role in the spatial variability observed in the shear behavior of rock joints. Consequently, researchers frequently employ statistical analysis methods to investigate and understand this characteristic (Kulatilake et al. 2021; Sow et al. 2017). The mean value of test results derived from a series of specimens obtained from the identical rock joint or test area is widely regarded as the most reliable estimate of shear strength. This approach facilitates the analysis of crucial geomechanical characteristics and properties (Brady and Brown 1993). However, accurately obtaining the statistical characteristics of shear strength encounters challenges due to the limited availability of specimens. In such cases, the mean value of shear strength estimated from a small sample may be biased (Sow et al. 2017), hindering the accurate evaluation of shear strength for rock joints (Barton 2013). Various efforts have been undertaken to enhance the estimation shear strength by augmenting the specimen numbers utilized in the analysis. Efforts have been made to improve shear strength estimation by increasing the number of specimens (Renaud et al. 2019; Tanyas and Ulusay 2013). Various standards (ASTM 2016; GB/T50266 2013; JGS 2008; Muralha et al. 2014; USACE 1980) suggest sampling and testing 3–5 specimens from the same joint or test horizon along the same shear direction. However, the reasons behind these suggestions are not specified, and the factors influencing the required number of specimens should be acknowledged.

Determining the required minimum number (RMN) of test specimens for achieving desired accuracy in geotechnical parameter estimation is crucial to ensure satisfactory accuracy of geotechnical performance estimates at low economic cost. Researchers have extensively studied statistical approaches to determine the RMN by characterizing the variability of geotechnical parameters (Gong et al. 2014; Namikawa 2019; Pepe et al. 2016). Ruffolo and Shakoor (2009) found that testing 9 or 10 rock specimens is necessary to determine unconfined compressive strength with a 20% acceptable deviation from the mean under a 95% confidence interval. Cui et al. (2017) later introduced a method known as the confidence interval dynamic process approach, which aims to ascertain the optimal number of rock specimens needed for uniaxial compression tests. Magner et al. (2017) introduced a method to determine the number of specimens based on the coefficient of variation (COV) of geotechnical parameters using Monte Carlo simulations. Recent investigations have focused on studying the RMN of specimens for determining the shear strength of rock joints through statistical analysis. Yong et al. (2018a) conducted class ratio analysis on a 100-cm-long cross-sectional profile of a slate joint and established that 65 specimens were required in each group size for estimating natural rock joint characteristics. Huang et al. (2020) recommended a minimum requirement of 121 rock joint specimens for all sampling sizes using a progressive coverage statistical procedure. These studies, with their high specimen numbers, offer inspiration for acquiring shear strength of rock joints through statistical analysis. However, the relationship between the number of rock joint specimens and shear strength accuracy remains unresolved.

The heterogeneity of shear strength in well-matched rock joints depends on joint roughness, given the same joint wall materials, joint scale, and normal stress conditions (Barton 1973, 2018, 2020; Huang et al. 2023; Tatone and Grasselli 2010; Wang et al. 2023). For instance, Hencher et al. (2010) discovered different roughness degrees in specimens taken from various locations along the same joint, resulting in different shear strength values. Similarly, Bahaaddini et al. (2014) observed significant variability in direct shear tests due to local variability in effective roughness of rock joints. Therefore, this study will investigate the relationship between the heterogeneity of rock joint shear strength and joint roughness, aiming to provide a method for determining the minimum number of specimens required for laboratory testing of rock joint shear strength. Firstly, the heterogeneity of shear strength and roughness of rock joints is demonstrated with natural rock joints. Subsequently, the effectiveness of the number of specimens in estimating the shear strength of rock joints is evaluated based on Monte Carlo Simulation (MCS). Finally, the RMN of rock joint specimens necessary to achieve the desired accuracy in shear strength testing is determined.

2 Heterogeneity of joint roughness and shear strength

In order to examine the variability in roughness and shear strength of rock joints, an investigation was conducted on a large-scale slate rock joint obtained from an open-pit slope located in the Heshangnong quarry near Qingshi Town, southeast of Changshan County, Zhejiang Province, China (refer to Fig. 1a). The open-pit slope under consideration measures 87 m in length, 59 m in width, and reaches a maximum height of 79 m. The rock comprising the slope’s overburden primarily consists of calcareous slates that originated from Ordovician argillaceous limestone subjected to mild metamorphic processes. The stability of the slope is influenced by the dip angle of approximately 55° northwestward exhibited by the slate foliation. The foliated wall of the slate rock appears grayish-green in color, is finely-grained, and has formed as a result of intermediate tuff metamorphism. Within the rock overburden of the slope, the continuous planes of foliation are aligned parallel to the walls of the slope and exhibit a downward dip towards the base of the slope (see Fig. 1b).

Fig. 1
figure 1

Sites selected for the investigation. a Location map of the study site; b view of the structurally-controlled open-pit slope

In order to assess the morphological characteristics of rock joints, a sample measuring 1100 mm × 1100 mm in total area was extracted from the slate rock joint and subsequently transported to the laboratory. To ensure the integrity of the sample during transit, a study area measuring 1000 mm × 1000 mm was obtained from the central region of the sample, thereby avoiding potential damage to the edge areas (caused during transportation). The surface of the slate joint was examined using a three-dimensional (3D) laser scanning system known as MetraScan 750, which offers a maximum level of accuracy reaching 0.030 mm (refer to Fig. 2a). The morphology of the slate joint surface was subdivided into 100 individual specimens, each measuring 100 mm × 100 mm (as demonstrated in Fig. 2b). Each specimen was labeled using the notation Si−j, where i and j denote the row and column of the specimen’s respective location. The analysis of 3D joint roughness involved the utilization of the roughness metric θ* max/[C + 1]3D and the maximum potential contact area ratio A0, as defined by Eq. (1) (Grasselli et al. 2002). Detailed information regarding the calculation of roughness metrics can be found in previous studies (Grasselli et al. 2002; Du et al. 2022b).

$$A_{\theta *} = A_{0} \left( {\frac{{\theta_{\max }^{*} - \theta^{*} }}{{\theta_{\max }^{*} }}} \right)^{C}$$
(1)

where \(A_{{\theta^{*} }}\) is the potential contact area ratio corresponding to the apparent dip angle θ* in the shear direction, θ* max is the maximum apparent dip angle, and C is a dimensionless fitting parameter characterizing the distribution of the apparent dip angles over the joint surface.

Fig. 2
figure 2

Joint surface scanning and digitization. a Scanning of the joint surface; b Digitized joint surfaces of specimens with sizes of 100 mm × 100 mm

The shear strength of rock joints was predicted based on the quantitatively estimated joint roughness using the shear strength criterion proposed by Xia et al. (2014), as expressed by Eq. (2):

$$\tau = \sigma_{n} \tan \left\{ {\varphi_{b} + 4\frac{{A_{0} \theta_{\max }^{*} }}{1 + C}\left[ {1 + \exp \left( { - \frac{1}{{9A_{0} }} \cdot \frac{{\theta_{\max }^{*} }}{1 + C} \cdot \frac{{\sigma_{n} }}{{\sigma_{t} }}} \right)} \right]} \right\}$$
(2)

where τ represents the shear strength of rock joints, φb represents the basic friction angle of rock joints, σn denotes the normal stress applied on the rock joints, and σt refers to the tensile strength of rock materials. The values for σt and φb were determined through tensile tests and direct shear tests, respectively. The tensile strength of the slate rock (σt) was found to be 7.8 MPa, while the basic friction angle of the slate joint (φb) was measured as 32°.

The roughness and shear strength values of the extracted 100 rock joint specimens were calculated based on a consistent shear direction, specifically following the negative Y-axis direction as depicted in Fig. 2b. Figure 3 illustrates the roughness and shear strength characteristics of joint specimens extracted from various positions on the slate joint surface. As depicted in Fig. 3a–b, the surface roughness of the specimens demonstrates notable spatial variability. The histogram displaying reveals an approximately normal distribution with the highest frequency observed within the 8.0° to 8.5° range. Generally, the upper left side of the slate joint surface exhibits higher surface roughness compared to the lower right side. Analyzing the histogram of A0, it exhibits an approximately right-skewed normal distribution, with a mean value of 0.468 and a standard deviation of 0.037. Joint specimens with higher A0 values are randomly distributed across the joint surface. Additionally, Fig. 3c presents a histogram representing the predicted shear strength under a normal stress of 1.0 MPa. The results demonstrate clear heterogeneity in shear strength, following a right-skewed normal distribution, with the highest frequency observed within the 1.30–1.80 MPa range. Notably, the lower right specimens generally exhibit lower shear strength values, while rows 1–2 or columns 3–4 demonstrate higher shear strength values. By comparing the distributions of joint roughness and shear strength, it is evident that the heterogeneity of the roughness of rock joints is responsible for the heterogeneity of its shear strength.

Fig. 3
figure 3

The heterogeneity of joint roughness and shear strength. a \(\theta_{\max }^{*} /[C + 1]_{3D}\); b A0; c shear strength

Subjectivity frequently influences the selection of test specimens from a natural rock joint, as the determination of specimen locations heavily relies on individual judgment. According to the ISRM (Muralha et al. 2014) recommendations, for a single test sequence, it is suggested to sample and test at least three, preferably five, specimens from the same joint along the same shear direction. Thus, for accurate and meaningful testing, it is crucial to extract the specimens from the same joint or test horizon exhibiting similar characteristics (ASTM 2016; GB/T50266 2013; JGS 2008; Muralha et al. 2014; USACE 1980). However, it is difficult to select representative specimens by visual comparison. In this case study, the selection of three/five specimens can be approached as a mathematical combination problem known as “selecting k objects from a set of n objects” (represented as Ck n), where subsets are generated without considering their order. The formula to evaluate this is as follows:

$$C_{n}^{k} = \frac{n!}{{(n - k)!k!}}$$
(3)

Herein, a total of 75,287,520 combinations were generated, each containing 5 specimens from the slate joint surface with 100 joint specimens presented (i.e., C5 100). In order to quantitatively assess the heterogeneity of rock joint shear strength, an index called the maximum difference ratio (MDR) is calculated for each combination. As an illustrative example, let us consider a combination of 5 specimens with shear strengths τ1, τ2, τ3, τ4, and τ5 (arranged in ascending order based on their values), where τ3 represents the median shear strength while τ1 and τ5 correspond to the minimum and maximum shear strengths respectively. The MDR for this particular combination of 5 specimens can be calculated using Eq. (4).

$$MDR = \max \left( {\frac{{\tau_{5} - \tau_{3} }}{{\tau_{3} }},\frac{{\tau_{3} - \tau_{1} }}{{\tau_{3} }}} \right)$$
(4)

Figure 4 illustrates the histograms displaying the MDR values under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa. The distribution ranges of MDR values were found to be 0.35–145.34%, 0.20–89.83%, and 0.22–60.32% under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa, respectively. Notably, the MDR distribution range decreases as the normal stress increases, indicating that the degrees of heterogeneity among rock joints are lower under higher normal stresses compared to lower normal stress conditions. Prior studies involving direct shear tests (Barton 1973; Barton et al. 2023; Grasselli et al. 2002; Patton 1966; Zhang et al. 2019) have reported that rock joints predominantly experience slip failure under lower normal stresses, with minimal shearing of asperities. However, as the normal stress increases, a greater number of asperities displayed on the joint surface undergo shearing. Consequently, the heterogeneity in rock joint shear strength, which is influenced by the variety of roughness, becomes less pronounced as the asperities are sheared off. Nevertheless, even when the normal stress reaches 5.0 MPa, the proportion of combinations with MDR values lower than 5% remains less than 1%. Similarly, for a normal stress of 0.20 MPa, the proportion of combinations with MDR values lower than 5% is merely 0.057%. This demonstrates the challenges associated with selecting multiple specimens exhibiting similar characteristics, which are more demanding than initially anticipated.

Fig. 4
figure 4

The histograms of maximum difference ratios (MDR) of the shear strength to the medium value under normal stresses of a 0.2 MPa, b 2.0 MPa, and c 5.0 MPa

3 Method for evaluating the effectiveness of sampling numbers

In order to overcome the difficulties related to obtaining specimens with similar characteristics, a commonly adopted approach in engineering practice is to use the mean shear strength as a representative value (Magner et al. 2017). However, there remains a question among researchers and engineers regarding the adequacy of sample size within a subgroup for accurately estimating the population mean. To quantitatively evaluate the effectiveness of sampling numbers, it is necessary to obtain the statistical distribution of the sample mean of shear strength. However, determining the statistical distribution of the sample mean shear strength can be a complex and costly endeavor, mainly due to the large number of possible combinations of joint specimens. To overcome this challenge, we employed Monte Carlo Simulation (MCS) to simulate the sampling procedure. Based on the generation of extensive subgroups using MCS, the statistical distribution of the sample mean of shear strength was investigated, and the effectiveness of sampling numbers in subgroups was evaluated. The procedure to evaluate the effectiveness of sampling numbers is shown in Fig. 5.

Fig. 5
figure 5

Flowchart illustrating the method for evaluating the effectiveness of sampling numbers

Step 1: Determine the shear strengths of rock joints under specified normal stresses.

The shear strengths of the 100 rock joint specimens in the case study were calculated using Eq. (2) at varying normal stresses. The population means for shear strength among the 100 specimens, subjected to normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa, were determined as 0.379 MPa, 3.032 MPa, and 6.317 MPa, respectively. These population means are denoted as μ and exhibit an increasing trend with respect to normal stress.

Step 2: Generate artificial subgroups containing i specimens based on MCS.

Using the specimen labels of the 100 rock joint specimens in the case study, we employed MCS to randomly select a specific number of labels (i labels) for analysis. The corresponding joint specimens were then taken as a subgroup, and the sample mean shear strength for that subgroup (μi) was calculated.

To demonstrate the influence of specimen numbers on the statistical distribution of the sample mean shear strength for subgroups, we obtained ratios of the sample mean (μi) to the population mean (μ), referred to as RSP (ratio of sample mean to population mean). When the RSP equals one, the sample mean accurately estimates the population mean. In practice, exact estimation is not always achievable. In Fig. 6, we plotted RSP values under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa against the number of specimens in subgroups. The figure reveals that RSP values fluctuate around 1.00, with the fluctuations decreasing as the number of specimens increases. Significant fluctuations occur when the number of specimens is less than 50, but convergence is observed as the number approaches 90. Additionally, normal stress has a significant impact on the RSP values. The RSP fluctuation is greatest at a normal stress of 0.2 MPa and least at 5.0 MPa. It’s important to note that Fig. 6 only presents one combination for each subgroup size, and the sample mean (μi) may vary under other combinations. Nevertheless, the degree of fluctuation gradually decreases with an increasing number of specimens. Therefore, collecting an adequate number of test specimens in a subgroup is crucial, especially under low normal stress, to accurately estimate the population mean shear strength.

Fig. 6
figure 6

The ratio of the sample mean of the shear strength to the population mean (RSP) as the number of specimens in the subgroup increases

Step 3: Determine the number of MCS repetitions needed to achieve a converged characterization of the sample mean statistics for subgroups of size i.

The computed sample mean (μi) for a subgroup containing i joint specimens is a random variable since the subgroups are randomly generated from the 100 rock joint specimens in the case study. To assess the variation of the sample mean effectively, multiple subgroups need to be generated to obtain a converged characterization of sample mean statistics. In this study, we conducted multiple repetitions of Step 2 to generate a series of subgroups consisting of the same number of specimens. The number of MCS repetitions initiated from three and incrementally increased by one interval (4, 5, 6, etc.). For each number of MCS repetitions, we computed the mean and standard deviation of μi. By analyzing these values, we identified the point at which both the mean and standard deviation of μi demonstrated a stable convergence.

As an example, we focused on the subgroup containing five specimens to demonstrate this procedure. Figure 7 shows the mean and standard deviation of μ5 plotted against the number of MCS repetitions. It is evident that when the number of MCS repetitions is smaller than 2500, both the mean and standard deviation exhibit wide scatter. However, they tend to remain nearly constant as the number of MCS repetitions reaches 20,000. At 20,000 MCS repetitions, the mean values of μ5 steadily converge to 0.378 MPa (σn = 0.2 MPa), 3.030 MPa (σn = 2.0 MPa), and 6.314 MPa (σn = 5.0 MPa). These mean values of shear strength are very close to the population mean, with a maximum deviation of only 0.264% (see Table 1). It should be noted that mean values of shear strength have slight biases compared to population mean. These kinds of biases are caused by the heterogeneity of rock joints and cannot be ideally eliminated. Furthermore, the standard deviation values of μ5 steadily converge to 0.038 MPa (σn = 0.2 MPa), 0.205 MPa (σn = 2.0 MPa), and 0.311 MPa (σn = 5.0 MPa), decreasing as normal stress increases. Therefore, we set the number of MCS repetitions to 20,000 to analyze the statistical characteristics of the sample mean μ5.

Fig. 7
figure 7

Mean and standard deviation of the sample mean of shear strength for subgroups containing five specimens as the number of MCS repetitions increases

Table 1 Statistical characterization of μ5 based on 20,000 MCS repetitions

Step 4: Repeat Steps 2 and 3 until the mean and standard deviation of μi for subgroups of different sizes are steadily converged, as illustrated in Loop 1 in Fig. 5.

Starting from a subgroup size of three, we increased the number of specimens by one each time, performing MCS on subgroups with sizes ranging from 3 to 99. Like the subgroups containing five specimens, the mean and standard deviation of μi for subgroups of different sizes steadily converged when the number of MCS repetitions reached 20,000. Therefore, we used 20,000 MCS repetitions in our analysis to determine the statistical distribution of the sample mean shear strengths.

Step 5: Establish the statistical space of the sample mean for subgroups containing i specimens.

Based on 20,000 MCS repetitions, we established the statistical spaces of the sample mean (μi) for subgroups containing i specimens, where 3 ≤ i < 100. The results showed that the sample mean μi for all subgroups followed a normal distribution centered around the population mean. Due to limited space, Fig. 8 only presents the distributions of the sample mean shear strength for subgroups with sizes 5, 10, 50, and 90. As the subgroup size increases, the distribution becomes narrower and more concentrated around the population mean. This indicates that larger subgroups provide more accurate estimates of the population mean shear strength. Note that there is still a non-neglectable standard deviation of μi, even when the number of specimens in a subgroup increases to 90. Thus, the estimated population mean of shear strength based on a subgroup of specimens may deviate from the true value.

Fig. 8
figure 8

Normal distribution of the sample mean of shear strength under the normal stress of 5.0 MPa for subgroups containing a 5 specimens, b 10 specimens, c 50 specimens, and d 90 specimens

Step 6: Iteratively perform Step 5 to obtain the mean and standard deviation of μi for subgroups of varying sizes, as demonstrated by Loop 2 in Fig. 5.

Figure 9 depicts the mean and standard deviation of μi plotted against the subgroup size. From Fig. 9a, it is evident that when subjected to the same normal stress, the mean values of μi for subgroups with different sizes are equivalent, enabling accurate estimation of the population mean of shear strength. Furthermore, Fig. 9b illustrates that the standard deviation of the sample mean of shear strength decreases as the number of specimens increases. Hence, it can be concluded that employing a greater number of specimens for experimentation leads to a more precise estimation of the population mean of shear strength. While it is possible to accurately estimate the population mean of shear strength with MCS repetitions exceeding 20,000, engineering practice often encounters limitations in terms of available experimental tests. Consequently, there arises a need to evaluate the effectiveness of sampling numbers for determining the required minimum number (RMN).

Fig. 9
figure 9

Mean and standard deviation of the sample mean of shear strength (μi) as the number of specimens increases

Step 7: Evaluation of sampling numbers’ effectiveness.

As demonstrated in Steps 5 and 6, the sample mean μi conforms to a normal distribution centered at the population mean μ, having a significant standard deviation σ. Figure 10 presents the normal distribution for the sample mean μi within subgroups containing i specimens. Assuming an acceptable relative error ε is set for the population mean μ, the probability (Pε) that the test result from a subgroup satisfactorily estimates μ corresponds to the area enclosed between the probability density curve and the abscissa axis from μ-ε to μ + ε. The calculation formula for Pε is as follows:

$$P_{\varepsilon } = \int_{{\mu \left( {1 - \varepsilon } \right)}}^{{\mu \left( {1 + \varepsilon } \right)}} {\frac{1}{{\sigma \sqrt {2\pi } }}} e^{{ - \frac{{\left( {\mu_{i} - \mu } \right)^{2} }}{{2\sigma^{2} }}}} {\text{d}}\mu_{i}$$
(5)
Fig. 10
figure 10

Sketch to determine the probability of the sample mean of shear strength deviating from the population mean of shear strength within a relative error

In general, high Pε values warrant confidence in the test results. When a specific Pε is required, the relative error ε of the test result within a subgroup can be estimated using the following equation:

$$\varepsilon { = }\frac{\sigma }{\mu }z_{{{{\left( {1 + P_{\varepsilon } } \right)} \mathord{\left/ {\vphantom {{\left( {1 + P_{\varepsilon } } \right)} 2}} \right. \kern-0pt} 2}}}$$
(6)

where \(z_{{{{\left( {1 + P_{\varepsilon } } \right)} \mathord{\left/ {\vphantom {{\left( {1 + P_{\varepsilon } } \right)} 2}} \right. \kern-0pt} 2}}}\) is the upper quantile of the standard normal distribution corresponding to the probability (1 + Pε)/2.

For instance, considering a subgroup consisting of five specimens, if the acceptable relative error ε is set at 5% under a normal stress of 0.2 MPa, the value of probability Pε is merely 37.9%, as determined by Eq. (5). This low value of Pε raises significant doubts regarding the test result’s reliability. Additionally, when aiming for a Pε of 95%, the estimated relative error of the test result is approximately 19.7%, as calculated by Eq. (6). In such cases, the relative error may fail to meet the required accuracy in engineering practice. Therefore, increasing the number of specimens within a subgroup is advisable to achieve acceptable values of both Pε and ε simultaneously. For example, when employing a subgroup comprising 44 specimens to estimate the population mean μ under a normal stress of 0.2 MPa, the calculated results for Pε and ε are 95% and 5% respectively, thereby enabling accurate estimation of the population mean μ with a high probability.

4 Determination of the required minimum number of specimens (RMN) for laboratory testing

Based on the method used to evaluate the effectiveness of sampling numbers, it is possible to obtain the probability Pε and the relative error ε for a subgroup when estimating the population mean. Using these values, it becomes feasible to determine the appropriate RMN that satisfies acceptable criteria for both Pε and ε.

4.1 Probability of subgroups meeting acceptable relative error

In engineering practice, it is common for researchers and engineers to aim for an acceptable relative error when obtaining shear strength data. However, as mentioned in Sect. 3, the sample mean μi is a random variable, and it is important to determine the likelihood of the sample mean μi accurately estimating the population mean μ within an acceptable relative error. This necessitates the calculation of the probability Pε, representing the probability of achieving the acceptable relative error ε. In this study, acceptable relative errors of 5%, 10%, and 15% were considered to investigate the effectiveness of sampling numbers under different normal stresses. The relationship between Pε values and the number of specimens under various normal stresses and acceptable relative errors is illustrated in Fig. 11. Consistent with engineering practice, the Pε value tends to increase as the number of specimens increases. For example, when the normal stress is 0.2 MPa, the Pε value (ε = 5%) is only 29.6% for 3 test specimens, but it increases to 90.8% when the number of specimens is 38.

Fig. 11
figure 11

Probability of the sample mean of shear strength deviating from the population mean of shear strength within a relative error as the number of specimens increases

Furthermore, as anticipated, the Pε values increase as the acceptable relative error becomes larger. Taking the case of 5 test specimens as an example (which is also the minimum requirement according to ISRM (Muralha et al. 2014) for conducting direct shear tests), the Pε value ranges from 37.9 to 86.2%, from 53.8 to 97.27%, and from 68.5 to 99.7% under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa, respectively, as the acceptable relative error increases from 5 to 15%. In this scenario, one may have confidence in using the test results to estimate the population mean of shear strength with a relative error of 15%, but not 5%.

It is important to note that normal stress significantly affects the effectiveness of sampling numbers. Under a normal stress of 5.0 MPa, the Pε value (ε = 15%) reaches a high value of 98.1% for 3 test specimens. However, when the normal stress is 0.2 MPa, the Pε value (ε = 15%) for 3 specimens is only 74.5%, and at least 9 test specimens are required to achieve a Pε value (ε = 15%) larger than 95%. The influence of normal stresses on the effectiveness of sampling numbers is clearly demonstrated by the subplot in Fig. 11, where subgroups containing 3 to 10 specimens were analyzed. Under a normal stress of 0.2 MPa, the Pε value with an acceptable relative error of 5% ranges from 29.6 to 53.0%. Even when the relative error is set at 15%, the Pε value remains below 90% for specimen numbers less than 6. In contrast, under a normal stress of 5.0 MPa, the Pε value with a 5% acceptable relative error ranges from 56.5 to 85.8%, and it exceeds 98.1% when the relative error is set at 15%. Hence, based on the same number of specimens, one can have greater confidence in the test results obtained under a normal stress of 5.0 MPa compared to those obtained under a normal stress of 0.2 MPa. This phenomenon also indicates that the shear strength heterogeneity of the analyzed discontinuity decreases as the normal stress increases, which can be attributed to the diminishing influence of roughness on shear strength as the normal stress increases (Barton and Choubey 1977).

4.2 Relative error with a specified probability for subgroups

In accordance with the analysis in Sect. 4.1, it is crucial for researchers and engineers to assess the efficacy of sample sizes based on a specified probability Pε. Consequently, an examination of the relative error ε with a specific probability Pε becomes imperative. Typically, higher values of Pε instill greater confidence in the ε estimation. In this study, we considered Pε values of 85%, 90%, and 95% to investigate the effectiveness of sample sizes under different normal stresses. Figure 12 illustrates the relationship between ε values, specimen numbers, and specified probabilities. Generally, as the number of specimens increases within a subgroup, the relative error decreases, signifying that augmenting the specimen number leads to more accurate test results. For instance, when the specified Pε is set at 95%, the ε values under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa decrease from 25.7%, 17.1%, and 12.5% to less than 5% as the specimen numbers increase from 3 to 45, 27, and 17, respectively. Similar patterns are observed for other Pε values. Moreover, the ε values demonstrate an upward trend with increasing specified probabilities. To demonstrate this trend, we employed the number of specimens recommended by ISRM (Muralha et al. 2014), which is 5. Under normal stresses of 0.2 MPa, 2.0 MPa, and 5.0 MPa, the ε values increase from 7.0% to 14.3%, 9.6% to 13.2%, and 7.0% to 9.7%, respectively, as the specified probability rises from 85 to 95%. This trend aligns with the findings presented in Sect. 4.1.

Fig. 12
figure 12

Relative error of the sample mean of shear strength deviating from the population mean of shear strength as the number of specimens increases

Plotting the ε values against the specimen numbers ranging from 3 to 10 allows us to visually assess how normal stress influences the calculated relative error under a specified probability (subplot in Fig. 12). As depicted in the subplot of Fig. 12, ε values below 13% are observed for the normal stress of 5.0 MPa, even when choosing a high specified Pε of 95% for only 3 test specimens. Conversely, ε values exceed 18% for the normal stress of 0.2 MPa across the Pε range of 85% to 95%. Although the relative error diminishes as the number of specimens increases, it is substantially greater under the normal stress of 0.2 MPa compared to that under the normal stress of 5.0 MPa. Consequently, conducting shear strength tests under lower normal stresses necessitates special attention.

4.3 Recommended RMN of specimens for laboratory testing

As highlighted in previous sections, the effectiveness of the number of specimens increases with higher probabilities Pε and lower relative errors ε. Therefore, when estimating the population mean μ, the number of test specimens should meet both the acceptable ε and Pε criteria simultaneously. It is important to consider the significant influence of normal stress on the heterogeneity of shear strength. Consequently, the RMN of specimens was calculated for different normal stresses, given an acceptable ε and Pε. In engineering practice, the acceptable relative errors are commonly taken as 5–10% at the probability of 85–95% (Yong et al. 2018b).

Figure 13 illustrates the relationship between the RMN and increasing normal stress for the slate joint. For acceptable ε values ranging from 5 to 10% and corresponding Pε values ranging from 85 to 95%, the recommended RMN under normal stresses of 1.0 MPa, 2.0 MPa, 3.0 MPa, 4.0 MPa, and 5.0 MPa ranges from 6 to 35, 4 to 26, 3 to 21, 2 to 18, and 2 to 16, respectively. Ignoring the influence of normal stress on shear strength heterogeneity and using the same number of specimens to estimate the population mean could result in resource wastage under higher normal stresses and failure to meet the ε and Pε requirements under lower normal stress. To assess the effectiveness of the ISRM-recommended number of specimens (3 or 5) (Muralha et al. 2014), they were also included in Fig. 13 for verification. The results indicate that the ISRM-recommended numbers of specimens cannot fulfill all requirements. For instance, when a Pε value of 85% and ε value of 10% are required, the suggested number of specimens (3 or 5) meets the requirements only when the normal stress exceeds 2.5 MPa or 1.2 MPa. When the required Pε value (with ε = 10%) increases to 90%, the threshold normal stress for the suggested number of specimens becomes 3.0 MPa or 2.0 MPa. In cases where researchers and engineers require a higher Pε value (with ε = 10%) of 95%, the suggested number of specimens (3 or 5) meets the requirements only when the normal stress exceeds 6.5 MPa or 3.5 MPa. However, when the required ε value is 5%, even with a Pε value of 85%, the suggested number of specimens (3 or 5) fails to meet the requirement.

Fig. 13
figure 13

Required minimum number (RMN) of specimens to estimate shear strength of the slate joint as normal stress increases

To further investigate the effects of properties of rock materials on the RMN, a larger, exposed rock joint surface that could be easily digitized was sought in a relatively new rock slope of a Lead–zinc open pit mine located in Jinding town, Lanping County, Yunnan Province, China. The rock slope, with a maximum height of approximately 12 m, mainly consists of brownish-red fine sandstone rock masses in the Paleocene Yunlong Formation of the Lower Tertiary. The slope face is oriented sub-parallel to a persistent joint set that strikes roughly E–W. As a result, the face contains several large exposures of natural, rough joint surfaces showing slight alteration (Fig. 14a). In the field, a total area, roughly 3500-mm long by 2000-mm wide, was measured using a portable laser scanning system with a sampling interval of 0.5 mm (Fig. 14a). Since the measured sandstone joint surface is irregular, we extracted a rectangular area as large as possible, 2900-mm long by 800-mm wide, from the original point cloud of the measured sandstone joint surface (Fig. 14b). Then, the morphology of the 2900 mm × 800 mm sandstone joint surface was evenly subdivided into 100 individual specimens, each measuring 100 mm × 27.5 mm.

Fig. 14
figure 14

Required minimum number (RMN) of specimens to estimate shear strength of the sandstone joint as normal stress increases

The sandstone joint was tested to have a joint wall compressive strength of 18.2 MPa, the tensile strength of the sandstone rock was found to be 1.8 MPa, and the basic friction angle of the sandstone joint was measured as 24.3°. It is worth noting that, compared to the previously examined slate joint, the mechanical properties of the sandstone joint are relatively weaker. Utilizing the proposed methodology to determine the RMN under varying normal stress conditions, we computed the RMN of sandstone joints for different normal stresses, while ensuring an acceptable relative error ε and probability Pε. The relationship between the RMN and increasing normal stress is illustrated in Fig. 14c. As depicted in the figure, the RMN of the sandstone joint exhibits a decrease with an increase in normal stress, while it demonstrates an increase when considering acceptable relative error and probability requirements. This observed relationship between the RMN and increasing normal stress for the sandstone joint aligns closely with previous findings concerning slate joints. However, there are notable differences in RMN values between sandstone and slate joints. In general, sandstone joints exhibit an RMN range of 5 to 25, while slate joints have a wider range of 2 to 46. Specifically, at lower normal stresses, the RMN for sandstone joints is smaller compared to that of slate joints. Conversely, under higher normal stresses, the RMN values for sandstone and slate joints are similar. For instance, at a normal stress of 0.1 MPa, the RMN ranges from 23 to 25 for sandstone joints, which is smaller than the range of 10 to 47 observed for slate joints. Conversely, at a normal stress of 10 MPa, the RMN ranges from 3 to 17 for sandstone joints, which is comparable to the range of 1 to 12 seen in slate joints. The observed differences in RMN values between sandstone and slate joints can be attributed to variations in the degree of heterogeneity under different normal stress conditions. At lower normal stresses, the comparatively weaker mechanical properties of sandstone joints result in more pronounced fracturing of the asperities within the joints, as compared to slate joints. Consequently, the degree of heterogeneity is lower in sandstone joints. On the other hand, at higher normal stresses, both sandstone and slate joints undergo significant breaking. Under these conditions, the degree of heterogeneity decreases considerably for both types of joints, as they are subjected to substantial deformation and damage.

Moreover, it should be noted that adhering to the ISRM recommendation regarding the number of specimens (3 or 5) for testing (Muralha et al. 2014) only meets the requirements for an acceptable relative error of 10% at a probability of 85% when the normal stress exceeds 2 MPa in sandstone joint tests (Fig. 14c). In contrast, the recommended number of specimens from ISRM exhibits broader applicability in slate joint tests. Therefore, it is crucial to pay meticulous attention not only to the normal stress but also to the characteristics of the rock materials when determining the RMN of specimens for laboratory testing of the shear strength of rock joints.

5 Discussions

In Sect. 3, we established the mean value as the representative shear strength of rock joints within the population. Alongside the mean value, the median and mode values are commonly employed as representative indicators. In this study, we examined the mean, median, and mode values of shear strength under various normal stresses using the dataset comprising 100 slate joint specimens. Figure 15 depicts the comparison among these values. It is observed that the median closely approximates the mean across different normal stresses. As the normal stress increases, the relative error between the median and mean decreases from 4.56 to 0.01%. However, due to the right-skewed normal distribution of shear strength in the 100 slate joint specimens (as depicted in Fig. 3c), the mode appears smaller than the mean value (as indicated in Fig. 15). Although the relative error between the mode and mean diminishes from 37.22 to 17.44% with increasing normal stress, a notable disparity persists between the mode and mean values. Consequently, we suggest employing the mean or median as the representative value of shear strength to mitigate the excessive influence of extreme high and low values within the population on the results.

Fig. 15
figure 15

Comparison of mean, median, and mode values of rock joint shear strength under different normal stresses

In Sect. 4.3, we discovered that the normal stress exerts a significant influence on the RMN of rock joint specimens. When sampling for engineering applications, such as UDEC or 3DEC modeling of a cavern, it is advisable to consider a normal stress range that allows for approximately 2 to 3 times the tangential stress. Determining the appropriate RMN under these circumstances can be perplexing. In our previously published paper (Wang et al. 2022), we investigated the influence of normal stress on the selection of representative rock joint specimens. Specifically, specimens whose shear strength deviated from the population mean of the 100 rock joint specimens in the case study by less than 5% were selected as representative specimens. Following this approach, we plotted the locations of representative specimens for the 100 slate joints at normal stresses of 0.2 MPa, 1.0 MPa, 5.0 MPa, and 10.0 MPa in Fig. 16. Remarkably, the figure demonstrates that the locations of representative specimens remain consistent across different normal stresses. Even as the normal stress increases, a specimen identified as representative under lower normal stress can still maintain its representativeness. Therefore, for projects involving a wide stress range, we propose determining the RMN of specimens based on the lower value within the stress range. This conservative approach ensures that the results can be reliably applied to scenarios with higher normal stresses.

Fig. 16
figure 16

The locations of representative specimens determined under different normal stresses (Wang et al. 2022)

As mentioned in Sect. 4.3, the recommended RMN decreases as the normal stress increases. However, it remains crucial to conduct numerous laboratory tests, especially in cases involving lower normal stresses. For instance, at a normal stress of 1.0 MPa, when aiming for an acceptable ε value of less than 5% and a corresponding Pε value of larger than 95% (as depicted in Fig. 13), a total of 35 laboratory tests are required by the slate joint. Fortunately, the tilt tests (and Schmidt hammer for JCS) can be performed in large numbers from each joint set. These tests have the potential to provide insights into interpreting the required RMN for more demanding direct shear tests, such as those used in UDEC-BB modeling. In this modeling approach, input parameters for each set, such as JRC, JCS, and residual friction angle, play a crucial role in determining the appropriate RMN (Barton and Choubey 1977). For JRC, JCS, and residual friction angle estimation, the RMNs of joint specimens can be determined individually. Subsequently, the maximum number of specimens among these determinations can be adopted as the recommended RMN for rock joints. In the future, tilt tests will be conducted to further validate the proposed method. Moreover, the accurate determination of the RMN of rock joints may be influenced by the exposed scale of large joints. For instance, when measuring a large rock joint such as the slip surface of a rock slope, only the exposed sub-areas can be assessed. However, limitations in the scale of exposed sub-areas or constraints on specimen extraction from these sub-areas may result in RMN values that differ from or are biased compared to the correct value.

6 Conclusions

This study presents a method for determining the required minimum number (RMN) of test specimens, taking into account the heterogeneity of rock joints in direct shear tests. The main conclusions drawn from the study are as follows:

Investigation of natural rock joints revealed distinct spatial variabilities in surface roughness and shear strength. The heterogeneity of joint roughness was found to be responsible for the variation in shear strength. The acquisition of specimens was treated as a combination problem in mathematics. However, the proportion of combinations with maximum difference ratios (MDR) lower than 5% was less than 1%, indicating the difficulty in obtaining a few specimens with similar characteristics.

By utilizing subgroups of rock joint specimens generated through Monte Carlo Simulation (MCS), the statistical characteristics of the sample mean μi of shear strength were examined, and the effectiveness of the number of specimens within subgroups was evaluated. The sample mean μi of shear strength for subgroups followed a normal distribution centered around the population mean μ. The standard deviation of μi decreased as the number of specimens increased, indicating that a larger number of specimens in a subgroup improved the accuracy of estimating the population mean μ. The relative error ε between the sample mean μi and the population mean μ, along with its corresponding probability Pε, were introduced to evaluate the efficacy of the number of specimens. It was observed that the RMN of specimens depended on the acceptable relative error, specified probability, and normal stress. For an acceptable relative error ε ranging from 5 to 10% and a corresponding probability Pε ranging from 85 to 95%, the RMN of the sandstone joint exhibits a decrease with an increase in normal stress, while it demonstrates an increase when considering acceptable relative error and probability requirements. A smaller RMN was needed under high normal stress for the same rock joint, demonstrating a decrease in shear strength heterogeneity with increasing normal stress. Additionally, the comparison of RMN values between slate and sandstone joints highlights the importance of meticulous attention not only to normal stress, but also to the rock material properties when determining the RMN of specimens for laboratory testing on shear strength of rock joints.

Future work will include conducting experimental investigations to explore the heterogeneity of shear strength in different types of natural rock joints with varying roughness. Additionally, the scale effect of shear strength will be taken into account during the RMN determination process. The data obtained from these investigations will serve as the foundation for further analysis and refinement of the proposed methodology.