A hierarchical cosimulation algorithm integrated with an acceptance–rejection method for the geostatistical modeling of variables with inequality constraints

This work addresses the problem of the cosimulation of cross-correlated variables with inequality constraints. A hierarchical sequential Gaussian cosimulation algorithm is proposed to address this problem, based on establishing a multicollocated cokriging paradigm; the integration of this algorithm with the acceptance–rejection sampling technique entails that the simulated values first reproduce the bivariate inequality constraint between the variables and then reproduce the original statistical parameters, such as the global distribution and variogram. In addition, a robust regression analysis is developed to derive the coefficients of the linear function that introduces the desired inequality constraint. The proposed algorithm is applied to cosimulate Silica and Iron in an Iron deposit, where the two variables exhibit different marginal distributions and a sharp inequality constraint in the bivariate relation. To investigate the benefits of the proposed approach, the Silica and Iron are cosimulated by other cosimulation algorithms, and the results are compared. It is shown that conventional cosimulation approaches are not able to take into account and reproduce the linearity constraint characteristics, which are part of the nature of the dataset. In contrast, the proposed hierarchical cosimulation algorithm perfectly reproduces these complex characteristics and is more suited to the actual dataset.


Introduction
The multivariate stochastic modeling of continuous variables and the quantification of uncertainty at unsampled locations are of paramount importance in various disciplines in earth sciences, such as mineral resource classification (Battalgazy and Madani 2019a, b;Adeli et al. 2018;Hosseini and Asghari 2018), environmental studies (Eze et al. 2019;De Benedetto et al. 2010), geochemistry (Madani and Carranza 2020), geometallurgy (Abildin et al. 2019), geophysics (Castrignanò et al. 2017), and hydrogeology (Linde et al. 2015;Mariethoz et al. 2010). In this respect, the existence of inherent correlations among the variables motivates one to use cosimulation approaches rather than independent simulation (Chilès and Delfiner 2012;Goovaert 1997;Rossi and Deutsch 2014). Among other approaches, tuning bands cosimulation (Emery 2008) and sequential Gaussian cosimulation (Verly 1993;Tran 1994;Gómez-Hernandez and Cassiraga 1994) have been widely used in practice because of their simplicity and straightforwardness. In these techniques, direct and crossvariograms should be taken into account as linear models of coregionalizations (Wackernagel 2003;Chiles and Delfiner 2012). However, one may find this step of variogram inference tedious, particularly if the number of variables is more than two. To overcome this obstacle, factorization based on the decorrelation of the original variables is an alternative. In this technique, the crosscorrelated variables are transformed to decorrelated factors, and then independent simulation can be applied over all factors. The simulated factors should then be back-transformed to the original scale to restore the primary crosscorrelation structure. Regarding this, factorization methods can be employed in multivariate geostatistical analysis, such as Principal Component Analysis (PCA) (Wakernagel, 2003, Goovaerts 1997, the Minimum/Maximum Autocorrelation Factor (MAF) (Desbarats and Dimitrakopoulos 2000;Rondon 2012;Vargas-Guzman and Dimitrakopoulos, 2003), and Enhanced Coregionalization Analysis (ECA) (Emery and Ortiz 2012). However, these approaches, which are based on either a linear models of coregionalization or decorrelation, are restricted to linear bivariate relationships and are not able to handle complexities such as heteroscedasticity and nonlinearity in the variables. Some ways around this issue involve other geostatistical factorization approaches, such as Stepwise Conditioning Transformation (SCT) (Leuangthong and Deutsch 2003), flow anamorphosis ( Van den Boogaart et al. 2017), and projection pursuit multivariate transform (Barnett et al. 2014(Barnett et al. , 2016. Nevertheless, it is a common practice in multivariate geostatistics to address other types of complexity between variables, such as geological constraints, that lead to the introduction of an inequality constraint in their bivariate relations, which is different from heteroscedasticity and nonlinearity. This characteristic implies that one of the variables is always less than, greater than, or equal to another variable, thus making geostatistical cosimulation challenging, given the expectation of reproducing this type of linear restriction after the cosimulation process, and highly impacting the further postprocessing of the simulation results. Several techniques have been developed to reproduce this type of constraint, such as quadratic programming (Mallet 1980), constrained interpolation functions (Dubrule and Kostov 1986), Stepwise Conditional Transformation (Leuangthong and Deutsch 2003;Hosseini and Asghari 2015) and changing to variables that are free of inequality constraints Abildin et al. 2019). Emery (2012), for instance, presented a methodology based on explicitly modeling the conditional distributions of the variables by converting them to Gaussian random fields through Stepwise Conditional Transformation, taking into account the linear models of coregionalization to allow cosimulation to be implemented and to handle the issue of preferential sampling design. Arcari Bassani et al. (2018) integrated projection pursuit multivariate transform (PPMT) with a change-of-variables technique as another alternative to cosimulate the variables with both sum and fraction constraints. Another method to address the inequality constraint is to utilize a log-ratio transformation of the original variables (Pawlowsky- Glahn and Olea 2004;Pawlowsky-Glahn and Egozcue 2006) or utilize their stoichiometric relations (Mery et al. 2017;Adeli et al. 2018). However, there are two potential difficulties in working with these approaches: their applicability can be restricted to the sampling pattern or to the marginal distribution of the variables. In the former case, when the sampling is partially or totally heterotopic, transformation-based techniques may not be easily applicable, since it is required that both variables be present at all the sample locations. The latter difficulty concerns the marginal global distribution of variables that should be identical in terms of their range and type of distribution.
This work presents an algorithm based on an updated version of the hierarchical sequential Gaussian cosimulation proposed by Almeida and Journel (1994), integrated with an acceptance-rejection technique to cosimulate variables with inequality constraints or any type of marginal distribution. The proposed methodology in a real case study also involves the derivation of an equation based on a robust regression analysis that introduces an inequality constraint on the inferred function rather than on the sample points alone.
The paper proceeds as follows: (1) The methodology is described, including the concept of inequality constraints, derivation of the linear regression function, acceptancerejection technique, and description of the proposal for updated hierarchical cosimulation. (2) The application of the proposed approach to a real case study from an Iron deposit is presented. (3) The validation of the results and comparison with other common techniques of cosimulation are carried out, and (4) a conclusion is given.

Methodology
The proposed algorithm in this study is based upon a hierarchical cosimulation algorithm that is tailored to an acceptance-rejection technique for modeling variables with inequality constraints. Since this technique needs a linear inequation to be fitted to a bivariate joint distribution of variables with inequality constraints, first, a brief introduction is given to the concept of this bivariate complex characteristic; then, we explain how the function can be inferred from this type of joint distribution. In addition to reviewing the acceptance-rejection method that is given subsequently, the proposed algorithm based on hierarchical cosimulation is discussed, and the proper workflow is provided for the implementation of this geostatistical modeling technique.

Inequality constraints through an inequation
In the joint simulation of random functions, it is a common practice to deal with particular joint distributions that have inequality or equality constraints. This property can be classified into three main families. The first family corresponds to strict inequality constraints in bivariate cases, in which a target variable Z 2 is always either less than or greater than an auxiliary variable Z 1 (i.e., Z 2 \Z 1 or Z 2 [ Z 1 Þ over the sample points. The second family, nonstrict inequality constraints, addresses situations in which a target variable can be either less than, greater than or equal to an auxiliary variable (i.e., Z 2 Z 1 or Z 2 ! Z 1 Þ on the sample points. An example of this is in the modeling of copper deposits in which the soluble copper grade is a fraction of the total copper grade, meaning that the former variable is always less than or equal to the latter variable in all the sample locations (Emery 2012;Hosseini and Asghari 2015;Abildin et al. 2019). This family is also known as a fraction constraint when both variables have the same distribution (Arcari Bassani et al. 2018). The third family, nonstrict inequality constraints expressed by an inequation, concerns a particular type of multivariate study in which the target variable can be less than, greater than or equal to the auxiliary variable; thus, one can see both situations of Z 2 Z 1 and Z 2 ! Z 1 together within the sampling points ( Fig. 1). In this case, the inequality constraint is not generalized over the sample points; rather, it can be imposed by a linear inequation over a joint distribution, entailing that the target variable is always less than or equal to the auxiliary variable below that strict line. For instance, in Fig. 1a, b, two variables show a positive correlation (a) or negative correlation (b), while each of them demonstrates different distributions and ranges. If one looks at the experimental sample points themselves on this scatter plot, a mixture of inequality constraints may appear in the form Z 2 Z 1 in some sample points and in the form Z 2 ! Z 1 in other sample points. Although confusion about the generalization of inequality constraints over sample points may arise from these observations, the underlying variables may show a particular inequality constraint property that can be characterized by a linear inequation apart from the observations alone. The line representing this inequation is located directly above the scattered points. Based on this conceptual technique, for the sake of an example, a linear function can be inferred that implies Z 2 aZ 1 þ b in the case of a positive correlation (Fig. 1a) and Z 2 À aZ 1 þ b in the case of a negative correlation (Fig. 1b), where a and b represent the slope and intercept of this linear inequation, respectively.

Fitting the inequation
In the case that the inequality constraint can be identified through an inequation (i.e., the third family in the previous section), one needs to simulate the values that follow the underlying joint distribution respecting that linearity constraint. In this context, it is of interest to simulate the values that fall below that linear function. The common cosimulation approaches, however, are unable to comply with such a restriction, resulting in the simulated values being distributed throughout the scatterplot. Instead, the proposed hierarchical simulation algorithm is capable of handling this issue through integrating an acceptance-rejection technique. To implement this technique, and before any further steps are taken, this inequation should be formulated so that by using this function in the simulation process, the proposed algorithm will force the generated random numbers to follow the desired joint distribution. This means that the simulated values for Z 2 in Fig. 1, for instance, will always be less than or equal toẐ 2 , which is the predicted Z 2 value, via an inequation that is a function of Z 1 .

Regression analysis
To obtain the parameters of the specific line (i.e., the underlying inequation), a combination of regression analysis and an optimization algorithm is proposed. In a general sense, a regression line should first be fit to the bivariate dataset, the derived line should be moved upwards, and a line that covers the scattered points can then be fitted parallel to it. This line then identifies a cap and limitation for the simulation values of the target variable Z 2 in combination with the acceptance-rejection method and the proposed updated hierarchical cosimulation algorithm. This is explained in the following sections.  This idea is illustrated in Fig. 2, where the red line that was first obtained from regression analysis is moved parallel upwards to infer the underlying blue line that we need to embed in the simulation workflow.
The minimum support line (blue line in Fig. 2) can be obtained in such a way that the set of data in a position is confined as close as possible to the points. Let g n i denote the given set of sample points on a Euclidean distance plane with n sample points. It is of interest to determine the line where W indicates the slope and b represents the intercept of the line. The idea is to minimize the error between the estimated line and the experimental sample points. There are some algorithms to find the underlying parameters, W and b, which are discussed in computational geometry (O'Rourke et al. 1998), such as convex hull algorithms (Preparata and Shamos 2012). However, in this study, we suggest a more straightforward approach based on robust regression analysis and in particular the Huber loss function (''Appendix 1''). When the parameters are obtained and the line is derived (i.e., the red line in Fig. 2) by these approaches to infer the minimum support line (i.e., the blue line in Fig. 2), that line can be shifted upwards by Z 2 i is the predicted Z 2 value. This will be shown through a practical example.

Acceptance-rejection technique
Once the function is derived, an acceptance-rejection method integrated with the cosimulation algorithm must be taken into account so that the simulated values of the auxiliary and target variables follow the desired joint distribution.
This technique is intended to simulate a random variable X with a probability density function f ðxÞ considering another known distribution gðxÞ and a constant C such that 8x; f ðxÞ CgðxÞ (Bishop 2006). The background of this method is briefly discussed in ''Appendix 2''.
There is a simpler alternative to the acceptance-rejection technique, namely, the ''Von Neumann'' approach (''Appendix 3''), which was first proposed in 1951. In this algorithm, Cg x ð Þ ¼ f ðxÞ max is a uniform distribution with an identified range of x in ½0; x max and 0\f x ð Þ\f ðxÞ max , where the upper bound in f x ð Þ is such that the distribution function is within this range for all x within the closed interval. The procedure here is very straightforward: one can randomly draw values in x and f within the region and check whether the points represented by these values fall below or above the target distribution function f x ð Þ. If they are below or above the curve, the points are accepted or rejected, respectively. As shown in Fig. 3, the red and blue points are rejected and accepted sample points, respectively. The gray shaded tail represents a region that we are leaving out.
This technique is taken into consideration for the simulation of variables with a desired distribution, such as those illustrated in Fig. 1. Given that the auxiliary variable at the grid node u i is known, the target variable can be iteratively simulated by this acceptance-rejection technique until the simulated values meet the restriction imposed by the minimum support line, meaning that they are scattered below the underlying fitted line. This concept is demonstrated in Fig. 4. Let Z 1 u i ð Þ ¼ a denote the simulated value of the auxiliary variable at grid node u i and represent the corresponding value of the target variable obtained from the fitted inequation implementing the underlying inequality constraint. Following the acceptance-rejection technique, the simulated value of the target variable at the grid node u i is accepted given that x min Z 2 u i ð Þ x max ; otherwise, it is rejected. In this scenario, the simulation continues as long as it produces an acceptable value for Z 2 u i ð Þ b Z 2 ðu i Þ, falling below the density curve of the target variable Z 2 and dropping into the defined range.

Hierarchical sequential Gaussian cosimulation
The proposed algorithm is based on the conventional hierarchical cosimulation approach that was developed by Almeida and Journel (1994). Therefore, a review of the conventional approach is given first, and then the proposed algorithm is described in detail. The hierarchical sequential Gaussian cosimulation is subject to the availability of the auxiliary variable at the target locations. This requirement is frequently met in oil reservoir characterization, since the auxiliary variable (e.g., seismic data) is typically available over an entire region (Pyrcz and Deutsch 2014). Taking into account these useful auxiliary data in static reservoir modeling, the target continuous variables, such as porosity and permeability, can be modeled in three-dimensional space by considering the collocated cokriging system in the cosimulation workflow. However, in the mining industry, the auxiliary variable may rarely be available exhaustively at the target location. Instead, one may regularly face some particular sampling configurations of, for instance, borehole data, such as heterotopic sampling patterns, for which the target and auxiliary variables partially share some geographical locations, leading to the greater availability of the auxiliary variable (Wackernagel 2003). Madani (2019) discussed how this hierarchical approach in the sense of resource estimation can be adapted to mining engineering problems for the purpose of block modeling and tonnage reporting. In addition, for stochastic modeling of this particular type of sampling pattern (i.e., heterotopic sampling), the conventional hierarchical cosimulation algorithm proposed by Almeida and Journel (1994) can also be applied to simulate each variable in turn, rather than all of them at the same time, given that the value of the variable is conditional on the previously simulated values. This stochastic algorithm, through a predetermined hierarchical structure, is established based on a collocated cokriging approximation (Goovaerts 1997). The steps of this hierarchical cosimulation approach in the case of two variables (i.e., the target and auxiliary) can be described as follows: 1. Define a hierarchy of variables, starting with the most important or best autocorrelated variable Z 1 (hereafter denoted as the auxiliary variable). In a heterotopic sampling pattern, a variable with greater availability usually represents a better description of the spatial continuity or autocorrelation. Another variable can therefore be denoted as the target variable Z 2 . 2. Transform both the auxiliary and target variables (Z 1 and Z 2 ) independently to their normal scores (Y 1 and Y 2 ). This can be implemented by methodologies such as Gaussian anamorphosis (Rivoirard 1994) and quantile-based approaches (Deutsch and Journel 1998): where G À1 ð:Þ is the standard normal cumulative distribution function, Fð:Þ is the cumulative distribution function of the original variable Z and Y is the normal score. 3. Define a random or regular path as an order, in which each grid node is visited only one time. In practice, it is better to consider a random sequence than a regular sequence (Isaaks 1990). Nevertheless, a random path can be considered constant from one realization to another realization if the neighborhood is large enough, which saves significant computational resources in the simulation algorithm without a serious loss of accuracy in the spatial statistics reproduction (Nussbaumer et al. 2018). In the case of a long-range variogram model, a multiple-grid simulation is recommended, in which it is advisable to first simulate the locations corresponding to a coarse subgrid and then progressively refine this grid to denser grids (Deutsch and Journel 1998). In this respect, Nussbaumer and Mariethoz (2017) showed that paths for which the simulation starts at distant nodes, such as multiple grid, are more efficient than types of paths, such as row-by-row and spiral paths, that start at close nodes. Within each grid, a random path is followed. 4. At each grid node u i : (a) Use simple kriging (SK) (''Appendix 4'') to infer the parameters (i.e., mean and variance) of the Gaussian conditional cumulative distribution function (ccdf) Gðu 0 ; yj n ð ÞÞ required for simulating the auxiliary variable at location u i ; i 2 f1; . . .kg, with k grid nodes from Y 1 conditional on the neighborhood data, including the normal score data n, and previously simulated values k À 1 at the visited grid nodes in either a moving or unique neighborhood.
Then, choose a simulated value Y m 1 ðu i Þ with m realizations by the obtained simple kriging estimate Y SK 1 u i ð Þ and its corresponding variance r SK u i ð Þ, and add it to the conditioning dataset: where U m i is a set of m different random numbers obtained from the Nð0; 1Þ residual, i.e., a standard Gaussian random variable independent of U m 1 ; . . .; U m iÀ1 (b) Use simple collocated cokriging (SCCK) to infer the parameters (i.e., mean and variance) of the Gaussian conditional cumulative distribution function (ccdf) Gðu 0 ; yj n ð ÞÞ required for simulating the target variable at location u i ; i 2 f1; . . .kg, with k grid nodes from Y 2 conditional on the neighborhood data Y 2 ðu a 2 Þ, which includes the normal score data and previously simulated values of the target variable of size n and k À 1, respectively, as well as the previously simulated collocated data from the auxiliary variable Y m 1 u i ð Þ at the visited grid nodes in either a moving or unique neighborhood. A simulated random number can then be generated for the target variable Y m 2 ðu i Þ taking into account the obtained simple collocated cokriging estimate Y SCCK 2 u i ð Þ and its variance x SCCK u i ð Þ and then added to the conditioning dataset according to: 5. Loop until all k grid nodes are simulated. 6. Back-transform the realizations Y m 1 and Y m 2 to simulated values for the original scale of the variables Z m 1 and Z m 2 , respectively; this requires applying the inverse of the normal score transform to the simulated values: where F À1 ð:Þ is the inverse cdf or quantile function of the variable Z and Gð:Þ is the standard Gaussian cdf.
In this workflow, to quantify the spatial continuity of the autocorrelation and solve the cokriging and kriging systems, a full matrix of coregionalization is not necessarily required; instead, one needs to derive the direct covariance of the target variable C 22 h ð Þ and the cross-covariance between the auxiliary and target variables C 12 h ð Þ ¼ C 21 h ð Þ and then simplify the variance of the auxiliary variable C 22 0 ð Þ. This aim can, however, be achieved by a Markov model, for instance, type II (Journel 1999), in which the cross-covariance is proportional to the target direct covariance and shares its characteristics: shape, correlation range and scale factor (Journel 1999).
Since one is dealing with normal score data for both the auxiliary and target variables in addition to the cross-covariance between these two variables at lag 0, C 12 0 ð Þ is identical to their corresponding linear correlation coefficient.

Proposed approach
In traditional hierarchical cosimulation, in step 4, simple kriging is first utilized to simulate the auxiliary variable. Although this is fast and straightforward, the cross-dependency between the auxiliary and target variables is neglected. Second, since the auxiliary variable will be available at the grid nodes after step 4(a), it is assumed that the collocated datum at the grid node of interest screens out the influence of the auxiliary data located farther away using simple collocated cokriging (Journel 1999), following step 4(b). Madani and Emery (2019) argued that this might not be true in the case where the direct covariance of the auxiliary variable shows different spatial characteristics (e.g., scale factor) compared to the direct covariance of the target variable, and consequently, a linear model of coregionalization might not be suitable for fitting the (cross)covariance functions. Another difficulty of the conventional algorithm is that it is not suitable for cases where there is a linearity constraint between the variables. Because the random number generator U m i in Eq. (11) that produces the simulated values chooses the normal scores from a joint multiGaussian distribution, it conceptually follows the elliptical shape of a bivariate Gaussian. This feature does not allow for modeling the variables with inequality and linearity constraints, which is a common practice in resource estimation. To overcome these impediments, an updated variant of the hierarchical sequential Gaussian cosimulation algorithm is introduced, as follows: 1. Define a hierarchy of variables, starting with the most important or best auto-correlated variable Z 1 (auxiliary variable). Another variable, therefore, can be denoted as the target variable Z 2 . 2. Transform both the auxiliary and target variables (Z 1 and Z 2 ) independently to their normal scores (Y 1 and Y 2 ). 3. Define a random or regular path as an order in which each grid node is visited only one time. 4. At each grid node u i (a) Use simple cokriging (SCK) to infer the parameters (i.e., mean and variance) of the Gaussian conditional cumulative distribution function (ccdf) Gðu 0 ; yj n ð ÞÞ required for simulating the auxiliary variable at location u i ; i 2 1; . . .k f g, with k grid nodes from Y 1 conditional on the neighborhood data Y 1 u a 1 ð Þ and Y 2 u a 2 ð Þ including the normal score data of size n 1 and n 2 , respectively, and the previously simulated values of the auxiliary and target variables of size k À 1 at the visited grid nodes in either a moving or unique neighborhood.
A simulated random number can be generated for the auxiliary variable Y m 1 u i ð Þ, taking into account the simple cokriging estimate Y SCK 1 u i ð Þ and its variance r SCK u i ð Þ, and then added to the conditioning dataset: with the same notation as above. (b) Use simple multicollocated cokriging (SMCCK) (Rivoirard 2001;Wackernagel 2003;Chilès and Delfiner 2012) to infer the parameters (i.e., mean and variance) of the Gaussian conditional cumulative distribution function (ccdf) Gðu 0 ; yj n ð ÞÞ required for simulating the target variable at location u i ; i 2 1; . . .k f g(where k is the number of grid nodes), from Y 2 conditional on the neighborhood data Y 1 u a 1 ð Þ and Y 2 u a 2 ð Þ including the normal score data of size n 1 ¼ n 2 ¼ n, the previously simulated values of the auxiliary and target variables of size k À 1, and the previously simulated collocated datum from the auxiliary variable Y m 1 u i ð Þ at the visited grid nodes in either a moving or unique neighborhood.
A simulated random number can be generated for the target variable Y m 2 u i ð Þ, taking into account the simple multicollocated cokriging estimate Y SMCCK 2 u i ð Þ and its variance x SMCCK u i ð Þ, and then added to the conditioning dataset: with the same notation as above.
The proposed approach is exactly applicable here. In the case of conventional cosimulation, the above formula allows any random number to be chosen following the multiGaussianity assumption, which entails that Y m 2 u i ð Þ and Y m 1 u i ð Þ show an elliptical shape during the simulation. This, however, is not practical in the case of a given inequality constraint in the dataset. Therefore, an acceptance-rejection method can be introduced and integrated with the above formula so that Y m 2 u i ð Þ and Y m 1 u i ð Þ can be in agreement with the corresponding joint distribution with a restricted inequality constraint. This situation can occur through an acceptance-rejection method as follows: (i) Obtain the simulated value for Y m 2 u i ð Þ from Eq. (6) and, coupled with Y m 1 u i ð Þ, back-transform it to the original scale.
if so, accept the normal score simulated value of Y m 2 u i ð Þ and continue with step (5); otherwise, simulate another value for Y m 2 u i ð Þ and loop until Y m 1 u i ð Þ and Y m 2 u i ð Þ meet the requirements of the inequality constraint in the original bivariate relation; finally, the corresponding point lies under the fitted line. In this step,Ŷ 2 is obtained from the fitted inequation as stated in the previous section so that it is eitherŶ 2 aY 1 þ b orŶ 2 ! aY 1 þ b, depending on the sign of the correlation coefficient between the target and auxiliary variables Y 2 and Y 1 , that is, negative in the first inequation and positive in the second inequation.
5. Loop until all K grid nodes are simulated. 6. Back-transform the realizations Y m 1 and Y m 2 to the simulated values of the original scales of the variables Z m 1 and Z m 1 , respectively. In this proposed workflow of cosimulation, it is required that the direct and cross-covariance matrices be derived first to establish the cokriging systems in steps 4(a) and 4(b) and then compute the corresponding weights. To do so, the linear model of coregionalization and the intrinsic correlation models are possible options to be fitted to the covariances due to their simplicity and tractability (Journel and Huijbregts 1978;Goovaerts 1997;Wackernagel 2003).
In this proposed algorithm, as indicated, simple cokriging is substituted for simple kriging in step 4(a), and multicollocated cokriging is substituted for collocated cokriging in step 4(b) in the conventional workflow of cosimulation. In the first instance, the proposed improvement entails using the contribution of the target variable to simulate the auxiliary variable at grid node locations. This feature strengthens the cosimulation procedure, particularly in a case in which the sampling pattern is partially or totally heterotopic (Wackernagel 2003). The second improvement, i.e., the employment of multicollocated cokriging, allows the maximal impact of the cross-correlation structure between the variables to be considered by taking into account the retained auxiliary variable at grid nodes u i and at the locations where the target variable is accessible.
In addition, this characteristic of multicollocated cokriging provides an opportunity to simulate the variables in such a way that they obey a desired joint distribution following an acceptance-rejection method, as explained. For this purpose, an acceptancerejection algorithm is integrated with the proposed hierarchical cosimulation, which enables one to simulate values with inequality and linearity constraints.

Presentation of the case study
To show the practical aspects of the proposed cosimulation algorithm, a case study concerning an Iron ore deposit is considered. This dataset pertains to an exploratory diamond drill hole project. The coordinates are transformed and the variables are multiplied by a scale factor to preserve the confidentiality of the dataset. In the 2137 sample points, two variables are measured, Iron and Silica, in a homotopic sampling pattern in which the two variables share the same locations (Wackernagel 2003). The location maps of the Iron and Silica and the corresponding rock types, Hematite and Itabirite, can be seen in Fig. 5. The Iron is highly concentrated in the center of the deposit and is negatively correlated with Silica. The Hematite is spatially distributed in the center and is mostly surrounded by Itabirite. Visual inspection of the location map additionally demonstrates that the high and low concentrations of Iron and Silica, respectively, mostly embedded into the Hematite rather than the Itabirite, explicitly shows that the rock types regulate the spatial distribution of Iron and Silica. Due to the irregularity of the sampling pattern in the region (Fig. 5) (i.e., boreholes scattered more in the center than the corners), the statistical parameters are less representative. One way around this issue is to perform cell declustering (David 1977;Deutsch 1989), which can ensure that the statistical parameters are not biased and furthermore that they are not affected by the scarcity of data in some part of the region. To do so, a cell size of 20 m 9 20 m 9 5 m is chosen based on the primary pattern of the borehole as seen in the plane. The statistical parameters of both variables after declustering are provided for each rock type separately and for the entire region, as shown in Table 1. Hematite bears a higher grade of Iron and a lower grade of Silica than Itabirite. The coefficient of variation (COV) parameter, which is also known as the relative standard deviation, is a good indicator of the global dispersion of a frequency distribution and is applicable in comparing variables with different statistical parameters. As seen in the table, the COV for Silica is slightly higher in Hematite than Itabirite, and Iron is slightly higher in Itabirite than Hematite, even though both display relatively low COVs throughout the region.
The global bivariate statistic is considered as shown in Fig. 6, where it is noted that there is a strong negative correlation coefficient between Iron and Silica, q ¼ À0:95, with completely different global marginal distributions, such that Iron and Silica exhibit negative and positive skewness, respectively, and distinct ranges. In addition, it is evident that Iron and Silica follow extreme-value and lognormal distributions, respectively, which are seen more  (Fig. 6) reveals that, irrespective of the rock type, the dataset demonstrates a significant inequality constraint. This indicates that a mixture of nonstrict inequality constraints manifests itself in the sample points, meaning that in some points (#148), one observes Iron Silica and in other points (#1989), the opposite holds. However, intuitively speaking, a conceptual line can be drawn for the corresponding function, in which Silica f Iron ð Þ, signifying that the inequality constraint can be determined through an equation rather than the sample points alone. This characteristic motivates one to use the proposed algorithm for stochastic modeling of Iron and Silica based on the hierarchical sequential Gaussian cosimulation, integrating it with an acceptance-rejection technique. Prior to this modeling, the inequation should be fitted.

Selection of the target and auxiliary variables
In this study, we considered Silica as the target variable Z 2 and Iron as the auxiliary variable Z 1 . The reason is related to the importance of Silica in Iron deposits, which is frequently neglected in the modeling procedure, although it has a large impact on the spatial distribution of Iron. In fact, the amount of Silica in Iron deposits is crucial for mineral processing plants and the final products of Iron, so Silica is considered a problematic compound that significantly influences the quality of the pellets produced and the selling price. For instance, high Silica leads to an increase in the slag volume of blast-furnace Iron making, an increase in energy consumption, and a decrease in productivity (Pal 2019). In this context, it is always of interest to produce a high-grade Iron concentrate for pellet feed with low amounts of Silica (Dey et al. 2012). Moreover, the high concentration of Silica makes the flotation of Febearing ores such as Hematite and magnetite difficult (Flint et al. 1992;Quast 2017;Veloso et al. 2018;Massola et al. 2009). The beneficiation and separation of Silica from Iron becomes more complex when the ore types are of a lower Iron grade (Dukino et al. 2000). Pérez- Barnuevo et al. (2018), through a texture analysis of core samples, showed that there is a strong correlation between Silica and crushing energy, such that more energy (kJ/kg) is needed to crush rocks with high concentrations of Silica in the processing of Iron ores. For instance, in India, approximately 10-12 million tons annually, 15-25% of the mined ore, is unused and discarded as slime due to its low Iron content and high amount of Silica, since this is not compatible with the chemistry of blast furnaces. Therefore, this slime is an environmentally polluting waste material (Ghose and Sen 2001;Jena et al. 2014;Mohanty et al. 2017), and proper care must be taken to deal with it before launching a project.
A high amount of Silica is not always harmful in industry. For instance, high-purity Silica with low amounts of Iron is one of the most important materials employed in the glass manufacturing, microelectronics, aerospace, biotechnology, new energy, and new material fields (Yang et al. 2017;Jang et al. 2008;Guo and Li 2011).
Therefore, a feasibility study and investigation of Silica variation along with Iron in the early stages of a mining project and its predicted level in Iron deposits have a significant impact on the later years of the life of the mine and are of paramount importance for mine planning activities, and they reflect the inherent geological variability of metallurgical performance and mine economics. (Vallejo and Dimitrakopoulos 2019).
In this regard, a proper and reliable mapping of the 3D spatial characterization of Silica contributes to detecting target mining blocks that may represent areas likely to have a high amount of Iron and a low concentration of Silica or vice versa. For instance, Hematite containing Iron ores is the main source of Iron for the steel industry (Veloso et al. 2018).

Fitting the minimum support line
Once the target and auxiliary variables are identified, to derive a suitable function, following the proposed approach explained in the methodology section, a robust linear regression should be performed using the Huber loss (Peter 1964). The coefficients of this line are obtained using the SCIKIT-LEARN library (O'Rourke 1998): Slope ¼ À1:465 and Intercept ¼ 67:535: This library is open-source software for machine learning in the Python programming language with several features for regression analysis (www.scikit-learn.org).
This linear function is illustrated in Fig. 7 as a solid red line. Then, this line is shifted upwards to derive the minimum support line (blue line in Fig. 7)   The information of this line is then taken into consideration for hierarchical simulation based on the proposed algorithm.

Simulation of the grade and rock type
In this section, we address two rock types, Hematite and Itabirite, in the region; the grade distributions of Silica and Iron might be different in each rock type zone. To apply the proposed methodology, the modeling of rock types is also taken into account, which is a common practice in 3D geostatistical modeling of ore deposits. Generally, in mineral resource modeling, one usually addresses continuous variables (e.g., the grade) and categorical variables (e.g., the rock type). Because of the interdependency of variables, the evaluation of mineral resources relies on modeling the grade conditional on the rock type. In such cases, the latter frequently regulates the variability of the former in the region. Therefore, geostatistical modeling can be implemented by either of two common approaches: 1. Joint simulation In this technique, the cross-correlation structure between the continuous and categorical variables entails using a joint simulation algorithm to model two or more Gaussian random fields throughout the region. Then, through the back-transformation and a predefined truncation rule, one can obtain the simulated grades and rock types (Dowd 1994(Dowd , 1997 The decision to employ either approach for resource modeling depends on the behavior of the continuous variables across the boundaries between two adjacent rock types. A soft boundary, compatible with joint simulation, is a gradual transition in the grade distribution when crossing a rock type boundary. In contrast, a hard boundary, which introduces an abrupt change or discontinuity in the grade distribution between two rock types, is adequate for cascade simulation (Kim et al. 2005).

Contact analysis
To better understand and predict the behavior of continuous variables when crossing the boundary between two adjacent rock types (i.e., whether the boundary is hard or soft), different types of contact analysis, such as mean grade contact analysis and cross-correlation contact analysis, can be implemented (Ortiz and Emery 2006;Larrondo and Deutsch 2005). In the former type, pairs of data are found in the two adjacent rock types at predefined distances, such that the tail is in one rock type and the head is in another rock type very close to the boundary. Then, the mean grade of each group at the corresponding predefined distance can be plotted versus the distance to the boundary (Rossi and Deutsch 2014). This tool identifies whether the behavior of the grade is ''hard'' or ''soft'' when moving closer to or farther from a boundary (Maleki and Emery 2015). Another tool of contact analysis that is related to the cross-correlation structure is the cross-variogram between the grades of different rock types. However, since one is dealing with disjoint pairs of data, which entails a totally heterotopic sampling pattern (Wackernagel 2003), the cross-variogram cannot be derived from the data (Chilès and Delfiner 2012).
To address this difficulty, the cross-correlogram (Maleki and Emery 2015) or pseudo-cross-variogram (Ortiz and Emery 2006;Emery and Maleki 2019) can be computed from the data even when there are no matching samples. These functions show how strongly correlated the grade values are between the two sides of the boundary (Maleki and Emery 2017). As shown in Fig. 8a, c, the mean grade contact analysis shows that the transition of Iron and Silica between Hematite and Itabirite can be described by a hard boundary. This manifests when approaching the border between these two rock types from large lag separations (i.e., leaving Hematite and entering Itabirite), where the mean grade changes suddenly from * 44 to * 38% for Iron and from * 3 to * 9% for Silica.
This can also be corroborated by the cross-correlation contact analysis, which shows that the correlation coefficient calculated from pairs that belong to two different rock types (i.e., Hematite and Itabirite) does not depend on the lag separation distances and abruptly disappears at small lag separations (Fig. 8b, d). This can be interpreted as a pure nugget effect that does not represent a significant spatial structure; thus, the Iron and Silica grades in Hematite are not spatially correlated with the Iron and Silica grades in Itabirite, meaning that the separation of the zones into two rock types is statistically significant and these two domains should not be merged.

Lagged scatter plot
It might also be of interest to consider the statistics of variables in two different rock types at short distances to confirm that the transition boundary is hard. The lagged scatter plots (Goovaerts 1997) computed omnidirectionally for a separation distance of less than 10 m verified that there exists no significant dependency between Iron and Silica in Hematite and Itabirite, as indicated by their low correlation coefficient. The tendency of the center of gravity in Iron compared to that of Hematite shows that the mean grade of Iron in Itabirite is lower than that of Hematite close to the border. In contrast, Silica has a higher mean grade in Itabirite than in Hematite (the center of gravity tends more toward Itabirite) (the related lagged scatterplots are shown in ''Appendix 4'').

Geostatistical modeling of the grade and rock type
The results of contact analyses and lagged scatter plots indicate that the behavior of Iron and Silica grades across the boundary is hard, supporting the use of the cascade simulation method. Following this, Iron and Silica are cosimulated in the Hematite zone completely independently from the Itabirite zone, in accordance with the hierarchical cosimulation algorithm proposed in this study. To do this, the layout of Hematite and Itabirite is first simulated by Sequential Indicator Simulation (Deutsch and Journel 1998), and then Silica and Iron are simulated in each rock type by using only the data that belong to the rock type of interest. The general geostatistical simulation process is as follows: 1. Simulate each rock type by Sequential Indicator Simulation (SIS). 2. For each SIS realization and each rock type: 3. Obtain the Silica and Iron grade realizations by juxtaposing the simulated grades in Hematite and Itabirite (each individual realization of grades is associated with a realization of rock types) At this level, the proposed algorithm based on the updated hierarchical sequential Gaussian cosimulation algorithm is employed by imposing the inequality constraint obtained from the minimum support line (Fig. 7) and integrating it with the proposed acceptance-rejection technique. It is worth mentioning that the criterion for accepting or rejecting the simulated value for Silica is determined by the original scale of the dataset. This means that the simulated value of Iron in each grid node should be back-transformed to its original scale to obtain acceptance of the simulation of Silica in the workflow. After confirming that the simulated value of Silica is compatible with the inequality constraint, the simulation continues through its corresponding normal score data and subsequently visits the rest of the grid nodes on the preidentified random path. Otherwise, the acceptance-rejection technique produces another value for Silica until it meets the bivariate joint distribution of Iron.

Modeling the rock types
To model the rock types in this study, a Sequential Indicator Simulation (SIS) was used to simulate the layouts of Hematite and Itabirite within the region of interest.
A Sequential Indicator Simulation (SIS) (Deutsch and Journel 1998;Deutsch 2006) is a variogram-based geostatistical algorithm, which is similar to sequential Gaussian simulation except that it simulates indicator variables. It is a particularly suitable technique for the simulation of categorical variables that can easily be converted to indicators. For K categories of sample points, the correspondence indicators in a particular location u can be defined as: In this algorithm, the spatial correlation can be obtained by variogram analysis of each indicator, and then simple kriging (SK), which is suitable in cases where the area is mostly stationary and no secondary variable is available, can be applied in the simulation workflow (Deutsch 2006). However, for cases with nonstationary characteristics, ordinary kriging (OK) can be applied.
To implement the SIS algorithm and produce the stochastic layout of the rock types, variogram analysis should be performed on the indicator data obtained from each category. Since we are just dealing with two categories, the variogram analysis of both indicators is identical, as each indicator is a complement of the other indicator. First, the anisotropy is checked by calculating the variogram in different directions. The results indicate that the spatial continuity is isotropic in the horizontal direction and anisotropic in the vertical, implying geometric anisotropy (''Appendix 4''). A two-nested variogram model is then fitted to the experimental variogram: Once the variogram analysis is ready, the realizations can be produced based on a defined search neighborhood. For this purpose, simple or ordinary kriging can be applied to establish the kriging system in the simulation workflow. However, the selection of the optimal SIS algorithm with the proper kriging paradigm is significant. It is common in SIS that the realizations produce some patchy results that have spotty variations that are not geologically significant (Deutsch 2006;Emery 2004). To circumvent this problem, one may need to use an image cleaning procedure (Deutsch 1998) to reduce the patchiness in the realizations. This technique is based on substituting a category with the most likely category in the target grid node, depending on the categories in the surrounding blocks (Deutsch 1998(Deutsch , 2006. In this case, we obtained different results from different levels of cleaning procedures, which could be light, heavy, or super heavy (Deutsch, 2006). The comparison of the results with the original local distribution of Hematite and Itabirite in the region as well as validation through reproduction of the declustered statistical proportions of Hematite and Itabirite revealed that SIS with ordinary kriging outperforms SIS with simple kriging. In addition, the light cleaning option is the best option in terms of the reproduction of statistical parameters. The histogram of the proportions of all 100 realizations for both SIS algorithms showed that the reproduction proportion for Hematite in SIS with ordinary kriging on average (0.20) is closer to the original declustered proportion of this category, 0.19, that is obtained from the borehole data. This is also true of Itabirite when comparing the proportions reproduced through the realizations with the original declustered proportions (the histogram of the proportion reproductions can be found in ''Appendix 4'').
The probability maps obtained from SIS with ordinary kriging are shown in Fig. 9. It can be seen that the probability of finding Hematite in the center is high, and these results are compatible with what can be observed from the local distributions of Hematite and Itabirite in the region, as illustrated in Fig. 5c.

Spatial structural analysis of Iron and Silica grades in Hematite
The declustered Iron and Silica Z ð Þ belonging to the Hematite zone are transformed to normal scores Y ð Þ by means of Gaussian anamorphosis (Rivoirard 1994). To check whether Y ð Þ is a realization of a Gaussian random field, the lagged scatter plots are calculated so that it can be confirmed whether the density of points follows a bivariate-Gaussian elliptical shape (Rivoirard 1994;Emery 2005). Although performing the univariate transformation and checking the bivariate Gaussianity are of paramount importance in the simulation workflow, for the implementation of cosimulation, where more than one variable is involved, it is required to check the multivariate Gaussianity between the normal scores (Leuangthong and Deutsch 2003). One significant aspect of this process is to examine the homoscedasticity and linearity between the cross-correlated normal scores (Johnson and Wichern 1998). It is confirmed that the bivariate density shape between the normal scores of Iron and Silica is consistent with the two latter properties of multivariate Gaussianity. This signifies that cosimulation can be employed in this case (see ''Appendix 4'').
The next step in the proposed sequential Gaussian cosimulation algorithm is to derive the direct and crossvariograms. Prior to this, anisotropy in the region is examined through the calculation of the variogram in different directions, and it is revealed that there is no significant elongation in continuity over the horizon; there is only one anisotropy in the vertical direction, meaning that the variogram is isotropic in the horizontal direction and anisotropic in the vertical direction, which, in general, implies geometric anisotropy (Chilès and Delfiner 2012;Rossi and Deutsch 2014). Therefore, four experimental variograms (direct and cross) are computed along these isotropic and anisotropic directions, and by means of a semiautomatic fitting approach (Emery 2010), two nested exponential structures are inferred based upon the linear model of coregionalization (Wackernagel 2003;Chilès and Delfiner 2012) (Fig. 10). The corresponding variogram function of Iron Y 1 ð Þ and Silica Y 2 ð Þ in this Hematite zone is:

Spatial structural analysis of Iron and Silica grades in Itabirite
The Iron and Silica in Itabirite Z ð Þ are transformed to their corresponding normal scores Y ð Þ by Gaussian anamorphosis. The resulting transformed variables can be considered as realizations of Gaussian random fields, since the isodensity of their lagged scatter plots at small distances (up to 15 m) for each variable and the scatter plots between the two normal scores Y 1 and Y 2 at lag 0 shows that they are somewhat compatible with ellipses, from which it can be verified that the bivariate and multivariate Gaussianity assumptions are confirmed for Silica and Iron in this zone (''Appendix 4'').
Likewise, in the Hematite zone, the anisotropy is determined in Itabirite by computing the experimental variogram in different directions and small tolerances at the original scale of the dataset. The results indicate that the spatial continuity parameters of the variables display the same characteristics as the continuity parameters obtained in the Hematite zone, i.e., they are isotropic in the horizontal direction and anisotropic in the vertical direction. Following this, the experimental direct and cross-variograms are calculated along these directions, and then the linear model of coregionalization is obtained through a semiautomatic fitting approach (Emery 2008) (Fig. 11). The fitting produces two-nested exponential structure variogram models for Iron Y 1 ð Þ and Silica Y 2 ð Þ in this Itabirite zone: Comparing the fitted direct and cross-variograms in the two different rock types of interest shows that the ranges of spatial continuity for Iron and Silica are different in the horizontal direction and identical in the vertical direction. This dissimilarity can also be seen in the obtained variogram sills, although the number of structures for both is the same. These types of characteristics support modeling the Silica and Iron in each rock type separately. It is worth mentioning that the inequality constraint corresponding to the inequation can only be reviewed for the original scale of the dataset (Fig. 6) and not for the normal score bivariate distribution.

Conditional simulation results
With the variogram fitted models and the realizations of rock types, one can establish the realizations of Silica and Iron grades. Once the simulated grades are prepared as mentioned earlier, the simulated values can be juxtaposed with the Hematite and Itabirite. The cosimulation of Silica and Iron in each rock type follows the approach proposed in this study, in which an inequality constraint is superimposed through an inequation to produce results that not only restore the bivariate relation between Silica and Iron but also respect the declustered statistical parameters of the original data. To compare the results of the proposed cosimulation algorithm, three alternative cosimulation algorithms are taken into account: conventional hierarchical cosimulation with multicollocated cokriging, conventional sequential Gaussian cosimulation with full cokriging (this is not hierarchical) and the projection pursuit  (Barnett et al. 2014(Barnett et al. , 2016Barnett 2017). In these four algorithms, the simulation techniques for cascade simulation are all equivalent for deriving the realizations of grades while deploying them to the realizations of rock types.
The simulation following the projection pursuit multivariate transform is slightly different in terms of the Gaussian transformation. In this algorithm, the cross-correlated variables are first decorrelated to implicitly remove the multivariate complexity (Barnett et al. 2016), based on the projection density estimation algorithm (PPDE) (Friedman 1987), and then, independent simulations are applied to model each decorrelated factor separately, taking into account their direct variogram inference. However, a slight correlation may remain between the factors after this transformation. To address this difficulty, either further decorrelation based on the Minimum/Maximum Autocorrelation Factor (MAF) (Barnett et al. 2014;Battalgazy and Madani 2019a) or directly fitting the coregionalization model to the factors after the first transformation (Battalgazy and Madani 2019b) may be applicable. In the former case, since the factors are independent, an individual simulation can be implemented. In the latter, a conventional cosimulation approach can be considered by means of a fitted linear model of coregionalization. In both cases, after simulation, the realizations should be back-transformed to the original scale. This technique, as argued in Barnett et al. 2014, is able to reinstitute all complex multivariate relations. Therefore, the Silica and Iron in each rock type zone are decorrelated by projection pursuit multivariate transform, and then, by sequential Gaussian simulation, each factor is simulated and back-transformed to the original scale.
As a result, and as a typical step in all Gaussian simulation paradigms, particularly sequential Gaussian cosimulation, a regular grid with a spacing of 20m Â 20m Â 10m (with point support and no discretization) in a moving neighborhood compatible with the ranges of the variogram, including 200 conditioning data consisting of the sample points and previously simulated grid nodes, is used to produce 100 realizations for each variable. Including more data in the neighborhood may increase the computation time, but the statistical results will be better, meaning that increasing the amount of conditioning data improves the quality of the simulation results in this sequential simulation paradigm (Emery 2004;Emery and Peláez 2011;Paravarzar et al. 2015). In this respect, Safikhani et al. (2017) showed that at least 200 conditioning data points are required to carefully regenerate the model statistics, such as the coregionalization model, while having fewer than 50 conditioning data points usually leads to a poor result. The E-type maps obtained from averaging the 100 realizations are illustrated in Fig. 12. Realization #1 of this technique is also provided in ''Appendix 4''. To save space, only the output of the proposed hierarchical cosimulation algorithm is presented in this figure, and the rest of the maps obtained from the other techniques are not provided, since there was no significant difference in their visual appearance. Instead, the results obtained from different techniques are compared in detail from a statistical point of view below.
4 Comparison of the algorithms by statistical analysis

Reproduction of the mean, variance and correlation
To validate the results and compare the four cosimulation algorithms, some statistical analyses are performed. This contributes to assessing the quality of the realizations in terms of the reproduction of the original declustered statistical parameters and choosing the proper cosimulation algorithm that best suits the bivariate complexity of the dataset. To do this, the original declustered mean and variance of Iron and Silica are compared to the average  mean of the cosimulated variables obtained with 100 realizations by all four methods: conventional hierarchical cosimulation with multicollocated cokriging, conventional cosimulation with full cokriging, proposed hierarchical cosimulation with multicollocated cokriging and projection pursuit multivariate transform. The results are all provided in Table 2. Although the reproduction of the mean for Iron is almost the same in all four approaches, in Silica, it shows a significant difference, as it is overestimated compared to the original declustered mean in the conventional cosimulation approaches and in PPMT, while in the proposed cosimulation algorithm, this difference is small and somewhat acceptable.
Another comparison related to the reproduction of variance shows that this statistical parameter on average over the realizations for Iron is very close to the original declustered variance in all the methods except PPMT, in which it is remarkably decreased. However, in Silica, the situation is slightly different. The average variance is significantly decreased in PPMT and to a lesser extent in the proposed hierarchical algorithm. To the best of the authors' knowledge, the reason for this is still unclear.
Another criterion for assessing the quality of each cosimulation algorithm is to check the reproduction of the dependence relationship between Silica and Iron. For this purpose, the correlation coefficients are calculated between the back-transformed values of the cosimulated grades in each realization. It is expected that the coefficients on average will fluctuate around the original declustered correlation, q ¼ À0:95. This occurs for the proposed cosimulation algorithm but not for the other three approaches, as illustrated via a boxplot ( Fig. 13 and Table 2). It is also of interest to assess the reproduction of inequality constraints in the bivariate relation between Silica and Iron. As seen from Fig. 14, the proposed algorithm is able to regenerate the bivariate shape of the correlation very well while respecting the inequality constraint imposed by the inequality function. In contrast, the reproduction of the bivariate shape in the other methods is extremely poor. This can be explained because, unlike the proposed algorithm, the other methods cosimulate the Gaussian random fields irrespective of any restriction, although each one takes into account the cross-correlation structures obtained from the linear model of coregionalization.

Reproduction of the contact analysis across the boundaries between rock types
Another way to confirm the validity of the proposed algorithm is to check whether the behavior of Iron and Silica close to the boundaries is reproduced. As explained above, this can be done by computing the correlation between Iron and Silica across the layout of the two adjacent rock types, Hematite and Itabirite. The resulting graphs show that the proposed algorithm is able to reproduce the abrupt transition in the mean grade near the rock type boundaries for both Iron and Silica grades compared to what was obtained in the original dataset (''Appendix 4''). This hard boundary is also assessed in the other three algorithms, and as expected, all of them displayed practically the same results as those obtained with the proposed hierarchical cosimulation algorithm. These types of outputs of boundary reproduction are expected when one applies a cascade simulation algorithm, which is more compatible with portraying the spatial distribution of grades in this Iron deposit than other techniques such as joint simulation Emery 2015, 2017;Maleki et al. 2016;Emery and Maleki 2019). The results of the other approaches are not provided, since all of them produce almost the same results in terms of the boundary behavior of grades.

Reproduction of the spatial correlation structure
In addition to the previous evaluation of statistical parameter reproduction, the spatial direct and cross-correlation structure can be assessed through the realizations in each rock type and compared with the experimental spatial continuity observed in the dataset. As seen in Fig. 15, the direct and cross-variograms for Iron and Silica are both reproduced well by the proposed cosimulation approach, with a slightly better performance for Hematite, particularly with respect to Silica. This is explained by the different influences of the neighborhoods in the two zones. The fundamental idea of the proposed approach, however, is based on sequential Gaussian simulation; therefore, attention must be given to the specification of the neighborhood in establishing the cokriging system in the cosimulation workflow, as explained earlier. Since there are more data for Itabirite, the screening data in the moving neighborhood tend to be more tangible in this rock type zone, which impacts the quality of the variogram reproduction for the simulated Silica grade. To improve the reproduction of the direct variogram for this variable, a more powerful resource computation method can be used, which leads to defining a unique neighborhood. To save space, the results of the other three approaches are not presented. The reason is that they provide almost the same outputs as the proposed approach in terms of variogram reproduction.

Postprocessing of the realizations
To postprocess the realizations and further discuss the capability of the proposed approach, the recovery functions are calculated. To do this, first, two series of cut-offs are calculated: ten cut-offs that introduce the deciles of the global Silica distribution and the declustered median of Iron. Second, a realization is selected, and the percentage of tonnage, mean grade and metal quantity for the blocks with simulated Iron greater than or equal to the median and with Silica less than each of the ten cut-offs are taken into account. Third, a loop is performed over 100 realizations, and then the average of the recovery functions is computed. Finally, this process is repeated for all the cosimulation algorithms (Fig. 16). In reporting the tonnage, the results show that the proposed approach produces almost the same results as two other conventional cosimulation algorithms and a slightly higher tonnage than projection pursuit multivariate transform, which produces the lowest tonnage. The same characteristics can be seen in the mean grade calculated for Iron, for which projection pursuit multivariate transform again yields the smallest values. This, however, is not the case for Silica in terms of the mean grade. The proposed algorithm yields the minimum grade values in each cut-off, while the projection pursuit  impacts the recovery functions, which subsequently affect the Net Present Value (NPV) of a mining project.

Discussion and conclusion
The cosimulation of cross-correlated variables is of paramount importance in geostatistical modeling in different disciplines in geoscience. The existence of complex bivariate relations among these variables and the ways of addressing them in the cosimulation workflow motivates one to use sophisticated data analysis approaches. In this study, an updated hierarchical sequential cosimulation algorithm based on multicollocated cokriging is proposed and combined with an acceptance-rejection sampling technique to cosimulate variables with inequality constraints. The tools and algorithms are provided to derive a linear function that imposes the inequality constraints on the cosimulation algorithm based on an optimization technique integrated with a robust regression function. In practice, the scatterplot of the two variables of interest largely implies the slope and intercept of this linear function. The benefits of this proposed hierarchical cosimulation workflow in comparison with other cosimulation algorithms, such as PPMT, conventional cosimulation with full cokriging and hierarchical cosimulation with multicollocated cokriging, are fourfold. First, the inequality constraint as a complex characteristic of the bivariate shape of the relation is perfectly reproduced. In this context, the algorithm requires the simulated values of the target variable to lie on the fitted line; therefore, no points should be scattered above that fitted line. Second, the global and local statistical parameters, such as the mean, variance, correlation coefficient, spatial continuity, and marginal distributions, can be reproduced. The last is illustrated in ''Appendix 4''.
Third, any sampling patterns, such as homotopic and partially and totally heterotopic patterns (Wackernagel 2003), can be fed as inputs to the algorithm without loss of generality, while the cross-correlation structure models are identified through either intrinsic or linear models of coregionalization, or even via Markov I & II. Fourth, in addition to reproducing any given inequality constraint in the bivariate characteristics, any continuous variables with any type of marginal distributions that show different ranges and skewness can be cosimulated.
The proposed tools and algorithms are applied to an Iron deposit in which Iron and Silica are cross-correlated, showing a linear complex relationship. This case study demonstrates the ability of the proposed model and algorithms to cosimulate the Silica and Iron grades and to reproduce the behavior of grades across rock type boundaries through a cascade simulation technique.
Despite these benefits, there are still some opportunities for further enhancement of the algorithm. The proposed algorithm is based on a hierarchical approach. Therefore, when dealing with three or more cross-correlated variables with inequality constraints, the algorithm can be updated to simulate the variables in a stepwise manner, whereby each variable can be simulated in turn and then used as secondary data for simulating the next variable. In this respect, a full linear model of the coregionalization can facilitate the inference of the cross-correlation structure. The sequential simulation paradigm and its integration with the acceptance-rejection technique increases the computation time compared to conventional cosimulation approaches. To address this, a GPU-based parallel coherence cosimulation algorithm under the CUDA framework (NVIDIA 2017;Yu et al. 2019) or GPU-shared memory (Fang et al. 2020) can be developed to accelerate the computational process. If the transition of the cross-correlated grades across the layout of the rock types is soft, the joint simulation algorithm (Dowd 1994(Dowd , 1997Emery and Silva 2009;Emery 2015, 2017;Talebi et al. 2017) can be tailored to the proposed hierarchical cosimulation algorithm to simulate the rock types first by pluri-Gaussian simulation and then using the simulated value at the corresponding grid node as the secondary data for the simulation of grades with inequality constraints. This technique entails the reproduction of the cross-correlation structures that inherently exist among rock types and grades. One of the legitimate criticisms of sequential Gaussian cosimulation is related to the quality of realizations that strongly depend on the definition of a predefined neighborhood (Emery 2004;Paravarzar et al., 2015). This, however, can be resolved by substituting a sequential simulation for the turning band simulation or cosimulation (Emery 2008).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.

Appendix 1: Huber loss function as a robust regression analysis method
In statistics, robust regression belongs to the family of regression analysis that seeks an optimal relationship between one or more independent variables and a dependent variable. The main purpose of robust regression is to provide resistant results in the presence of outliers by decreasing their influence in the derivation of regression parameters (Andersen 2008). Among other techniques, we propose using the Huber loss to find the slope and intercept of the regression line (Huber 1964): The error should be small and depends on a hyperparameter, d, which can be tuned. Therefore, L d a ð Þ is quadratic for small values of a and linear for large values, with equal values and slopes of different sections at the two points where a j j ¼ d. The a refers to the residuals that are needed to find the difference between the observed and predicted values a ¼ y À f x ð Þ so that the following expansion can be obtained (Trevor et al. 2009 Appendix 2: Acceptance-rejection method In the original version of this technique, the goal is to simulate a random variable X with probability density function f x ð Þ (Bishop 2006). Let us assume that another density function g x ð Þ is known and that there is a constant value C such that: 8x; f ðxÞ CgðxÞ The acceptance-rejection technique consists of generating points X; Y ð Þ that are uniformly distributed under the curve C Â g x ð Þ by setting: • X as a random variable with probability density g x ð Þ • Y ¼ U Â C Â g x ð Þ with U a uniform variable in 0; 1 ½ .
Then, one removes the points X; Y ð Þ that are not under the curve f x ð Þ. It can be shown that the abscissa X of the nonrejected points follows the target distribution with density f x ð Þ. To restrict the number of attempts, it is advisable to choose a constant C that is as small as possible.
Appendix 3: ''Von Neumann'' acceptancerejection approach The algorithm for Von Neumann acceptance-rejection sampling is as follows: • Find the cumulative distribution function F x ð Þ. • Find the value x max such that F x max ð Þ ¼ 1 À 1 N , where N is the size of the desired realization.
• Find f max such that f x ð Þ\f max for all 0\x\x max , or if it goes beyond the maximum, the neglected probability is less than 1=N.
• Generate a random point (x i ; f i Þ such that 0\x i \x max and 0\f i \f max . • Evaluate the distribution function f x i ð Þ. If f i \f x i ð Þ, accept f i ; otherwise, reject it.
• Continue iterating the previous two steps until N values x i have been found.

Appendix 4
See Figs. 17,18,19,20,21,22,23,and 24. Fig. 17 Lagged scatter plot for a separation distance less than 10 m (omnidirectional calculations). The low correlation coefficient shows that there is no dependency between Iron and Silica in Hematite and Itabirite, respectively, at short distances  For brevity, only the contact analysis of the proposed cosimulation approach is illustrated Fig. 24 Histogram reproduction of the simulation results obtained from the proposed hierarchical cosimulation algorithm. Left: original declustered distribution; right: realization #1