Introduction

A mineral prospectivity model defines and delineates areas favorable for a mineral deposit type of interest. Geographical information systems (GIS) provide a robust platform with many applications to integrate digital map data into a map that defines mineral potential of an exploration terrain. According to Bonham-Carter (1994), we can divide the different prospectivity mapping techniques into two main approaches: (1) empirical or data-driven and (2) conceptual or knowledge-driven. For empirical models, we need known examples of mineral deposits within the study area. Data-driven techniques include, e.g., weights of evidence, logistic regression and artificial neural networks (Bonham-Carter 1994). In turn, a conceptual prospectivity model can be constructed without prior knowledge of existence of mineral deposits. Knowledge-driven-techniques include, e.g., fuzzy logic, evidential belief functions, the Dempster–Shafer models and decision tree approaches (Carranza 2008). The combination of these two approaches has been conducted earlier in several cases by incorporating fuzzy logic with other techniques in order to avoid loss of information (e.g., Brown et al. 2003; Porwal et al. 2003, 2004, 2006; Nykänen et al. 2008). It has also been previously observed that hybrid models are more commonly used than purely empirical or purely conceptual models (Hronsky and Groves 2008).

This paper combines empirical and conceptual approaches by using a statistical technique called receiver operating characteristics (ROC) method to optimize the process of rescaling input datasets and the process of data integration when using fuzzy logic technique for prospectivity mapping. The proposed dynamic, hybrid approach, aims at updating the models by adding new data during an exploration related modeling project. Moreover, through the developed approach a classical conceptual or knowledge-driven fuzzy logic method can be transferred into a data-driven method. In comparison with the data-driven weights-of-evidence (WofE) method, the advantage of the fuzzy logic method is that there is no need to classify the data into a limited number of classes thereby avoiding loss of information due to the generalization required by, e.g., WofE (e.g., Nykänen et al. 2008; Yousefi and Nykänen 2016).

Study area

The Peräpohja Belt (PB) is located in the Northern Fennoscandian Shield, adjacent to the Arctic Circle (Fig. 1). The PB comprises a greenschist to amphibolite facies, complexly deformed supracrustal sequence of variable quartzites, mafic volcanic rocks and volcaniclastic rocks, carbonate rocks, black shales, mica schists and graywackes ca. 2.44 to ~1.91 Ga (Fig. 2). These formations were deposited on the Archean basement and 2.44 Ga layered intrusions, during the multiple rifting of the Archean basement (2.44–1.92 Ga). Later intrusive units comprise 2.20–2.13 Ga gabbroic sills or dikes and 1.98 Ga A-type granites. Metamorphism and complex deformation of the PB took place during the Svecofennian orogeny (1.9–1.8 Ga) and was followed by intrusions of post-orogenic granitoids (1.81–1.77 Ga) (Perttunen and Vaasjoki 2001; Ranta et al. 2015). The belt is characterized by fold-and-thrust belt style deformation leading to numerous laterally continuous, doubly plunging fault-bound synforms and antiforms (Piippo et al. 2015). The map-scale structures may be linked to N–S compression, with later orthogonal overprint of less-significant magnitudes (Lahtinen et al. 2015). The recent mineral exploration activities have recognized several gold-bearing mineral occurrences within the PB. For example, the Rompas-Rajapalot Au mineralization occurs in an area of approximately 80–100 km2 in the northern part of the PB (Fig. 2). In this area, the high-grade gold zones are hosted by calc-silicate-bearing albitites, carbonate-rich rocks, quartzites and amphibolites of volcanic origin in the vicinity of 1.78 Ga tourmaline-rich, undeformed, post-orogenic granite stocks and pegmatites, as well as migmatites with similar leucosome ages along the southern boundary of the Central Lapland Granitoid Complex (Ranta et al. 2015, 2016; Vanhanen et al. 2015). Widespread occurrences of carbonaceous material-rich sulfidic schist also characterize this area. Native gold is confined to sulfide-rich disseminations and thin (1–30 cm) quartz-tourmaline veins in Ca-depleted and Mg–Fe-rich zones of antophyllite and cordierite-bearing rocks with intense chlorite, biotite, tourmaline and talc alteration in the eastern part of the area (e.g., the Palokas prospect; Ranta et al. 2016). In the Rompas prospect, in the western part of the area, coarse-grained visible gold locally occurs in very high grade (up to >10,000 g/t Au), sulfide-poor pockets at sites where late brittle fractures with carbonate infillings cut coarse-grained uraninite and pyrobitumen in metamorphosed and strongly folded calc-silicate veins hosted by amphibolite (Molnár et al. 2016a). Boron isotope data from tourmaline-rich alteration zones in various parts of the Rompas-Rajapalot area and from tourmaline granites (Ranta et al. 2016) and from the Rompas area (Molnár, unpublished data) suggest to a uniform, most probably magmatic origin of hydrothermal fluids (Ranta et al. 2016). In concordance with this hypothesis, Re–Os dating of molybdenite from Palokas and Pb–Pb model ages from the Rompas area indicate 1.78 Ga as the age of mineralization (Molnár et al. 2016a, b).

Figure 1
figure 1

Location of the study area. Generalized bedrock map is modified from Koistinen et al. (2001)

Figure 2
figure 2

Bedrock map of Peräpohja Belt based on Bedrock of Finland—DigiKP (2016)

Exploration model and datasets

The first step in a prospectivity modeling project is the definition of the exploration model that will be used to select relevant data for the modeling. It may be based on descriptive or conceptual mineral deposits models, or on practical exploration expertise (Nykänen et al. 2015). A classical mineral systems model (Wyborn et al. 1994; Knox-Robinson and Wyborn 1997; Hronsky and Groves 2008; McCuaig et al. 2010) would include data that can be used as proxies for three key elements of a mineral systems model (e.g., McCuaig et al. 2010; Kreuzer et al. 2015): (1) source of metals, fluids and heat; (2) transport channels or conduits; (3) physical, chemical, mechanical or other traps; (4) deposition; and (5) preservation. Here, we chose to use a combination of the mineral deposit model and the practical exploration model. This method can be called as a Venn-diagram approach (cf. Hronsky and Groves 2008), in which we identify areas where a number of critical parameters in the targeting model intersect. We selected appropriate data on the basis of their expected exploration importance (Table 1) but we also emphasize their linkage to various geological processes that can be connected to a mineral systems model (e.g., Groves et al. 1998, McCuaig and Kerrich 1998, Goldfarb et al. 2001; Nykänen et al. 2008) relevant to the observations at Rompas and Rajapalot. This model emphasizes the significance of late- to post-orogenic granite intrusions (sources of heat) and carbonaceous material and sulfide-rich metasedimentary rocks (possible sources of metals), as well as post-peak metamorphic brittle structures (transportation pathways of gold-bearing fluids) and occurrences of reactive rock units (Ca–Mg–Fe-silicate rocks) and U-anomalies (traps). The geochemical data reflect sources of metals by indicating either presence of the mineralization system itself or the alteration halo around mineral deposits. The structural interpretations and geophysical data mostly reflect the presence of transport conduits of mineralizing fluids (faults and deformation zones) and also possible structural traps for deposition of gold (antiforms). By expressing and combining these parameters in GIS as digital maps, we can highlight areas where most of these criteria meet.

Table 1 Exploration criteria and input data

The datasets used for this study include derivatives from a 1:200,000 scale geological map, high-resolution airborne geophysics, ground gravity survey and regional till geochemical survey. Interpretation of the main geological structures based on airborne geophysics and a preliminary structural synthesis of field mapping (Piippo et al. 2015) was used as evidence layers for prospectivity modeling. The included features are: (1) major deformation zones, which demarcate PB-scale structural discontinuities associated with major abrupt transitions in the structural geometry and/or lithology of the supracrustal sequence; (2) inferred major deformation zones interpreted as blind continuations of the recognized zones; (3) large faults typically associated with asymmetric thrust-style folds; and (4) traces of map-scale antiforms, which were recognized from the dip data and from the patterns of the lithological maps (Fig. 2). We calculated a proximity grid with 50 m × 50 m cell size for these structural interpretation maps.

Finland has been covered by high-resolution airborne geophysical surveys flown at an average 40 m terrain clearance and 200 m line spacing (Airo 2005). The airborne system comprised magnetic, electromagnetic and gamma radiation sensors, which simultaneously measured data during flight. In the current study area, the airborne surveys were carried out between 1977 and 2004 with the flight altitude from 31 to 38 m. The flight direction was E–W or N–S.

The airborne magnetic data acquisition system used either a proton (1977–1991) or cesium (1992–2004) magnetometers installed on the rear boom of the aircraft. Secular and other magnetic field variations were corrected according to data from a base-station registration. The line data were levelled using an in-house program, and final levelling was performed using tie lines. The frequency domain electromagnetic (EM) system measured in-phase and quadrature components using the transmitter and receiver coils located either in both wingtips of the aircraft (Twin Otter and Cessna) or, in the case of DC-3 during 1970’s, in the nose and tail. In-phase and quadrature components can be further transformed to the apparent resistivity values (Pirttijärvi 1995; Hautaniemi et al. 2005). In this study, apparent resistivity values calculated from 3.1 kHz EM data were used. The radiometric measurement unit consisted of a set of NaI detectors and a gamma-ray spectrometer (Airo 2005). The uranium window of the gamma-ray spectra was used in this study. Low resistivity zones and low magnetic zones were considered regional-scale alteration patterns following the reasoning of Nykänen and Salmirinne (2007) and Nykänen et al. (2008).

The gravity dataset has been collected during 1945–1996 by the Finnish Geodetic Institute with a point density of 1 observation per 2.5 km2 (Kiviniemi 1980; Kääriäinen and Mäkinen 1997). This regional gravity dataset mainly reflects large intrusions, crustal-scale structures, lithological units, fault- and shear zones (Nykänen et al. 2015). For this study, we used the locations of maximum values of spatial gravity gradients that can be obtained using the multiscale edge detection technique (Hornby et al. 1999), whereby so-called gravity worms were created (Lahti et al. 2014). The edge detection is automatically repeated at different upward continuation levels that model theoretical surveys performed at different heights above ground level.

The regional till geochemical survey was conducted in the 1980s (Salminen 1995) with a sampling density of one sample per 4 km2. The sampling was performed with a portable percussion drill equipped with a throughflow bit. The samples of chemically unaltered parent till were collected as a composite of 3–5 sub-samples, from an average depth of 1.5 m. The samples were dried and the <0.06 mm fraction sieved for analysis. A hot aqua-regia digest was used and Al, Ba, Ca, Co, Cr, Cu, Fe, K, La, Li, Mg, Mn, Mo, Ni, P, Pb, Sc, Sr, Th, Ti, V, Y, Zn and Zr were determined with ICP-AES. In addition, Au, Te and Pd were analyzed with graphite furnace AAS (Kontas 1981; Kontas et al. 1990). Distance of glacial transportation is less than 1 km within the current study area according to Sarala (2015) based on glacial terrain type on regional scale.

Fuzzy logic data integration and ROC validation

The fuzzy logic technique, based on fuzzy set theory introduced by Zadeh (1965), is a flexible method for emulating the decision-making process during mineral exploration in various scales from global to target scale. The membership value of a fuzzy set is defined on a continuous scale from full membership to full non-membership (e.g., from prospective to non-prospective or favorable to non-favorable), and can be expressed as follows: a fuzzy set of A is a set of ordered pairs:

$$ A = \left\{ {\left[ {x,\mu A\left( x \right)} \right]|x \in X} \right\} $$
(1)

where X is a collection of objects and µA(x) is the membership function of x in A, which can be linear or non-linear. This means that µA(x) defines the degree of membership of x in A.

As mentioned earlier, a prospectivity mapping analysis starts by defining an exploration model, which can be based on a mineral systems model (e.g., McCuaig et al. 2010) that will form the basis for the selection of evidential (supporting) datasets or, on the other hand, a simple exploration model can also be based on practical exploration expertise (e.g., Nykänen et al. 2015). In the second phase in prospectivity mapping analysis, the selected data are preprocessed into meaningful map patterns, which can be called anomaly or evidential maps. In this study, a set of map data was first selected based on previous studies on orogenic gold prospectivity in Central Lapland (Nykänen and Salmirinne 2007; Nykänen et al. 2008) and modified to fit the intrusion related model proposed here. Then, each map was rescaled from zero (non-prospective to one (prospective) based on subjective expert opinion. The geochemical and geophysical datasets described in Table 1 were rescaled using the following fuzzy membership function [Eq. 2], adopted from Tsoukalas and Uhrig (1997):

$$ \mu (x) = 1/(1 + (x/f_2)^{( - f_1)} ) $$
(2)

where f 1 = spread (range from 1 to 10) and f 2 = midpoint (range from min to max of input data). We used negative (ascending) spread value for the apparent resistivity, magnetics and the proximity to structures, and positive (descending) spread value for the geochemical data. The spread parameter defines the shape of the function and midpoint defines the fuzzy membership value of 0.5 within the input data range. Spread was set at 3–8 so the functions had moderate steepness. By changing the spread value, it is possible to create several input maps that can be tested in the data integration process. For a midpoint value, we selected to use the mean value of each dataset.

When using the kind of logistic function, as Eq. 2, there is no need to reclassify the input data into discrete classes as integer grids but we can use continuous values as floating point grid format (e.g., Nykänen et al. 2008; Yousefi et al. 2012, 2014; Nykänen et al. 2015; Yousefi and Nykänen 2016).

The rescaled evidential maps listed in Table 1 were integrated using a variety of fuzzy operators (Bonham-Carter 1994) to produce a single prospectivity map. This map defines the most favorable areas for gold exploration taking into account the selected input maps selected and processed by an expert. The model is documented in a flowchart describing the datasets and the operators in Figure 3. We used two different operators. (1) The “fuzzy AND” operator is equivalent to logical intersection in GIS terminology. It returns the minimum value of the inputs in each location and is also called as “minimum-operator” (Bonham-Carter 1994). The “fuzzy AND” operator restricts the most favorable areas into narrow zones where all the selected element concentrations are considered as favorable. This is useful with the sparse geochemical data but can also lead into false negative results due to spatially incomplete data. (2) The “fuzzy gamma” operator is combination of “fuzzy algebraic sum” and “fuzzy algebraic product” (Bonham-Carter, 1994). The gamma value used in this paper varied from 0.6 to 0.9.

Figure 3
figure 3

Fuzzy logic model flowchart

Then, we used the ROC technique (Obuchowski 2003; Fawcett 2006) for statistical prospectivity model validation. Previous studies (e.g., Robinson and Larkins 2007; Nykänen et al. 2015) have demonstrated that the ROC technique can be used for spatial predictive model validation. This requires that an adequate number of known examples of the deposit type in question exists. The locations of known deposits represent “true positive” sites. In addition, a set of “true negative” sites representing areas where no mineral occurrences are found is required to generate the receiver operating characteristics (ROC) curves. The locations of the active exploration sites within Peräpohja Belt were used to validate the model results. We used the location of 29 gold exploration drilling sites and 23 exploration license areas totaling 52 sites for validation as true positive sites. Furthermore, to avoid bias due to number of either true positive or true negative sites, we used a number of random points generated within the study area equal to the number of true positive sites. Nykänen et al. (2015) suggested using the locations of other deposit types or randomly selected locations within the study area. In this paper, we propose that instead of using a single set of random points, it is preferable to test the effect of randomization by generating several sets of random points and calculate the area under a ROC curve (AUC) values for each set. Therefore, we generated 10 sets of random points. The resulting mean, median and standard deviation (std) values for these are reported in Table 2.

Table 2 Area under a ROC curve (AUC) values for evidence data and output models (N = 10)

A ROC curve (Fig. 4) is a plot of the sensitivity (true positive rate: TP/(TP + FN)) on the y-axis compared to 1–specificity (false positive rate: FP/(FP + TN)) on the x-axis; where TP is true positive, FP false positive, TN true negative and FN false negative. The AUC can be used as a measure of the accuracy of a diagnostic test and can also be used to measure the performance of a spatial predictive model, as in this paper. The AUC values may vary from 0 to 1, with an AUC value of 1 indicating the result is perfectly accurate having a sensitivity value of 1 and a 1–specificity value of 0. A totally random model would result in an AUC value of 0.5 and the curve would follow the chance diagonal. The ROC curve and AUC calculations were made by using an in-house built Python code. The integrated data, geochemistry (Fig. 5a), geophysics (Fig. 5b) and structures (Fig. 5c) result in higher AUC values than most of the input datasets individually. Geophysical evidence maps in general are not performing as well as the geochemical and geological evidence maps (Table 2).

Figure 4
figure 4

Example of ROC curves for fuzzy models

Figure 5
figure 5figure 5

(a) Combined geochemistry (AUCmed = 0.80 ± 0.05), (b) combined geophysics (AUCmed = 0.74 ± 0.04), and (c) combined geology (AUCmed = 0.83 ± 0.04)

Results

As mentioned earlier, for the fuzzy combinations we used two different operators: fuzzy AND and fuzzy gamma. The minimum operator Fuzzy AND, which is equivalent to logical intersection of the input sets, returns always the minimum value of the input sets and it can be considered as a “conservative” choice. The fuzzy gamma operator can be considered as a flexible compromise between the “increasive” fuzzy algebraic sum and “decreasive” fuzzy algebraic product operators (Bonham-Carter 1994). The optimal gamma value for each fuzzy gamma combination was chosen by selecting the one that returns the highest AUC value in ROC analysis.

For simplification and allowing comparability, we have classified the resulting fuzzy combination maps into 10 classes using equal intervals from 0 to 1. This enables visual comparison between the results in addition to the numerical validation AUC values given by the ROC analysis.

The geochemical evidence dataset was used to describe the presence of mineralizing processes either as direct observation of gold, a pathfinder element Te or metals reflecting presence of sulfide minerals (Fe, Cu, Co and Ni). In addition, K was used as a proxy to potassic alteration zones (Table 1). The fuzzy combination of geochemical evidence (Fig. 5a) was achieved using fuzzy gamma operator with gamma value of 0.75 for combination of Au–Te and Fe–Cu–Co–Ni (Fig. 3). The final combination used a gamma value of 0.6. The AUC values for geochemical evidence are ranging from 0.63 (Fe) to 0.84 (Te) (Table 2).

The geophysical evidence datasets were used as proxies for alteration zones and structures that can be considered as physical traps for mineralization. We selected electromagnetics, gamma radiation, magnetic field total intensity and gravity. The fuzzy combination of geophysics (Fig. 5b) was achieved by combining these maps using fuzzy gamma operator with gamma value of 0.85. All the geophysical evidence has relatively low AUC values below 0.70 (Table 2). However, the AUC value for fuzzy gamma combination of these is 0.76, which can be considered as a moderate result.

The geological evidence used in this study included only the structural interpretations, which are based on field observations and high-resolution airborne geophysics. These structures were classified into two main groups: major structural discontinuities (faults and deformation zones) and antiforms. These can be considered to act either as pathways or traps for the mineralizing fluids during the ore forming processes. We used fuzzy gamma operator with gamma value of 0.75 to combine these datasets (Fig. 5c). The AUC values were ranging from 0.69 to 0.76 for the inputs and the combination of them all return AUC value of 0.83.

The final prospectivity maps were created using fuzzy gamma and fuzzy AND operators. We used gamma value of 0.7. Both of these models return high AUC value of 0.86 and the standard deviation only 0.02. The current exploration targets coincide with the high favorability areas and are fairly localized in a small area (Fig. 6).

Figure 6
figure 6

(a) Fuzzy AND (AUCmed = 0.86 ± 0.02) and (b) fuzzy gamma (AUCmed = 0.86 ± 0.02)

Discussion

The validation of a spatial predictive model is crucial, and it is important to understand the performance and the level of confidence of the model. When dealing with a data-driven technique, we have to leave out a portion of the training sites to be used as validation sites. For validation of knowledge-driven models, we can use all the known occurrences if they exist. The demonstrated validation techniques include cross-validation (Agterberg and Bonham-Carter 2005; Chung and Fabbri 2008; Fabbri and Chung 2008), jack-knifing (Bonham-Carter 1994; Nykänen and Salmirinne 2007) and ROC validation (Robinson and Larkins 2007; Nykänen et al. 2015). The main difference between these validation techniques is that the ROC technique is not only considering the true positive sites (known deposit sites) but also the true negative sites (known not-deposit sites). It is a great challenge to define true negative sites for a spatial model. Nykänen et al. (2015) suggested using random points to represent the true negative sites and we have modified this approach by running the ROC analysis for 10 successive sets of true negative sites with identical set of true positive sites. This gives us an estimate of the variability caused by the use of random points. The level of this variation is from 2 to 8% in AUC values. The lower variation occurs within the combined models and the higher variation within the input datasets. It seems that, even though the input data might have low AUC values, their fuzzy combinations result in higher AUC values. This suggests that, even though an evidence dataset does not have clear spatial association with the known deposits, it can be used in a fuzzy combination together with other evidence successfully. Therefore, a pure data-driven approach like WofE might have neglected such evidence data. In addition, the knowledge-driven approach allows testing of different mineral systems models in a flexible manner without being biased due to use of training sites.

The use of the mineral system concept as a basis for mineral prospectivity model is also challenging. According to McCuaig et al. (2010), the main challenge is that the geological processes cannot be mapped; only the results of these processes and these need to be translated into mappable features in the geoscience datasets. So, the geoscience map data are actually used as proxies for the mineral systems parameters and eventually the geological and ore forming processes. A geological map is essentially a model, i.e., a generalization of the real world. This also causes uncertainty to prospectivity models when data derived from a geological map are used as an input to prospectivity modeling. Lisitsin et al. (2013) postulate that when critical processes related to a mineral systems model can only be recognized indirectly by proxy, there is an additional uncertainty of the representativeness of the proxies expressed by evidential maps. Furthermore, to be able to assess the uncertainty, Lisitsin et al. (2013) propose the method of probabilistic fuzzy logic prospectivity modeling and they used Monte Carlo simulations to assess uncertainty of model outputs. The current paper proposes another approach, which is using ROC or other validation technique to ensure the spatial association between the evidential data and the deposit type in question. This is, however, impossible if the known examples do not exist within the study area in a greenfields exploration terrain.

There are vast areas that can be considered as favorable for the Rompas-Rajapalot-type gold deposits within the current study area according to the prospectivity models presented in this paper, but are nevertheless lacking active mineral exploration. These areas might include new exploration targets or alternatively, if these remain barren, we have missed some key parameters in the model that restrict the location of gold deposit near the current exploration camp of Rompas-Rajapalot. One possible missing key parameter can be the post-orogenic granitoids suggested by Molnár et al. (2016a). These granitoids, however, are not properly or accurately depicted in a present geological map and remain enigmatic until new relevant data have been acquired either by drilling or geophysical surveys and appropriate data processing and modeling have been conducted. However, in this paper, the potassic alteration zones defined by K anomalies in till geochemistry together with U radiation anomalies can be considered as proxies for these granitoids.

Conclusions

The Au-U mineralization at Rompas is hosted within deformed and metamorphosed calc-silicate veins enclosed within mafic volcanic rocks and contains both U-bearing zones without gold and very high-grade (>10,000 g/t Au) gold pockets with uraninite and uraninite-pyrobitumen nodules. The disseminated Au mineralization at Rajapalot (8 km east of Rompas) is confined to Fe-rich Ca–Mg–silicate (skarn) rocks. The exploration criteria based on a mineral system concept were translated into a fuzzy logic prospectivity model including the following datasets:

  • data derived from regional till geochemistry (Fe, Cu, Co, Ni, Au, Te, K);

  • high-resolution airborne geophysics (magnetic field total intensity, apparent resistivity, gamma radiation);

  • ground gravity; and

  • regional bedrock map (structures).

The current exploration targets for gold were used as the examples of known mineral occurrences to validate the knowledge-driven mineral prospectivity model using ROC validation technique. Final prospectivity maps presented in this paper define well the known Rompas-Rajapalot-type gold occurrences with the AUC scores above 0.8. The ROC technique using random points as true negative sites is a suitable validation technique for spatial models and can be used in model optimization.