1 Introduction and related work

Surveying tasks require important technical and economic resources for the effective and accurate development of research activity and successful implementation of management plans. The use of geographic information systems (GISs) has greatly contributed to simplifying the mapping process in a wide variety of fields (archaeology, [1]; environmental sciences, [2]; engineering, [3]; geology, [4]; risk assessment, [5]). However, the quality of the final product often depends on the expertise and technical skills of the surveyor. In addition, digital image surveying involves a certain degree of subjectivity depending upon different intrinsic factors, such as image quality, panchromatic hue and resolution. Recent advances in augmented reality (AR) have considerably reduced the human factor, as the interpretation of geospatial data can incorporate a 3D graphical representation of datasets [6,7,8]. Nevertheless, these techniques only simplify the viewpoint of the surveying feature without providing any information about the morphological expression of the object.

Fig. 1
figure 1

Summary of the methodology followed in this experiment

Growing interest in combining GIS and AR methodological approaches has led to the implementation of surveying methods, providing outstanding results in a wide variety of contexts and disciplines. Hence, various research fields, prominently including spatial analysis, archaeology, mining, geology, environmental sciences, civil engineering, and education, have benefited from reliable maps of landforms, features and processes [9,10,11,12,13,14]. Nonetheless, the potential of these methods still has a reduced impact on the surveying results since the implicit subjective factor of identification and recognition of interesting features remains unaddressed.

For instance, the detection of objects based on images is an important application using deep learning techniques. The precise multistage detection of small objects in target regions [15], while complex, continues to be a challenge in the field of computer vision since the orientations are arbitrary, the scales vary, and the shapes are irregular. Advances in data acquisition technologies, such as the appearance of unmanned aerial vehicles (UAVs), have potentially raised interest in various fields in solving problems that were previously carried out manually [16, 17]. Now, it is possible to capture a large number of high-resolution images that can be rapidly preprocessed and analysed using deep learning techniques [18]. For instance, the identification and description of archaeological and mining patterns makes it possible to simplify the study, identification and mapping of elements of interest in these fields [19,20,21,22,23].

The implementation of methods based on a bidirectional multibranch pyramidal network (BPMFPN) for the detection of objects in real time by UAVs has been demonstrated as robust to scale changes [24,25,26].

The common use of different models such as Inception-v3, ResNet-50, VGG-16, VGG-19 and Xception to solve problems related to crops and plants [27] reaches a high accuracy of up to 93.82%, suggesting a promising future for this type of work [19, 28,29,30]. Similar examples in this field have involved the use of a new decision tree (DT) combining texture features (mean and variance) and spectral features (TS-DT) for rice plant detection from UAV images [31]. Many other examples in agriculture have recently been developed [32,33,34] using UAVs, spectral–spatial techniques and artificial intelligence [35], and these techniques have also been studied in residential environments by Lin et al. [36].

New advances have been made in the analysis of anthropic features for the identification of linear structures such as roads [37] and other elements by means of automatic detection from blurred images [38] or RGB-D maps [39].

However, in the last few years, the implementation of neuronal networks and UAV images in the mining field has grown [24]. Studies on this topic have also aimed at the identification of defects in rocks [40] and present-day mining areas based on unsupervised classification algorithms [41]. However, until now, no general studies on ancient mining, in particular Roman gold mining, have been developed for automatic detection using UAV images. In this paper, we explore the potential of deep learning to reduce the uncertainties produced during the exploration and mapping of ancient mining landscapes in NW Iberia. A method capable of automating the process of classifying different ancient and recent features has been applied to the identification of Roman gold mining remains. This methodological approach can help archaeologists, geomorphologists, and geologists identify mining patterns such as gold mines and hydraulic infrastructure and is able to differentiate them from other present-day anthropogenic elements such as roads and tracks based upon images captured with UAVs by means of artificial intelligence techniques. In our case, a dataset with old mining elements is created, and the main task is to develop an automatic detection method for these elements (crossroads, gold mines, canals).

The proposed model is capable of identifying and recognizing a pattern by alerting the user to the existence of a related pattern. The potential of this methodology relies on the rapid, cost-effective, and accurate surveying of archaeological and mining elements, which will contribute to facilitating the analysis often performed manually by scientists. A graphical abstract with the main steps of this project can be seen in Fig. 1.

Therefore, this paper makes several significant contributions to the field of mining research and cultural heritage preservation. First, we introduce a novel approach for identifying mining remains from UAV-derived images using convolutional neural networks (CNNs). This approach offers several advantages over traditional methods, including the ability to process large regions quickly and accurately, even in areas that are difficult to access. Convolutional neural networks are a type of deep learning neural network algorithm that is particularly well suited for image recognition tasks. They are capable of extracting complex features from images, such as edges, corners, and textures, and can learn to recognize patterns and objects in images with high accuracy, outperforming classical feature descriptors.

Second, we demonstrate the effectiveness of our approach through experiments that show recognition results close to 95% accuracy. Third, we create a new image dataset by obtaining UAV images, which can be used for future research in this field. Finally, we highlight the potential of artificial intelligence in this field and the implications of our work for the protection and study of cultural heritage sites. We believe that our contributions have significant implications for the field of mining research and cultural heritage preservation, and we hope that our paper will inspire further research in this area.

This paper is organized as follows. The research aim of this paper is highlighted in Section 2. An introduction to Roman gold mining infrastructure is presented in Section 3. The methods and materials are detailed in Section 4. In Section 5, the experiments carried out are explained, and the potential of artificial intelligence is supported in Section 6. Finally, Section 7 gives the conclusions.

2 Research aim

The main objective of this work is to automatically identify the remains of ancient Roman mines and ancient water channels, differentiating them from roads and paths using UAV-derived images and deep learning techniques. The advantages provided by this automatic process are mainly the speed of computing large regions, including areas that are difficult to access, and the greater precision when classifying the elements due to the aerial perspective offered by UAVs. The techniques used correctly label 94.66% of the images evaluated, so they will allow archaeologists, geographers and geologists to improve the identification of new areas of cultural heritage for future study. The main goals of this project are as follows:

  • Development of an automatic system for the detection of ancient Roman remains.

  • Rapid mass detection capability using UAV imaging.

  • Differentiation between Roman canals and dirt roads.

  • Analysis of land with difficult road access.

Fig. 2
figure 2

Patterns with crossroads (top) and without crossroads (bottom)

3 Roman gold mining infrastructure in NW Iberia

NW Iberia has one of the largest Roman gold mining complexes in the world, and most of its activity occurred between the 1st and 3rd centuries A.D. Its importance seems to be supported by some relevant Latin authors, such as Pliny the Elder, who wrote about this region (mostly Asturias, Galicia and León), stating that during this period, over 6.5 tons of gold were obtained. The study area is located in the Province of León and comprises an outstanding hydraulic system of channels and water tanks used for exploitation of auriferous materials [42]. The channel infrastructure in this sector exceeds 1,200 km [43]; the longest feature known is over 140 km, and it supplied water to the popular mine of Las Médulas, declared a World Heritage Site in 1997 [44, 45].

Channels were excavated in rock using iron chisels and peaks when brittle (i.e., slate rock), while hard rock was excavated using fire (i.e., quartzite). In general, the channels were built in a standard pattern across their length, assuming average slopes below 1% and widths of 0.3-0.5 m in hard rocks or areas with scarce water resources across the upper exploitation stretches; widths over 1.20-1.50 m in brittle rocks and/or middle exploitation stretches; and widths over 3 m in sedimentary rocks or soils located in the lower sectors of exploitation, also known as leats [43]. Recent works carried out by Andrés Bercianos et al. [46] in the nearby area of Zamora estimated an average flow rate of 0.2-0.4 \(m^3/s\) considering water depths < 0.35 m. Some of these elements of the hydraulic infrastructure are still recognizable in the landscape, especially in remote mountainous areas that are difficult to access, where no recent anthropic impact due to farming or other human activities is recorded [47].

The channel selected in this study comprises a representative sector of the main hydraulic infrastructure that supplied water to the Las Médulas mine. It is known as the Peña Aguda channel and comprises a nearly 43 km structure with varying widths between 0.9 and 1.20 m across its length. This channel was excavated in slate rock, which preserves the notches of the chisels on its walls. Water was collected from the Truchillas River, close to the Natural Monument of Truchillas Lake in the La Cabrera Mountains and the nearby Teleno Mountains.

Different mining techniques were implemented to exploit the gold-bearing deposits. Thus, depending upon the geological characteristics (i.e., primary or secondary gold-placer deposits), the extraction technique and exploitation system were modified. This led to a complex infrastructure of mining features that are still recognizable in the landscape. Among the most remarkable structures, according to their geometrical and picturesque morphology, the convergent furrows stand out. This is a system that has been widely developed in areas with reduced slope and material thickness but large size. This system was established in secondary gold-placer deposits broadly represented throughout the northwestern Iberian Peninsula (León, [48]; Zamora, [49]; Salamanca, [47, 50]).

The case study of convergent furrows in Castrocontrigo features one of the most extensive systems of mines, occupying a total extent that exceeds 5 km in length and a surface over 195 ha. It was first studied by Sáenz and Vélez (1974) [51] and was recently explored using LiDAR technology by Fernández-Lozano et al. [48]. Recently, the implementation of UAVs and visual enhancement techniques allowed Fernández-Lozano and Gutiérrez-Alonso (2016) [52] to improve the identification and description of different mining elements. The Vallico-La Escoba mine, located in the easternmost sector of the Castrocontrigo exploitation site, constitutes an outstanding representative feature of this type of mining infrastructure. Therefore, it can be considered a mining analogue for analysis due to its reduced dimensions (18 ha) and good preservation.

Fig. 3
figure 3

Pattern of Roman water channels

Fig. 4
figure 4

Roman gold mining pattern

4 Methods and materials

4.1 Dataset

The drone used for this project is a rotary-wing vehicle with 4 engines and vertical take-off. The aircraft was remotely controlled or totally autonomous depending on the different types of flights that were carried out. To obtain the dataset, three flights were carried out, one for each pattern that we aimed to detect in this work. To control these flights, the Pix4DCapture mobile application was used, which allows us to select the parameters that interest us for each flight. As the elements to be recognized are crossroads, Roman water channels and Roman gold exploitations, three flights were carried out with different characteristics that are described below.

4.1.1 Capturing images with a crossroad pattern

This flight was made on November 11, 2020, in the area of Villimer, a village located east of León city. In this case, 2209 images were obtained with and without crossroads. Some images were captured autonomously at 55 and 100 metres of altitude, and others were captured manually at different heights. An example of these image classes can be seen in Fig. 2.

4.1.2 Capturing images with patterns of Roman water channels

The channels are 1.20 metres wide, and on curves, they increase in width to reduce the speed of the water. They have a slope of less than 1%. With these characteristics, the Romans ensured that the water did not have much force when it reached the washing areas, to which it could pull all the mineral sediment.

The flight was carried out on March 8, 2021, in the Corporales area, belonging to the Truchas municipality west of León. In this case, 1646 images were obtained of the water transport channels leading to the Roman mining operations. In this case, it was necessary to sweep up the channels and capture the images from different perspectives, so manual flight was carried out. Samples of the captured images are shown in Fig. 3.

4.1.3 Capturing images with a Roman gold mining pattern

The Roman gold mining pattern is similar to a broom (see Fig. 4) leading to the location where the Romans refined the material. In the more remote areas, thicker material was used, and where these two areas converged was a washing area where the finest gold could be obtained.

The flight to obtain images of the mine was carried out on March 8, 2021, over the Castrocontrigo mines in León, and 2,082 images were obtained. In this case, manual flights forming circles at a suitable height to capture all the gold mines were carried out.

At the end of this process, a total of 5,892 valid images of size \(4000\times 3000\) of the different patterns were obtained to train our model.

Fig. 5
figure 5

Vanilla CNN representation

4.2 Convolutional neural networks

A convolutional neural network (CNN) is a deep learning neural network algorithm that can take a two-dimensional input, an image and a matrix, and is capable of extracting complex characteristics from the input data.

In general, a vanilla CNN has several convolutional layers, and it is also common to add MaxPooling layers to reduce the processing time and one or more fully connected layers to perform classification.

Convolutional layers work with different types of filters, which can change the image by blurring it, using the difference in contrasts to find edges, etc. However, these filters are not configured manually; the convolutional neural network itself is adjusted until it reaches the best performance in regard to recognizing the patterns.

In the CNN training process, the network adjusts the filter weights to produce an accurate feature map for each class.

The scheme of a vanilla CNN is shown in Fig. 5.

4.3 Classical methods

4.3.1 Nearest centroid

This classification model assigns to each new set of input data a descriptor of the photo, in this case, the class of the training data whose centroid is closest to the input data. That is, given a set of labelled samples \({(x_1,y_1),...,(x_n,y_n)}\), where \(x_i\) is the data and \(y_i\) is the label, this method computes the per-class centroids as represented in (1).

$$\begin{aligned} \mu = \frac{1}{\Vert C_l\Vert }\sum _{i\in C_l}X_i \end{aligned}$$
(1)

where \(C_l\) is the set of indices of the samples belonging to class l. This model assigns to the new input the class: \(y=argmin_{i\in Y}||\mu _i - x||\).

4.3.2 Decision trees

A decision tree divides the nodes into all available variables and then selects the division that results in the most homogeneous subnodes. The observations are represented by the branches, and the classes are represented by the leaves. A tree is built by splitting the training set, constituting the root node of the tree, into subsets that constitute the successor children. The splitting uses a set of rules based on classification features [53].

4.3.3 Random forest

This method is a combination of predictor trees such that each tree depends on the values of a random vector that is tested independently and has the same distribution for all the trees. It is a substantial modification of bagging that builds a large collection of uncorrelated trees and then averages them [54]. The main idea is that Random Forest builds multiple decision trees and merges them together to obtain a more accurate and stable prediction.

4.3.4 Gaussian naive Bayes

Gaussian naive Bayes is a variant of naive Bayes that follows a Gaussian normal distribution and supports continuous data. The main algorithm is divided into five steps: calculate the prior probabilities for given class labels, create a frequency table with the historical data, calculate the probability for each data point of each class, use the Bayes formula (2) to calculate the posterior probability and finally, assign the class with the highest probability to the input data.

$$\begin{aligned} P(A_i\vert B)=\frac{P(B\vert A_i)P(A_i)}{P(B)} \end{aligned}$$
(2)

where \(P(A_i)\) is the a priori probability, \(P(B \vert A_i)\) is the probability of B conditional on hypothesis A, \(P(A_i\vert B)\) is the posterior probability and P(B) is the probability of observing the evidence.

4.3.5 Local binary pattern

Convolutional networks do not need the image to be described in advance, but in order to use classical machine learning methods and compare our results, the dataset images had to be described, and LBP was used for this purpose.

LBP describes the texture of greyscale images by extracting their local spatial structure with a very low computational cost [55]. For each image pixel, a code is obtained by comparing its value with the value of its neighbours (see (3)):

$$\begin{aligned} LBP_{P,R} = \sum _{p=0}^{P-1} s(g_p - g_c) 2^P \end{aligned}$$
(3)

\(s(x) = 1\) if \(x\le 0\), and \(s(x)=0\) if \(x<0\). \(g_c\) is the grey value of the central pixel, \(g_p\) is the value of its neighbour p, P is the number of neighbours, and R is the radius of the neighbourhood. The input image is described by creating the histogram of the LBP values at each pixel of the image.

5 Experiments and results

5.1 Experimental setup

First, the dataset captured in the different flights was labelled according to the classes to be recognized. The manual labelling process involved utilizing the entire image for reference. By using the entire image, we can determine the presence of Roman architecture even if it appears in a small or off-centre part of the image, so the experiments are more challenging. After labelling, the data were divided into training and testing sets, as shown in Table 1:

Table 1 Dataset split for training and testing with the numbers of images in each class
Fig. 6
figure 6

CNN scheme

When loading the dataset, a Keras class called ImageDataGenerator was used to avoid collapsing the computer memory when loading batches of 64 images for processing. It was also used to increase the dataset by rotating the images, inverting them horizontally and vertically, and rescaling between 1 and 255 the range of colours that appear in the images, in addition to resizing the images to 150 x 150 pixels for faster processing.

For the implementation of the neural network, Keras Python libraries that work on the lowest-level TensorFlow library were used. The CNN is made up of three convolutional and pooling layers to reduce the dimensionality of the feature maps, with a 3x3 kernel in the convolutional layer and a 2x2 kernel in the reduction layer. After these layers, we reduce the feature maps to just one per image using a Flatten() layer that performs classification. Half of the neurons are deactivated in each iteration by means of dropout layers to avoid overfitting the network, thus making it better able to generalize. The scheme of the CNN is shown in Fig. 6.

5.2 Classification results

The described network was trained for 1000 iterations, and the results were satisfactory, yielding an accuracy of 95.03% over the test dataset. In Fig. 7, it can be seen how the loss function decreases and how the accuracy increases throughout these thousand epochs, demonstrating the good performance of the training process.

Fig. 7
figure 7

Evolution of the loss function and accuracy over 1000 epochs

To verify the effectiveness of the proposed method, the results were compared with those of other methods, such as Nearest Centroid, Random Forest, decision trees and Gaussian naive Bayes. The advantage of the CNN is that the description and classification processes are carried out at the same time. To evaluate the method with classical classification techniques, it is necessary to make a description of the image first. It was decided to use local binary patterns (LBP), which is one of the best-known image description techniques. Different LBP configurations were tested to verify the classification method in a more robust way. The images were described with LBP 16 (taking into account the 16 nearest neighbours) and LBP 8 (taking into account the 8 nearest neighbours). The results are shown in Table 2.

Table 2 Classification results of our proposed CNN compared with classification methods using LBP 8 and LBP 16 as descriptors over the test dataset

It can be seen that because convolutional networks can intrinsically describe an image through filters, very promising classification results are obtained in the field, outperforming other classical methods. In terms of classical methods, LBP8 obtained slightly better results than LBP16, with the highest accuracy of 83.48% obtained using Random Forest.

For a better understanding of the performance and limitations of the model, we calculated the confusion matrix for the proposed neural network. These results are shown in Fig. 8. The confusion matrix demonstrates the excellent performance of this model in identifying channels, crossings and mines with 99.18%, 93.82% and 99.17% accuracy, respectively. The greatest imbalance is seen when classifying images without elements, where 34% are classified as crossing. Although this can be improved, the main interest of the experiment is in identifying the elements through UAV images, so these false positives do not affect the objective greatly.

Fig. 8
figure 8

Confusion matrix using our proposed CNN

Furthermore, precision, recall and f score were calculated for each of the classes. The results are shown in Table 3.

The outcomes presented herein underscore the commendable performance of the network in effectively discriminating between mines and channels. It is noteworthy that the most suboptimal results are found within the “None” class; however, this discrepancy is inconsequential within the context of the experimental design, which deliberately prioritizes mine detection over potential instances of false negatives, where a mining entity might be present but remains undetected.

6 Discussion

The combination of drones and neural networks has proven to be an extremely versatile tool for identifying elements related to Roman gold mining and differentiating them from other present-day anthropic remains. One of the greatest pitfalls encountered by scientists when identifying mining remains is the degree of subjectivity of their mapping due to various factors [56, 57], both extrinsic (i.e., quality of the images, vegetation, etc.) and intrinsic (i.e., expertise of the mapping technician and digital means available). To overcome this degree of subjectivity, neural networks can help in discriminating digital environments. The mapping of Roman mining remains in NW Iberia presents a high degree of difficulty due to the dense vegetation that covers the mining landscapes and the anthropic modification of the landscape over the last fifty years due to farming practices. A method of reliably discriminating this type of infrastructure can be a versatile tool for archaeologists. Furthermore, the identification of characteristic patterns can improve archaeological protection plans, reducing the time needed to analyse and identify these remains, which reduces time-consuming practices and economic resources, especially in a region such as the study region presented in this work, which has more than a thousand kilometres of Roman canals and a large number of mining operations. It is important to highlight that being able to identify ancient mines has enormous potential from a mining point of view, as it facilitates the location of nearby deposits of economic interest, the identification of which has been carried out from the historical analysis of the landscape with the use of LiDAR data, drones and orthoimages [48, 58]. The automatic identification of mining remains has another associated benefit, as mines are often dangerous places where accidents can occur, so administrations must make an enormous effort to locate and catalogue old mines that have been inactive for years, possibly thousands of years. The latter are the most dangerous, as in many cases, there is no documentation of their presence in the territory. The possibility of identifying pits and ditches can reduce the risk of accidents and can be used for the control and monitoring of current exploitations. The study and control of landslides and subsidence in mining areas has thus far been carried out by remote sensing analysis [59]. However, neural networks could even help in determining the presence of illegal mining in many mining areas, especially in underdeveloped countries where this type of mining is more prevalent [60]. However, remote and automatic detection of features also brings potential benefits for environmental monitoring and can help reduce mining impacts and assist in the administrative control of mining sites, a task that has been carried out to date through image-based surveys [61, 62]. It is also of interest to highlight how the use of drones in recent years has contributed to improving several fields of Roman gold mining archaeology research [58], especially the use of remote sensors, such as multispectral cameras that facilitate the identification of mining remains and hydraulic infrastructure in a fast and effective way [43]. The implementation of these airborne systems for capturing images that are subsequently processed with neural networks can contribute to improving the identification of elements over large and remote areas or difficult-to-access sectors, providing fast and cost-effective results.

Table 3 Precision, recall and F Score using our proposed CNN for all the different classes evaluated

In summary, convolutional neural networks (CNNs) have demonstrated exceptional capabilities in image analysis and feature extraction, making them an ideal choice for processing UAV-derived aerial images. These networks leverage convolutional layers to automatically detect hierarchical features such as edges, textures, and patterns within the images. In the context of identifying ancient mining remnants, CNNs excel at recognizing subtle visual cues and variations in terrain, which may not be discernible to the human eye. The mathematical foundation of CNNs, specifically the convolution operation, helps in capturing spatial hierarchies and patterns within the imagery, enabling accurate feature identification.

CNNs not only are powerful in feature extraction but also excel in pattern recognition and classification tasks. The mathematical underpinnings of CNNs, including activation functions and pooling layers, enable the network to learn complex spatial relationships and classify objects with high precision. When applied to UAV-derived images, CNNs can achieve good results by exploiting both the rich spatial information provided by the imagery and the network’s inherent ability to learn and generalize from datasets, making them a valuable tool.

This work shows the possibilities of combining new geomatic technologies, such as drones, with neural networks, opening the door to future research aimed at improving the mapping of Roman gold mining remains in NW Iberia. Our methodological approach enhances mapping and management for archaeologists and mining and environmental companies, as well as administration, which is ultimately responsible for the control, management and preservation of past, present and future mining remains.

7 Conclusions

In this paper, a new method has been proposed to automatically classify Roman gold mining infrastructure from UAV-derived images. We have developed a convolutional neural network architecture that is capable of classifying different common anthropogenic features, such as channels, roads, and mines, with 95.03 % accuracy. The importance of using deep learning is demonstrated by comparing the proposed method with classical image description techniques and conventional classification algorithms. The results show an improvement of 13.39% with respect to the best evaluated classical method, which was Random Forest. The ability of convolutional architectures to identify the best descriptions allows us to obtain much better results than have been obtained to date with other types of techniques. In addition, the use of UAVs for taking images greatly facilitates the capture process and brings this type of solution much closer to the end consumer. These results make it possible to ensure that the identification of Roman infrastructures can be automated, with all that this entails for the related industries.

One promising avenue for future work could involve expanding the scope of the study by incorporating additional mining site locations and significantly augmenting the dataset used for training and validation. By doing so, the method’s robustness and predictive power could be enhanced.

Broadening the dataset to encompass a wider range of mining remnants and diverse environmental conditions can lead to a more comprehensive understanding of the model’s capabilities. It would enable the algorithm to generalize better across different terrains and mining practices, ultimately making it more powerful in identifying ancient mining remnants from aerial imagery. Future work will involve enabling the user to geolocate targets automatically from coordinates, even in a large area. Furthermore, the inclusion of more locations and data points would help address potential biases and ensure that the model’s performance remains consistent and reliable across various scenarios.