1 Introduction

Architectural heritage possesses outstanding value while concomitantly comprises a fundamental manifestation of sociocultural identity. The historic build environment is a vital aspect of a place's culture, history, and landscape, which necessitates measures to ensure its preservation through time. However, environmental pressures and anthropogenic factors cause constant alterations and impose significant risks. Planning appropriate and compatible conservation and restoration interventions to tackle the deterioration of historic structures requires a comprehensive knowledge of the preservation state. Thus, the need for the historic structures' recording emerges, which will provide the detailed information needed to support required preservation interventions.

Recording the preservation state of a historic structure is a crucial prerequisite for pathology diagnosis. Documenting in detail the condition of the structure's elements is the first step towards qualitatively interpreting its condition and identifying mechanisms of deterioration. Therefore defining the data recording techniques that will provide rich and suitable information about the extent and forms of deterioration is essential for condition documentation. To the greatest extent possible, recording should be non-destructive, meaning that it should encompass those nonintrusive inspection and sensing techniques that do not cause further damage to, nor impair the future usefulness of the structure and the historic materials.

Mapping is widely recognized as an effective non-destructive method useful for condition documentation and can be applied to all materials at different scales. It registers information about the surface patterns of historic structures that can be later analyzed through computational systems. Mapping is frequently performed as a manual process in a computer-aided design (CAD) or geographic information system (GIS) environment using as background color photos.

Progress in automated mapping for historic structures has primarily concentrated on identifying and classifying building elements, materials, and additionally deterioration as a binary concept—considering the presence and absence of deterioration solely. The segmentation algorithms that have been considered are mainly based on dimensionality reduction, unsupervised clustering, and deep learning approaches, occasionally considering spectral bands at the infrared range.

1.1 Aims and scope

This work delves into the fields of imaging science and pattern recognition to identify a novel and accurate methodology for classifying different deterioration forms on historic structures. Reflecting on the potential of multispectral imaging and learning-based image analysis for defect detection, the classification of multispectral composites synthesized from reflectance images captured at the visible (RGB), near-ultraviolet (NUV), near-infrared (NIR), and thermal infrared (TIR) spectra, with supervised segmentation methods based on random decision trees, ensemble learning, and regression algorithmic implementations, is thoroughly evaluated.

1.2 Article structure

This article is structured into six sections. Section 2 delivers an overview of the background for the presented work and discusses the related research. Section 3 describes the methodology followed, including the instrumentation, data collection and preparation, algorithmic implementations, and approaches followed to evaluate the segmentation results. Section 4 presents the application and results for the case study of a historic fortification, while Sect. 5 discusses the accuracy and interpretation of the results. The concluding remarks are presented in Sect. 6.

2 Background and related work

Architectural surfaces of historic structures are subjected to continuous alterations due to exposure to environmental conditions, microorganisms, pollution, anthropogenic damages; their susceptibility to decay also depends on (incompatible) conservation interventions of the past and the inherent characteristics of historic materials. Particularly when several different materials are present (such as in masonry structures), the architectural surfaces consist of an intricate mosaic of deterioration forms. Consequently, documentation methods for describing these complicated conditions in a non-destructive way become pertinent and often necessary.

2.1 Mapping the preservation state

Mapping is a valuable non-destructive method as it facilitates the description, registration, and quantification of the, often overlapping, multitude of surface patterns on historic structures. When performed in a digital, computerized manner, it produces spatial information, entities with geometric attributes that can be correlated, compared, used to produce statistical information, and allow for the attribution of semantic data about the characteristics of materials and their decay. Traditionally, mapping is a technique manually performed inside CAD and GIS platforms by describing the shape of surface patterns and organizing them into thematic layers [1,2,3,4,5,6]. An alternative way of mapping deterioration is the visualization of damage levels/indexes, which can be either accomplished directly or indirectly by analyzing the mapped deterioration patterns [7,8,9,10,11,12,13,14,15]. The additional spatial annotation of lithotypes facilitates the association between materials and alteration [1, 11,12,13, 15,16,17,18,19,20,21,22,23]. Mapping supports the interpretation of weathering phenomena when combined with data from non-destructive testing (NDT) [8, 9, 17,18,19,20, 22, 24, 25], laboratory mineralogical, chemical and physical characterization [15, 18, 19, 23, 26], and environmental measurements [8, 12, 18, 19, 25, 27].

2.2 Generating base-maps for deterioration mapping

Mapping is typically a photo-based approach where a color image, an orthorectified image, or an orthoimage-mosaic is used as a base-map for designing the geometrical shape of surface patterns [27]. The metric (accuracy, scale-dependent spatial resolution) and chromatic quality of this background are essential for identifying deterioration [25, 28, 29]. Thus, acquiring suitable images is crucial for successful deterioration mapping. However, not only true color images have been considered as base maps, but also images captured at portions of the electromagnetic spectrum beyond the visible.

2.3 Multispectral imaging and data complementarity

The reciprocity of mapping and infrared reflectance imaging—especially thermography—has often been considered essential for detecting weathering on historic structures [5, 20, 24, 30, 31]. Besides, thermography is being extensively used in built heritage structural diagnostics [32,33,34] and has also been explored to detect different historic materials on building façades [35, 36]. The additional inclusion of NIR reflectance images enhances the identifiability of deterioration, mainly when there is a presence of vegetation and biogenic crusts, which present vastly different near-infrared reflectance characteristics compared with construction materials [37, 38]. However, the decision to include recorded data from multiple bands comes with the realization that suitable sensing techniques have to be selected.

Spectral collection in the infrared is connected with various sensing techniques that depend on the wavelength choice. Detection in the wavelength range between 400 and 1100 nm has been performed with multispectral configurations that involve multiple single-band cameras recording at 4–12 different narrow spectral bands. The resolution of these instruments is usually low, and the collected imagery has to be meticulously checked to correct sensors' errors [39, 40]. The introduction, or rather repurposing, of commercial digital single-lens reflex (DSLR) cameras with charge-coupled device (CCD) and complementary metal–oxide–semiconductor (CMOS)-based detectors, for spectral imaging at the same range, however, provides more affordable and agile solutions that retain the user-friendly features and the interfaces to a wide variety of photographic software and accessories, and have high spatial resolution [41,42,43]. Commercial off-the-shelf (COTS) DSLR camera detectors are generally sensitive in a portion of the NIR range up to 1100 nm, which is cut off by an internal blocking filter. Removing this filter implies that the camera can be used for imaging at a wider than visible range, and external wavelength-specific filters can be utilized. Detection in the long-wavelength infrared (LWIR) range has usually been performed with uncooled microbolometer detectors for building inspections. The spatial resolution of thermography cameras is considerably lower than that of DSLR, and their relative cost is higher. Recently, more affordable thermography camera models have come into the market, including smartphone-adjustable low-resolution instruments. However, these inexpensive cameras provide lower accuracy, which makes them unusable for some applications.

2.4 Digital image processing

The need for more efficient inspection [44] and intelligent identification of conservation needs [45] has led to the adoption of image processing approaches to generate the thematic data needed for deterioration mapping. Digital image processing (DIP) refers to the manipulation of the digital images to extract features and recognize patterns, which, after having acquired the suitable base-maps, can be performed with techniques as simple as thresholding, edge detection, or information reduction to obtain the required results [33, 46,47,48]. However, these approaches still largely depend on the human factor since many parameters have to be tuned differently for each application, and often deterioration patterns have to be identified and extracted one at a time. The current rise of deep learning-based pattern recognition has delivered powerful tools for fully automated detection of deterioration (often through convolutional neural networks), even when a plethora of surface patterns can be observed [49,50,51,52]. Nevertheless, deep-learning implementations require large image datasets to be efficiently trained, which is often impractical for built heritage applications. They may also underperform considering the uniqueness of each heritage asset, many of which present a distinctive mixture of historic materials. Therefore, other more easily executable supervised learning-based approaches are sometimes considered for deterioration detection through classification and regression.

Multiband and multispectral image segmentation for built heritage inspection purposes has been applied via a range of clustering algorithms, some of the most common being maximum-likelihood, minimum-distance, and k-means [36, 37, 39, 48, 53,54,55,56]. However, most of the relevant works aim at segmenting the materials and elements of historic façades, and when deterioration is considered, it is determined as present or not present. To be specific, many works consider the altered and unaltered areas of a historic material as two categories rather than identifying the different deterioration typologies, which is also partly a result of the state of preservation of the heritage assets involved. Alternative multi-sensor approaches, involving terrestrial LiDAR for NIR recording, have been reported to produce high-accuracy thematic mapping results for damaged historic structures [57, 58]. However, they introduce significant instrumentation costs, and require rigorous radiometric calibrations and optimal data gathering conditions.

3 Methods and materials

The rationale behind this work is set on the identified lack of image-based methods for automatic mapping of weathered historic structures. The methods tested aim to tackle the problematics of mapping the preservation state when various surface deterioration forms are present. Instead of following unsupervised segmentation techniques and then interpreting each classified category of weathering-caused alteration, supervised algorithmic approaches are implemented using as input the already identified deterioration categories. Combinations of different spectral band composite images and supervised segmentation algorithms are evaluated to distinguish an optimal solution in terms of accuracy—based on reference data.

Figure 1 depicts the implemented research design in this work. As already highlighted, the quality of available imagery upon which the pattern recognition will be performed is essential for ensuring the accuracy and interpretability of results. Therefore, the workflow starts from acquiring appropriate images and then continues with their radiometric correction. The multispectral composites are digitally synthesized from the band-specific reflectance images and subsequently segmented into deterioration categories following a visual identification of training regions. The results are evaluated with metrics deriving from the field of remote sensing. The output of the deterioration classification can be optionally transferred to an environment appropriate for spatial information management. The principle of using low-cost equipment and software was followed throughout this work as it is an essential factor for the inspection of historic buildings.

Fig. 1
figure 1

Overall research methodology

3.1 Sensors and data acquisition

The selection of the instruments employed in this work considers the complementarity of data captured at different spectral bands and the flexibility requirements of sensing techniques used for built heritage condition monitoring. Affordable, portable sensors are utilized to obtain the necessary multispectral data that will constitute the background for the deterioration pattern analysis, contributing to a simple to implement methodology. The characteristics of the instrumentation are presented in Table 1. The images are taken with two sensors, an EOS Rebel SL1 (Canon Inc., Tokyo, Japan) digital single-lens reflex camera with an EF-S 18-55 mm f/3.5–5.6 IS II lens, and a FLIR ONE Pro (Teledyne FLIR LLC, Wilsonville, OR, USA) thermographic camera attached to a smartphone. The internal hot mirror filter of the SL1 camera has been removed to allow imaging beyond the visible range. Three low-cost external filters are employed to allow RGB, NUV, and NIR photo shooting. The images are acquired as parallel as possible to the architectural surfaces to avoid occlusions, and with small focal lengths to avoid large distortions that can affect image quality during the resampling phase of distortion correction. Furthermore, the images are acquired under homogeneous illumination conditions and without shadows, improving their radiometric potential and with a steady tripod, thus preventing image blur. Since low-cost sensors are more likely to be affected by noise sources, the camera sensor is checked to estimate the vignetting and background noise levels, and the images are corrected to ensure their quality. The thermographic data are acquired at sets of burst images to increase digitally later their spatial resolution.

Table 1 Specifications of digital cameras used for acquiring multispectral reflectance data

3.2 Multispectral data preparation

Pre-processing the imagery data involves the preparation of multispectral image composites for the subsequent segmentation. At first, the radiance images acquired with the SL1 camera are downloaded in the RawDigger (LibRaw LLC, Maryland, USA) software, where the color filter array conversion is reversed to acquire raw radiance images, and RGB images are color balanced. Non-visible spectrum images should also be converted to reflectance images based on pixel values of a reference surface. The uncompressed images are then corrected from distortion [59] in ImageJ2 [60]. Thermal infrared burst mode images acquired with the FLIR ONE Pro camera are used to create high-resolution thermal images [61].

The manual matching of band-specific images is done using the HyperCube software [62] (projective transformation, nearest-neighbor interpolation). Subsequently, the image composites are constructed using different multispectral combinations, as described in Table 2. The images are resampled to match the resolution of all bands, and the sky and ground are trimmed from all multispectral composites to reduce potential misclassifications. The synthesis of the multiband composites also considers the same principle of using low-cost equipment, and thus all composites consist of three bands so that segmentation can be performed in ImageJ2—avoiding the use of commercial specialized spatial analysis software.

Table 2 Multispectral image composition

3.3 Machine learning-based segmentation of deterioration patterns

The classification of deterioration patterns is performed via a supervised segmentation procedure using the Trainable WeKa Segmentation 3D plugin [63] of ImageJ2. The machine learning-based image segmentation techniques follow decision tree [64], ensemble learning [65], and regression approaches. Specifically, the Random Tree, Random Forest, Fast Random Forest, and LogitBoost classifiers are employed. The supervised approach presupposes the annotation of image regions of interest (ROIs), corresponding to each semantic deterioration category to be segmented, that will train the algorithmic model into providing a semantic classification of the entire image.

The decision tree model is a machine learning algorithm that can be used for both supervised classification and regression problems. A decision tree simply consists of a series of sequential decisions made to reach a specific result of distinct data classes. The classes are mutually exclusive and represented by specific attributes. The learning input, which consists of sets of pixels belonging to known classes, assists the accurate classification of both annotated pixels and not annotated pixels. Each node of the decision tree decides an outcome based on the attribute values and leads either to another node, using an appropriate subtree, or to a leaf, which gives the predicted class of the pixel [66]. The Random Tree classifier is based on a decision tree learning method. Single decision trees are easy to conceptualize but usually suffer from high variance, making them not competitive in terms of accuracy.

A random forest classifier combines ensemble classification machine learning algorithms and decision trees. Each tree classifier is independently generated from the input training data using a random sample like in bagging. When growing a tree, the best possible split is computed for a random subset, instead of always computing the best split for each node. In this way, tree diversity is generated using two ways of randomization. Aggregating predictions make the class prediction of the ensemble. Random forest generally overcomes the accuracy limitations of single decision trees [65, 67].

LogitBoost is a boosting algorithm that performs classification using a regression scheme as the base learner and can handle multi-class problems. It can be seen as a convex optimization; it applies the cost function of logistic regression on a generalized additive model. This classifier determines the appropriate number of iterations by performing efficient internal cross-validation [68].

3.4 Accuracy metrics

The performance of the machine learning classification implementations, and of the different multispectral combinations, are quantitatively evaluated using manually produced degradation maps as the ground truth. Different parameters are used to assess the classification efficiency of the intelligent feature extraction techniques based on accuracy metrics common for thematic mapping. More specifically, the evaluation relies on the precision (fraction of appropriate classification among the classified instances) and F1-score (harmonic mean of precision and sensitivity) calculated for each class (Eqs. 1,2), and on the overall accuracy (Eq. 3)—useful to estimate the overall performance of the classifiers.

$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
(1)
$$F1{\text{ Score}} = { }\frac{{2{\text{TP}}}}{{{\text{TP}} + {\text{FP}} + {\text{TP}} + {\text{FN}}}}$$
(2)
$${\text{Overall accuracy}} = \frac{{\text{Sum of correctly classified units}}}{{\text{Total number units}}}$$
(3)

where, for each class the TP (true positive), FP (false positive), and FN (false negative) come from the error matrix, a square array of numbers, which express the number of pixels assigned to a particular class in one classification relative to the number of pixels assigned to a particular class in the reference data [69, 70].

3.5 Case study

The historic structure selected as a case study for the application and assessment of the proposed methodology is a fortification in Euboea, Greece (Fig. 2). The Fort of Karababa is an Ottoman fortification constructed in 1684 on the homonymous hill which dominates the Boeotian coast across the city of Chalcis. The construction of the fortress was part of the effort to protect the city of Chalcis from impending Venetian attacks. The architectural style of the fort is more European than Turkish. It is oblong in plan, with a rampart on the north side, three bastions, and a large tower. Several parts of the fortification walls have ancient spolia built-in, while the south part is preserved in poor condition. The weathered masonry surfaces selected for evaluating the methodology are presented in Fig. 3. They are on the north side, and for abbreviation purposes, they have been named A, B, and C, starting with the westernmost (on the north bastion).

Fig. 2
figure 2

Fort of Karababa, bird’s-eye view

Fig. 3
figure 3

Fort of Karababa north side, façades selected for evaluating the mapping methodology; from top to bottom: A (westernmost), B, and C (easternmost)

4 Results

Following the described methodologies, after the composition of multispectral images was completed (Fig. 4), 60 classifications were performed. The generation of reference maps considered the Illustrated Glossary on Stone Deterioration Patterns [71] as a guide during visual inspection. The observed categories of deterioration were vegetation, moss, black crusts, lichens, missing material (including loss of components, large cracks, and windows), and dampness. These constituted all the categories of surface pathology that altered the surface reflectance characteristics of the masonry façades. Patterns that caused slight geometrical surface alterations, such as minor cracks, superficial cracking due to biogenic deterioration, disintegration, or other shape features induced by material loss, insignificant concerning the considered scale and the reflectance contract comparing with healthy historic materials could not be considered. The thematic comparisons were performed using the full reference maps and not sampled patch areas. Overall accuracy statistics calculated from the confusion matrixes are presented in Table 3. The precision and F1-score results are presented in detail in “Appendix A”.

Fig. 4
figure 4

Multispectral data preparation for façade C. Note: NUV near-ultraviolet; R red; B blue; G green; NIR near infrared; TIR thermal infrared; M monochromatic color image

Table 3 Overall accuracy statistics by image and classifier

The deterioration maps produced for all the studied architectural surfaces were of generally high thematic accuracy, especially for classifications performed with ensemble learning-based algorithmic implementations. Furthermore, the inclusion of different spectral bands improved the classification potential, subject to the categories of deterioration present.

5 Discussion

The inclusion of the NIR spectral band fairly improved the classification results for all deterioration forms. The segmentation of a NIR-R-G multispectral image and the Fast Random Forest classifier proved to be the most consistent solution overall (79 ≤ overall accuracy%). Figure 5 presents a comparison between the reference maps and the NIR-R-G composites segmented with the Fast Random Forest. Using NUV reflectance data generally did not provide any improvement to the quality of the classifications. Including the TIR band also did not improve the deterioration patterns' classification. Furthermore, the fusion of visible with thermal data significantly decreased the accuracy of detecting deterioration when dampness was present, which contradicts that thermal images are helpful in detecting moisture on historic masonry, as evident by Fig. 6.

Fig. 5
figure 5

Reference deterioration maps (left), and corresponding deterioration maps produced with a NIR-R-G multispectral image using the Fast Random Forest Classifier (right); façades A, B, and C (from top to bottom)

Fig. 6
figure 6

Thermograms of façades A (left), and B (right)

According to the overall accuracy results, the Fast Random Forest classifier was the most accurate learning-based method for deterioration classification for all multispectral images, not including the TIR band (70% < overall accuracy < 87%). Implementing the random tree classifier resulted in more inconsistent and less accurate classifications (60% < overall accuracy < 77%). LogitBoost outperformed the Random Tree classifier.

According to the precision and F1-score values, moss and lichens were the most misclassified surface patterns, even though both random forest approaches improved their classification. The results prove that the distinction among non-deteriorated material, dampness, black crusts/discoloration, and plants is much more easily detectable (and therefore classifiable) than biogenic colonization of any form. Therefore, surface alterations of the historic materials—which alter the reflectance characteristics—can be more accurately mapped using multispectral images in comparison with the deterioration forms that completely cover them as an additional layer.

6 Conclusions

In this work, a novel methodology for the automatic classification of damage on built cultural heritage was proposed that uses low-cost photographic equipment for multispectral data acquisition and supervised machine learning-based image segmentation to map deterioration patterns. It was confirmed that including near-infrared reflectance intensities in the employed methods improved the classification of alterations on the historic masonry façades.

The segmentation of multispectral composites (synthesized with visible and near-infrared reflectance images), with classifiers combining random trees and ensemble learning, performed particularly well even were a high number of surface patterns was present. However, the coexistence of different overlapping categories of biogenic colonization complicated the mapping procedure significantly. It should be highlighted that the accuracy evaluation considered some level of bias since the manually produced reference thematic maps cannot consider the overlapping surface patterns.

The proposed methodology has the limitation that it can map only the pathologies that have been previously recognized through visual inspection (or analytical techniques) because regions of interest have to be annotated to train the intelligent algorithms. However, a crucial advantage is that it produces easily interpretable mapping results, in contradiction to unsupervised methods where each mapped pattern class has to be a posteriori assigned to a deterioration category. Furthermore, there is a clear advantage over deep learning-based methods, that require large image datasets, for rapid monitoring purposes of monumental heritage structures. A direct outlook of the proposed framework is the combination with 3D recording technologies to enhance the capability of detecting and mapping the geometric altering (material loss) of historic monuments.