Introduction

In the past few years we have been particularly interested in the development of methodologies that will promote a complete characterization of the organic colourants used in the past as well as their degradation products [1,2,3,4,5,6,7,8,9,10,11,12]. Changes in pigments, whether used pure or admixed, can alter the appearance of a painting significantly; consequently, the identification and state of degradation of colourants is of fundamental interest, since it provides critical information about the artists’ aesthetic perspective, conceptions and choices, and how the work has changed over time. Therefore, it is desirable to develop methods that can characterize these materials directly on the artwork, in situ, or from small samples that may be available from works of art. Microspectrofluorimetry offers high sensitivity and selectivity combined with good spatial resolution and the possibility of in-depth profiling. It can also be used in situ without any contact with the sample or work of art to be analyzed, for movable objects that can be transported in the laboratory [13, 14]. The importance of sensitivity is clear when the following facts are considered: some of the dyes used in the past to create bright colours may have faded or may have been applied as very thin coats over, or mixed with, an inorganic pigment or extender, and therefore they may be present in very low concentrations. The possibility of in situ analysis of ancient colourants is a considerable advantage, particularly when considering that the techniques currently employed for dye analysis (HPLC–DAD-MS, microFTIR and SERS) require micro-sampling [15,16,17]. Microspectrofluorimetry also presents some drawbacks, namely the absence of a molecular fingerprint as disclosed in infrared spectra. This limitation may be overcome by combining surface-enhanced Raman spectroscopy (SERS) and fiber-optics reflectance spectroscopy in the visible (FORS) and by using a consistent database build up with historically accurate reproductions of references for colourants, binders and colour paints, which are the result of research into written sources of medieval techniques [13, 14, 17]. They are part of reproducing the process described in the source material as well as molecular identification and comparison with the original colours. This leads to a virtuous feedback loop, where reference compounds are validated against originals and are used to improve the analytical methods applied when identifying materials [11, 18,19,20,21,22]. A hypothesis that we will test in this work using a chemometric approach.

We will focus on four natural red dyes, and their lake pigments, used during the Middle Ages (found in medieval manuscripts and described in technical treatises): lac dye, kermes, cochineal and brazilwood, Table 1. The latter is a flavonoid, but the other three are anthraquinone reds extracted from animal sources, which makes their identification by an analytical technique such as microspectrofluorimetry very challenging.

Table 1 The four red colourants studied in this work, with the respective chromophores, provenience and chronology of occurrences in the Mediterranean world (in artworks)

Brazilwood has been extensively found in books of hours from the 15th–16th c., and in the Galician-Portuguese Ajuda songbook, possibly dated from the 13th c. [5, 11, 12] and is extracted from a tree, Caesalpinia sappan or other brazilwood species brought to Europe from Brazil from the 16th c. onwards (Caesalpinia echinata, Caesalpinia brasiliensis, Caesalpinia violacea, Caesalpinia crista, and Haematoxylum brasiletto) [18]. Kermes was obtained from a small insect, Kermes vermilio, found in the kermes oak, Quercus coccifera L. Other important historical sources of red derived from the resin secreted from the female lac insect, Kerria lacca, from which are obtained both the lac dye and the shellac resin. It was applied as a dark red or pink colour in Portuguese manuscripts and it is characteristic of the Romanesque monastic production (12th–13th c) [1, 10]. In the 16th c. most of these sources were replaced by the red and scarlet colours of the American cochineal, Dactylopius coccus, commercialized by the Spanish empire [23]. Similar species were already found in Eastern Europe, Porphyrophora polonica and Porphyrophora hamelli, known as Polish cochineal and Armenian cochineal, respectively [23, 24].

In previous publications, we proved that confocal microfluorescence is a powerful tool for in situ analysis of colourants based on natural dyes [13, 14, 25]. Natural dyes may be described as weak to medium emitters. Following light absorption, an excited molecule is formed, and this fluorophore may lose its excess energy by emitting light. In a spectrofluorimeter, exciting at a single excitation wavelength and recording the fluorescence in the fluorophores’ emission wavelength range results in an emission spectrum. It is also possible to excite at different wavelengths, following the colourant absorption spectrum, collecting at a single wavelength, obtaining thus an excitation spectrum that may reproduce the absorption spectrum [26].

The simultaneous acquisition of emission and excitation spectra facilitates a more accurate identification of dyes and lake pigments [14]. To maximize the extraction of the information present in these signals, this work proposes a chemometrics approach for the study of the database build up with historically accurate reproductions of brazilwood, cochineal, lac dye, and more recently kermes. These lake pigments were used to produce a similar range of colours, and the three anthraquinone based chromophores display similar excited state properties, Fig. 1. The potential of chemometric models which simplify the interpretation of each system, i.e. each colourant, will allow to explore similarities between colourants and classify the spectral data into different classes. For this reason, hierarchical cluster analysis (HCA) and principal component analysis (PCA), as well as soft independent modelling of class analogy (SIMCA), were explored, with the spectral data acquired, to test the possibility of discrimination between these main four colourants.

Fig. 1
figure 1

Excitation and emission spectra of selected reconstructions of the red lake pigments. From left to right: brazilwood, from the Livro de como se fazem as cores: recipe 8 and recipe 44; kermes, both from the Roosen-Runge adaptation of the Jean le Begue manuscript; cochineal, Winsor and Newton’s Finest Orient Carmine and Crimson with gypsum; lac dye, Ms. Bolognese, recipe 129 and recipe B.140

PCA is the chemometrics workhorse. Its application is often intended to help interpretation of multivariate datasets. PCA projects multivariate data onto a lower dimension orthogonal space. These projections (loadings) yields the scores or an alternative representation of the samples, though encompassing most of the original data variance [27]. PCA is an unsupervised method in the sense that no considerations are made regarding the samples for building the model. HCA is the general designation of methods for grouping samples characterized by data vectors or matrices, eventually forming clusters. The distance between samples (e.g. Euclidean or Mahalanobis distance) is evaluated recursively, aiming at defining a clustering tree. With this grouping process, performed hierarchically, and depending on the selected algorithm, multiple clustering options are possible. Results are typically represented graphically in the form of a dendrogram, where samples are visualized according to their similarity [27]. The SIMCA model, is a supervised classification method. It is based on the development of multiple PCA models, each built considering samples of a known class or group [28, 29]. The goal is to allow for classification by presenting unknown samples to the different PCA models composing the SIMCA model. When projecting samples to this model, they are classified according to their similarity with the different PCA class models (typically Hotelling’s T2 and squared residuals statistics are used to evaluate the distance to each model). Indeed, when projecting one sample, different outcomes are possible: (1) the sample might be classified according to one class; (2) the sample is classified as belonging to two or more classes; (3) the sample is not classified in any of the model’s classes. This allows for the coverage of high class variability by the principal components calculated individually, making SIMCA one of the most commonly used techniques for the classification of spectral data [28, 29].

Experimental

Historically accurate reconstructions

Kermes lake reconstructions were prepared, with as much historical accuracy as possible, according to the Roosen-Runge adaptation (1967) of a Jean le Begue´s manuscript (Experimenta de coloribus) recipe [30, 31]. Kermes vermilio female insects were ground in a mortar with additions of lye until a concentrated dark red solution was obtained. The mixture was heated for 30 min at 50 °C and then centrifuged for 10 min (pH circa 8). Afterwards, the dark red supernatant was heated at 50 °C and alum (Al3+) was added (pH = 6.8). This procedure was repeated to verify the reproducibility of the data.

For lac dye, twelve recipes were selected from six treatises/recipe books: Mappae clavicula (9th–12th c.), Livro de como se fazem as cores (The Book on How to Make Colours, 15th c.), the Bolognese manuscript (15th c.), the Strasbourg manuscript (15th century), the Montpellier manuscript (15th c.) and the Paduan manuscript (late 16th to 17th c.); these reproductions have been described elsewhere [20, 31,32,33,34,35,36].

The production of cochineal lake pigments is very little documented in the written sources from the medieval period. Reconstructions of these lake pigments were therefore adapted from Winsor & Newton 19th c. archive in different varieties: carmine (Finest Orient Carmine, Half Orient Carmine and Ruby Carmine), and crimson (with an aluminate composed of alum and an alkaline compound, designated as Crimson 1 and 2, and with gypsum, designated as Crimson with gypsum). The preparation of the cochineal reconstructions is described in [22] and brazilwood reconstructions have been reported elsewhere [18].

Paint references were prepared using arabic-gum and glair. Glair was prepared as described on the 11th-century De clarea treatise [37] and gum arabic, from Kremer Pigmente, was prepared according to De arte illuminandi as a 10% solution [38]. For the glair, the egg white was beaten and the liquid that formed at the bottom was used; for the gum arabic, the pieces were ground and then added to pure water. The lake pigments were first ground in an agate mortar with pure water and then ground with the binder. The paints were applied on filter paper and parchment with a paintbrush and allowed to dry. Spectroscopic or equivalent grade solvents and Millipore filtered water were used for all the spectroscopic studies as well as for the extraction of the dyes and preparation of the lake pigments.

Microspectrofluorimetry measurements

Fluorescence excitation and emission spectra were recorded with a Jobin–Yvon/Horiba SPEX Fluorog 3-2.2 spectrofluorometer hyphenated to an Olympus BX51M confocal microscope, with spatial resolution controlled by a multiple-pinhole turret, corresponding to a minimum 2 μm and maximum 60 μm spot, with 50 × objective. Beam-splitting is obtained with standard dichroic filters mounted at 45°; they are located in a two-place filter holder. For a dichroic filter of 570 nm, excitation may be carried out until about 560 nm and emission collected after about 580 nm (“excite bellow, collect above”). The optimization of the signal was performed daily for all pinhole apertures through mirror alignment, following the manufacturer’s instructions, using a rhodamine standard (or other adequate reference). For the study of red dyes, two filter holders with two sets of dichroic filters are employed, 500 and 570 nm in one set and 525 and 600 nm in the other set. This enables both the emission and excitation spectra to be collected with the same filter holder. A continuous 450 W xenon lamp, providing an intense broad spectrum from the UV to near-IR, is directed into a double-grating monochromator, and spectra are collected after focusing on the sample (eye view) followed by signal intensity optimization (detector reading). The pinhole aperture that controls the area of analysis is selected based on the signal-to-noise ratio. For weak to medium emitters, it is set to 8 μm, in this work for very weak signals 30 μm spot was also used (pinholes 5 and 8, respectively) with the following slits set:emission slits = 3/3/3 mm and excitation slits = 5/3/0.8 mm. Emission and excitation spectra were acquired on the same spot whenever possible. For more details on the experimental set-up please see [13, 14].

The paint reconstructions were analysed in situ. For each of the prepared paintings 6–9 emission and excitation spectra were acquired, in different days, in different points, and the data was shown to be reproducible. Forty excitation spectra with the corresponding emission spectra, were obtained for brazilwood; 34 spectra for cochineal; 22 spectra for kermes; and 22 spectra for lac dye. Therefore, 118 excitation spectra in total, together with the corresponding emission spectra, were acquired.

Theory and calculation

Spectral pre-treatment

Both excitation and the emission spectra were used. For each sample, excitation and emission intensity were independently normalized to 1 (area) and then the two data blocks were merged (horizontal concatenation of the matrices). For the excitation spectral dataset used in this work it was considered that some filtering was necessary combined with the removal of baseline drifts. It was decided to adopt the Haar transform and 1st derivative (2nd order) were selected for this task. Normalization by area (1) is also typically used for the analysis of fluorescence area and was also considered and applied subsequently to the first two methods [39]. See Additional file 1 for the spectral pre-treatment data.

Chemometric methods

The HCA algorithm was made resourcing to the scores of PCA models. The number of principal components to use for the HCA was defined as the most appropriate for achieving a colourant class separation considering only the calibration dataset. Dendrograms considering from 1 to 10 components were tested. Colourants classification with the SIMCA method resourced on the development of models from a calibration dataset. Samples assignment to classes was defined according to a distance to model metric as described in [40]. The distance to model (d) as defined in Eq. 1 was used with the threshold 1.5 as the criterium for assigning samples to colourant classes.

$$ d = \sqrt {\left( {\frac{{T^{2} }}{{T_{Lim,95\% }^{2} }}} \right)^{2} + \left( {\frac{Q}{{Q_{Lim,95\% }^{{}} }}} \right)^{2} } < 1.5 $$
(1)

In Eq. 1, T2 and Q are the Hotelling’s T2 and squared residuals statistics, respectively and \( {\text{T}}_{{{\text{LIM,95\% }}}}^{2} \) and QLIM,95% are the confidence limits for a significance of 0.05. A sample is considered to belong to a class when d < 1.5. Prior to the application of all chemometric methods the data were mean centred. All chemometric analysis and data manipulations were performed in Matlab Version 8.6 (R2015b) (The Mathworks, Natick, MA) and the PLS Toolbox Version 8.2.1 (Eigenvector Research, Manson, WA).

Results and discussion

The data: historically accurate reconstructions

Brazilwood lakes and lac dye reconstructions encompass a chronological arch from the 15th c. until the 19th c. and 12th c. until the 16th c., respectively, also showing that the main steps for the manufacture of these lake pigments were kept through time [18, 32, 33]. Due to the lack of cochineal recipes in medieval records, 19th c. W&N carmine and crimson cochineal pigments were used [22]. The main process of W&N for carmine manufacture (Finest Orient Carmine) involved an acid extraction of the dye and the addition of aluminum from alum to calcium from milk. Two other processes to produce carmines involved the absence of the milk in the previous process (Half Orient Carmine) and extraction of the dye with potassium carbonate followed by precipitation with alum and cream of tartar (Ruby Carmine). The crimson colours were produced with the addition of a lake pigment dispersion to an aluminate composed of alum and an alkaline compound (ammonium or sodium carbonate), or an extender (gypsum) [22]. Kermes database is constituted of several reconstructions of a Jean le Begue’s recipe [30, 31]. Further studies in other recipes are currently in progress.

Pigment lakes have been fully characterized and rationalized by multi-analytical techniques [18, 20, 22].

Analyzing the relative fluorescence intensities between colorants, we may conclude that brazilwood and cochineal present relatively similar excitation and emission intensities, while lac dye’ intensity increases, in some cases, by tenfold (see Additional file 1: Fig. S1–S4). Kermes, on the other hand, is the chromophore which presents the lowest intensities from the four colourants. The chromophores of both laccaic acid and carminic acid have been characterized in solution [41]. The quantum yield of fluorescence registered for these chromophores with a 1:100 ratio aluminum complex (lake), has been 1.5 × 10−2 and 4 × 10−2, respectively, which enables their characterization as moderate and weak emitters. No values for brazilein—Al3+ complexes are available. However, knowing that brazilein at pH = 1.5 shows a quantum yield of fluorescence of 6.8 × 10−3 [42], we may predict a tenfold increase, for an aluminum complex, of the quantum yield, i.e., 7 × 10−2 [41]. A photophysical characterization is yet to be done on kermesic acid, which could shed some light on why the intensities are much lower than the other chromophores.

Unsupervised modelling

The pre-processed spectral data were analysed by PCA. The first principal components were examined regarding the ability for separating samples with different colourants. The first and second components separate mainly brazilwood and cochineal from the other two colourants (lac dye and kermes), Fig. 2. The third component differentiates the crimson recipes based on cochineal, where an aluminate or extender was added to a lake pigment dispersion, and the finest orient carmine colours, that stand out for the addition of milk as a source of calcium, Fig. 3. Even more interesting is the fact that it distinguishes between Crimson 1, which had the addition of an aluminate composed of ammonium carbonate and alum, and Crimson 2, which had the addition of sodium carbonate and alum. This proves the potential of this methodology for the identification of not only the colourant, but also the specific recipe. The forth component separates the classes of kermes and lac dye, considering mostly the 400–440 nm region, Fig. 4.

Fig. 2
figure 2

Principal component analysis scores, for normalized and filtered (by Haar transform and 1st derivative (2nd order)) excitation spectra for red lake pigments, showing the separation of cochineal (green) and brazilwood (red) from the other two colorants

Fig. 3
figure 3

Principal component analysis scores illustrating the separation of cochineal manufacturing processes, Crimson lakes and Finest Orient Carmine (green)

Fig. 4
figure 4

Principal component analysis scores, for normalized and filtered (by Haar transform and 1st derivative (2nd order)) excitation spectra for red lake pigments, illustrating the separation of the red lake pigments: kermes (light-blue), lac dye (dark-blue), cochineal (green) and brazilwood (red)

The PCA model is not a classification method and, for a better visualization of the similarity between the different samples, the HCA method resourcing to the Ward’s algorithm was used (using the Mahalanobis distance). The HCA method was fed with the principal components generated by PCA models. The number of principal components (PCs) to use was based on the development of different HCA considering different number of PCs but using only approximately 2/3 of the total samples (the dataset division was based on the Kennard-Stone algorithm). This was performed for emission/excitation spectra models. It was found that five components yielded the best colourants discrimination. After this selection, the HCA method was applied to all samples considering always five components. These results are presented subsequently. Considering the excitation spectra dataset alone, the HCA method revealed a successful separation of the four dyes, Fig. 5. Four distinct clusters are visible in the dendrogram, each encompassing the samples of a different colourant. When both excitation and emission spectra sets were used, the distinction between lac dye and kermes was not possible due to the similarities in the emission spectra for these colourants, which is also seen in Fig. 2.

Fig. 5
figure 5

Dendrogram generated by HCA applied to excitation spectra, showing a clear discrimination between the four red lake pigments: kermes (light-blue), lac dye (dark-blue), cochineal (green) and brazilwood (red)

SIMCA model

As previously mentioned the SIMCA model, is a supervised classification method, allowing the development of multiple PCA models, each built considering samples of a known class or group. To build the SIMCA method, samples were divided randomly in a training and validation set according to the Kennard-Stone algorithm, where 2/3 of the 118 samples were set for calibration and the remaining for validating the model. Four PCA models were calibrated considering the excitation and emission spectra of the four colourants. The criteria for selecting the number of components was based on the percentage of captured variance (at least 95%). Brazilwood, cochineal and lac dye samples were modelled with four components, while kermes demanded five components. Optimized PCA models were then built using the entire calibration dataset. Each model was then tested by projecting the validation samples. In SIMCA, the normalization by area proved to be enough as a pre-treatment.

Both training samples and validation samples were predicted as the correct class as shown in Fig. 6. The distance to model for each sample (Calibration and Testing sample) for the four different PCA models forming the SIMCA approach are presented in Fig. 6 together with the threshold level for class assignment (1.5). For a better visualization of the data, each colourant is represented with a specific colour. Also, distance to model values were truncated to five for better results visualization. Clearly, brazilwood samples (red markers on Fig. 6 top-left) lie below the 1.5 threshold meaning that these samples are close to the brazilwood model as expected. Other samples lie significantly above the 1.5 threshold. This result can be observed for all colourant models. There are no samples that could belong to more than one class. The strict application of the colourant assignment criterium results on 100% of correct classifications for all validation samples. It was demonstrated the ability of the SIMCA modelling approach to correctly assign the colourant type to all validation samples resourcing to the excitation and emission spectra.

Fig. 6
figure 6

Distance to model metric used to assign samples (calibration and test sets) to colorant classes resulting from the SIMCA modelling approach: kermes (light-blue), lac dye (dark-blue), cochineal (green) and brazilwood (red). The SIMCA modelling approach results in 100% correct predictions for both calibration and validation sets

Conclusions

Microspectrofluorimetry is a powerful technique for the analysis of dyes and lake pigments, due to the advantage of being used in situ without any contact with the sample or work of art to be analyzed. In this work, this technique was explored in a robust methodology for the identification of red lake pigments, using a chemometric approach that allowed to explore similarities between colourants and classify the spectral data into the four different colourant classes, Fig. 1. Unsupervised (HCA & PCA) and supervised (SIMCA) modelling were tested in the discrimination between the four dye families. The first was applied considering the excitation spectral dataset alone and with several pre-processing treatments, allowing for the separation of the colourants into different clusters. It was also possible to pin point the main W&N’s manufacturing processes of cochineal lake pigments, where among the crimson lakes it was possible to distinguish between the different additives, aluminates (Crimson 1 and 2) and the gypsum, and among the carmine colours between the Finest Orient Carmine, which had the addition of milk as a source of calcium, and the Half Orient Carmine and Ruby Carmine, both without calcium.

The SIMCA modelling, allowed for the discrimination between chromophores, with both spectral sets, i.e. excitation and emission, while using less pre-processing treatments.

Based on these results, this methodology will be next applied in data acquired form artworks, from medieval manuscripts to textiles, to select which modelling (unsupervised or supervised) best suits the data. Finally, a search algorithm will be developed making this new advanced analytical tool accessible to the conservation community, and not only to the photophysics experts.