1 Introduction

The world’s estimated annual wood trade was more than 2.6 billion m3 in 2014 (ITTO: International Tropical Timber Organization 2014), and several regulations are being established to assure the legal origin of the wood and the sustainability of logging activities. Proper control is thus essential given the pressure on certain commercial wood species and to protect those that are registered as ‘endangered’ by the International Union for Conservation of Nature (IUCN) or listed by the Convention on International Trade in Endangered Species (CITES) (Johnson and Laestadius 2011). Given the diversity of species, a major concern in tackling illegal logging and trade in the tropics is the classification of wood samples. While the number of European tree species is limited to about 120 (Latham and Ricklefs 1993), the total number of tropical tree species is between 40,000 and 50,000, of which about 5000 are found in the tropics of Africa (Slik et al. 2015).

Wood can be considered as a natural compound material that consists of different types of cells and tissues. Variation in wood structure mainly depends on the wood species, but also on cambial age, growth rate and environmental conditions. Anatomical composition of wood is being discussed in the context of tree performance analysis (Beeckman 2016), wood technology, plant phylogeny and botanical identification. There are three types of tissue: conducting tissue for sap transport; supporting tissue for strength and the so-called parenchyma, which provides for a series of metabolic processes, such as heartwood formation and storage of starch. Wood is heterogeneous, and since cells are aligned vertically and horizontally anisotropic, the appearance changes relative to the viewing angle. Standard observation of the wood’s three-dimensional structure logically entails viewing from three different angles at different microscopic magnifications. The initial viewpoint makes use of transverse or cross-sections: a cut straight through the longitudinal axis showing the layout of vessels, growth rings, the pattern of axial parenchyma and, at greater magnification, the thickness of cell walls. The second viewpoint is that of the tangential section, which one obtains by cutting parallel to the main axis, but tangential to the growth rings. This section is principally used to study wood rays. The third viewpoint is that of the radial section, obtained by cutting lengthwise along radius. It provides insight regarding ray composition. In order to determine the species of wood, its anatomical characteristics are analysed by viewing from the three principal angles at different magnifications. Observed features are compared to scientific reference material kept in a xylarium or with data from specialist literature or databases like InsideWood (Wheeler 2011).

Alternative methods to facilitate wood species recognition are gradually established. DNA methods based on molecular markers provide such an alternative, but the extraction methods are still expensive and data interpretation is to be done by a skilled genetic expert. Moreover, the lack of reference databases is also hampering fast progress in species identification (Hanssen et al. 2011; Nithaniyal et al. 2014). Recently, species identification through chemometric processing of direct analysis in real time (DART) mass spectrometry-derived fingerprints has been developed (Musah et al. 2015). Up to this day, this method allows to distinguish a limited number of mostly CITES-listed species and shows promising results. Databases of fingerprints are currently being established and expert knowledge for interpretation is required (Musah et al. 2015). Also, machine vision based on pattern recognition for automated wood species identification using pattern recognition appears a feasible and attractive solution, because it is less dependent on expert knowledge, while existing databases containing high-quality microscopy images can be exploited (Hermanson and Wiedenhoeft 2011).

Recent research illustrates the potential of automated classification based on macroscopic (Bremananth et al. 2009; Yu et al. 2009; Khairuddin et al. 2011; Khalid et al. 2011; Yusof et al. 2013a, 2013b; Zhao2013; Zhao et al. 2014a, 2014b; Filho et al. 2014) and microscopic (Mallik et al. 2011; Gurau et al. 2013; Guang-Sheng and Peng 2013; Cui et al. 2013) images. Although some studies show satisfactory results, they are either based on experiments using few species (Bremananth et al. 2009; Zhao 2013; Zhao et al. 2014a, 2014b; Mallik et al. 2011) or involve morphological wood features (Khairuddin et al. 2011; Khalid et al. 2011; Yusof et al. 2013a, 2013b; Gurau et al. 2013; Guang-Sheng and Peng2013). The latter, however, depend on segmentation (Guang-Sheng and Peng 2013), thus potentially resulting in variable results. Texture analysis has been introduced as a promising technique for wood species identification, avoiding the issue of the aforementioned segmentation bias. Texture attributes are able to describe the spatial organization of gray levels, as well as the variation of patterns in an area on the surface of the studied object (Ebert 1994). Filho et al. (2014) and Wang et al. (2013a, 2013b), have used texture attributes derived from macroscopic images to identify wood species. Few studies have reported on the use of texture analysis with microscopic transverse cross-section micro-imagery for wood species identification. For instance, texture features have been used to identify wood species from the Brazilian flora based on microscopic transverse cross-sections using local phase quantization, local binary patterns and gray-level co-occurrence matrix (Martins et al. 2012, 2013; Cavalin et al. 2013).

Remarkably, the African continent, and especially the Congo Basin, is hardly represented in the above-mentioned studies regarding texture-based classification. The Congo Basin is, however, the second largest tropical forest region after the Amazon. Although the Democratic Republic of Congo (DRC) is the only African country within the world’s top ten of most forested countries, long-term data are lacking, as well as reliable data on volumes of wood species for export (Verbeeck et al. 2011). Therefore, this research focuses on commercial timber species of the DRC, using the wood collection of the Belgian Royal Museum for Central Africa (RMCA), housing the world’s largest wood collection on Congolese woody species (Beeckman 2007). The discriminative potential of texture features, extracted from microscopic transverse cross-sections, is tested. Lower magnifications of such sections show patterns of vessels, fibres and parenchyma that are often characteristic of a species, or more generally, of a taxon (Ruffinatto et al. 2015).

Two texture feature descriptors are applied in this study: local phase quantization (Ojansivu et al. 2008; Ojansivu and Heikkilä 2008) and local binary patterns (Ojala et al. 1996, 2001, 2002). Texture characteristics are extracted from the microscopic transverse cross-section images of 77 commercial or potentially commercial Congolese timber species. Local discriminant analysis and principal component analysis are employed to reduce the dimensionality of the feature vectors. Further, an analysis considering the genus and family phylogenetic levels is carried out to arrive at a better understanding of the misclassifications. The following research questions motivate this paper. Can pattern recognition be used to identify the wood species in this study? Which texture method has the highest success rate of classification? Can wood anatomical features explain misclassifications?

2 Materials and methods

Figure 1 schematizes the wood classification procedure with three main phases: data acquisition, feature extraction and classification, all described in detail below.

Fig. 1
figure 1

General scheme of the wood species classification procedure based on transverse cross-section micro-imagery

2.1 Data acquisition

A total of 77 Congolese timber species were selected for this study, belonging to 58 genera and 25 families (see Table 2). Based on this list, the transverse sections of all available anatomical slides of the Tervuren Xylarium (RMCA) were inspected under the light microscope. Only sections free of cracks and artifacts (e.g. enclosed air bubbles), uniform in texture, sufficiently thin and free of staining agents were selected. Additional transverse sections were prepared from xylarium specimens when anatomical sections were of poor quality or the number of sections was insufficient. Cross-sections were cut with a sliding microtome, dehydrated in a graded ethanol series (50, 75, 96 and 100%) and fixed with Euparal. The entire database is available at doi:10.5281/zenodo.235668.

In order to account for inter- (Fig. 2) and intra-slide (Fig. 3) variability in texture, each section was imaged at different locations without overlap and each species was represented by a minimum of three different transverse sections. This way, at least five images per species were available for classification. A light microscope (Olympus BX60) in connection with a digital camera (Olympus UC30) and the image analysis software package CellB (version 3.2, Olympus) were used for image acquisition. A standard magnification of 25 × was chosen for all pictures. A total of 1221 images was collected for this study.

Fig. 2
figure 2

Transverse cross-sections of a selection of 24 wood species used in the experiments. The name in each panel is the name of the species

Fig. 3
figure 3

Examples of transverse cross-sections of wood species. Rows show samples of the same species to highlight the intra-species variability, from top to bottom: Leplaea cedrata, Millettia laurentii, Morus mesozygia, Pterocarpus soyauxii, Terminalia superba and Zanthoxylum gilletii

2.2 Feature extraction

Texture provides an important source of information that is useful for identifying spatial patterns in an image. The texture feature extraction process consists of modelling the textural patterns in such a way that they can be represented by a set of features, further referred to as a feature vector. Before feature extraction, all images were converted to 8-bit gray scale and their histogram was stretched for contrast improvement. Two feature descriptors were used for extraction of textural features, namely local phase quantization and local binary pattern.

Local phase quantization (LPQ) (Ojansivu et al. 2008; Ojansivu and Heikkilä 2008) is a texture descriptor that is rotation invariant and insensitive to centrally symmetric blur. It computes the quantized phase of the Fourier transform over local windows at every image position. The local phase information is extracted by a finite two-dimensional discrete short-term Fourier transform (STFT).

Local binary pattern, proposed by Ojala et al. (1996) and later improved in Ojala et al. (2001, 2002), is a gray-scale and rotation-invariant texture descriptor based on the analysis of properties of local image texture and their occurrence. The first version of local binary pattern (LBP) involved the aggregation of the pixel values in a pixel’s neighborhood into a binary number, which is then converted to decimal and is subsequently used as the new value of the focal pixel. This procedure results in an LBP map, from which one may obtain a histogram of local binary patterns to represent the image as a feature vector (Fig. 4). Later, the method was made rotation invariant and made applicable to gray-scale images. The former variant will be referred to in the remainder as LBP1 P,R , while the latter will be referred to as LBP2 P,R , where P is a circularly symmetric neighbor set equally spaced on a circle of radius R.

Fig. 4
figure 4

a An original image of Afzelia bella in gray scale, b the LBP map generated from a and c the histogram of LBP patterns in b

2.3 Classification

Finally, a supervised classification algorithm was used to predict membership of a new sample to one of 77 species, 58 genera or 25 families. For that purpose, each transverse cross-section image was characterized by a feature vector, obtained by either the LPQ or LBP descriptor. The feature vectors had a different length depending on the feature descriptor used. More precisely, LPQ generated 256 features, while the number of features obtained by LBP depends on the parameters of the method, but was sometimes as high as 4116. Consequently, the number of features had to be reduced while maintaining their discriminative potential. To that end, linear discriminant analysis (LDA) (Fisher 1936) and principal component analysis (PCA) (Jolliffe 2002) were applied. LDA maximizes the ratio of between-class variance and within-class variance while maximizing the preservation of class discriminative information. The approach proposed by Lu et al. (2005) was implemented, as such handling the problem of a small number of samples for training compared to the dimensionality of the sample space by using the regularized Fisher separability criterion. PCA calculates a linear combination of the original set of features, in such a way that all the principal components are orthogonal to each other, thereby avoiding redundant information (Jolliffe 2002). The components are ordered according to their variance, starting with the component with the highest variance.

It is important to highlight that both LDA and PCA were used to reduce the dimensionality of the feature space of the training set only. The coefficients obtained from the training set were used to calculate the components of the test set. The success rate is based on the classification of all samples. The training set was constructed following a stratified fivefold cross-validation scheme (Hastie et al. 2001). More specifically, the set of samples was partitioned into five equally large subsets, with at least one sample of each species in each subset. Four subsets of samples were used for training, while the remaining subset of samples was used for testing. The latter was alternated five times, thus enabling the classification of all test samples. The final success rates are based on all classifications. This entire procedure was repeated ten times to obtain an average success rate and corresponding standard deviation (Fig. 1).

Ultimately, the classification was performed by a simple 1-nearest neighbor classifier (Aha et al. 1991) using the Euclidean distance. The classification was performed at three phylogenetic levels simultaneously, in the sense that a triplet of labels (species, genus, family) was associated with each instance. When a sample was labeled, the label triplet of its nearest neighbor was taken into account and the classification is considered correct at the species, genus or family level if the species, genus or family label is correct, respectively. Evidently, if the classification is correct at the species level, it is also correct at the genus level; similarly, if a classification is correct at the genus level, it is also correct at the family level.

3 Results

Table 1 lists the average success rates and corresponding standard deviations over ten runs, for LPQ and LBP, with the best configuration of parameters given by LBP 12,1; the parameters P and R were analyzed in a range of 4 to 16 and 1 to 2, respectively. Samples were classified according to the triplet of labels (species, genus, family) and the corresponding success rates were computed. LPQ in combination with LDA resulted in an average success rate of approximately 88% at species level, 89% at genus level and 90% at family level. LBP in combination with LDA achieved an average success rate of 85%, LBP1 resulted in 80% and LBP2 in 72% at species level, followed by an increase of 2% at genus level and at least 7% at family level.

Table 1 Classification of transverse cross-section wood images at species, genus and family level. The best results obtained are reported in italics

Table 2 shows the average success rates for the triplet of labels (species, genus, family). Table 2 also shows the close relationship between species and Tables 3 and 4 show the analysis of the misclassifications. If, for instance, a species was classified as a species of the same genus, it was not considered an error at the genus level, with a similar assumption at the family level.

Table 2 Average success rates of identifying 1221 samples based on transverse cross-section micro-imagery according to a triplet of labels (species, genus, family), being 77 species, 58 genera and 25 families
Table 3 Increase of the average success rate (SR) when identifying the species and the misclassification is still in the correct genus or family
Table 4 Increase of the average success rate (SR) when identifying the genus and the misclassification is still in the correct family

Six species were never misclassified: Nesogordonia kabingaensis, Pycnanthus angolensis, Triplochiton scleroxylon, Tieghemella heckelii, Guibourtia demeusei and Drypetes gossweileri. Based on the confusion matrix (see Fig. 5), it can be concluded that less than half of the species under study were misclassified twice or more, where 95% of those species were at least once classified in a different family. The families Fabaceae-Caesalpinioideae, Fabaceae-Papilionideae and Fabaceae-Mimosoideae have a large share in the latter percentage. When considering them as one single family, it drops to 83%. Ten species had at least five misclassifications: Albizia adianthifolia, Albizia ferruginea, Cynometra alexandri, Entandrophragma angolense, Hallea stipulosa, Khaya anthotheca, Leplaea laurentii, Milicia excelsa, Prioria oxyphylla and Pterocarpus tinctorius. These misclassifications can be divided into three types depending on the level of classification: genus or family level (C. alexandri, P. oxyphylla, P. tinctorius), genus/family level mixed with other families (A. adianthifolia, A. ferruginea, E. angolense, K. anthotheca, L. laurentii) and misclassified in another family (H. stipulosa, M. excelsa). Confusion matrices for genera and families can be seen in Figs. 6 and 7, respectively.

Fig. 5
figure 5

Confusion matrix considering the species label

Fig. 6
figure 6

Confusion matrix considering the genus label

Fig. 7
figure 7

Confusion matrix considering the family label

4 Discussion

4.1 Analysis at species level

Although no common criteria exist to compare wood anatomical identifications with the results obtained by relying on pattern recognition, the high average success rates reported in this study clearly illustrate the potential of such an automated method. It is of interest to elaborate on those wood species with the highest and lowest success rates and compare them with classical wood anatomical descriptions of the InsideWood database (Wheeler 2011).

Five species have a success rate of 75% or less: Nauclea diderrichii (68%), L. laurentii (73%), Ongokea gore (74%), K. anthotheca and Ficus mucuso (both 75%). N. diderrichii, L. laurentii and O. gore are characterized by distinctive anatomical features that manifest themselves at a higher magnification especially on the radial and tangential section, which were not used here. K. anthotheca and F. mucuso have the same success rate. This can be explained by the fact that both species have a typical signature characterized by rather large vessels, a relatively low vessel density and large rays.

On the other hand, six species had a maximum success rate of 100%: N. kabingaensis, P. angolensis, T. scleroxylon, T. heckelii, G. demeusei and D. gossweileri. Half of them have very particular wood anatomical features that can be observed on the transverse section. N. kabingaensis has a specific vessel arrangement, T. scleroxylon combines large vessels with low vessel densities and T. heckelii also has a particular vessel arrangement and reticulate axial parenchyma. Those species are therefore easy to distinguish using the transverse sections only, irrespective of the approach. The anatomy of D. gossweileri is less distinctive, but it has smaller vessels in combination with very thick-walled fibres, which might explain the high average success rates for this species. P. angolensis has some very particular wood anatomical features but those are not present in the transverse section. Most probably, the absence of axial parenchyma results in a 100% success rate. G. demeusei cannot be identified by anatomical identification on wood transverse sections. However, in our set of images, its vessel frequency is lower compared to that of the close allied taxon Guibourtia arnoldiana, which is probably the explanation for the high success rate using automated classification. Differences with other species of the genus are subtle and various axial parenchyma types exist, making it hard to identify this species for wood anatomists, which contrasts the relatively high success rates obtained for this species using our approach.

4.2 Misclassifications compared to wood anatomical features on transverse sections

More than half of the 77 commercial species were never misclassified or misclassified once, which clearly demonstrates the potential of pattern recognition for the classification of the species covered in this study. Comparison with the wood descriptions of InsideWood (Wheeler 2011) gives an indication why some species were misclassified more frequently than others.

Species that were misclassified at least five times within the same genus or family were C. alexandri, P. oxyphylla and P. tinctorius. C. alexandri was mostly misclassified as Cynometra hankei, a species with a very similar wood anatomy on the transverse plane. Within the Prioria genus, traumatic canals are encountered. For P. oxyphylla, they have a similar size as the vessels and for Prioria balsamifera, the traumatic canals are smaller. One classification resulted in Scorodophleus zenkeri, a species with a similar vessel structure, similar rather thick-walled fibres and an equally large variety of parenchyma types. However, rays, parenchyma and/or vessel elements are not storied, as opposed to Cynometra species. P. oxyphylla was three times falsely identified as P. balsamifera, which has a very similar transverse wood anatomy. Wood anatomically, the two species can be distinguished on the basis of the size of intercellular canals. Apparently, this phenomenon is not recognized automatically at the chosen magnification. The other three misclassifications of P. oxyphylla resulted in P. tinctorius. This is difficult to explain from a wood anatomical point of view due to the presence of clearly distinctive features on the transverse section. P. tinctorius has more vessels, fibres with thicker cell walls, banded parenchyma and storied structures of rays and axial parenchyma/vessel elements as compared to P. oxyphylla. Finally, P. tinctorius was misclassified once as A. ferruginea (no banded parenchyma or storied structures), Brachystegia laurentii (no 3-cells-wide banded parenchyma), Copaifera mildbraedii (smaller vessels, no storied structure), P. oxyphylla (see higher) and S. zenkeri (smaller vessels, larger rays and no storied structure). Thus, within the genus, the misclassifications can be understood by the similar transverse wood anatomy, but misclassifications at family level are less straightforward to explain from an anatomical point of view. Species belonging to different botanical families may superficially appear similar because there has been considerable convergent and parallel evolution in wood structure (Wheeler and Baas 1998).

H. stipulosa was misclassified into four different families as Leplaea cedrata, L. laurentii(both Meliaceae), O. gore (Olacaceae),Petersianthus macrocarpus (Lecythidaceae) and Zanthoxylum gilletii (Rutaceae). While P. macrocarpus and Z. gilletii still have similar vessel features, thick-walled fibres and partially equal parenchyma types, the other species/families are normally easy to distinguish based on their wood anatomy on the transverse sections, but this does not seem to hold when using pattern recognition.

4.3 Afzelia and Entandrophragma: two difficult genera for wood anatomy

The results obtained for the genus Afzelia—a very popular genus on the international wood markets (Donkpegan et al. 2014)—are promising. This might not be so surprising if we compare the transverse wood anatomical features: Afzelia is a fairly uniform genus with some intra-species variability in vessel size and density but no distinctive features that stand out for a single species. From a wood anatomical point, it is not clear why A. bipindensis has significant higher success rates than the other species of the genus. But as shown by chemometric fingerprinting (Espinoza, pers. comm.) and pattern recognition, A. bipindensis and A. africana are easy to separate while A. pachyloba and A. cuanzensis are hard to distinguish. Genetic analyses, however, showed two groups with another clustering: the savanna species (A. cuanzensis, A. africana) and the forest species (A. bella, A. bipindensis) (Donkpegan et al. 2017). Donkpegan et al. (2014) also stress the importance of careful sampling due to pronounced morphological similarities within the genus; thus, validated herbarium vouchers are essential. In our study, images of Afzelia originated from 14 wood samples.

Comparing the transverse wood anatomical features within the Entandrophragma genus, E. candollei is quite different from the other species in this genus (smaller, fewer vessels and larger parenchyma bands), which explains the corresponding high average success rates. E. utile and E. angolense are very similar at the used magnification, so harder to distinguish, while E. cylindricum is in between the others. The wood anatomy of this genus is more diverse than that of Afzelia, which is supported by the fact that misclassifications are mainly situated at the family level and not at the genus level. For both genera, traded in large quantities and common objects of study for wood anatomists (Giraud 1980), the link between pattern recognition and wood anatomical features on the transverse section is quite strong.

4.4 Classification at family level

It is clear that our results are influenced by the dataset quality. The average success rates depend on the (dis-)similarities within the group of species tested. Therefore, grouping species into families for success rates could compensate for (dis-)similarities at the species level. The family of the Euphorbiaceae, however, is best ranked although it consists of a, anatomically speaking, very heterogeneous family (Wheeler and Baas 1998). D. gossweileri is the only member of this family within our study and has very distinctive wood anatomical features, resulting in a high success rate (up to the family level).

5 Conclusions

This work demonstrates the potential of a semi-automated classification of tropical wood species by resorting to pattern recognition. Not only do wood anatomical collections and wood anatomists face huge challenges for the future, training wood anatomical experts is time-consuming and expensive, making semi-automated systems like ours valuable tools complementing classical wood identification. Generally speaking, the average success rates are high. Nonetheless, there are certain requirements on the acquisition of images. All images must have the same resolution, and damaged regions or regions showing growth irregularities that cannot be determined due to abnormal or non-standard features of the wood sample should be avoided. The prerequisites of our method thus necessitate wood scientists to take slightly different images than the ones commonly used in wood identification and analysis. The slide collections of numerous xylaria, however, offer large possibilities for testing this approach, even though not all the information of the wood sample, such as age and growth rate, is available.