Statistical methods for the analysis of thermal images obtained from corn seeds

During the last decades, digital image processing algorithms have been developed to measure external characteristics of agricultural products due to the great potential that these methods offer. So, in this research, the thermal images obtained from a thermographic camera were analysed considering two genotypes of maize seeds: crystalline and floury in their natural state, previously irradiated with a laser light source of 650 nm for exposure times of 15 s and 35 s. The methods applied in the analysis were: a) histogram to obtain the distribution of gray levels of images, b) mean value that indicates the brightness of images, c) variance which means the contrast of images, d) entropy applying both Shannon and Tsallis definitions, which provide the average self-information of images, e) estimation of the probability density of temperature variations on seeds to quantitatively characterize them from thermal images. Higher mean and variance were obtained from crystalline seeds indicating higher brightness and contrast. Furthermore, thermal images of floury seeds had higher entropy of Shannon indicating that images had greater disorder with respect to images of crystalline seeds. In the case of the entropy of Tsallis, the entropic index q could be used for characterization of seeds. Thermal images obtained from seeds with a floury structure provided a higher redundancy value for a shorter exposure time to laser light. Thus, the viability of the statistical methods of digital image processing applied to thermal imaging for the characterization of seeds is shown.


Introduction
Image processing has found an important role in many applications including scientific, industrial, medical, agriculture, and so forth [1]. Digital image processing (DIP) is a subarea of digital signal processing (DSP) [2], which can be defined as the operation of images through computer in order to interpret some noticeable characteristics [3]. DIP has several advantages: for example, it allows a greater diversity of algorithms to be applied, reducing thus noise accumulation or distortion of the image [2]. Mathematical models are often used to describe images or other signals [4] so that images can be classified as either continuous, discrete or digital [5]. An example of a continuous image might be the one captured by a TV camera, which can be modeled as a continuous function of two variables f(x,y), where (x,y) are coordinates in a plane as shown in Fig. 1. Another important aspect is that images can be treated as deterministic and statistical. In deterministic image representation, a mathematical image function is defined and point properties of the image are considered. For a statistical image representation, the image is specified by average properties [1].
Digital images are represented by a discrete data structure, such as a matrix. To generate a digital image from a continuous image f(x,y) captured by a sensor, it must be sampled. Perfect image sampling would be obtained by multiplying the image represented as a continuous function, by a sampling function composed of extensive Dirac delta functions arranged in a grid of spacing (Δx, Δy) given by Eq. 1 [1]: Thus, the sampled image f s (x,y) can be represented by: (2) f s (x, y) = f (x, y)s(x, y) After the sampling, a quantization process is carried out, which assigns a discrete value (i.e., an integer) to each samples [4]. For a grayscale image, the integer values range from 0 (black) to 255 (white). So, a digital image can also be defined as a two-dimensional function that quantifies the intensity of light [3]. The most common model of a digital image is through a matrix of M rows and N columns: where each of its elements I(m,n) represents a pixel of the digital image as in Fig. 2.
The aforementioned aspects have allowed the development of digital image processing (DIP) algorithms to objectively measure external characteristics of agricultural products, since their appearance considerably conditions the acceptance of the product. Typically, the physical characteristics are evaluated considering the size, shape, color, freshness and finally without visual defects [6]. These   172 173 174  171 172 171  173 173 175   172 173 174  171 172 171  173 173 172  173 174 174  172 172 171  171 173 175   173 172 171  171 172 171  173 173 173 algorithms are used by various materials characterization techniques in many knowledge areas of research and industry such as infrared (IR) and charge-coupled device (CCD) cameras, IR, Raman spectroscopy, Fourier transformnear-infrared (FT-NIR), photoacoustic spectroscopy (PAS), among others [8][9][10] [7][8][9] and also with microscopies such as scanning electron microscopy (SEM), photoacoustic microscopy (PAM), image mapping spectrometer (IMS), etc. [10][11][12]. From the obtained data by using the above techniques, it is possible to extract useful information regarding the analyzed samples. In [13], an algorithm is proposed to analyze and classify both quickly and accurately, the quality of rice using DIP techniques with an automated software system. Other techniques used are computer vision, to classify wheat seeds according to their varieties [14]. The images were examined to assess the quality of the seeds based on texture characteristics, with an average classification precision of 98.15%. A system for segmentation and classification of images based on particle swarm algorithm is proposed in [15] to detect sunflower leaf disease, obtaining an accuracy of classification of 98% compared to other methods.
Additionally, the advances in statistical mechanics based on the concept of nonextensive entropy have intensified the interest to investigate a possible extension of the entropy of Shannon [16], such as the entropy of Tsallis [17]. This interest is mainly due to the similarities between the entropy functions of Shannon and Boltzmann/Gibbs. The entropy of Tsallis is a definition that generalizes the entropy of Boltzmann-Gibbs to nonextensive physical systems. In this theory, a parameter q is introduced as a real number, which is associated with nonextensivity of the system and is system-dependent [18]. Authors in references [18,19] proposed a method for the segmentation of images based on the one-dimensional maximum entropy of Tsallis (1-DMTE) [20]. Since then, the entropy of Tsallis has become a popular tool for image segmentation [21]. A methodology to calculate the q parameter based on the maximization of the redundancy of the image and maximum entropy of the information theory is proposed in [22].
Considering the diversity and potentiality of digital image processing techniques, in the present research, collections of thermal images in grayscale that are treated as a realization of random processes [1] are analyzed using the following statistical methods: histogram to obtain the gray levels distribution of images; the mean which is a measure of the image brightness; the variance that represents a measure of the image contrast, the entropy of both Shannon and Tsallis that indicate the average amount of self-information that contains the images and the probability density of the temperature variations for two genotypes of corn seeds: crystalline and floury in their natural state. The seeds were irradiated with laser light using a thermographic camera following the experiment described in [23], considering two exposure times: 15 and 35 s, with the aim of characterizing them quantitatively from the thermal images.

Biological materials
In this research, two genotypes of corn seeds are considered: crystalline and floury in their natural color. With large populations of seeds, accurate results can be obtained from samples that represents only a fraction of the whole population. In that sense, a simple random sampling was used, (i.e., a method of selecting n units out of N such that every one of the distinct samples has equal chance of being drawn). If the sample size n is less than 50 for the estimates, the confidence probability denoted as 1 − α may be taken from Student's t α/2 table with n − 1 degrees of freedom, considering also finite population correction (fpc) factor [24]. Thus, for both genotypes of corn seeds, n = 10 seeds of each were randomly selected. A homogenization process was performed on the seeds, measuring the length and width using a vernier caliper gauge. After that, the seeds were numbered from 1 to 10 as shown in Table 1. Figure 3 shows the optical images of the samples for both genotypes of corn seeds, and their respective numbering.

Experimental setup
The thermographic instrumentation used to obtain the thermal images was performed by using an IR camera (i5 model; 6.8 mm lens; accuracy of ± 2%; thermal , and as excitation source was used a diode laser at 650 nm wavelength, 27 mW power, (Tyson Technology Co.) instrumentation is controlled by a digital timer. The samples were placed in a silicone net lying in an empty space on a plastic container. The distance between the laser and the corn samples was 0.228 m. The seeds were fixed on the side opposite the embryo and were suspended in the air. The exposure times considered were 15 and 35 s for both genotypes of corn seeds. During the experiment, the temperature and humidity conditions were recorded using a data acquisition system, which was implemented through the sensor DHT11 for Arduino free hardware platform. The DHT11 sensor can measure a temperature range from 0 to 50 °C with a resolution of 0.1 °C and a relative humidity (RH) range from 20 to 90% with a resolution of 1% and a response time of 1 s. In Fig. 4, the experimental setup is shown.

Histogram of images
To obtain the histogram, the thermal images in grayscale of the 8-bit unsigned integer type (uint8) were used, so that the data that comprise them are in the range of [0,255] [5]. The histogram is a discrete function that counts the number of occurrences that each level of gray presents in an image. It is represented as a graph, where the axis of the abscissa is the gray level and the axis of the ordinates is the frequency of each gray level in the image. If the histogram is divided by the number of pixels that conform the image [MxN], the probability density function of each level of the image will be obtained [25]: where h(i) is the number of occurrences of the ith gray level in the image, M is the number of rows of the image, N is the number of columns, p(i) denotes the probability that the ith gray level of the image occurs.

Mean of images
The brightness of the image is the average value of the image that matches the mean value of the histogram [25]: where f(x,y) adopts the gray level of the pixel located in the coordinates (x,y), I are the number of gray levels that have been used in image quantization.

Variance of images
The variance of the histogram is also linked to the contrast of the image [25]: The contrast shows the dispersion of gray levels in the image.

Entropy of Shannon and Tsallis for images
The entropy of Shannon is a concept developed in the information theory [16], which represents the average self-information that each pixel of the image carries and is defined as: where b is the base of the logarithm. If b = 2, the units are bits/pixel; if natural logarithm is used, the units are nats/pixel. The entropy of Shannon is maximum, when all pixels of the image have the same probability.
The entropy of Tsallis allows describing extensive and nonextensive systems and is [20]: where Ω represents the total number of accessible microstates of the system (i.e., total number of grayscale levels for images), each with probability p i where 1 ≤ i ≤ Ω) and all probabilities fulfill the condition: The number q ε R is the entropic index, which characterizes the nonextensivity of a particular system [17]. A system is said to be extensive if it satisfies Eq. (10): and it is not extensive if it does not satisfy it. In Eq. (10), the term H BG represents the Boltzmann-Gibbs entropy given by and k = 1.3806504 × 10 -23 J·K −1 is the Boltzmann's constant.
Like the entropy of Shannon, the Tsallis one is maximum when all microstates are equally probable which is given by To calculate the entropic index q, the methodology described in [22] is applied, where the image in grayscale is considered as a nonextensive system in order to maximize the q-redundancy of the image, defined as From the point of view of information theory, the less redundancy, the greater the information of an image and vice versa. Redundancy is calculated as a function of the entropic index q, where q can be evaluated from − ∞ to ∞ since it is a real number. Figure 5 shows an example of the time series of temperature (T °C) and relative humidity (RH) conditions recorded during the experiment. It can be observed that practically, there were not variations of the environmental conditions, ensuring thus that all the seeds were evaluated under homogeneous temperature and humidity conditions. During the experiment, the average temperature and relative humidity of the environment were 22 °C and 38%, respectively. All values of the environmental conditions were taken every second.

Results and discussion
All thermal images were edited and exported to CSV format, using the software of the thermographic camera FLIR Tools, version 6.4.18039.1003. After that, all the exported images were processed for data analysis with the MATLAB package. Figure 6 shows a collection of thermal images obtained from both crystalline and floury seeds, considering exposure times of 15 s and 35 s, and their equivalent images in grayscale, with the purpose of making the analysis more efficient from the computational point of view.
To investigate the gray-level distribution of collections of images, the corresponding histograms were obtained   In all cases, it can be observed that the shape of the histogram tends to be preserved graphically. The changes are observed in the frequency at which the maximum values of the grayscale occur in each seed sample and are dependent of the genotype of seeds as well as of the exposure times to laser light. These variations in the distribution of frequencies (i.e., the histogram) are due to the fact that seeds have complex and nonhomogeneous structures, so their thermal properties are also nonhomogeneous.
In Table 2, the numerical results of the mean, variance and entropy for all the collection of thermal images in grayscale, obtained for both genotypes of corn seeds and considering two exposure times, are shown.
According to the numerical results, it can be seen that the mean value of the grayscale thermal images is reduced when the exposure time to the laser light of the  seed samples is increased. This is because the samples of the seeds increase their temperature by increasing the exposure time and this in turn reduces the brightness and increases the contrast of the image, which can be corroborated with the values of the variance that also increase with the exposure time. If the mean value of the thermal images in grayscale is compared, considering the same exposure time to the laser light of seed samples of different genotypes, it is observed that in seeds with a floury structure, a smaller mean value is obtained, indicating thus less brightness of images; however, the variance is reduced indicating a reduction in the contrast of thermal images. respectively. It can be observed that different samples provide different values of the estimates and therefore will produce different confidence intervals. When the floury seeds are exposed to the laser during 15 s, a relatively narrow interval is obtained in comparison with the other cases, indicating that the mean has been rather precisely estimated due a lower value of the variance. In order to make a comparison of the Shannon entropy and Tsallis entropy that is applied in the segmentation of images, the value of the entropic index q needs to be determined. In this research, the method described in [22] is applied, where grayscale images are considered as a nonextensive system to find the q value that maximizes the redundancy (Eq. 13) of images.
For the collection of ten grayscale images of crystalline and floury seeds with exposure times of 15 s and 35 s to the laser light, the values of the q index are determined when the maximum value of R is obtained. As it was aforementioned, q can be evaluated from − ∞ to ∞ since it is a real number, but in this case, the q index has been limited to the range of values from − 2 to 10 [22] in steps of 0.01. In Fig. 8 In Table 3, a comparison of the entropy values of both Tsallis and Shannon is shown. It can be seen that in general terms, the Tsallis entropy provides a greater amount of information of images with respect to the Shannon entropy. As it was aforementioned, the q value is determined when the maximum redundancy R T of images is obtained, since under these conditions, the entropy of Tsallis is minimal, indicating that images have less disorder allowing thus a better identification of patterns. In the case of crystalline seeds, it can be observed that as the exposure time increases from 15 to 35 s, the q index decreases from − 0.14 to − 0.34 which in turn increases the redundancy of images. Considering floury seeds, it can be observed that as the exposure time increases from 15 to 35 s, the entropic index q increases from − 0.280 to − 0.252 reducing the redundancy of images. In the case of seeds with a floury structure, a higher redundancy value is obtained for a shorter exposure time to laser light. It should be noted that the value of the entropic index q is characteristic of each collection of images so that this index could also be used for characterization of seeds.
Regarding the Shannon entropy of images, it is observed that the numerical values of entropy increase with the increase in the exposure time to the laser light, for seeds of the same genotype, which indicates that there is a greater disorder in the images, that is, more levels of grays participate in them. For the same exposure time, the numerical value of the Shannon entropy is greater for images of seeds with a floury structure, indicating that they present a greater disorder with respect to the images of crystalline seeds, because the molecular structure of the floury seeds is less organized [26].
In addition to the analysis of images obtained by the thermographic camera of crystalline and floury seeds at the grayscale level, it is proposed to analyze the collection of images through software MATLAB R2014a, considering the temperature values obtained by the FLIR Tools software of the camera for the same exposure times of 15 s and 35 s in order to observe the irregular variations of the temperature of seeds, because these variations cannot be quantified from RGB imaging directly, since the analysis at either grayscale or RGB imaging is done at pixels level and is not performed with temperature values.  In Figs. 10 and 11, the temperature variations in °C are shown in each sample of crystalline and floury seeds considering an exposure time of 15 s. From the graphical results, it can be seen that due to the inhomogeneous structure in the thermal properties of seeds, the temperature values in each seed tend to vary irregularly.
To perform a quantitative comparison of the variations in the temperature by increasing the exposure time in the samples of each genotype of corn seeds, the probability density functions are estimated using the Gaussian kernel functions [27] with an efficiency of 95.12% in density estimation [28], to obtain a continuous form of the density function for each collection of ten thermal images as shown in Figs. 12 and 13 as well as the statistical averages, such as the mean, variance and standard deviation.
In Table 4, the average values obtained for each case are shown, where it can be seen that for the case of crystalline seeds, when the exposure time is increased from 15 to 35 s, the mean temperature increases from 22.3507 to 22.5849 °C, respectively. For seeds with a floury structure, it can be seen that the mean temperature increases from 21.9306 to 22.3284 °C for the same increase in the exposure times, which indicates that for both cases, the mean temperature of seeds increases. When comparing the values of the mean temperature considering the same exposure time, for example at 15 s, it can be seen that for the case of crystalline seeds, there is a higher average value (22.3507 °C) with respect to floury seeds (21.9306 °C); this is because the pericarp of crystalline seeds has a more organized molecular structure and greater thermal diffusivity D = k/ρc (where k is the thermal conductivity, c is the specific heat and ρ is the density) and thermal conductivity according to the literature [26] compared to the pericarp of the floury seeds. Thermal diffusivity is a measure of how fast heat propagates through the material [29], which allows a higher temperature to be reached. Regarding the values of the variance, it can be observed that as the exposure time increases, the variance increases for both cases, being more noticeable in floury seeds, which indicates a greater variability in the values of the temperature, since the floury seeds have a more amorphous molecular structure with respect to crystalline seeds.

Conclusions
The thermal images of crystalline seeds presented greater mean and variance values, indicating higher brightness and contrast, respectively, for the same exposure times, with respect to thermal images obtained from floury seeds. The confidence intervals were estimated for a 0.95 reliability in all cases because these intervals are inversely related to the precision level. That is, highly reliable interval estimate may be imprecise due to the gain in reliability generates a loss in precision, so the 0.95 is a compromise between an acceptable confidence and precision levels in the estimation. When the floury seeds were exposed to the laser for 15 s, a relatively narrow interval was obtained in comparison with the other cases, indicating that the mean has been rather precisely estimated. However, thermal images obtained for floury seeds showed a higher entropy of Shannon, indicating that thermal images have a greater disorder with respect to thermal images of crystalline seeds, due to that the molecular structure is less organized. Considering thermal images of the same genotype of corn seeds, it is observed that when the exposure time increases, the average value is reduced; however, variance and entropy of Shannon increase, due to the increase in the temperature of the samples. In the case of the entropy of Tsallis, it is observed that it is dependent on the value of the entropic index q, which is determined when the maximum redundancy is obtained since in these conditions, the entropy of Tsallis is minimal, indicating a smaller disorder in the images, which in turn facilitates the identification of patterns in images. In the case of seeds with a floury structure, a higher redundancy value is obtained for a shorter exposure time. Regarding the analysis of thermal images considering the temperature values captured by the thermographic instrumentation, by applying the Gaussian kernels, an efficiency of 95.12% can be guaranteed in density estimation. From this analysis, it was found that increasing the exposure time increases the average value of the temperature, which is higher in the seeds with a crystalline structure due a higher thermal diffusivity in the pericarp, which allows the temperature in the seeds to increase. Thus, it was possible to show the viability of statistical methods of digital image processing applied to thermal images, for characterization of corn seeds.