Flesh of pumpkin from ecological farming as part of fruit suitable for non-destructive cultivar classification using computer vision

The aim of this study was to build the discriminative models for distinguishing the different cultivars of flesh of pumpkin ‘Bambino’, ‘Butternut’, ‘Uchiki Kuri’ and ‘Orange’ based on selected textures of the outer surface of images of cubes. The novelty of research involved the use of about 2000 different textures for one image. The highest total accuracy (98%) of discrimination of pumpkin ‘Bambino’, ‘Butternut’, ‘Uchiki Kuri’ and ‘Orange’ was determined for models built based on textures selected from the color space Lab and the IBk classifier and some of the individual cultivars were classified with the correctness of 100%. The total accuracy of up to 96% was observed for color space RGB and 97.5% for color space XYZ. In the case of color channels, the total accuracies reached 91% for channel b, 89.5% for channel X, 89% for channel Z.


Introduction
Pumpkin (squash) belongs to the genus Cucurbita and the family Cucurbitaceae. Especially economically important are Cucurbita pepo L., Cucurbita maxima Duchesne and Cucurbita moschata Duchesne ex Poiret which have high production and are cultivated worldwide [1]. America, Mexico and Peru are considered to be the primary center of origin of pumpkin [2]. The pumpkin is selective harvesting including recognition of fruit in the first stem, and then stem cutting and harvesting the fruit [3]. Pumpkin was characterized by high yields, high nutritive value and long storage life. It may be grown under a wide range of agro-climatic conditions. The ripe fruits have sweet yellow or orange flesh [4]. The fruits may be different in terms of color, size, weight and shape. The pumpkin fruits consist of edible flesh, skin and seed cavity. The pumpkin crops are seasonal and fresh fruits, which are sensitive to microbial spoilage, should be dried or frozen. The physicochemical characteristics of pumpkin and its products may differ, among others, depending on the variety and conditions in the region of cultivation [5]. Additionally, the physicochemical properties may change during the growth and ripening of pumpkin [6]. The pumpkin may be considered as a functional food [7]. Due to the presence of phytonutrients, important functional components, e.g., dietary fibers, carotenoids, minerals, zeaxanthin, vitamins, ascorbic acids, linoleic acid, consuming pumpkin is beneficial to human health. The pumpkin can be consumed in various forms, including processed products, e.g., jelly, jam, marmalade, sauce, puree, candy, chutney, cookies, pickles, pies, halwa, weaning mix, beverages and pumpkin flour may be added to prepare instant noodles, bakery products, pasta, soups [1,4]. Also seeds of pumpkin have high nutritional value and they may be used, for example, to produce cold pressed oils [7,8].
Pumpkin flesh (pulp) includes phytochemicals with health-promoting action. The content of many compounds depends on the cultivar and species of pumpkin. The significant differences can be determined in the content of carotenoids, vitamin C, B1, tocopherols, folates, flavonols, phenolic acids [9]. Therefore, the selection of the cultivars with desired properties for consumption and processing can be important. Determination of the chemical compounds of pumpkin can be expensive and time-consuming. According to Sim et al. [10] also testing the cultivars based on phenotypes may be time-consuming and laborious. The authors 1 3 [10] reported that the application of the molecular markers for pumpkin cultivar identification may be fast and useful.
In this study, the authors assumed that it is possible to distinguish the flesh of different pumpkin cultivars using an image analysis technique. It would provide an objective way to avoid mixing different cultivars of pumpkin indistinguishable by the naked eye. Additionally, the application of image processing may partially replace more expensive or more laborious methods of pumpkin cultivar identification, e.g., phenotypic or molecular. The aim of this study was to build the classification models based on texture parameters computed from digital images of the pumpkin flesh.

Materials
The research material comprised of the pumpkin of four cultivars 'Bambino', 'Butternut', 'Uchiki Kuri' and 'Orange'. The cultivars 'Bambino', 'Uchiki Kuri' and 'Orange' belonged to pumpkin species Cucurbita maxima Duchesne. Whereas cultivar 'Butternut' belonged to species Cucurbita moschata Duchesne ex Poiret. The samples were collected from ecological farming from Świętokrzyskie Voivodeship (Holy Cross Province), Poland. The soil in classes II and III and fertilizers of potassium sulfate and Bioilsa were used. In the case of pumpkin 'Bambino', the triticale was applied as forecrop, straw was left, the mulching was used. For 'Butternut', 'Uchiki Kuri' and 'Orange', the cabbage was forecrop, and cattle manure was used. The pumpkin was harvest after reaching consumer maturity. After harvesting the fruits were stored under room conditions (temperature about 18-20 °C) for two weeks. Then, the pumpkin flesh was cut into cubes (1 cm × 1 cm) using a knife. The pumpkin cubes were put into plastic boxes and were frozen at a temperature of − 29 °C. The samples were subjected to frozen storage. Afterwards the pumpkin cubes were thawed under room conditions. Excess water was removed from cubes using a paper towel and the pumpkin cubes were subjected to experiments using image analysis.

Image analysis
The flesh of pumpkin fruit was subjected to image processing. Image acquisition of cubes of the flesh placed on the black background was performed using a flatbed scanner. One image included ten cubes of pumpkin belonging to one cultivar. For each cultivar, five images were obtained. A total of 50 cube images for each cultivar were acquired. Therefore, the image analysis was performed for two hundred cube images (50 cube images for each of the four cultivars). The images of pumpkin flesh were subjected to processing using the Mazda software (Łódź University of Technology, Institute of Electronics, Poland) [11]. The obtained images were converted to individual color channels L, a, b, R, G, B, X, Y, Z. For each cube image in each color channel, the region of interest (ROI) as the whole scanned surface, was determined. For each ROI about 200 textures of the outer surface of the image based on the histogram, run-length matrix, cooccurrence matrix, gradient map and autoregressive model were calculated [11].

Statistical analysis
The discriminant analysis was performed using the WEKA 3.9 application (Machine Learning Group, University of Waikato) [12] to develop the models for distinguishing the cultivar of pumpkin based on selected textures of the outer surface of images of cubes of flesh. The Best First with the CFS (Correlation-based Feature Selection algorithm) subset evaluator was used for the selection of textures with the highest discriminative power. The texture selection was performed separately for the sets including textures of images of pumpkin flesh of four cultivars 'Bambino', 'Butternut', 'Uchiki Kuri' and 'Orange' from color space Lab and color channels L, a, b, color space RGB and color channels R, G, B, color space XYZ and color channels X, Y, Z. The discriminative models were built separately based on the sets selected textures from color space Lab and channels L, a, b, color space RGB and channels R, G, B, color space XYZ and channels X, Y, Z. The analyzes were carried out using the of Multilayer Perceptron (group of Functions), IBk (Lazy), PART (Rules) and Random Forest (Decision Trees) classifiers and ten-fold cross-validation mode [13]. These procedures ensured the highest correctness of discrimination.

Results and discussion
The differences in texture parameters of different cultivars of pumpkin flesh were observed. The mean values of selected textures from different color channels are presented in Fig. 1. The complete differentiation of pumpkin cultivars in terms of some textures is noticeable. It resulted in high discrimination accuracies. The accuracies (%) of cultivar discrimination of pumpkin of four cultivars 'Bambino', 'Butternut', 'Uchiki Kuri' and 'Orange' based on textures selected from color space Lab and channels L, a, b are presented in Table 1. The results were very high. The discriminative models built based on textures selected from the color space Lab provided a total accuracy of up to 98% for the IBk classifier. For the individual cultivars, the accuracies reached 100% for 'Orange', 98% for 'Butternut', 'Uchiki Kuri' and 96% for 'Bambino'. The correctness of 100% for pumpkin 'Orange' was also observed in the case of Multilayer Perceptron (total accuracy of 97%) and Random Forest (total accuracy of 97.5%). It indicated that this cultivar was the most different from the others. In the case of the models built for textures selected from images from individual color channels, the highest total accuracy was determined for color channel b and was equal to 91% for the Random Forest classifier. The pumpkin 'Orange' was characterized by 100% accuracy. It means that all images of flesh of pumpkin 'Orange' were correctly classified as 'Orange'. The same situation (100% accuracy for 'Orange') was also found for the Multilayer Perceptron (total accuracy of 90.5%) and IBk (total accuracy of 89.5%) classifiers. The flesh pumpkin belonging to cultivar 'Orange' differed most from the others in terms of selected textures from the color channel b, which was reflected in the high correctness of discrimination of this cultivar for models built for color space Lab. The total accuracies of discrimination of pumpkin 'Bambino', 'Butternut', 'Uchiki Kuri' and 'Orange' for color channel L ranged from 84.5% for the IBk classifier to 88% for the Random Forest classifier, and for color channel a were in the range of 79% for the PART classifier to 85% for the Random Forest. None of the cultivars obtained 100% correctness of discrimination. However, the highest accuracies of up to 98% (PART and Random Forest for color channel L and IBk for color channel a) were determined for 'Butternut'.
The results of discriminative models built based on textures selected from color space RGB and color channels R, G, B are shown in Table 2. The discrimination accuracies for color space RGB were very high. The flesh of pumpkin 'Bambino', 'Butternut', 'Uchiki Kuri' and 'Orange' was distinguished with the total accuracy reaching 96% for the IBk and Random Forest classifiers. In the case of individual cultivars, the correctness of 100% was for 'Orange' for the Multilayer Perceptron, IBk and Random Forest classifiers, as well as for 'Uchiki Kuri' for the Random Forest. The accuracies for models built based on textures selected from color channels R, G, B were lower than color space RGB. The lowest results ranged from 63.5% (PART) to 74.5% (Random Forest) were obtained for color channels R. The color channel R turned out to be unsuitable for cultivar discrimination  of pumpkin flesh. For other color channels, the accuracies were higher and were equal to 80% (IBk)-87.5% (Random Forest) for color channel G and 80.5% (PART)-85.5% (Random Forest) for color channel B.
The accuracies of discrimination of pumpkin flesh based on textures selected from color space XYZ and channels X, Y, Z are presented in Table 3. The discriminative models built for textures selected from color space XYZ produced the highest total accuracy equal up to 97.5% for the Multilayer Perceptron classifier. For the individual cultivar, the result of 100% was observed only for pumpkin 'Uchiki Kuri' for the Multilayer Perceptron. The discrimination accuracies for color channel X were from 85% (IBk) to 89.5% (Random Forest). In the case of color channel Y, the correctness ranged from 80.5% (PART) to 86% (Random Forest). For color channel Z, the accuracies were in the range of 80.5 (PART) to 89% (Random Forest).
The results revealed the usefulness of digital image analysis based on texture parameters for the cultivar classification of the pumpkin flesh. In the available literature, there are data about the application of image processing in various studies of pumpkin. For example, Wittstruck et al. [14] used UAV (Unmanned Aerial Vehicle)-Based RGB imagery for the detection of single pumpkin fruit in the field, counting fruits and the prediction of fruit size and weight. Oblitas-Cruz et al. [15] and Oblitas et al. [16] applied image processing combined with neural networks for the discrimination of microstructural elements of the pumpkin tissue. Image analysis was also used for the research related to the drying of pumpkin. Zenoozian et al. [17,18] applied a computer vision system for the prediction of the surface color, percentage of shrinkage and Heywood shape factor of pumpkin cubes subjected to osmotic dehydration and hot-air drying. The analysis of X-ray pumpkin seed images was used for the evaluation of the internal morphology of seeds [19]. The above-mentioned examples for pumpkin indicated several different applications of image processing. The usefulness of imaging techniques was also confirmed in our research for a new direction for research aimed at pumpkin cultivar discrimination.

Conclusion
The applied digital image processing proved to be effective for the classification of the pumpkin flesh with high probability. The developed cultivar discriminative models built based on texture parameters of pumpkin flesh images provided very satisfactory accuracy. The distinguishing of the pumpkin cultivars was the most correct in the case of model built for the textures selected from the color space Lab and the total accuracy was equal up to 98%, and for individual cultivars even reached 100%. In the case of models built for color channels, the results were the highest for color channel b and reached 91% for the discrimination of flesh of all four cultivars. The developed procedures may be used in the