Introduction

Over the last few decades, agriculture has progressively evolved from an experience-based and subjective process, to the present state, which is characterised by an increasing use of sensors and data-driven decision-making. According to the analysis performed by the Food and Agricultural Organization (FAO) in 2017, the world population will reach 9.6 billion by 2050 (Trilles et al., 2020). To feed this growing population, the global food production must be increased by 50%. On the other hand, the practice of agricultural intensification can have a substantial impact on the environment, resulting in the deterioration of soil quality caused by erosion from wind and water, as well as the pollution of air and water owing to excessive use of nutrients and agrochemicals. It may also result in a loss of biological and ecological diversity. To address the negative impacts of highly productive but intensive farming, it is crucial to transition towards more environmentally friendly and sustainable agricultural practices (Issad et al., 2019). Precision Agriculture (PA) has emerged as one of the most promising solutions, since it predicates the use of cutting-edge technologies such as proximal and remote sensing, automation and control systems, power and data management. This domain has extended its reach to encompass a broad spectrum of applications, including but not limited to crop disease monitoring, pesticide control, irrigation and water management, storage management, and weed and soil management (Jha et al., 2019). The main goal of PA is to optimize the available resources in order to achieve sustainable production with the lowest cost (Alreshidi, 2019; Issad et al., 2019). The International Society for Precision Agriculture (ISPA) initiated a year-long endeavor that involved the contributions of 46 experts in the field of precision agriculture, culminating in the formulation of the subsequent definition:

Precision Agriculture is a management strategy that gathers, processes and analyzes temporal, spatial and individual data and combines it with other information to support management decisions according to estimated variability for improved resource use efficiency, productivity, quality, profitability and sustainability of agricultural production. (Andujar, 2023).

In short, PA is, by its definition, data rich. Yet, much of practical PA today still relies upon human decision making. The role and influence of increasing amounts of available data in that decision making can be greatly enhanced by the implementation of Artificial Intelligence (AI) into the decision process (Alreshidi, 2019). AI algorithms allow a machine to emulate humans in solving complex problems for which analytical solutions are difficult or impossible to derive (Pathan et al., 2020). For instance, in agriculture, AI can be trained utilizing sensor data to estimate and interpret physiological or organoleptic changes in crops/fruits and at the same time, classify them according to their physiological properties. Hence, its predictive capabilities enable the prediction of crop yields, disease outbreaks, and optimal planting times, resulting in more informed decision-making. AI has nowadays achieved a large success in different applications (e.g. healthcare (Yu et al., 2018), business (Akerkar, 2019), transportation (Machin et al., 2018), education (Roll & Wylie, 2016), robotics and automation (Perez et al., 2018; Wisskirchen et al., 2017)) and has been demonstrated in agricultural robotics, optimization management, automation, knowledge-based systems, expert systems and decision support systems (Alreshidi, 2019).

Overview of fruit maturity estimation

The estimation fruit quality and maturity, and the subsequent fruit classification steps are considered essential parts of the harvest process. Accurate maturity estimation and classification methodologies mitigate the risk of economic loss due to untimely harvesting. Traditionally, destructive approaches have been used to identify the maturity of the fruit, but this is not a very efficient approach since the fruit is destroyed in the process. For example, in Iran, many fruits, such as hawthorn and mulberry, are harvested either manually by shaking the tree branches or mechanically using a shaker. However, this harvesting process can lead to variations in the ripeness levels of the fruits as some hawthorn fruits may be unripe, some ripe, and some overripe. One possible reason for this variation is the irregular flowering time of hawthorn trees, which causes the fruits to ripen at different times. The uneven ripeness levels can negatively impact the marketability of the harvested hawthorns, leading to a decline in economic value and an increase in product waste. In Iran, annually around 20 million tons of these agricultural products go to waste during the post-harvest stage. This amount of waste is equivalent to the food that 20 million people could consume in a year (Azadnia et al., 2023).

Moreover, the traditional process must be frequently repeated with many samples to get representative values for the entire orchard. In effect, destructive testing provides a very selective view of the broader field as there is high variability between fruit maturity from cultivar to cultivar and from plant to plant. One alternative is organoleptic fruit maturity estimation by an experienced grower with years of accumulated expert knowledge. However, expert knowledge is difficult to acquire, difficult to communicate (or translate) to others and hence scale, and therefore cannot be widely applied. Active and passive remote sensing technologies have advanced considerably to the point where reasonably inexpensive sensors can be easily deployed in fields. There is now growing use of AI to process remote sensing measurements and infer maturity levels of fruit to save time, reduce cost and avoid laborious inspection processes. As a result, non-destructive approaches emerge as a more promising PA technique in which sensor data and AI are combined. These sensors measure the fruit maturity indices externally and provide this information to AI models to estimate the fruit maturity and quality and classify accordingly. The bulk of AI algorithms used for fruit quality and maturity estimation belong to the Machine Learning (ML) category. ML methods are designed to ‘learn’ non-linear mappings between input measurements and output inferences that are not observable directly. Some commonly considered ML approaches for estimation are Multiple Linear Regression (MLR), Partial Least Square (PLS) and Principal Component Regression (PCR) while are Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forests, K-nearest neighbours (KNN), K-means clustering and Neural Networks (NN) widely adopted ML approaches to sorting out fruits. “Sensor fusion and artificial intelligence for sensor data processing” section extensive describes the ML approaches that have been adopted for both the classification and estimation of fruit maturity.

Overview of machine learning applications in smart agriculture

ML is popular especially because it does not require extensive analytical modelling (Liakos et al., 2018) and it is rather simple to setup a computer to learn from structured data. In agriculture, the data can be supplied from local sensors data storage, cloud based data sources that accumulates soil properties, temperature or humidity from multiple sensors independently, or any agricultural company or organization like Food and Agriculture Organization (FAO). To infer the fruit quality and maturity, a computer must receive a large volume of data from multiple sources. ML algorithms extract patterns from data based on user-defined labels during the training process. User-defined labels provide description of data, categorized in a dataset which are set by the user. For example, while developing a ML model for categorizing maturity, a user would supply a dataset of numeric values of multiple maturity indexes. Each level of maturity would be accompanied by a label indicating the specific type of maturity such as ‘immature’, ‘mature’, or ‘ripe’. Therefore, understanding features and metrics that can be extracted from data is crucial since this step is the foundation for the estimation of the fruit quality and maturity. In the case of fruit maturity level identification, ML algorithms are expected to identify features from the dataset, such as maturity indexes and determine the ranges of the maturity indexes according to user-defined labels, whereas metrics interpret the accuracy in determining the ranges. ML approaches in agriculture have been reviewed by several researchers. For instance, Liakos et al. (2018) classify and briefly discuss the ML approaches and show their applications to four major aspects of agriculture, namely livestock management, crop management, soil management and water management. Under crop management, they discuss briefly the application of ML in yield prediction, weed detection, disease detection, and crop quality and species detection. Ip et al. (2018) highlight the applications of ML in the area of crop protection that covers plant disease, weed and pest management.

Aim and structure of the article

In recent years, there has been considerable work on the use of ML for the detection of fruit maturity and quality, both in-situ and in sorting and packing sheds. In order to use ML approaches to estimate fruit maturity, suitable datasets are required that comes from destructive and non-destructive sensors. These sensors primarily measure some maturity indexes of the fruits. Therefore, finding the suitable maturity indexes of a particular type of fruit is challenging since each type of fruit exposes its physical and organoleptic changes in its unique way. Apart from that, some of these maturity indexes can only be measured considering either destructive or non-destructive approaches. Hence, an extensive study is required that can outline the important maturity indexes and their measuring approaches so that researchers in agriculture can take decisions suitably in their future works. Besides, another challenge may appear when it is a matter of considering suitable ML approaches in order to interpret different type of datasets where some ML approaches need to be introduced with features while others not. It is worth mentioning that sensor data type varies from sensor to sensor. To address this challenge, the strength and limitations of these ML approaches should be discussed and analyzed, particularly for this application of fruit maturity estimation. Additionally, the interpretability and data sensitivity for the ML algorithms have been addressed that play significant roles in terms of understanding how the maturity indexes are interpreted by the algorithms in this application. Thus, the ML users can consider the features during model development and utilize the algorithms correctly.

Considering a large number of fruit maturity indexes, discussed in literature, this study outlines the common and effective maturity indexes to offer readers a clear understanding of them. Apart from that, this study significantly contributed to developing a framework about the applicability of different destructive and non-destructive methods in measuring different fruit maturity. Hence, the destructive and non-destructive approaches have been classified and comprehensively described that can help a reader to consider suitable fruit maturity testing methods. In addition, this study seeks to review the work on fruit maturity estimation using ML approaches, including a comprehensive critical review of maturity indexes is presented to inform the potential future design of novel ML models. Moreover, this review contributes by analysing commonly used ML approaches, particularly in fruit maturity level estimation and their corresponding strengths and limitations. Finally, the article shows the trend of currently considered non-destructive approaches (i.e., spectral imagery and spectroscopy) in estimating fruit maturity and suitable ML approaches that can interpret the sensors data suitably.

This article is organized as follows. “Fruit maturity indexes and testing methods” section reviews commonly considered maturity indexes and their derivation from in-field measurements. Therefore, it also offers an idea of widely applied destructive and non-destructive approaches to measure the maturity indexes. “Sensor fusion and artificial intelligence for sensor data processing” section focuses on a comprehensive review of AI approaches with sensor fusion in fruit maturity, beginning with AI fundamentals, followed by a comparative analysis of their advantages and limitations in the context of estimating and/or predicting fruit maturity. “Challenges and future research and development trends” section highlights several challenges pertaining to ML algorithms, maturity indexes and sensors are identified. Additionally, several promising areas of future work are identified. This section also demonstrates the trends in remote sensors and ML algorithms to give a conclusive idea about their directions and progress.

Fruit maturity indexes and testing methods

Maturation is a stage of fruit development that refers to the achievement of either physiological or horticultural maturity. Physiological maturity defines the absolute biological stage of fruit development whereby fruit can carry on the process of ontogeny (i.e. further development) once detached from the host plant (Kader, 1997). Noted that non-climacteric fruits (e.g. grapes, cherries, citrus) do not continue to ripen after picking whereas climacteric fruits (apples, pears, peaches) continue to ripen. Horticultural maturity, on the other hand, is the stage of fruit physiological development associated with meeting specifications as defined by processors or consumers (Brovelli & Cisneros-Zevallos, 2007). This section addresses commonly considered maturity indexes and their measuring approaches, including their classifications.

Fruit maturity indexes

In this review, maturity indexes have been classified into two categories based upon observability; namely internal and external observables. Figure 1 illustrates the classification of fruit maturity indexes, described in detail below and in the subsequent sections of the article. Internal observables include the fruit texture and composition metrics such as moisture, dryness, firmness, sugar, acidity, starch, flesh colour and oil content. Some other internal observables such as specific gravity and heat capacity are also considered in a few works to define the fruit maturity but they have been omitted from this review since they are not broadly considered for a number of fruits.

Fig. 1
figure 1

Maturity Indexes classification based on the observables

On the other hand, external observables such as fruit size, shape, weight, skin colour, chlorophyll content and number of days after full bloom have been discussed extensively since they are commonly considered in horticulture, particularly for horticultural maturity. Conversely, leaf changes, aroma other colour pigments (e.g., chlorophyll content, anthocyanin and carotenoid) have not been explored in full detail since their application is limited to certain fruits. The maturity indexes can be measured by destructive, non-destructive or both methods. Indeed, most of them can be measured by both methods, i.e. ‘double methods’, while some can only be measured by a single method. Table 1 summarises the main observables, maturity indexes and their measuring approaches.

Table 1 Maturity indexes and associated measuring approaches

Sugar concentration

In general, both in climacteric fruits (e.g., apple, pears, plums, avocados etc.) where starch starts to break down to sugars when the fruit starts to ripen, and in non-climacteric fruits (e.g., strawberries, grapes, cherry, citrus fruits etc.) where sugars are accumulated during the process of maturation (Prasad et al., 2018), sugar concentration is considered a key indicator of fruit maturity. However, instead of being climacteric fruits, summer fruits such as peach, nectarine, plums and apricots continue to ripen after harvest. Hence, off and on its own, sugar concentration cannot be used to estimate, in an absolute sense, the ‘maturity’. Instead, other maturity indicators such as firmness, size, colour, aroma etc., are also required to be taken into consideration at the same time in order to be confirmed about the maturity level of these fruits. Hence, multiple sensor data can be fused altogether using ML approaches. Noted that Brix is the technical term that is used for both total soluble solids (TSS) and SSC, albeit Brix only refers to the sugar contents of fruits (Magwaza & Opara, 2015). Degree Brix (°Brix) is the unit of measure that defines the gram of sucrose equivalents per 100 g of solution, i.e. juice. The soluble sugars include sucrose, glucose and fructose and are, in general, measured by destructive methods like gravimetric methods (using hydrometer) (Nor et al., 2014), refractometry (Harrill, 1998), High-Performance Liquid Chromatography (HPLC) (Richmond et al., 1981) and potentiometric chemical sensors (also known as electronic tongues) (Beullens et al., 2008). Since these methods are time-consuming, manual, laborious and destructive, a few non-destructive methods have emerged as alternatives (Jie et al., 2014). Several spectroscopic methods such as hyperspectral imaging (Sugiyama & Tsuta, 2010), visible to near-infrared (vis/NIR) spectroscopy (Jie et al., 2014), nuclear magnetic resonance (NMR) (Zhang & McCarthy, 2013; Zion et al., 1995), and Fourier transform NIR spectroscopy with attenuated total reflection (ATR-FTIR) (Bureau et al., 2009) are frequently used non-destructive methods to determine the sugar contents of fresh produce. Dielectric spectroscopy is another non-destructive method that is also found in some literature (Castro-Giráldez et al., 2010).

Acid contents

Changes in acidity are significant for several fruits during their maturation and ripening. In both climacteric (Pech et al., 2008) and most of the non-climacteric fruits (Batista-Silva et al., 2018), acidity is an observable indicator of maturation. Unlike sweetness, acidity declines gradually as the fruit reaches maturity although the rate of acidity changes is affected by the cultivars and season. The acidity of fruits are primarily measured considering titratable acidity (TA) and pH, and both of them provide a unique insight into fruit quality (Tyl & Sadler, 2017). TA can be calculated using either Eqs. (1) or (2) below. Equation (1) calculates the percentage of an acid with respect to weight, and Eq. (2) measures it with respect to volume:

$$TA_{wt/wt} \left( \% \right) = \frac{N \cdot V \cdot Eq wt}{{W \cdot 100}} \cdot 100$$
(1)

where N, V, Eq.wt and W define the normality of titrant (usually NaOH), the volume of titrant, equivalent weight of the acid and mass of the fruit.

$$TA_{wt/vol } \left( \% \right) = \frac{{N \cdot V_{1} \cdot Eq wt}}{{V_{2} \cdot 100}} \cdot 100$$
(2)

where \(V_{1}\) and \(V_{2}\) symbolize titrant’s volume and sample’s volume, respectively.

Apart from TA, the Brix-Acid ratio is a desired metric, often in preference over Brix or acid alone is and in some instances is considered an objective measurement that reflects the consumer acceptability (Tyl & Sadler, 2017). The measurement of Brix includes the sugars, acids and other acid components and hence acid components is often considered at the numerator. As a result, Brix-Acid ratio may have a chance of lower reliability in terms of defining fruit maturity (Magwaza & Opara, 2015). For instance, Zhang et al. (2020) consider six (6) different cultivars of peach of fifty (50) samples to predict their maturity relying on multiple maturity indexes such as IAD, skin colour, firmness, extractable juice, TA, SSC, and Brix-Acid ratio. Among them, IAD and firmness prove their reliability in interpreting the maturity while SSC, TA and Brix-Acid ratio are unable show considerable reliability in terms of estimating peach maturity.

In order to measure the pH of fruits, a pH meter or pH instrument (Chassagne-Berces et al., 2010) is used. While TA is measured, in general, using acid–base titration (Kafkas et al., 2007), other instrumental approaches such as HPLC (Fawole & Opara, 2013; Moing et al., 1998), voltammetry (Kotani et al., 2019) and pH meters (Lebrun et al., 2008) also have drawn the attention of the researchers. Near-infrared (NIR) hyperspectral imaging can be considered as a new promising non-destructive technique to measure the pH efficiently (Li et al., 2018a, 2018b).

Firmness

The firmness of fruit flesh decreases with the ripening process, and hence overripe fruits are relatively soft (Jarimopas & Kitthawee, 2007). Therefore, it is considered as one of the important maturity indicators to classify fruit according to their maturity levels (Macrelli et al., 2013). Several types of penetrometers are widely used to measure fruit firmness including the Magness–Taylor tester (Jantra et al., 2018), electronic pressure tester (Lehman-Salada, 1996) and Effegi firmness tester (Harker et al., 1996). Zahed Fathizadeh et al. (2021) classify the non-destructive approaches to measure the apple firmness into three (3) categories based on their working principles and extensively discuss them as follows:

  1. (1)

    Acoustic, vibration and mechanical response (i.e., mechanical resonance, vibrational excitation, impact force response/velocity, ultrasonic methods and intelligent firmness detector);

  2. (2)

    Optical methods (i.e., time resolved and NIR spectrometry, chlorophyll fluorescence and laser excitation); and

  3. (3)

    Electrical resistance and capacity measurements.

Although this particular study focuses on the non-destructive tests of firmness of apples, these methods are applicable to other fruits as well.

Starch contents

During active growth, many fruits accumulate carbohydrates that are stored in starch form utilizing photo-assimilates. As they approach maturity in the trees, starch starts to be hydrolysed (Fuchs et al., 1980) and turns into soluble carbohydrates (Doerflinger et al., 2015b). Once the fruits are detached from the trees, the starch begins to transform into soluble sugars in order to contribute to the metabolic processes of fruit ripening (Doerflinger et al., 2015b). Thus the fruits lose starch gradually with the ripening process (Blankenship et al., 1993; Doerflinger et al., 2015b). Interestingly, the starch releases sugars like glucose, sucrose and fructose that are responsible for the respiration and enhancement of sweetness of the fruits (Doerflinger et al., 2015b). Therefore, for starchy fruits such as apple, mango, avocado, banana, kiwi etc., starch content is considered an effective maturity index. Starch pattern index (SPI) (Doerflinger et al., 2015a), SSC (Subedi & Walsh, 2011), and dry matter content (DMC) (Palmer et al., 2010) are some maturity indexes that are significantly influenced by the starch content of the fruits. Starch pattern index (SPI) defines the starch degradation in fruits (Brookfield et al., 1997) by staining with an iodine potassium-iodide solution. It is a widely applied destructive approach (Blankenship et al., 1993; Brookfield et al., 1997; Zude et al., 2006). Peris et al. has introduced NIR spectroscopy to determine the starch index in apples and achieved satisfactory results in determining the maturity stage of the fruit though it is a time-consuming approach (Peirs et al., 2003).

Dry matter contents

Dry matter is a measure of carbohydrate content and hence it is related to the starch content. It is also significantly related to moisture content. Dry matter increases with the fruit growth and development and declines with fruit maturity. Conversely, moisture content increases when dry matter starts to decline at the very beginning of fruit ripening (Olarewaju et al., 2016). Accordingly, dry matter and moisture content are increasingly considered as two effective maturity indexes to determine the fruit maturity stage that will define the harvest time (Magwaza & Tesfay, 2015; Rajkumar et al., 2012). In general, a sample of the pulp of a starchy fruit is oven dried in order to measure the weight of dry matter or dry matter concentrations (Doerflinger et al., 2015a; Palmer et al., 2010) which is a destructive approach. McGlone et al. (2003) established a mathematical equation for DM with starch content as follows:

$$DM \, = \, SS \, + \, St \, + \, NS$$
(3)

where SS, St and NS are respectively the percentages (w/w) of water, soluble solids and starch and non-starch insoluble solids. In addition, Magwaza and Tesfay also introduced two equations to calculate DM and Moisture Content as follows (Magwaza & Tesfay, 2015):

$${\text{DM }}\left( \% \right) = \frac{{M_{d} }}{{M_{f} }} \cdot 100$$
(4)
$${\text{Moisture Content }}\left( \% \right) = \frac{{M_{d} - M_{f} }}{{M_{f} }} \cdot 100$$
(5)

where \(M_{d} { }\) and \({ }M_{f}\) define the mass of the dry sample and the mass of the fresh sample.

In general, non-destructive approaches such as NIR (McGlone et al., 2003; Mishra & Woltering, 2023; Travers et al., 2014), SWNIR (Subedi & Walsh, 2011), NMR (Chen et al., 1993), ultrasonic attenuation (Mizrach et al., 1999) for DM, NIR (Blakey & Van Rooyen, 2011) and dielectric method (Gun & Chen, 2010) for Moisture Content can be taken into consideration in pre-harvest to estimate fruit maturity.

Oil content

Oil content is considered one of the crucial factors along with dry matter and moisture content to estimate the maturity of fruits like avocado (Magwaza & Tesfay, 2015), palm and olive (Guzmán et al., 2015). In the case of palm, oil content does not increase after harvesting (Saeed et al., 2012), and hence, it is very crucial to decide the suitable palm in situ during harvesting. In general, oil content increases with the decrease of water (i.e. moisture contents) when fruits are in the trees. Since the maturity of palm is significantly correlated with its texture and colour, chlorophyll content is considered as one of the factors to estimate its maturity that can lead to estimate the oil content in palm fruits (Balasundram et al., 2006). Magwaza and Tesfay (Magwaza & Tesfay, 2015) have introduced an equation to measure the Oil Content as follows:

$${\text{Oil Content }}\left( \% \right) = dry matter \left( \% \right) \cdot \frac{oil weight}{{dry pulp weight}}$$
(6)

The Soxhlet extractor (Magwaza & Tesfay, 2015; Stefanoudaki et al., 1999), refractometer (Lavee & Wodner, 2004) and single-outlet piston press (Tantanawat et al., 2020) are examples of destructive methods used to measure oil content; the Soxhlet extraction method being the most popular approach. Alternatively, NMR (Magwaza & Tesfay, 2015), vis-NIRS (Ncama et al., 2018) and laser-light backscattering (Ali et al., 2020) are examples of non-destructive methods.

Chlorophyll content

Chlorophyll content degradation in fruit can be considered one of the efficient maturity indicators (Zhang et al., 2020) and even can be better in indication than starch, firmness and SSC (Costamagna et al., 2012). The ripening process brings various physiological and biochemical changes in the fruits (Kasampalis et al., 2020), and hence, it is responsible for the degradation of chlorophyll in the fruits. For instance, chlorophyll shows strong correlation with firmness in apple (Song et al., 1997) and peach (Zhang et al., 2020) and with skin colour of mango (Jacobi et al., 1998) in their ripening process. To measure the chlorophyll content, index of absorbance difference (IAD) has been introduced in the literature that degrades with the degradation of chlorophyll content (DeLong et al., 2014). Fruit absorbance (\(A\)) can be expressed using Lambert’s Beer Law as stated (Ziosi et al., 2008):

$$A = \log_{10} I^{ - 1}$$
(7)

where \(I\) is the interactance spectra.

Hence, index of absorbance can be measured using the formula as stated (DeLong et al., 2014; Ziosi et al., 2008):

$$I_{AD} = A_{670} - A_{720}$$
(8)

where \(A_{670}\) and \(A_{720}\) are the \(A\) values at 670 nm and 720 nm wavelengths.

Chlorophyll content can also be measured using both destructive and non-destructive methods. Chromatography is the only destructive method that has been used to measure the chlorophyll content (Lichtenthaler, 1987). Delta Absorbance (DA) meter (Cocetta et al., 2017; Zhang et al., 2020) and Vis–NIR spectrometer (Infante et al., 2011; Zhang et al., 2017) particularly maintain a similar working principle considering light absorbance and help to find out the chlorophyll content through IAD. Raman spectroscopy (Trebolazabala et al., 2017) and chlorophyll fluorescence (Kolb et al., 2006) are two other non-destructive approaches to measure the chlorophyll content in the fruits though they are not used for measuring IAD.

Size, shape and weight

Reaching a certain size and weight can be considered a possible maturity index, but it certainly can be alone to declare the maturity of a fruit (Crisosto, 1994). In general, the size and weight of fruit according to its maturity level can vary from another because of its cultivar (Cheng & Breen, 1992), cultivation practice, climate conditions, region of cultivation etc. (Crisosto, 1994). In addition, fruit shoulder is remarkably highlighted as an indicator to assess the maturity of a fruit (Pereira et al., 2009; Sahu & Potdar, 2017). However, to rely upon these maturity indexes, additional maturity indexes are highly recommended to be coupled with them, such as skin colour (LaRue, 1989; Prabha & Kumar, 2015).

Skin colour

Skin colour is considered as one of the indicators of fruit maturity. Hence, fruits like bananas (Prabha & Kumar, 2015), tomatoes (Wan et al., 2018), dates (Lee et al., 2008), plums (Kaur et al., 2018) etc., in general, expose their colours vividly during their maturity process. Noted that the colour change also varies from cultivar to cultivar, where some cultivars are very responsive with their maturity through skin colour while some other cultivars are not such responsive (Prasad et al., 2018). However, classifying fruits according to their maturity using camera have received tremendous interest since the computer vision system has significantly developed, become cheaper and available. Computer vision particularly works relying on the skin colour of the fruits. Hence, recently, skin colour has become a popular maturity index to segregate mature fruit from immature though relying upon skin colour completely to identify a mature fruit is not certainly a practical approach always. However, in some cases, like nectarines, peaches or apples, skin colour is highly responsive to the sunlight that leads to reach a wrong conclusion about the maturity of a fruit (Iglesias et al., 2008; Prasad et al., 2018). Therefore, particularly changes in ground colour (Mitchell et al., 1977) that is not much influenced by sunlight or flesh colour (Josan & Chohan, 1982) in fruit can be comparatively a better indicator than exposed surface colour (Prasad et al., 2018).

Days after full bloom

Days after Full Bloom (DAFB) is also considered as a maturity index for fruit though it is not as efficient as other indexes since it is highly variable depending on the weather. It can vary from 5 to 20 days in general (Washington State University, 2021). Therefore, DAFB helps to estimate only the season for fruit maturity of an orchard in general but is unable to provide the accurate information about the maturity of a fruit particularly.

Fruit maturity testing methods

The maturity indexes can be measured using either destructive or non-destructive methods or a combination of both. As aforementioned, in comparison with destructive methods, non-destructive methods, are more advantageous since they assist in estimating in situ maturity, measuring multiple maturity indexes simultaneously, and supporting real-time decisions (Li et al., 2018a, 2018b). Interestingly, during the process of ripening, several physical and chemical changes in the fruits can be noticed, such as fruit shape, size, colour, weight, ethylene production, firmness, texture, chlorophyll concentration, soluble solid contents, sugars, oils, acids, and respiration (Perkins-Veazie et al., 2000; Prasad et al., 2018). However, non-destructive methods are not able to measure and assess all the changes. Therefore, researchers particularly work on the maturity indices that can describe some physical and chemical changes during harvest time efficiently and hence, non-destructive methods can assist in estimating harvest time and ripeness. Reid (2002) outlines the maturity indices for specific fruits and commonly practiced methods (i.e. destructive and non-destructive) to determine the fruit maturity as delineated in Table 2. Figure 2 outlines the classification of available testing methods that are widely adopted to determine and estimate the fruit maturity and ripening based on some previous works.

Table 2 Maturity indices for specific fruits and methods of maturity determination (Reid, 2002; Vanoli & Buccheri, 2012)
Fig. 2
figure 2

Classification of fruit ripening and quality measurement testing methods

This review work exploits the widely applied non-destructive and destructive approaches to identify fruits qualities and narrow down the focus on the possible non-destructive approaches which can be adopted in Smart Agriculture. Hence, the working principles of destructive and non-destructive approaches can be further studied from some previous literature. For example, Li et al. (2018a, 2018b) elaborate the colour detection, spectroscopy and spectral imaging essentially while El-Mesery et al. (2019) shows emphasis on spectroscopy, spectral imaging and some dynamics methods such as acoustic and ultrasonic methods. Lakshmi et al. (2017) briefly discuss the commonly utilised non-destructive methods and classify them according to their working principles. Nielsen (2017) addresses both the destructive and non-destructive approaches which are often practised in food industry.

Destructive methods

Destructive methods are defined as the approaches that measure the fruit maturity or ripening through destroying. These approaches include penetration, peeling fruit flesh/mesocarp, chemical applying reaction or exertion of external force on a fruit. Figure 3 illustrates some of the commonly adopted instruments for destructive testing of fruit maturity. Based on the actual measurement performed, this study classifies destructive methods as either physical or chemical. Physical approaches do not include chemical reactions during the process. Hence, they can be named as non-chemical approaches as well. Physical approaches can be further categorised as follows:

Fig. 3
figure 3

Destructive SSC measuring instruments: a refractometer (Rawle, 2017); b electronic tongue (Podraz˙ka et al., 2018); c HPLC (Monash University Malaysia, 2021); d hydrometer (Camuffo, 2019)

Dissection approaches These rely on some sort of dissection of fruits to extract a sample, which can then be used to measure a maturity index. For instance, volumetric measurement, water activity measurement, displacement method, voltammetry, gas chromatography, Soxhlet extractor test, oven-dry, HPLC, and hydrometry require a pulp primarily from the particular fruit to measure its maturity indexes.

Non-dissection approaches As opposed to dissection approaches, these do not require any sort of dissection of a fruit. Hence, pressure tester, penetrometer, testurometer and effigy tester follow non-dissection approaches since they take the measurement through penetration or creating pressure at the surface.

Chemical Approaches, on the other hand, usually follow a dissection process but focus on chemical characterisation of the sample as opposed to the physical quantities discussed above towards the fruit to measure its maturity indexes. For instance, starch index has traditionally been measured using I2-KI solution with a pulp of a fruit, and TA of a fruit is measured by titration.

Non-destructive methods

Non-destructive methods can be broadly categorized according to the three following working principles: electro-optical, dynamics and electro-magnetic methods. Figures 4, 5, 6 show some of the commonly adopted non-destructive instruments, whereas Table 3 overviews established non-destructive methods.

Fig. 4
figure 4

Non-destructive instruments: a Vis–NIR Spectrometer (Trimble, 2019); b NMR Spectrometer (Bruker, 2021)

Fig. 5
figure 5

a Hyperspectral camera (Middleton Spectral Vision, 2021) and b configuration for measuring fruit maturity (Cheng et al., 2014)

Fig. 6
figure 6

Firmness tester (Facchini srl, 2021)

Table 3 Non-destructive methods and their measured maturity indexes

Electro-optical methods have been defined as the methods that primarily use visual sensors with electronic signal acquisition and processing system. Therefore, it includes various active and passive systems such as: laser, LIDAR, colour detecting (i.e. visible imaging and colorimetry), spectroscopy (i.e. fluorescence, visible, infrared, microwave and thermal imaging), and spectral imaging (i.e. multispectral and hyperspectral imaging). In general, imaging spectroscopies are categorized under passive sensing system as they rely on controlled external energy sources for measuring the energy absorbance, reflectance, refraction or scattering. In contrary, laser and LIDAR are considered as active sensors since both of them generate signals to a fruit and receive back the signals from the same fruit. Thus, they measure the index of a fruit based on the transmitter and receiver signals.

LIDAR-based systems are active systems in that they measure the difference between the emitted pulse and the returned signal at the receiver (time delay, energy or power differential). As opposed to passive systems (hyperspectral) that measure the reflected radiation, active systems are less susceptible to atmospheric conditions, light variability, changes in viewing angle, and the canopy structure (for airborne measurements). Relating the tree profile and volume LIDAR to fruit yield and maturity, has been successfully applied to peaches (Pascual et al., 2009), but is heavily related to other measurements such as irrigation frequency. These types of applications, underutilize the data gathered by the LIDAR sensor. In fruit maturity LIDAR spectroscopy has also been used to measure oil palm fruit maturity specifically measuring the relative reflectance of fruit bunches on trees (Zulkifli et al., 2018). LIDAR-based systems in agriculture have largely been confined to 3D scanning, point cloud techniques, despite the inherent versatility of laser systems. Molecular-line absorption is an interesting area of laser application for atmospheric sounding, determining atmospheric constituents, but relatively obscure in agriculture applications. Previous work (Gardi et al., 2016, 2017; Pham et al., 2019) has identified LIDAR-based systems for volatile organic compounds (VOC) and CO2 estimation. Knowledge of these atmospheric constituents has clear causal implications for the photosynthetic and fruit maturity processes (Fahey et al., 2021). NASA’s Active Sensing of CO2 Emissions over Nights, Days and Seasons (ASCENDS) mission, employs monostatic Differential Absorption LIDAR (DIAL) to measure oceanic CO2 flux, taking advantage of the ability to take day and night measurements across different conditions that allows the small laser beam to pass through gaps in clouds (Abshire et al., 2010). The push towards LIDAR-based or fusion systems in agriculture, highlights the practical difficulties of VIS-based systems in in-field applications.

Dynamic methods include X-ray computed tomography, ultrasonic attenuation, acoustic impulse response measurement and laser doppler vibrometry. In these methods, primarily, the atoms or molecules of a fruit are stimulated by a detector and based on the response from the detector, the maturity indexes of the fruit can be measured.

Electro-Magnetic methods establish relations between a detector and a fruit through the principles of electricity and magnetism and find out the maturity status of the fruit. Hence, it incorporates several non-destructive approaches such as Electric Nose Test, Electrical Impedance Test and Nuclear Magnetic Resonance.

It is worthy to mention that single non-destructive sensor is not enough efficient to offer a fruit maturity index with a higher accuracy. Thus, destructive methods are adopted in order to achieve the certainty in maturity indexes of fruits. Table 3 describes the commonly used non-destructive methods on different fruits in order to measure their maturity indexes that are supported by some destructive methods.

Sensor fusion and artificial intelligence for sensor data processing

With the advancement of sensor technology, multi-sensor fusion and AI classification/inference methods have achieved a fast-growing research area since it has the ability to overcome the limitations of individual sensors, reduce the uncertainty and error in data and cooperatively provide information that may not be achieved by a single sensor (Durrant-Whyte, 1990; Mitchell, 2007). This section overviews the applications of sensor fusion and AI to the determination of fruit maturity.

Sensor fusion in fruit maturity estimation

Dull (1986) highlighted the need to consider multiple information sources to determine the fruit quality and Benady (1994) applied the concept of sensor fusion on an intelligent robot to determine the ripeness of melon through visual inspection and gas sensing. Steinmetz et al. (1999a, 1999b) demonstrated that multi-sensor fusion can significantly reduce the error in classifying fruit on the basis of maturity compared to using just a single sensor. With increasing levels of accuracy realised through sensor fusion, researchers are considering an ever-expanding range of different sensors to assess fruit maturity. For example, Steinmetz et al. (1999a, 1999b) considered vision coupled with spectrophotometry to predict a fruit maturity index, namely, the sugar content of apples. Mendoza et al. (2012) used a combination of an acoustic sensor, a bio-yield (firmness) tester and a visible and shortwave near-infrared (vis-SWNIR) spectrometer to measure apple firmness and SSC respectively. Zakaria et al. (2012) used the so-called ‘electronic nose’ along with an acoustic sensor to discriminate and classify the maturity and ripeness level of mangoes through measuring the aroma and firmness, achieving 84.4% classification accuracy.

Artificial intelligence in fruit maturity estimation

AI, a human intelligence-mimicking technology, is progressively covering almost all sorts of engineering areas nowadays, particularly that are data and logic-oriented though its initialization takes place under computer science. Its rigorous and multi-faceted applicability makes it pervasive in versatile research areas that would otherwise be computationally infeasible. Its domain of applications is so vast that cannot be highlighted only through naming, and hence, based on the characteristics of different applications, this study classifies it into ML, Genetic Algorithms, Fuzzy Logics and Expert Systems as shown in Fig. 7. This study mainly explains the applications of ML in fruit classification based on their maturity level, and thus, only the branches of AI in regards with only fruit maturity have been considered. Since agricultural growth is necessarily proportional to the growing population, smart agriculture has become a concern to researchers. Apart from that, traditional methods that have been practised are not sufficient to keep pace with increasing demands and hence, it expedites the development of smart agriculture. The necessity of smart agriculture then introduces AI that contributes to several agricultural areas such as crop diseases, pesticide control, irrigation and water management, storage management and crop, weed and soil management (Jha et al., 2019). As a result, several areas of applications of AI have been found in agriculture such as expert systems (Murase, 2000; Rani et al., 2011), decision support or making (Attonaty et al., 1999; Thomopoulos et al., 2015), prediction (Chlingaryan et al., 2018), optimization (Talaviya et al., 2020), multi-agent systems (Bahri et al., 2020; Skobelev et al., 2018), machine vision (Patrício & Rieder, 2018), natural language processing (Mostaco et al., 2018) and ML (Liakos et al., 2018). This study particularly focuses on the employment of AI for the in-situ fruit maturity and ripeness and hence, ML is the area of interest of this study since it deals with pattern recognition.

Fig. 7
figure 7

Branches of Artificial Intelligence

Machine learning

The interest in ML is resurging as high computation with high volumes of data is no more a challenging task and its progress in last two decades is significantly prominent. The purpose of developing ML is to train a model that can update and improve itself through experience automatically. These ML algorithms train models by extracting characteristic patterns from data that are applicable to a given industrial scenario (Diez-Olivan et al., 2019). Therefore, ML mainly deals with computer science and statistics concurrently, and its core comes from data science and AI (Jordan & Mitchell, 2015). In the realm of AI, ML emerges to define models based on some algorithms that finally are able to identify, classify or predict data efficiently and it can be performed through suitable training processes.

Definitions

Machine Learning (ML) methods are a data-driven subset of AI wherein typically large datasets are used to train algorithms to ‘learn’ highly non-linear functions automatically or semi-automatically (Salcedo-Sanz et al., 2020) that are difficult to model analytically. Despite the large number of ML algorithms, each with numerous variants that have been developed, the following fundamental definitions are commonly employed in the ML community to most algorithms:

Model The output of an ML training process is a model which represents the relationship or mapping between the user-provided input/observables e.g. fruit colour, pH value, measured volume and the output of interest e.g. maturity level, quality index. Under the ML paradigm, the model is not derived analytically from first principles i.e., a physics-based model, but rather, is derived from data through an automated training algorithm.

Model parameters This refers to iteratively adjusted parameters by the training algorithm according to a user-defined cost criterion. When the model training is complete, the parameters are ‘frozen’, and the model can be deployed in the field for its chosen application. For example, within the Artificial Neural Network (ANN) class of ML algorithms, the model is parameterized through weights and biases of fundamental computing elements called neurons.

Training The procedure by which model parameters are adjusted according to a cost criterion. As an example, backpropagation is commonly used in ANN to adjust weights and biases according to a cost function formulated from the difference between the model output and the target output. Training time varies widely from hours to several weeks depending on the number of model parameters, the size of the available dataset and the ‘learning rate’ or step-size to adjust the parameters for each iteration.

Dataset This refers to the user-provided input data that is used to adjust and finalize the model parameters prior to its deployment in the field. In the ML context, it is essential to make a distinction between the training dataset and the testing dataset. As the name suggests, the training dataset is used to adjust the model parameters during the training phase. Subsequently, an independent dataset is used to test the model to inform further parameter adjustments. The split of datasets into training and testing datasets is an important factor in ensuring that the ML model can generalize to unseen data during its operational deployment. Depending on the type of algorithm, the input training data can be completely ‘raw’ i.e., little to no pre-processing or preparation. This is the case for Deep Learning algorithms where features are automatically extracted from the data during the training process. For standard ML algorithms, Input datasets are typically analyzed beforehand to inform the development of hand-crafted features which are then input to the model.

Workflow

The objective of ML is to develop a model that can make an inference of the system or process being observed. The model essentially removes the need of deriving an analytical relationship that may be difficult or impossible depending on the non-linearity of the process. To estimate fruit maturity or identify a mature fruit, a ML algorithm receives the maturity indexes as the pre-processed data. Then the data are segregated into testing data and training data. In general, a large portion of the initial dataset is required to support the ML training, and the rest of the data are kept as testing data where the allotted data for testing and training depends on the features available in the data. It is important to note that the training data deserves consideration and the need for quality control by documenting performance with further independent sets is critical in practical adoption.

Then these training data are used to develop a model considering either a supervised or unsupervised learning approach. In unsupervised learning, the features are not labelled, and consequently, the algorithm identifies data clusters autonomously, without user guidance. Conversely, supervised learning algorithms are provided with labelled data by the user. Continuous data is trained through regression approaches, while discrete, categorized data is trained using classification methods, particularly designed for organizing data into distinct groups. Finally, this model is tested using the testing data, and the accuracy is measured. Depending on the application, Root Mean Square Error (RMSE) or confusion matrices are typically used as measures of model performance. Hence, if the model fails to interpret data with a pre-defined accuracy, the model is required to be re-developed. For more detail explanations about the workflow of ML algorithms, some previous works can be referred as (Bonaccorso, 2018; Janiesch et al., 2021; Mohammed et al., 2016). Figure 8 depicts the complete process of ML algorithms through a block diagram.

Fig. 8
figure 8

Block diagram of ML algorithms’ workflow

Classifications

ML algorithms can be classified into supervised and unsupervised learning based on their learning approaches, primarily though some works add semi-supervised learning and reinforcement learning (Kumar et al., 2019). As this work particularly focuses on the ML approaches in fruit maturity estimation, semi-supervised and reinforcement learnings are not widely practised. Hence, this sub-section focuses on supervised and unsupervised learning and their application to fruit maturity classification and estimation. Since this review work only explores the applications of different ML algorithms in fruit maturity estimation and their maturity-based classification, the working principles of some of the commonly considered ML algorithms have been discussed briefly. For further explanation, several works can be referred such as (Bonaccorso, 2018; Dhall et al., 2020; Mohammed et al., 2016; Sarker, 2021). Figure 9 shows the classification of ML algorithms that are commonly applied in Smart Agriculture.

Fig. 9
figure 9

Taxonomy of ML in Smart Agriculture (especially in Fruit Maturity application)

Supervised Learning approaches can be commonly listed based on the algorithm as Naïve Bayes, SVM, Decision Tree, Random Forests and ANN (Fig. 10).

Fig. 10
figure 10

a SVM, b Decision Tree and c Random Forest (Bijjahalli et al., 2020)

Supervised learning

Supervised learning necessitates labelling the data before the training process. During the training process, the training examples are input into the model. The model output predictions are compared against the label (true) value. The difference between the model prediction and the label is used to form a cost function. The model parameters are then iteratively optimized to minimize the cost function using optimization algorithms, i.e. Gradient Descent, AdaDelta, Adaptive Moment Estimation (Adam), RMSProp, Adagrad, Momentum, Nesterov Accelerated Gradient etc. Gradient Descent (GD) and its variants Batch GD and Stochastic Gradient Descent (SGD) are the most commonly employed optimization routines used in training. During the training process, patterns that characterize the underlying characteristics of the input–output relationship are extracted to inform the model prediction. In the fruit maturity context, ML models most commonly perform one of two tasks viz., classification or regression. In classification, observables or features extracted from sensor measurements or tests are input to the model, which infers and outputs the state of a particular crop as belonging to one of the multiple possible discrete categories, e.g., ripe, moderate ripeness, unripe. In regression, temporal features are extracted from the data to predict the future state of the observed crop. Both these applications are covered in greater detail in the following sections.

Classification based on algorithm

Supervised Learning approaches can be commonly listed based on the algorithm as Naïve Bayes, SVM, Decision Tree, Random Forests and Artificial Neural Network. Hence, this section explains the aforementioned supervised learning approaches since they are widely used in fruit maturity applications to either categorize fruits or estimate their maturity.

SVM Support Vector Machines (SVM) are powerful supervised machine-learning tools used mainly for classification, though less frequently for regression. They create a ‘decision boundary’ to classify new data into two categories. The standard SVM aims to find the widest ‘street’ between these categories, but initially had ‘hard margins,’ being highly sensitive to outliers. Soft-margin SVMs were later developed to allow for some margin violations. Since real-world datasets are often not linearly separable, SVMs use ‘kernel’ functions to transform data into a high-dimensional feature space. This facilitates the application of linear SVM in the transformed space to find the separation boundary. Common kernels include polynomial and Radial Basis Functions (RBF). In the context of this specific application, it is frequently observed that datasets exhibit nonlinearity when multiple features, such as fruit maturity indexes, are plotted against each other. Consequently, it becomes necessary to consider the utilization of a hyperplane that can effectively address the inherent high-dimensionality, as exemplified in references (De-la-Torre et al., 2019; Zhang & Wu, 2012).

Decision Trees Decision trees are a non-parametric supervised learning approach that solves classification and regression problems (Reyes, 2020). They use input attributes to make decisions, which can be output classes or numerical values. The tree structure involves arranging attributes based on data values, with the root at the top and nodes branching from it. Decisions are generated at the endpoints where no further splitting occurs (Alloghani et al., 2020). Hunt’s algorithm drives the selection of sequences for each node, determining the order of decisions. The Gini impurity is introduced to achieve accurate classification based on these attributes, minimizing the Gini index to 0 (Plapinger, 2017). Decision Tree method holds significant appeal in the context of classifying fruits based on their maturity levels, primarily owing to its simplicity and interpretability (Mim et al., 2018). The literature supports its higher predictive accuracy for fruit classification, as evidenced in (Fadchar & Cruz, 2020; Mim et al., 2018; Suresha et al., 2012).

Random Forest Random Forest is a machine learning approach that leverages an ensemble of decision trees to predict outcomes. It trains multiple decision trees in parallel and determines the final decision based on the majority consensus of these trees. Random Forest employs bootstrapping to train each decision tree on different subsets of the training data (Misra & Li, 2019), ensuring diversity and reducing variance. This approach, collectively known as bagging, makes the algorithm more robust than a single decision tree when dealing with training samples and noisy datasets. Random Forest is particularly advantageous for fruit classification and maturity estimation tasks where multiple maturity indexes and large sample sizes pose challenges for algorithm training. In such cases, Random Forest demonstrates promising performance, offering higher accuracy in both classification and regression applications. It is capable of outperforming Support Vector Machines (SVM) in certain scenarios, despite SVM’s efficiency with large datasets (Kaur & Gupta, 2017). Random Forest’s strength lies in its ensemble of decision trees, transforming weak learners into strong ones, ultimately resulting in higher prediction accuracy (Elhariri et al., 2014).

Naïve Bayes Naïve Bayes is a probabilistic classifier based on Bayes’ theorem (Mushtaq & Mellouk, 2017). It assumes that each input feature independently and equally contributes to the target class. For each class value, it calculates conditional probabilities for each feature, then applies the product rule to derive a joint conditional probability for the features (Thornton, 2021). Bayes’ rule is used to find the conditional probability for the class variable. It calculates conditional probabilities of each feature for each class, either assuming a multinomial distribution for discrete features or a Gaussian distribution for continuous features. The classifier then approximates the probability of a sample with known feature values belonging to a specific class. It selects the class with the highest combined conditional probabilities for all features and that class is the final prediction. It’s important to remove correlated features, as they can be overemphasized (Misra & Li, 2019). Naïve Bayes is favoured in fruit classification due to its low training data requirements and faster computation compared to more complex algorithms like SVM, ANN, or Random Forest (Ibba et al., 2021). The Gaussian kernel variant of Naïve Bayes is particularly effective in classifying fruits with nonlinear sample data. However, Naïve Bayes is unsuitable for regression tasks and assumes feature independence, which can be advantageous when dealing with truly independent maturity indexes (Kusuma & Putra, 2018).

KNN K-Nearest Neighbour (K-NN) is a versatile non-parametric machine learning approach suitable for handling large, multidimensional datasets in both classification and regression tasks. K-NN stores training dataset examples and classifies new instances based on their similarity to stored examples. The choice of k, the number of neighbours to consider during classification, impacts the method’s performance (Lindon et al., 2011). A suitable “k” is determined through distance measures like Euclidean, Hamming, Manhattan, Minkowski, or Chebychev (Isa et al., 2017). K-NN excels in small, noisy datasets (Liu & Zhang, 2012), making it apt for tasks like fruit classification. Weighted K-NN, which assigns higher weight to closer neighbors, enhances accuracy (Fan et al., 2019). However, K-NN demands more memory, longer training times, and slower runtime as dataset size increases (Amra & Maghari, 2017; Harrison, 2018).

ANN Artificial Neural Network (ANN) is a biologically inspired computer algorithm designed to mimic human brain information processing. It operates through interconnected neuron layers, including an input layer for data collection, a hidden layer for processing, and an output layer for delivering results (Chang et al., 2008). Neurons are assigned weights and biases, employing nonlinear activation functions like sigmoid, ReLU, or hyperbolic tangent (Zhang et al., 2018). The inner workings of hidden layer neurons remain largely opaque, making ANN a black-box model (Jo, 2021). It updates weights through techniques like back-propagation, addressing model uncertainties. However, ANN’s performance relies on extensive training data, and its complexity hinders interpretability as layers increase. While ANN is effective with large, nonlinear datasets, the lack of interpretability poses challenges as layer count rises (Mijwel, 2018).

Classification based on data type

Based on the types of data and their application, supervised learning takes the following two forms in agriculture: (1) classification; and (2) regression approaches.

Classification approaches In horticulture, ML is also introduced to interpret multiple sensors data and fuse them through proper training in order to classify fruits according to their maturity and predict the harvest time. As a result, several supervised ML algorithms have been introduced to classify the maturity levels of different fruits, such as Support Vector Machine (SVM) (Caladcad et al., 2020), Naïve Bayes (Kusuma & Putra, 2018), Decision-tree (Wajid et al., 2018) and Random forest (Harel et al., 2020). Table 4 depicts the consideration of different ML approaches in classifying the fruits based on their maturity in recent years. Caladcad et al. (2020) use SVM as a classifier to classify coconuts into three (3) groups and interestingly, they use the acoustic sensor data that are the measurements of coconut meat and water volume. Finally, they achieve in classifying fruits according to their maturity level with an accuracy of 80%. Kusuma and Putra (2018) classify tomatoes into raw, ripe and rot categories considering Naïve Bayes approach. They develop a grayscale formula using RGB colour images and finally are able to reach 76% accuracy in classifying tomatoes (Kusuma & Putra, 2018). Similarly, Wajid et al. (2018), Jhawar (2016) and Harel et al. (2020) adopt decision tree and linear regression to classify orange, and Harel et al. use a random forest to classify sweet pepper and finally, achieve 93.13%, 97.98% and 95.82% accuracy in classifying them respectively. These three studies use RGB colour images to identify fruits’ maturity levels. On the other hand, Saeed et al. (2012) use vegetation indexes to classify the palm fruits and manage to reach an accuracy of 82.2% where K-nearest neighbours have been used as the classifiers. It shall be noted that the accuracy in each study is measured based on the testing data performance in comparison with training data.

Table 4 Algorithms used in classifying fruit maturity

Regression approaches In situ fruit classification is not sufficient alone if the maturity cannot be predicted as it is still a laborious and time-consuming job. Therefore, researchers introduce another dimension of ML that can predict the fruit maturity indexes and determine the optimal harvest time. Hence, a regression model is developed at the beginning to test a group of data that has been collected and at this stage, it is known as a pre-process model. Then the best interpreting model can be considered as a predicting model, and then the model is yielded to be evaluated based on accuracy. Several regression models are taken into consideration such as Multiple Linear Regression (MLR) (Jha et al., 2007), Partial Least Square (PLS) (Li et al., 2018a, 2018b), and Principal Component Regression (PCR) (Mahesh et al., 2015). MLR is a regression method that determines potential explanatory variables which are important to predict the response variable (Hemmings & Hopkins, 2006). PLS, an extension of MLR, predicts the response of a variable from the nature of a large set of predictor variables through reducing the predictor variables to a smaller set of predictor variables and finally performs least square regression (Xia, 2020). Besides, PCR reduces a large set of explanatory predictor variables into a set of smaller principal components and at the same time, offer a regression model (Artigue & Smith, 2019). In the meantime, for scattering correction, Standard Normal Variate (SNV) (Marques et al., 2016; Torres et al., 2015) and Multiple Scattering Correction (MSC) (He et al., 2005; Saranwong et al., 2004), for describing scattering profile, Modified Lorentzian Distribution (MLD) (Peng & Lu, 2008) and for data smoothing, Savitzky–Golay (SG) filter are some commonly used approaches that lead to achieving suitable linear regression (Gorry, 1990). Table 5 describes the commonly considered regression models to offer a comprehensive idea about the earlier works in fruit maturity prediction. The table shows commonly used regression approaches such as ANN, PLS, MLR, and LDA-Competitive Learning Neural Network (CLNN). This table also describes the considered sensors for data acquisition and observables from the acquired data in those studies. Pre-process algorithms have also been highlighted that leads to predict the maturity indexes further. It is important to underline that these maturity indexes assist to predict when the fruits should be harvested since these maturity indexes gives the measurement of their gradual changes through ripening process.

Table 5 Algorithms to predict fruit maturity indexes

Classification based on AI approaches

Supervised learning algorithms can be alternatively/further classified into two based on the neural network approaches: neural networks and non-neural networks.

Neural Networks Several approaches appear in horticulture to classify and predict fruit maturity. For instance, ANN (Amiryousefi et al., 2012), CNN (Ayllon et al., 2019), faster Region-based Convolutional Neural Network (Faster R-CNN) (Zhu et al., 2020), Deep Convolutional Neural Network (D-CNN) (Habaragamuwa et al., 2018) and Recurrent Neural Network (RNN) (Yossy et al., 2017), have been adopted to classify the maturity stages of banana, mango and calamansi, strawberry and blueberry respectively. Deep Learning (DL) approaches, including CNN, RNN and their derivatives, are more computationally demanding than conventional ANN (largely due to the higher number of hidden layers). However, DL is more capable than ANN in dealing with more complex non-linear problems (Raghavan et al., 2016). One of the most successful areas of applications of deep learning is pattern recognition (Bai et al., 2021). In spite of its notable advantages, DL has several drawbacks; for example it requires more model parameters during training and as a result, it is prone to overfitting due to over-parameterization (Thompson et al., 2020). Also, deep ANN have much higher computational costs (Pandian, 2021; Thompson et al., 2020) and, as a result, one or more GPU become necessary. Each additional GPU increase significantly the amount of power absorbed by the PC. Hence, DL require higher power consumption comparing to other and especially non-neural network methods (Thompson et al., 2020). Table 5 discerns the applications of these algorithms in several research works with their accuracy to provide a better understanding of the algorithms.

Non-Neural Networks These alternative approaches do not adopt the conventional ANN multi-layer structure to interpret data, and thus, they include SVM, Decision Tree, Random Forest, Naïve Bayes, k-means neighbour, linear regression and logistic regression.

Notably, none of the approaches is perfect unequivocally in classifying fruit maturity level and predicting the fruit harvesting time, and at the same time, they have their own merits and demerits. Therefore, a proper trade-off between the advantages and drawbacks of each method can help to reach higher accuracy in classifying fruits according to their maturity level and predicting the suitable time for harvesting.

Unsupervised learning

Unsupervised learning is a type of ML that learns pattern of data without human intervention. Therefore, unsupervised learning is applied where data are not labelled, and the system does not have any prior knowledge about the data (Debener et al., 2023; Károly et al., 2018). Clustering is the most popular approach among the unsupervised learning approaches and commonly used in a wide range of applications such as feature extraction, pattern recognition, image segmentation, vector quantization and data mining (Du, 2010). It divides the input data into several groups where the system is not aware of the groups beforehand. Surprisingly, it has become popular in fruit maturity because of its well-known applications like feature extraction and pattern recognition. For example, Tu et al. (2018) apply k-means clustering to classify the maturity stages of passion fruits and achieve 91.52% of accuracy. Moradi et al. (2012) consider fuzzy c-means to classify pomegranates based on their maturity level through magnetic resonance (MR) images and able to reach an accuracy of 85.93%. Gong et al. (Gong et al., 2014) use hierarchical clustering to classify apples into 5 groups based on their maturity level. Besides, a few works on autoencoders are available in the area of fruit maturity classification. Varga et al. (2021) adopts autoencoder to enhance the performance of Deep Neural Network in terms of classifying kiwis according to their level of maturity.

Comparison of methods

ML algorithms are widely being used in different applications, and hence, it is important to highlight on what basis they are chosen in different fields. Some of the algorithms are well suited for classification, while some others are highly effective for regression. For example, Naïve Bayes is commonly used for classification while it is not used for regression. SVM is considered as an efficient classifier when dealing with high-dimensional data. Moreover, KNN is popular for its efficiency in dealing with outliers. Maturity estimation of fruit and their classification based on maturity, highly deals with outliers, predictors dependence and sometimes high-dimensionality. Therefore, choosing a suitable algorithm is highly challenging for this application. Hence, Table 6 outlines the algorithms’ features, advantages, limitations and suitable application areas to offer a clear understanding about them (Ma et al., 2019; Mao & Wang, 2012; Sen et al., 2020; Shrivastava et al., 2020; Tu, 1996).

Table 6 Comparative discussion among the ML algorithms

Challenges and future research and development trends

ML-based methods promise several benefits to smart agriculture. Broad application areas that benefit from this line of research include fruit maturity prediction, fruit quality classification, crop disease detection, plant health monitoring, weed detection, water level in soil, etc. In each of these domains, ML methods complement conventional algorithms to provide greater adaptability to evolving environmental conditions. Alternatively, they can also be used to replace traditional analytic algorithms entirely. These methods support the ability to generalize from training patterns or data. Despite these promised benefits, there are several limitations and pitfalls to be carefully addressed. Hence, this study categorizes the challenges and limitations from three different perspectives, namely algorithms, maturity indexes and instruments.

Challenges

The following subsections discuss in detail some of the relevant challenges faced when applying advanced sensor fusion and AI techniques to agriculture and particularly fruit maturity estimation.

Complexity

The complexity of a ML model depends on the complexity of the task that it is designed to perform. Complexity in this instance refers to the non-linearity of the underlying process/system, and consequently, the number of model parameters that are required to ‘fit’ the data.

For example, if strong maturity indexes such as starch index and firmness are available, a relatively simple model can be trained to classify the maturity of the fruit. If these indexes are not available and classification must be made on the basis of features such as colour or shape (as in the case of imagery), then a more complex model is required. A good example of this is Convolutional Neural Networks, where highly complex models are trained using hundreds of thousands of training samples. There is always a trade-off between model complexity and interpretability.

Data sensitivity

The prediction result from an ML model significantly depends on the size and variety of datasets. Interestingly, a lack of diversity in training dataset can make a model highly efficient in that particular dataset, but would make the algorithm perform poorly when a new dataset is used to estimate fruit maturity and ripening. In addition, a model developed with a small dataset can show the same problem in interpreting unseen data. In order to explain it comprehensively, the bias-variance trade-off model is illustrated in Fig. 11. A bias reflects the difference between the correct prediction value and average prediction value from the designed model. An excessive consideration on bias leads the model to be oversimplified and generate high error between the testing and training data. The variance, on the other hand, describes the variability in the prediction of a model. High variance models perform well when it is introduced with unseen training data. However, at the same time, it leads to high error when testing data is applied and is responsible in making the model more complex. Therefore, to develop a well-performed model, a proper trade-off between bias and variance should be achieved and hence, generalization error should be minimized. Both the bias and variance provide useful insights on the nature of a ML model. For instance, when a model of multiple variance leads high error in terms of prediction, it may give an insight that consideration of some variances can be misleading in terms of model’s performance. Therefore, PCA has been proposed to determine the most significant variances to explain a model. Similarly, bias can also help to interpret a model where two variables maintain a relationship. For instance, Root Mean Square Error of Prediction (RMSEP) and R-squared (R2) are commonly considered to explain the relationships between two variables and that is possible with the help of the presence of bias of a model. Hence, both bias and variance can help to design a suitable model for prediction.

Fig. 11
figure 11

Bias-variance trade-off (Papachristoudis, 2019)

Interpretability

Conventional approaches assist in predicting the maturity of fruits based on some key maturity indexes in general. However, some ML approaches, like Neural Network-based ML algorithms (in particular Deep Learning), cannot natively provide interpretations or explanations of their outputs, i.e. how they arrived at a particular prediction. As such, the user may not be aware of the most significant maturity indexes that the model utilized to predict the maturity level of a given fruit sample. For example, IAD and firmness shows strong correlation in peaches and thus, it can interpret the fruit ripeness efficiently. However, SSC or SSC/TA ratio are unable to deduce any considerable correlation which can interpret the peach maturity or ripeness (Gasic et al., 2013). There is a possibility of a black-box modelled ML algorithm to choose either SSC or SSC/TA ratio or both altogether for a group of samples and able to interpret maturity or ripeness of peach. This interpretation can be local and not be applicable to other group of peaches. Being the ML algorithm a black-box model, a user is unable to identify in which features the algorithm are adding weight to culminate the result of identifying or estimating peach maturity or ripeness. While the body of knowledge in explainable AI now entails several promising techniques, further research in ML explanation methodologies focusing on smart agriculture is required to integrate this capability where some works like feature attribution methods have already been started adopting in the sector (Montavon et al., 2018; Paudel et al., 2023).

Incorrect/insignificant maturity index selection

It is very well known in horticulture that different fruits have their own ways of exposing their maturity level. Therefore, a maturity index that is significant to measure the maturity level of a fruit is not necessarily applicable to others. For example, the skin colour of a peach or apple can be deceiving in estimating their maturity since they are highly responsive to sunlight. In that case, the measurement of multiple maturity indexes can be a promising solution. Moreover, the maturity level commonly varies from cultivar to cultivar of the same type of fruits. Therefore, the same standard, considered for a fruit’s maturity index to estimate its maturity level, cannot be considered to estimate the maturity level of the same type of fruits with different cultivars; even though they are from the same orchard and plants have grown up under same environments.

In some cases, the measurements from multiple maturity indexes can lead to a wrong conclusion. For instance, considering starch index and oil content together is a wrong choice to measure the maturity level of a peach or nectarine. In contrast, these maturity indexes are highly suitable for measuring the maturity of an avocado. Therefore, choosing appropriate maturity indexes is highly significant in terms of explaining a fruit’s maturity level.

External factors

Solar radiation/irradiance, water irrigation, fruits’ positions in the canopy and nutrition in plants have impacts at certain extents on the fruits though they are ideally not significant. However, it is worth noting that multiple external factors altogether can lead to a wrong estimation of the fruits in an orchard in terms of their maturity. Apart from that, it is widely known that weather, season and climate are some impactful factors that remarkably are responsible in defining the fruit maturity of an orchard.

Sensor limitations

Instrumental error is a commonly known challenge in all real-life applications. Therefore, several calibrations can be a way to mitigate the problem. Apart from that, the recorded performance of a sensor at ideal conditions may help determine if the sensor has fault inside by comparison.

Trends in non-destructive approaches and ML algorithms

In order to comprehend the pace and directions of ML applications in smart agriculture, a survey on previous studies, particularly on both sensors and ML algorithms, can aid in getting the complete picture. Out of fifty-three (53) relevant articles published in the last two decades, thirty-eight (38) articles have considered colour determination approaches that include colourimetry and visible imaging Fig. 12.

Fig. 12
figure 12

Pie Chart of non-destructive approaches in classifying fruit maturity/ripening stages

In addition, five (5) articles introduce spectroscopy approaches to classify fruit based on their maturity and it includes fluorescence, vis–NIR, NIR and MIR spectroscopy. Among these six (6) articles, three (3) articles consider Vis–NIR, and the other three (3) articles consider fluorescence, NIR and MIR spectroscopy. Spectral imaging approaches have been considered in six (6) articles that entail multispectral and hyperspectral imaging approaches, where four (4) of them are hyperspectral, and the rest two (2) are multispectral imaging approaches. Acoustic impulse and electrical impulse are minimally considered approaches among the articles that classify fruit based on their maturity level. It is essential to highlight that computer vision algorithms, specifically in visible imaging and colourimetry, have increased drastically in the last decade and hence, this technology has widely been considered to classify fruits. Figure 13 illustrates the timeline of ML algorithms that have been using in fruit maturity classification based on the surveyed articles. Between 2011 and 2020, it is apparent that ML model usage has sharply increased in fruit maturity classification because of the advancement of two key enablers—namely, advances in GPU efficiencies and the increasing availability of datasets. Among ML methods, SVM, and CNN have been two widely applied algorithms over the past five (5) years.

Fig. 13
figure 13

Timeline of ML algorithms in fruit maturity

Apart from that, in the realm of fruit classification based on maturity, several applications of ANN can be found along with SVM and CNN in literature, as summarized in Fig. 14. Both ANN and CNN are types of deep learning mostly used in image processing applications requiring high computation. Thus, when both approaches are combined, it will cover 31% of the works which represent that majority of current works in this application are being performed by deep learning. As previously illustrated by Fig. 13, most works have focused on colour determination in the last two decades. As a result, deep learning is achieving greater interest in recent years for image processing applications due to the advancement of camera technology and availability. In addition, SVM is increasingly popularity for statistical data analysis of non-destructive sensors data. Noted that Decision tree offers overfitting with high dimensional data (Kotu & Deshpande, 2019) while QDA offers singularity to the covariance of the matrix and not fit for the variable of high dimensionality (Jiang et al., 2018). The pros and cons of the ML algorithms further have been discussed in “Instrument technology” section.

Fig. 14
figure 14

ML approaches in fruit maturity

External factors

Solar radiation/irradiance, water irrigation, fruits’ positions in the canopy, nutrition in plants, environment temperature and soil quality are some external factors that have impacts on the fruit quality and maturity mentioned previously. These factors have been studied before but not extensively in relation to fruit maturity. Apart from that, the impact of season and weather on fruit maturity are well-known facts in horticulture. Cirilli et al. highlight environmental factors such as irrigation, pruning and canopy management, fertilization, temperature, photoperiod, solar radiation, soil patterns, precipitation, climate, crop load etc. significantly impacts on the physiological and metabolic process of peaches, and hence, SSC and sugar contents in peaches vary (Cirilli et al., 2016). Therefore, the impacts of environmental factors offer scopes to make a relationship with fruit maturity and ripening. Henceforth, it may widen the opportunity to understand fruits’ phenotype that can lead to estimate fruits’ maturity more efficiently.

Instrument technology

In-situ techniques are increasingly being surpassed in accuracy and application in agriculture by remote electro-optical sensors. Emerging techniques such as refinement of spectral imaging LIDAR, and thermography in remote sensing will shape the future of crop yield estimation, fruit maturity, product quality and stress detection. Spectroscopy and spectral imaging have remarkable prospects in estimating fruit maturity as they deal with hundreds of spectral bands and provide a high-levels of understanding about an object (Kang et al., 2020). As an illustration, Wendel et al. (2018) have introduced an unmanned ground vehicle equipped with a hyperspectral camera designed to measure dry matter and estimate the maturity of mangoes within an orchard. In the course of spectral image analysis, ML algorithm, specifically a Convolutional Neural Network (CNN), were employed. In addition, sugar contents, acidity, solid contents, oil contents and chlorophyll contents can be measured using spectroscopy that has been outlined in Table 1. Since spectral imaging can also include fruit colour, size and shape to estimate fruits’ maturity level at the same time, several works combine both spatial and spectral features of an image (Fauvel et al., 2012; Ghassemian, 2016; Mirzapour & Ghassemian, 2015).

Since ML works particularly with the features, a suitable ML approach is a requirement to achieve higher accuracy. According to the survey, as shown in Fig. 14, SVM is a widely applied ML approach in fruit maturity and its competitors are ANN and CNN since most of the vision systems users choose them. Noted that neural network-based ML algorithms behave as ‘black-boxes’ which are challenging to interpret by human users. This is a drawback since the interpretability of the maturity indexes is important. The black-box structure gives stress on some features during model development that are in general unknown. Therefore, in this application, the stressed features must be known since the significance of the maturity indexes vary from fruit to fruit that has been discussed in previous sections. Thus, it may lead to the wrong predictions about the maturity levels of the fruits. Therefore, neural network-based ML algorithms are in general not suitable for this application until the link between weights and the functions can be revealed. In addition, these methods are computationally efficient and ideal for image processing applications. As discussed previously, to support image processing applications, additional GPUs are usually needed especially when dealing with a large volume of images. In such case, additional power supply and cooling system are frequently required.

Besides, non-neural network-based algorithms can be chosen based on their features such as prediction time, training time, required tuning, memory usage, high-dimensional data handling capacity etc. Since training and prediction time, tuning requirements and memory usage are not significantly considered because of the availability of highly computational computers, high-dimensional data handling capacity should receive the highest priority now after interpretability. From this aspect, SVM, Naïve Bayes and KNN can be suitable for this application. Particularly, SVM has achieved the reputation of classifying hyperspectral images in both spectral and spatial aspects (Chandra & Bedi, 2021; Kaul & Raina, 2022; Li et al., 2011). KNN is easy to implement but shows poor performance in prediction. Besides, Naïve Bayes considers all the features as independent and equally significant, and it is practically not considerable specifically in this application where fruits’ maturity (Jahromi & Taheri, 2017).

Table 7 shows commonly used ML approaches in terms of their features that gives an understanding of choosing a suitable ML approach for this application.

Table 7 Outline of the available ML algorithms for fruit maturity estimation and their comparison

Conclusions

The evolution and increasing capabilities of sensors and AI algorithms have supported their growing role in precision agriculture, particularly in the analysis of fruit maturity and ripening. Fruit maturity estimation methods include both destructive and non-destructive approaches with a variety of techniques developed over the years for a wide range of practical applications. This article categorized these methods with the aim of exploring their potential and evolving applicability. The techniques can be very-cultivar specific but recent developments have introduced progressively higher levels of flexibility and applicability. The classification of fruit maturity is typically performed with reference to suitable indices, including sugar contents, acidity, firmness and chlorophyll content. This article examined the mainstream ML approaches adopted in fruit maturity estimation and compared their performance in different practical applications. Neural network-based ML algorithms have received more attention than other AI techniques due to their inherent simplicity and suitability for image processing applications. Additionally, as opposed to non-neural network methods, these approaches do not require manual feature extraction from the raw data, which helps to streamline the estimation process. However, neural network algorithms usually have a black-box structure that prevents an immediate human-readable interpretation of their workings. These approaches are widely adopted for image-based applications like colorimetry or digital imaging application but not fit in other applications where multiple sensor streams are monitored and interfered sometimes. In that case, SVM and KNN can be preferable since they have the ability to deal with large datasets and are inherently more human-interpretable. SVM have been widely used for dealing with spectral and spatial data and offers satisfactory results whereas KNN is poor in accuracy sometimes.

Ongoing research is focusing on using non-destructive spectroscopy and spectral imaging to analyze sugar, acidity, firmness and chlorophyll contents, particularly from remote sensors. This is a major trend in the smart agriculture field, which has observed a move away from destructive testing, despite them being traditionally more accurate. Remote sensing, conversely, can offer a faster and wider-area overview of the entire yield, which is also practical for other post-harvest activities and sorting. Specifically, there is a considerable body of research addressing spectroscopy and spectral imaging techniques to increase both their accuracy and widen their applicability. For instance, measuring vegetation indexes to assess the maturity of individual fruits is a promising area of application of multispectral and hyperspectral imaging. Since different spectral bands have specific capabilities to characterize the reflectance properties of different surface features and considering the rapid technological advancements in optics, vegetation indexes and multi/hyper-spectral analysis can deliver more information about the maturity of fruits. Therefore, further research is recommended to explore their efficacy in this important application domain. Further advancements in efficient and effective ML algorithms tailored for the PA context are also recommended to best match the characteristics of fruit maturity-related sensor data and of the natural maturation process.