Introduction

The heterogeneous nature of food complicates food bioprocessing operations through varying responses to process conditions1. Thus, the development and application of dynamic optimization approaches is an important step towards ensuring robust process control, quality, and consumer safety2,3. The technical application of these approaches, especially in a biologically complex product such as traditional beer has been minimal4. Furthermore, variable microbial growth kinetics, process constraints, biochemical reactions, dynamic food matrices, and difficult bioprocessing requirements amplify complexities in bioprocess development and optimization3,5. As a result, the combination of linear and non-linear techniques is an effective approach to describe, analyze, and predict bioprocess responses that impact the outcomes of the final product3,6.

The use of a single technique may not be adequate in ascertaining the relationship between process input variables and the quality of the product5. Nonetheless, standalone mathematical and statistical models have been previously successful in describing the linear, interactive, and quadratic effects of selected parameters in beer bioprocessing6,7. Response surface methodology (RSM) and factorial experiment with their associated designs are traditional statistical models which have been applied extensively to screen and optimize factors in the biotechnology and food engineering industries3,6. RSM consists of a group of empirical techniques which evaluate the relationship between a group of control experiment parameters to achieve an optimal process8,9. In particular, RSM has been used as a statistical method to generate efficient models to optimize very large and complex bioprocesses in food systems1,10. RSM determines the significance of a model and defines the relationship between process variables through analysis of variance (ANOVA) and the lack of fit1,11. Moreover, the optimum conditions are determined by the desirability function10,11. However, traditional techniques show significant limitations in biological processes3,6. For example, RSM disregards parameters deemed insignificant without accounting for the possible interactive effects on the output of bioprocess.

Artificial intelligence (AI) and machine learning (ML) tools such as fuzzy logic, ANN, particle swarm optimization (PSO), and genetic algorithm (GA) are emerging technologies appropriate for the research and development of efficient bioprocesses6,12. Recently, the application of 2-step or 3-step optimization approaches involving RSM, ANN, and GA has become standard practice for manufacturing and other biological processes7,12. However, these tools and approaches have been rarely applied in the modeling and optimization of brewing and fermentation processes3. ANN has been successful in accurately approximating linear and non-linear functions from historic data devoid of cellular kinetics and metabolic fluxes, especially in multivariate bioprocesses3,6. ANNs are mathematical emulations of the biological learning process occurring within the brain. It can arithmetically model the network structure of interconnected nerve cells, and thus “learns”, link associations, and adapts to make accurate value predictions from a specific sample set13.

ANN possesses extraordinary processing abilities such as self-organization and data classification, pattern recognition, processing fuzzy and inaccurate information, good generalization capabilities, quicker processing time, noise and fault tolerance as well as high parallelism12,13. Given its numerous benefits, the use of ANN as a non-linear multivariate tool in bioprocess development can improve both the bioprocess and the final product14,15. For umqombothi bioprocessing, ANN presents a unique advantage in the improving developed RSM models since its standard framework has an inherent ability to use background information to solve problems4,16. Bioprocessing approaches that apply both RSM with adaptive learning techniques such as ANN have been shown to have better accuracy, prediction, and dependence relation when compared to traditional, isolated RSM17,18. As such, bioprocess development and optimization without carefully deliberated process designs will result in irreproducible and unreliable process designs4,10. In this study, a modeling method for the development of a bioprocess to optimally produce umqombothi was investigated.

Methodology

Traditional beer (umqombothi) brewing process

Five hundred (500) g of pre-packaged King Korn malted sorghum (Mtombo – Mmela) (Tiger Brands, Bryanston, South Africa) was mixed with 1000 g of White Star maize meal (Pioneer Foods, Bryanston, South Africa) in a sterile 10 L bucket filled with 7 L tap water. The mixture was gently stirred, covered, and incubated (Labcon, Chamdor, South Africa) at 25 °C for 24 h to sour. Thereafter, the soured paste was stirred gently and cooked for 30 min at 45 °C to make a traditional beer porridge (isdudu). The porridge was allowed to cool to 25 °C after which 500 g of King Korn malted sorghum (Tiger Brands, Bryanston, South Africa) was added and gently stirred. The mixture was then incubated at 30 °C (Labcon, Chamdor, South Africa) for 24 h to ferment. The finished beer was then tested for physicochemical properties.

Experimental design using response surface methodology (RSM)

Preliminary experiments (data not presented herein) were conducted to determine appropriate ranges for processing factors: cooking time, fermentation temperature, and fermentation time and their effects on alcohol content, total soluble solids (TSS), total titratable acidity (TTA), pH, and viscosity in umqombothi. The obtained data was then used for the design of experiments (DOE) (Fig. 1). Thereafter, appropriate ranges were determined for factors of interest (Table 1). Central Composite Design (CCD) in Design-Expert software version 11.0.0 (Stat-Ease Inc., Minneapolis, USA) was used to generate 20 experimental runs. The input factors were cooking time (hr), fermentation temperature (°C), and fermentation time (hr) (Table 1). Following experimental combinations (Table 2) subsequent experiments were conducted.

Figure 1
figure 1

A flow chart of the complete experimental design and optimization techniques.

Table 1 Process parameters selected for optimization: cooking time, fermentation temperature, and fermentation time.
Table 2 Experimental design of umqombothi.

Samples were withdrawn after each experimental run (done in triplicates) and alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, viscosity (cm/min) were determined. The Design-Expert software was also used to analyze and compute a second-order polynomial model to estimate and predict response values over a range of input parameter values by determining which input factors influenced responses, and the direction of that drive for the designed experiments as depicted in Eq. (1) below:

$$Y={\beta }_{o}+\sum {\beta }_{\dot{i}}{x}_{i}+\sum {\beta }_{ii}{x}^{2}+\sum {\beta }_{ij}{x}_{i}{x}_{j}+\varepsilon$$
(1)

where \(Y\) indicated the response variable (optimal production parameter), \({\beta }_{o}\) the intercept of the response variable, while \({\beta }_{i}\), \({\beta }_{ii}\), and \({\beta }_{ij}\) were coefficients corresponding to the factor \({x}_{i}\), \({x}_{j}\) (\(i,j\) = 1, 2, …, n). The input variables that affected the response \(Y\) were \({\mathrm{x}}_{1}\),\({\mathrm{x}}_{2}\), \({\mathrm{x}}_{3}\). The random error was represented by \(\varepsilon\).

Neural network construction and fitting

Experimental data was organized and used for the development of ANN prediction models. A matrix laboratory MATLAB R2020a (MathWorks, Massachusetts, USA) software was used for the design of function fitting neural network. A feed-forward neural network with two layers was used. The first layer was the input layer and the second layer was the output layer, both of which were triggered using the sigmoid activation function. Cooking time (hr), fermentation temperature (°C) and fermentation time (hr) were used as network inputs and alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, and viscosity (cm/min), were each used as the outputs to develop several networks and to determine the optimal network topology. Experimental data were randomly divided for training, validation, and testing. For training, 14 (70%) instances were used, 3 (15%) for validation and 3 (15%) for testing. The ANN model was then trained, validated, and tested by the Levenberg–Marquardt (LM) training algorithm. To further study the responses of the model, Bayesian Regularization (BR) and Scaled Conjugate Gradient (SCG) training algorithms were also evaluated. The network was trained until the coefficient of correlation (R) was closer to 1.

Determination of physicochemical properties

Alcohol content

The alcohol content of the finished beer was determined using a digital refractometer for brewing (Hanna Instruments (Pty) Ltd., Johannesburg, South Africa). A clean pipette was used to place 0.5–1 ml of the finished beer on the sample well. The Plato readings were recorded afterward.

pH

The pH of the finished beverage was determined using a portable pH meter (Hanna Instruments (Pty) Ltd., Johannesburg, South Africa) after calibration with standard buffers of pH 4.00 and 7.00.

Total soluble solids

The total soluble solids of the finished beer were determined using a digital refractometer (Hanna Instruments (Pty) Ltd., Johannesburg, South Africa). A clean pipette was used to place 0.5–1 ml of the finished beer on the sample well. The refractive indices of the samples were then recorded accordingly.

Viscosity

The consistometer (Endecotts, London, United Kingdom) was used to determine the consistency of the finished beer (cm/min) by pouring 100 ml of the sample into the reservoir behind the gate of the consistometer. The lock release lever was released to instantaneously open the gate, allowing the liquid to flow over the instrument’s graduated scale for 1 min.

Total titratable acidity

The American Association of Cereal Chemists (AACC) 02-3119 approved method was used to determine the total titratable acidity whereby 10 g of the sample was dissolved in 100 ml distilled water. The solution was well mixed and 0.5 ml of 1% phenolphthalein indicator was added. Finally, standardized 0.1 N sodium hydroxide was used to titrate the prepared solution until a faint pink color was observed. Titratable acidity (in terms of lactic acid %) = volume (ml) required / 20.

Statistical analysis

All experiments and analyses were conducted in triplicates. ANOVA was employed to determine the significance of the generated models. Design-Expert software version 11.0.0 (Stat-Ease Inc., Minneapolis, USA) was used to determine the Response (Y) of the second-order polynomial equation, the coefficient of determination (R2), the ‘predicted R-squared’ and ‘adjusted R-squared’, the coefficient of variance (CV), and the ‘probability F’ value.

Statement on experimental research and field studies on plants

We confirm that the use of plant-based cereals in our study complied with the relevant institutional, national, and international guidelines and legislation, in particular the IUCN Policy Statement on Research Involving Species at Risk of Extinction.

Results and discussion

The effect of cooking time, fermentation temperature, and fermentation time on the alcohol content, TSS, TTA, pH, and viscosity were investigated. Optimization of cooking time, fermentation temperature, and fermentation time is essential for maintaining consistent physicochemical properties, curbing undesired changes that may occur during bioprocessing, and understanding the interactions among these process variables at different conditions1. In beer production, these are principal factors that influence the final product and its acceptance by consumers20,21.

The effect of input factors on the physicochemical properties of the beer

Alcohol content

Samples fermented for a longer period (≥ 60 h) at a relatively higher temperature (≥ 30 °C) contained a lower alcohol content (Table 3, see experimental run numbers 1, 4, 7, 9, 15, and 20). Generally, a higher fermentation temperature affects the rate of sugar metabolism (i.e., leads to a rapid increase in alcohol content and other by-products such as volatile compounds)21. On a contrary, in this study, a higher temperature accompanied by a longer fermentation time led to a lower alcohol content (Table 3). Given these conditions, a low alcohol content may be attributed to evaporative ethanol loss. It’s not uncommon for product inhibition to occur during simultaneous saccharification and fermentation, whereby ethanol, a fermentation product, inhibits zymase over time while the products of saccharification inhibit hydrolytic enzymes22. In addition, the synthesis of acetate and acids such as formic acid, acetic acid, and levulinic acid at concentrations above 100 mM may inhibit the bioconversion of biomass22,23 and thus influence alcohol content.

Table 3 Responses from the investigated input parameters.

TSS

Cooking the soured porridge for an adequate amount of time is essential for starch gelatinization and release of locked-up nutrients in yeasts cells24. The cooking time was found to influence the alcohol content, TSS, pH, and viscosity (Table 3). The proliferation of fermentative microbes is driven by the hydrolysis of cooked starch to fermentable sugars by endogenous amylolytic enzymes25. As the endosperm protein enclosing the starch granules is softened (during gelatinization), moving the grain to the retting water, thereby increasing the amount of TSS26. This might explain the increasing trend in the amount of TSS with an increased cooking and fermentation time. As observed from Table 3, cooking for more than 1 h significantly increased the amount of TSS. A reverse trend was observed when the fermentation time was increased. This could be attributed to the growth patterns of microorganisms that correspond to the consumption of soluble solids over time26.

The fermentation time largely contributed to the final product’s quality. The longer the fermentation was allowed to proceed, the lower the alcohol content, pH, and viscosity (Table 3). Fermentative microorganisms need sufficient time to adjust to environmental changes for optimal utilization of the substrate for building cellular components (RNA, enzymes, and metabolites)27. As cells complete the cell cycle, they enter the exponential growth phase, where they are the healthiest and most uniform, rapidly driving alcoholic fermentation forward27. A fermentation time of 24 h was observed to have a relatively higher alcohol content, TSS, and TTA levels compared to 60 h and 96 h. It is possible that within this timeframe fermentative microorganisms entered the exponential phase growth phase which led to a higher microbial activity rate.

pH and TTA

The TTA and pH ranged between 0.50–1.54% lactic acid and 2.81–4.60, respectively (Table 3). Generally, umqombothi and other African traditional beers have a pH range of 3 to 4.2, and a lactic acid level of 0.26% depending on how the beer is brewed4,24. Changes in TTA may be a better measure of the success rate of the fermentation process than changes in pH26. A biochemical relationship between alcohol content, TTA, and pH, whereby a lower pH was directly proportional to a high TTA and alcohol content, was observed in this study (Table 3). According to25, as the microorganisms carry out alcoholic fermentation, a decrease in the TSS and pH are usually observed. Beers with decreased pH values, such as umqombothi (Table 3) have a longer shelf-life, better safety and quality, superior facilitation of microbial growth, and a higher concentration of antimicrobial agents28. The low pH and elevated acidity in these beers aid in the elimination of certain pathogenic microorganisms that could pose safety threats29,30.

Viscosity

Cooking time had a direct influence on the final beer’s viscosity. This is because cooking increasing the availability of starch, which imparts viscidness to food and describes the clarity of the finished beer31. In addition, residual starch from incomplete hydrolysis into sugar contributes to a beer’s viscosity25. As TTA increases and the pH is lowered, the joint action of malt α-and-β amylases is reduced, thereby reducing the beer’s viscosity, and giving body to the final beer25. An increase in the α- amylase, Hitempase 2XL decreased the viscosity in beer produced from malted buckwheat32. In western beers, filtration of the beer may be difficult due to high viscosity, thus leading to starch hazes in the final product32, while in traditional beers such as umqombothi, filtering the beer may result in the loss of important fiber-imparting solids, giving the beer a higher viscosity4,33,34.

Multi-response optimization of process parameters

In search for the solution, ANOVA, and Fisher’s F-values were used to examine the best fit of the generated RSM models. Model adequacy was determined by the coefficient of determination values (R2) and lack of fit tests1,20. For the response in view, the R2 described the percentage contribution of the process variables (i.e., the amount of variation around the mean explained by the model). For high-confidence prediction purposes, a usable model demands percentage contribution of 88% (R2 > 0.88)35. The probability of significance was represented by p-values, with a high p-value indicating an inadequate model due to a significant lack of fit36. The models for alcohol content, TSS, and pH all had p-values of 0.00, indicating that the lack of fit was insignificant at a 100% confidence level. Polynomial equations together with 3D response surface plots were used to describe the mathematical solutions of the models. Polynomial equations for alcohol content, TSS, TTA, pH, and viscosity are shown in Eqs. (2), (3), (4), (5), and (6), respectively. For better visualization, 3D response surface plots for alcohol content, TSS, TTA, pH, and viscosity are shown in Fig. 2a,b. Regression equations from the fitted models were used to generate 3D plots.

Figure 2
figure 2

(a) 3D response surface plots demonstrating the effect of cooking time, fermentation temperature, and time on umqombothi samples: (A) Alcohol content, (B) TSS, (C) TTA. (b) A 3D response surface plot demonstrating the effect of cooking time, fermentation temperature, and time on umqombothi samples: (A) pH, (B) Viscosity.

The models for optimizing the alcohol content (°P), TSS (g/100 g) and pH, in the beer, were found to be significant as implied by high model F-values (F ≥ 10) and low p-values (p ≤ 0.05) (Table 4). For the alcohol content and TSS models, X3, X1X2, X12 were significant model terms (p ≤ 0.05) (Table 4). Significant model terms for pH were X3, X1X3, X32, with p-values of 0.00, 0.047, and 0.00 respectively (Table 4). The predicted determination (pred-R2) values for alcohol content and TSS were not as close to the adjusted determination (adj-R2) indicating a slight limitation with the model (Table 5). A consideration of outliers, model reduction, and response transformation may improve the empirical model37. In contrast, the predicted determination (pred-R2) of 0.89 in the pH optimization model was reasonably close to the adjusted determination (adj-R2) of 0.97, thus confirming the model’s accuracy in correctly predicting responses (Table 5). Adequate precision values above 4 indicated an adequate signal-to-noise ratio. This means the optimization models for alcohol content, TSS, and pH were suitable to navigate the design space and all of the model’s parameters showed that the developed models were able to predict the responses correctly. The optimization models for alcohol content, TSS had reproducibility above 90% (R2 ≥ 0.90) and low coefficient of variation (C.V. %) values (Table 5), indicating a good precision for the capability of the process under evaluation.

Table 4 Analysis of variance (ANOVA) for the alcohol content, TSS, TTA, pH, and viscosity quadratic models.
Table 5 Fit statistics of the quadratic model for alcohol content, TSS, TTA, pH, and viscosity optimization.

The models for optimizing TTA and viscosity were insignificant as implied by low model F-values (F ≤ 10) and high p-values (p > 0.05) (Fig. 2a,b). Here, model reduction, consideration of outliers, and response transformation will not improve the model. The overall mean may be a better predictor of the designed responses than the current models. A higher-order model may also predict better in certain cases. None of the TTA optimization model terms were significant, while X1 and X12 were significant model terms (p ≤ 0.05) for viscosity. Both the models’ limitations were described by significant differences between the predicted determination. The TTA model had a pred-R2 of –3.34 and an adj-R2 of –0.09. Similarly, the model for viscosity had a pred-R2 of –1.05 and an adj-R2 of 0.49. In this case, a negative predicted determinant (pred-R2) implies that the overall mean may be a better predictor of the designed response than the current model38. A higher-order model may also predict better in certain cases. An adequate precision value of 2.99 in the TTA model indicated an undesirable signal-to-noise ratio. This means the model was not suitable to navigate the design space. The viscosity optimization model had an adequate precision above 4, meaning the model was suitable for navigating the design space. The low reproducibility of 42% (Table 5) for the TTA optimization model was indicated by a low coefficient of determination (R2 = 0.424). In contrast, the coefficient of determination for the viscosity was 0.729, representing a 73% reproducibility. Although the reproducibility can be considered adequate, a C.V. % value of 15.22 may be alarming (Table 5). From the obtained experimental data, second-order polynomial equations showing the significance of linear, quadratic, and interactive terms in predicting the response were generated and shown in Eqs. (2), (3), (4), (5), and (6) below:

$${\text{Y}}_{1} = 8.4244 + 0.118123{\text{X}}_{1} + 0.00695478{\text{X}}_{2} { }{-}1.61535{\text{X}}_{3} { }{-}0.56{\text{X}}_{1} {\text{X}}_{2} + 0.0825{\text{X}}_{1} {\text{X}}_{3} + { }0.1325{\text{X}}_{2} {\text{X}}_{3} { }{-}0.518085{\text{X}}_{1}^{2} + 0.270339{\text{X}}_{2}^{2} { }{-}0.269005{\text{X}}_{3}^{2}$$
(2)
$${\text{Y}}_{2} = 7.80809 + 0.0916957{\text{X}}_{1} + 0.11852{\text{X}}_{2} { }{-}1.54563{\text{X}}_{3} { }{-}0.50375{\text{X}}_{1} {\text{X}}_{2} + 0.08875{\text{X}}_{1} {\text{X}}_{3} + 0.14625{\text{X}}_{2} {\text{X}}_{3} { }{-}0.4097{\text{X}}_{1}^{2} + 0.410543{\text{X}}_{2}^{2} { }{-}0.0940759{\text{X}}_{3}^{2}$$
(3)
$${\text{Y}}_{3} = 1.18401{ }{-}0.00795488{\text{X}}_{1} { }{-}0.0178066{\text{X}}_{2} { }{-}0.125284{\text{X}}_{3} + 0.025{\text{X}}_{1} {\text{X}}_{2} + 0.1125{\text{X}}_{1} {\text{X}}_{3} { }{-}0.0825{\text{X}}_{2} {\text{X}}_{3} { }{-}0.0697051{\text{X}}_{1}^{2} { }{-}0.0626341{\text{X}}_{2}^{2} { }{-}0.123537{\text{X}}_{3}^{2}$$
(4)
$${\text{Y}}_{4} = 2.90059{ }{-}0.0358794{\text{X}}_{1} { }{-}0.0343149{\text{X}}_{2} { }{-}0.36624{\text{X}}_{3} { }{-}0.02375{\text{X}}_{1} {\text{X}}_{2} + 0.05875{\text{X}}_{1} {\text{X}}_{3} { }{-}0.02125{\text{X}}_{2} {\text{X}}_{3} { }{-}0.0123214{\text{X}}_{1}^{2} { }{-}0.00348255{\text{X}}_{2}^{2} + 0.355683{\text{X}}_{3}^{2}$$
(5)
$${\text{Y}}_{5} = 15.2439{ }{-}2.61323{\text{X}}_{1} { }{-}0.184928{\text{X}}_{2} { }{-}0.880224{\text{X}}_{3} + 0.73{\text{X}}_{1} {\text{X}}_{2} + 0.27{\text{X}}_{1} {\text{X}}_{3} + 0.105{\text{X}}_{2} {\text{X}}_{3} + 1.8859{\text{X}}_{1}^{2} { }{-}0.000308033{\text{X}}_{2}^{2} { }{-}0.375946{\text{X}}_{3}^{2}$$
(6)

where Y1 = response for alcohol content (°P), Y2 = response for TSS (g/100 g), Y3 = response for TTA (% lactic acid), Y4 = response for pH, Y5 = response for viscosity (cm/min), X1 = Cooking time (hr), X2 = Fermentation temperature (°C), X3 = Fermentation time (hr).

The effect of input factors on the physicochemical properties of the optimal beer brew

Independent variables, cooking time (hr) coded as (X1), fermentation temperature (°C) coded as (X2), and time (hr) coded as (X3) were optimized. The optimization goal for all independent variables was set to ‘target’ as dictated by the nature of the study. The responses alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, and viscosity (cm/min) were considered for optimization. The software generated 100 optimization solutions each with a desirability value of 1. To select a suitable solution, prediction values of each solution were compared to prediction values of the constructed ANN. Yeast survival and proliferation, under-and-over cooking, shelf-life associated spoilage, and conditions’ applicability in real-life (study objectives) were also considered. Taking these variables into account, a solution that favored these considerations was selected. A cooking time of 1.1 h, fermentation temperature of 29.3 °C, and fermentation time of 25.9 h were optimal bioprocessing conditions. The parameters (alcohol content, TSS, TTA, pH, and viscosity) were subsequently investigated and the results are provided in Table 6. The customary brew (CB) was prepared by cooking the mixed ingredients for 30 min and leaving the cooked slurry to ferment at 25 °C for 24 h. The CB was then compared with the optimized brew (OPB).

Table 6 Physicochemical properties of umqombothi.

The OPB was found to have a low pH (3.27 ± 0.03) compared to the CB (4.23 ± 0.02) (Table 6). As a result, the OPB had a higher alcohol content (13.63 ± 0.12°P) and a higher TTA (0.68 ± 0.02% lactic acid). In preparing high-quality umqombothi, a 60 min cooking time has been suggested to be ideal39. A cooking time of 1.1 h did not under-/over-gelatinize the starch and provided adequate nutrients to yeasts cells24. In addition, the achieved gelatinization improved water absorption into the granules, thereby improving the viscosity40. This was reflected in the viscosity obtained for the OPB, which had more a desirable viscosity value compared to the CB (Table 6). A fermentation temperature of 29.3 °C was optimal for higher production of alcohol in the OPB (Table 6). A higher TSS in the OPB (Table 6) described the type of sugar conversion and its dependence on temperature for a rich, finished beer41. The slightly higher fermentation temperature and a relatively short fermentation time in the OPB appeared to improve the overall physicochemical properties of umqombothi. A fermentation time of 25.9 h was ideal for the fermentation rate and final beer.

ANN training, validation, and testing on experimental responses

An appropriate ANN construction involves the selection of network architecture, determination of hidden layers and number of neurons in each layer, learning—training—validation, and verification of the data18. In building a better ANN model, the number of the hidden layers between inputs and output must be appropriately trained and fitted18. To achieve this, the number of neurons in the hidden was varied (i.e., 5, 10, and 20 neurons in the hidden layer) (data not reported). To further study the responses of the model, three different training algorithms were evaluated. When 10 neurons in the hidden layer were used, all the algorithms rapidly generated solutions with high R and R2 values (data not reported). However, when the neurons were increased to 20, the number of reiterations increased in the BR algorithm, thus taking longer to generate a solution. In contrast, both the LM and SCG algorithms were not significantly affected by an increase or decrease in the number of neurons and maintained a higher rapidity in generating solutions. The SCG uses second-order approximation, resulting in fewer iterations and faster learning42. This may be due to the algorithm using a step-size scaling mechanism that avoids a timewasting line search per learning iteration43,44.

Adequate training, validation, testing, and overall prediction accuracy were observed when the LM algorithm was used (Table 7). The LM algorithm which may be the fastest of the three training algorithms specifically works with loss functions presented in the form of a sum of squared errors (SSE)45,46. Unfortunately, LM cannot be applied to the cross-entropy error and the root mean squared error functions46. For functioning approximation problems, the LM training algorithm was able to obtain lower MSE than all other algorithms among regularization techniques. As a result, the LM is the recommended choice with better performance in terms of rapidity and the overfitting problem when there are a few thousand instances and a few hundred parameters for training the ANN46,47. In an unrelated study, the LM training algorithm was found to show the highest accuracy in comparison to different training algorithms in a MLP model that forecasted chemical elements distribution in the topsoil45.

Table 7 Training, validation, and testing performance indices.

The ANN training using the LM algorithm stopped automatically when generalization stopped, indicated by an increase in the MSE of the validation samples. In measuring performance indices of the ANN, the MSE is the most used and simplest error function48,49. The MSE measures the ability of the model to predict responses accurately, with a lower MSE showing a higher modeling ability18. In combination, R2 and MSE evaluated the overall accuracy of the model18. The coefficient of correlation (R) was used to measure the correlation between inputs and targets. R = 1 described a close relationship, and R = 0 described a random relationship. ANN models for alcohol content, TSS, TTA, and viscosity had overall R2 values of 0.96, 0.96, 0.81, and 0.92, respectively (Table 7). These values were closer to 1, suggesting high reliability in model prediction accuracy. The overall R2 value for pH was 0.50 representing a 50% reproducibility. Overall, a high correlation between inputs and targets was observed for alcohol content (0.98), TSS (0.98), TTA (0.90), and viscosity (0.96) (Fig. 3).

Figure 3
figure 3

Response plots of the ANN: (A) Alcohol content (°P), (B) TSS (g/100 g), (C) TTA (% lactic acid), (D) pH, (E) viscosity (cm/min).

Apart from MSE values, the ANN was further assessed using performance curves. Performance curves display the network’s incremental training process and the direction in which it learns. These curves plot training record error values against the number of training epochs. Consequently, the learning curve is a plot describing a model learning performance over time or experience50. Performance curves are useful in diagnosing problems with learning aspects such as unrepresentative training datasets, underfitting models, unrepresentative validation datasets, and overfitting models50. The ANN best validation performance curves for the responses are shown in Fig. 4. The ANN achieved the best learning and the lowest error after a few iterations (epochs). The best validation performance for each network was taken from the epoch with the lowest validation error. Both alcohol and TTA had the shortest iterations before achieving the best validation performance. In contrast, TSS achieved its best validation performance at epoch 5. After more epochs of training, the error is generally reduced but may start to increase on the validation dataset as overfitting of the training data occurs51. All the networks showed a good learning rate for the training stage and a high learning rate for the validation and testing stages52. In addition, both the training and validation showed a good fit displayed by training and validation MSE (loss) values which decreased to a point of stability with relatively nominal gaps between the two final MSE (or loss) values50. Overall better learning is described by error scores closer to 0, thus indicating that the training dataset was learned thoroughly and minimal mistakes were made50.

Figure 4
figure 4

Validation performance plot of the ANN: (A) Alcohol content (°P), (B) TSS (g/100 g), (C) TTA (% lactic acid), (D) pH, (E) viscosity (cm/min).

Comparison between the RSM and ANN responses

An optimization prediction model developed by RSM was assessed by comparing its prediction accuracy with that of the ANN which was also used to validate the entire process. Table 8 shows the error comparison obtained from both and ANN predictions. The comparative error analysis was used to verify the prediction accuracy and generalization capacity of both models in optimizing the bioprocess53,54. Overall, the ANN model showed lower error values than the RSM, indicating lower computational deviations and an advanced generalization capability11,54. As a result, ANN displayed a higher prediction accuracy and better model fitting18. On the other hand, RSM prediction values can be accepted with a higher degree of confidence since they are closer to experimental values and ANN prediction values18,55. The results from Table 8 show a close correlation between the experimental values and RSM and ANN’s predicted values. Both RSM and ANN models showed a relatively high number of inexact predictions for viscosity.

Table 8 RSM and ANN predictions values.

The difference between predicted and experimental values directly contributed to the extent deviation in predictive capacity of each model. While RSM is recommended for modelling new processes, its sensitivity may be limited55. Despite this limitation, RSM has an obvious way of showing the effect of individual elements and their interactions on a specific system11. For example, the effect on a specific parameter is shown by a greater higher value of coefficients in ANOVA57. On the other hand, a higher number of inputs are required for ANN than RSM to have better predictions55. ANN cannot give such insights into the system directly since it is a ‘black box’56. Nonetheless, ANN can universally describe high-level interactions in non-linear systems without prior specification for suitable fitting function55,57. Additionally, ANN can calculate multi-responses in a single process53. As depicted by the close agreement between the experimental and predicted values, RSM and ANN are adequate for developing a bioprocess that optimally produces umqombothi. Advanced soft computing approaches like ANN may be preferred in the case of data sets with a limited number of observations in which regression models fail to capture reliably18. The closeness of the experimental values and predicted suggest that the non-linear fitting effects of the model are good, recommending the use of the proposed procedure18,57. A coupled modeling approach can thus be applied in bioprocess method development for umqombothi and related variations.

Conclusion

Both RSM and ANN were effective bioprocess development tools that facilitated the optimization of umqombothi. The effectiveness of RSM was shown by R2 closer to 1. The R2 values were 0.94, 0.93, 0.99, and 0.73 for alcohol, TSS, pH, and viscosity respectively, showing reliability and reproducibility above 70%. Similarly, ANN displayed a high degree of accuracy. Constructed ANN models for alcohol, TSS, TTA, and viscosity had R2 values of 0.96, 0.96, 0.81, and 0.92 respectively. As result, a good correlation between the experimental and predicted values suggests that a coupled approach may positively impact the bioprocess and the final product. However, further investigation of other key parameters (i.e., starter culture, the content and ratio of raw materials, souring time and temperature, and cooking temperature) is still required. Furthermore, the use of an additional tool such as genetic algorithm may resolve computational and modeling limitations.