1 Introduction

Despite the advantages associated with microalgae-based biofuels, its commercialization remains a challenge due to high energy input requirements in certain key processing steps, like microalgae biomass harvesting. Low biomass concentration in growth media, and the small size of microalgae coupled with their negative surface charge, are some of the features that make microalgae harvesting difficult and energy intensive. Centrifugation can also be a costly steps [1, 2]. Therefore, the economic feasibility of microalgae biomass production is limited to high-value products like dietary supplements [3]. The quest for sustainable and effective microalgae harvesting for products such as biofuels is critical in order to make microalgae an extensively used resources [1].

Flocculation is considered the most convenient microalgae biomass recovery technique [4]. Traditionally, flocculation is generally induced by the addition of chemical flocculants such as metallic hydroxides and synthetic polymers, which aggregate and settle the algal cells, thus concentrating the microalgae suspension for easy separation [5]. Microalgae flocculation methods that have been evaluated include chemical, electric field, and autoflocculation [6]. Autoflocculation, otherwise known as alkaline flocculation, can be induced by an increase in pH either via inorganic carbon consumption by photosynthetic microalgae or by the addition of alkali compounds [7]. Wu et al. [8] successfully flocculated different freshwater and marine microalgae species by increasing the pH of the microalgae suspensions using sodium hydroxide. The authors reported more than 90% flocculation efficiency for all species and recommended possible reuse of the culture medium after neutralizing the pH. The application of commercial metallic hydroxides to induce flocculation are well documented, but there are no reports on the use of inexpensive and sustainable ash-derived alkalis from solid agricultural wastes like cocoa pod husk. Although the ash derived from these bio-resources has been employed in transesterification of neem seed oil [9, 10], it has yet to be applied for recovery of microalgae from suspension. Hence, this work aims at investigating the potential of cocoa pod ash-derived alkalis (CPADA) to induce flocculation.

The study of microalgae flocculation modeling and parameter optimization via design of experiment (DoE) has been scarcely reported. The need for modeling and optimization of the microalgae flocculation process becomes necessary to save time in microalgae harvesting, cut costs, and above all, improve harvesting efficiency. In microalgae harvesting, there are few reports on the application of statistical mathematical modeling and optimization techniques like response surface methodology (RSM) [11, 12], while none are observed on the use of the non-statistical models based on artificial intelligence like artificial neural networks (ANNs) and genetic algorithms (GA). RSM is a statistical and mathematical tool that is suitable to develop, improve and optimize processes in which the output of interest is influenced by several parameters [13]. It is an empirical-based modeling technique in which one or more dependent factors are related to independent variables, and its uniqueness lies in the fact that statistical indicators on individual model terms and their interactions are calculated [14]. ANNs are described as a computational technique that mimics the way the human brain processes information [14, 15]. ANNs have received increased applications in various fields of research, such as engineering, agriculture, energy, vehicle fuel consumption [16] and oil and gas [17]. This owes to its ease of use, and more accurate prediction of complex and non-linear systems with large inputs [15, 18]. The predictive ability of both RSM and ANNs have been evaluated [14, 19,20,21,22,23], with the later giving better predictions. GA is an optimization solver, based on natural selection and biological evolution mechanisms such as mutation, selection, inheritance and crossover [24]. Recently, the application of GA in areas of bioenergy engineering, solar systems [25], computational science, chemistry, economics, mathematics, bioinformatics [24], composite materials [26, 27], and thermal engineering [26, 28] are gaining more attention.

The present study aimed at evaluating the microalgae flocculating potential of bio-based derived alkali from cocoa pod husk wastes. The D-optimal design of the DoE was employed to study the influence of biomass concentration and flocculant dose with their interactions on the flocculation of microalgae from culture broth. The modeling and predictive capability of both RSM and ANN were assessed. A multiple-objective optimization of the flocculation process variables, biomass concentration, and flocculant dose were carried out using RSM and RSM-GA (RSM coupled with GA), with views to examine the trade-offs between the objectives.

2 Materials and methods

2.1 Materials

The waste cocoa pod husks employed for this work were obtained from a cocoa plantation in, Oyo state, Nigeria. Thermolyne benchtop muffle furnace was used for calcinating the cocoa pod husk. All chemical reagents used are of analytical grade purchased from Fisher Scientific International, Inc. (Hampton, New Hampshire, USA) and VWR International Inc. (Radnor, Pennsylvania, USA) The microalgae species (Nannochloropsis oculata and Dunaliella salina) used were purchased from UTEX Culture Collection of Algae at The University of Texas at Austin, USA.

2.2 Culturing of microalgae

The N. oculata, and D. salina were cultured in Artificial Seawater and 2X Erdschreiber’s media, respectively. These are prepared using deionized water and synthetic seawater. 1.5-l photobioreactors provided with magnetic stirrer were used for culturing both species for 21 days at room temperature. Air was sparged at 0.02 vvm with 2% CO2 sparging rate, and light was supplied using daylight florescent tubes at 3500 lx with 12:12 h light cycle. The microalgae biomass concentrations (dry weight) were evaluated using gravimetry method: 20 ml of microalgae broth was vacuum filtered using a pre-weighed membrane filter with pore size 0.2 µm and diameter of 47 mm. Then it was rinsed with deionized water and the filtered biomass sample was dried at 80 °C in an oven overnight. Thereafter, it was cooled in a desiccator and weighed to calculate the dry weight in 20 ml of broth. This determination was performed in duplicate. The flocculation experiment was conducted at 21 days of culturing with microalgae biomass concentration of 816 and 825 mg/l for N. oculata, and D. salina, respectively.

2.3 Preparation of CPADA

The cocoa pod husks were washed several times using deionized water. The cocoa pod husks were cut into smaller pieces and sun dried to constant weight. The grain sizes of the dried cocoa pod husks were further reduced by grinding and subsequently put in a crucible and calcinated in a muffle furnace at 540 °C for 5 h. Three different concentrations of ash extracts were prepared by adding 9, 12 and 15 g of the calcinated sample in 120 ml of deionized water in conical flasks. The mixtures were covered to prevent contamination and allowed to sit for 48 h for maximum extraction. Thereafter, the mixtures were filtered with Whatman filter papers to remove the remaining solids. The pH of the filtrates was measured using an Orion Star A211 pH meter, and the filtrate with the highest pH was chosen for the flocculation experiment. Also, the filtrate and the cocoa pod husk ash were characterized as described below.

2.4 Characterization of CPADA and the ash

The ash and the filtrate, CPADA, obtained after 48 h extraction with deionized water, were analyzed for their elemental composition using scanning electron microscope-energy-dispersive X-ray spectroscopy (SEM–EDS) and inductively coupled plasma mass spectrometry (ICP-MS), respectively. The SEM–EDS values reported were obtained when the cocoa pod husk ash sample that was bombarded with 20 kV electrons from a Joel JSM 7500F field emission scanning electron microscope (equipped with a high brightness conical field emission, FE gun and a low aberration conical objective lens) at the Texas A&M University Materials Characterization Facility. Meanwhile, the characteristic x-rays were detected with an Oxford EDS system equipped with X-ray mapping and digital imaging. The elemental composition reported is taken as an average of two replicate data obtained at different sites on the sample microstructures given by the SEM.

The identification and quantification (concentration, ppm) of metallic content of the ash-filtrate, CPADA, was reported from the ICP-MS equipped with an SPS-3 autosampler, a micro-concentric nebulizer, nickel cones and a peristaltic sample delivery pump. The sample was analyzed in duplicate using helium kinetic energy discrimination (KED) mode and argon standard mode.

2.5 Physiochemical properties of CPADA extract

The pH of the filtrates was measured using an Orion Star A211 pH meter, and the filtrate with the highest pH was chosen for the flocculation experiment. The conductivity of the filtrate sample was measured using an Oakton ecoTest conductivity meter (EC1 0 µS to 20 mS) and the total dissolved solids was subsequently estimated based on conductivity value obtained. The concentration of the CPADA extract was estimated by titrating 10 mL of the extract against 1 N hydrochloric acid with the assumption that potassium hydroxide is the dominant alkali based on the EDS and ICP-MS results.

2.6 Experimental design for flocculation and RSM modeling

A three-level-two factor D-optimal design was employed in modeling the flocculation of microalgae, and sixteen experimental conditions were generated for each of the two species of microalgae. The Design-Expert 8.0.3 Trial software (Stat-Ease Inc., Minneapolis, MN) program was used for RSM modeling and optimization. The independent variables considered include biomass concentration (X1) and flocculant dose (X2), while the flocculation efficiency (Y1) and concentration factor (Y2) were the dependent variables. Table 1 illustrates the levels of process parameters studied in this work. In order to relate the response variables to the two independent variables employed, multiple regressions were used to fit the coefficient of the second order mathematical model of the responses. The goodness of fit of the model was evaluated by using a test of significance and an analysis of variance (ANOVA). Equation 1 presents the general form of the fitted second order mathematical response of the model.

$${\text{Y}} =\upalpha_{0} + \mathop \sum \limits_{{{\text{i}} = 1}}^{\text{n}}\upalpha_{\text{i}}\, {\text{X}}_{\text{i}} + \mathop \sum \limits_{{{\text{i}} = 1}}^{\text{n}}\upalpha_{\text{ii}}\, {\text{X}}_{\text{i}}^{2} + \mathop \sum \limits_{{{\text{i}} < j}}^{\text{n}}\upalpha_{\text{ij}}\, {\text{X}}_{\text{i}}\, {\text{X}}_{\text{j}} +\upvarepsilon$$
(1)

Y is the predicted response (flocculation efficiency or concentration factor), \(\alpha_{0}\) is the intercept, \(\alpha_{i}\), \(\alpha_{ii}\), \(\propto_{ij}\) (i = 1, 2, … , n) are the linear, second order and interaction coefficients respectively. \(\varepsilon\) is the error term of the model.

Table 1 The D-optimal design ranges and level of independent parameters studied with (a) D. salina and (b) N. oculata

2.7 ANN model development

In this case, the neural network toolbox of MATLAB 2018b was employed. In modeling the microalgae flocculation process, a feedforward, back-propagation multilayer perception neural network was used. The ANN architecture employed in the training consisted of an input layer with two neurons (biomass concentration and flocculant dose), an output layer of a single neuron flocculation efficiency and concentration factor) and a hidden layer with default setting of ten neurons (Fig. 1). Meanwhile, the transfer function used for the input and output layers were hyperbolic tangent sigmoidal and pure linear, respectively. There were two data sets (flocculation experimental results for D. salina and N. oculata) in this study, and each data set had two response variables. The ANN training was done separately for each response variable using the same independent variables. In each ANN training, the data set were divided using the default setting of training, validating and testing sets of 70%, 15% and 15% respectively. This was done with aim of evaluating the model ability to predict hidden data which were not employed for the training, thus assessing the generalization of ANN models [29]. ANN trainings were done until the lowest value of the mean square error was obtained, and the correlation coefficient was closer to 1.

Fig. 1
figure 1

Artificial neural network architecture topology

2.8 Multi-objective optimization by RSM and GA

The optimization of the two parameters investigated was carried out using the developed models. The optimization algorithm of the RSM and the RSM-developed quadratic model equations coupled with genetic algorithm (GA) were employed to optimize the response variables. In response surface optimization, the aim is to locate a desirable point within the design space, which can either be a maximum or a minimum [11]. To maximize the two response factors (flocculation efficiency and concentration factor) by RSM, the independent variables were set at the ranges of values investigated while the dependent factors were set at maximum.

In the GA optimization, the multi-objective optimization technique was employed. Generally, in multi-objective optimization, the objectives are usually conflicting, thus intercepting simultaneous optimization of the individual objectives [30]. Many real life problems are multi-objective in nature; in solving such problems, a single best solution in respect to all objectives may not exist [27, 31]. Therefore, there may exist a set of best solutions (pareto-optimal solutions), which are considered superior to the rest in the search space when all objectives are taken into consideration, but at the same time inferior to other solutions in the space [31]. Usually, decision makers are interested in knowing the alternative solutions that exist for multi-objective problems, with a view to examine for trade-off. In view of these, the multi-objective optimization of the GA was also employed in the flocculation process parameter optimization study. The developed RSM mathematical model equations were used as fitness functions for the GA optimization. A trial and error method were employed in selecting population type, creation function, crossover fraction, mutation function, crossover function and migration direction. The final GA parameters setting used for the optimization were population size (50), population type (double vector), creation function (uniform), crossover function (1.0), mutation function (uniform) and crossover function (scattered), and others are left at default settings. Meanwhile, the predictions from both optimization methods (RSM and RSM-GA) were validated by flocculation experiments performed in duplicate, and the average values obtained were compared with the predicted values. The Design Expert version 8.0.3 (Stat-Ease Inc., Minneapolis, MN, USA) was employed in the study of flocculation of microalgae by RSM, and the GA optimization was done with GA tool kit in MATLAB 2018b (Mathworks Inc., Natick, MA, USA).

2.9 Model evaluation

The models used for the prediction of flocculation process were evaluated using statistical indicators such as correlation coefficient (R), coefficient of determination (R2), and mean square error (MSE). These indices were calculated using Eqs. 24 [22]:

$$R = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{p,i} - q_{p,ave} } \right) \times \left( {q_{a,i} - q_{p,ave} } \right)}}{{\sqrt {\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{p,i} - q_{p,ave} } \right)^{2} } \right]\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{a,i} - q_{p,ave} } \right)^{2} } \right]} }}$$
(2)
$$R^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{a,i} - q_{p,i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{p,i} - q_{a,ave} } \right)^{2} }}$$
(3)
$$MSE = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{a,i} - q_{p,i} } \right)^{2} }}{n}$$
(4)

2.10 Flocculation experiment

The effects of flocculating parameters such as biomass concentration and CPADA dosage were investigated by analyzing the biomass recovery and concentration factor. About 20 ml of microalgae suspension was placed in a 50 ml beaker, and the prepared CPADA was added, depending on required dosage stated in the experimental design, while the mixture was stirred with a magnetic stirrer at 80 rpm for 2 min, then followed by gentle mixing at 10 rpm for 20 min. Thereafter, the mixture is poured into a graduated cylinder. The suspension was allowed to settle for 30 min and a sample was taken from the middle of the clarified zone. The optical density of the samples was measured with a Unico S1200 Visible spectrophotometer. Also, the initial height of the microalgae suspension and the settled floc height were measured using a meter rule. These procedures were repeated in duplicate for N. oculata and D. salina. Thus, the biomass recovery and concentration factor were calculated as follows;

$${\text{Biomass recovery}},\,\upeta = \frac{{{\text{OD}}_{{650{\text{i}}}} - {\text{OD}}_{{650{\text{f}}}} }}{{{\text{OD}}_{{650{\text{i}}}} }}$$
(5)
$${\text{Concentration factor}} = \frac{{{\text{H}}_{0} }}{{{\text{H}}_{\text{f}} }}$$
(6)

The variable \({\text{OD}}_{{650{\text{i}}}}\) represents optical density of microalgae suspension measured at 650 nm before the addition of CPADA and \({\text{OD}}_{{650{\text{f}}}}\) represents optical density measured at 650 nm after the addition of CPADA and settling. The \({\text{H}}_{0}\) and \({\text{H}}_{\text{f}}\) are the initial microalgae suspension and the settled floc heights respectively.

3 Results and discussion

3.1 CPADA and the ash Characterization

The Table 2 shows EDS results which indicate that carbon, oxygen, chlorine, potassium, phosphorus and silicon are the dominant elements, while the potassium is the principal metallic element. The physio-chemical properties of the bio-based alkali, CPADA, is shown in Table 3. The relatively strong signal intensities for potassium, oxygen, carbon and chlorine indicate higher concentrations of these elements in the ash sample, Fig. 2a. Even though it is difficult to infer more quantitative data from this spectral analysis, but there is an indication that the spectra gave consistent results with the chemical analysis of the ash sample data in Table 2. Figure 2b reveals the morphology of the ash sample by SEM image, which shows a fibrous and spongy-like microstructure of the sample. This is an indication of increased surface area of the ash for maximum extraction of the metallic components. Moreso, the ICP-MS results displayed in Table 2 also indicate that metallic elements present in the filtrate obtained from the ash sample are potassium, calcium, sodium, manganese and magnesium. From this, it is obvious that potassium has the highest concentration of approximately 51,489 ppm, followed by calcium and sodium with about 1450 and 210 ppm, respectively. Therefore, one can infer that the ICP-MS results are consistent with that of the SEM–EDS. Hence, it is appropriate to submit that K was the key active element responsible for the alkaline nature of the bio-based alkali, CPADA. The literature has shown that K has been reported to have the highest concentration among the metallic elements present in cocoa pod husk ash samples and their respective filtrates. Betiku et al. [9] reported that calcinated cocoa pod husk has 59% K, while Taiwo and Osinowo [32] documented the composition of K in the form of K2CO3 in cocoa pod husk ash to be about 57%.

Table 2 Elemental composition and concentration of bio-based alkali, CPADA and ash
Table 3 Physio-chemical properties of CPADA
Fig. 2
figure 2

Elemental spectra by SEM–EDS (a) SEM image (b) of the cocoa pod ash

3.2 Mathematical regression model development by RSM

The results of the flocculation experiments for the D. salina and N. oculata using the produced bio-alkalis, CPADA, are presented in Tables 4 and 5. Also, the tables show the experimental and the predicted values by the RSM and ANN techniques. The results of modeling the microalgae flocculation process gave mathematical equations which relate the responses (flocculation efficiency and concentration factor) to independent variables (biomass concentration and flocculant dose) in terms of the actual values as described in Eqs. 710.

D. salina:

$${\text{FE}} = 89.74 + 12.76{\text{X}}_{1} - 7.10{\text{X}}_{2} + 4.20{\text{X}}_{1} {\text{X}}_{2} - 6.25{\text{X}}_{1}^{2} + 0.39{\text{X}}_{2}^{2}$$
(7)
$${\text{CF}} = 56.99 - 37.67{\text{X}}_{1} - 13.02{\text{X}}_{2} + 7.87{\text{X}}_{1} {\text{X}}_{2} + 6.62{\text{X}}_{1}^{2} + 0.38{\text{X}}_{2}^{2}$$
(8)

N. oculata:

$${\text{FE}} = 19.39 + 139.33{\text{X}}_{1} + 54.06{\text{X}}_{2} - 10.61{\text{X}}_{1} {\text{X}}_{2} - 84.63{\text{X}}_{1}^{2} - 16.06{\text{X}}_{2}^{2}$$
(9)
$${\text{CF}} = 45.44 - 32.65{\text{X}}_{1} - 9.51{\text{X}}_{2} + 2.35{\text{X}}_{1} {\text{X}}_{2} + 9.06{\text{X}}_{1}^{2} + 1.2{\text{X}}_{2}^{2}$$
(10)

\({\text{X}}_{1}\), \({\text{X}}_{2}\) are the biomass concentration and flocculant dose, respectively. The fitness of the models was evaluated by the values of the coefficient of determination R2, absolute average deviation (AAD) and mean square error (MSE), which are 0.89, 0.752% and 0.945 respectively for D. salina flocculation efficiency model. The corresponding values obtained for D. salina biomass concentration factor model is 0.93, 9.012% and 4.489, respectively. Similar results were obtained for the flocculation of N. oculata. The calculated values for model fitness are the R2 (0.89), AAD (3.115%) and MSE (11.189) for N. oculata flocculation efficiency. The corresponding values obtained for R2, AAD and MSE of the N. oculata biomass concentration factor is 0.87, 4.156% and 0.089 respectively. In order to consider fitted models to be good, it is recommended that the R2 should not be less than 0.80 [33]. Thus, the models obtained for the microalgae flocculation processes are considered a good fit, which implies that 89%, 93%, 89% and 87% (Tables 6 and 7) of variability in the responses could be explained by Eqs. 7-10. The statistical significance of the mathematical regression models, the two independent factors and their possible interactions for the flocculation efficiency and concentration factors were assessed by statistical F- and p values. The mathematical regression models are statistically significant in all cases tested; the model F-values (> 13.21) with their respective p values less than 0.05 (Tables 6, 7) are within values recommended for model to be significant. This implies that there was less than 5% chance that every regression model with an F-value this large could result from noise. The individual terms in the model equations were tested with ANOVA at 95% confidence level (p < 0.05), based on the F- and p values. The values of p less than 0.05 shows that model term is significant, otherwise not significant. Therefore, the significant model terms for D. salina flocculation efficiency and concentration factors are X1, X2, X1X2 and X 21 , while X 22 is insignificant for both flocculation efficiency and concentration factor. Similarly, the model terms that are significant for the flocculation efficiency and concentration factor for N. oculata are X1, X2, X1X2 and X 21 , with the X 22 not significant only for the concentration factor. In view of these results, when the insignificant model terms in Eqs. 710 are ignored, but in the \({\text{X}}_{1}^{2}\) in Eq. 8 is not ignored due to its numerical significance. Therefore, the model equations are simplified as follows:

Table 4 The experiment and predicted values of flocculation efficiency and concentration factor according to the D-optimal design of RSM for harvesting D. salina
Table 5 The experiment and predicted values of flocculation efficiency and concentration factor according to the D-optimal design of RSM for harvesting N. oculata
Table 6 Test of significance for every regression coefficient and ANOVA for D salina harvest with CPADA
Table 7 Test of significance for every regression coefficient and ANOVA for N. oculata harvest with CPADA

D. salina:

$${\text{FE}} = 89.74 + 12.76{\text{X}}_{1} - 7.10{\text{X}}_{2} + 4.20{\text{X}}_{1} {\text{X}}_{2} - 6.25{\text{X}}_{1}^{2}$$
(11)
$${\text{CF}} = 56.99 - 37.67{\text{X}}_{1} - 13.02{\text{X}}_{2} + 7.87{\text{X}}_{1} {\text{X}}_{2} + 6.62{\text{X}}_{1}^{2}$$
(12)

N. oculata:

$${\text{FE}} = 19.39 + 139.33{\text{X}}_{1} + 54.06{\text{X}}_{2} - 10.61{\text{X}}_{1} {\text{X}}_{2} - 84.63{\text{X}}_{1}^{2} - 16.06{\text{X}}_{2}^{2}$$
(13)
$${\text{CF}} = 45.44 - 32.65{\text{X}}_{1} - 9.51{\text{X}}_{2} + 2.35{\text{X}}_{1} {\text{X}}_{2} + 9.06{\text{X}}_{1}^{2}$$
(14)

The 3-dimensional response surface plots (Figs. 3, 4) simultaneously characterize the mathematical regression equations and depicted the interaction between the response and independent experimental variables. Also, these plots assist in identifying the optimum levels for each parameter studied to achieve maximum flocculation efficiency and concentration factor.

Fig. 3
figure 3

Response surface plots showing interactive effect of independent variables on a flocculation efficiency, b concentration factor for D. salina

Fig. 4
figure 4

Response surface plots showing interactive effect of independent variables on a flocculation efficiency, b concentration factor for N. oculata

3.3 Dunaliella salina biomass concentration and CPADA, flocculant dose effect

The interaction between biomass concentration and flocculant (CPADA) dose on the flocculation efficiency and concentration factor of D. salina are shown in Fig. 3a, b, respectively. It is obvious from Fig. 3a, b that biomass concentration has a significant (p < 0.05) effect on both the flocculation efficiency and concentration factor of D. salina. In all cases, the biomass concentration effect is more significant than the flocculant dose effect. Apparently from Fig. 3a, at lower values of CPADA flocculant dose and biomass concentration, higher flocculation efficiencies were obtained, while the reverse was observed at higher values. Also, for the flocculation of D. salina, an increase in its biomass concentration resulted in an increase in flocculation efficiency, up to a maximum of about 94% at biomass concentration slightly higher than 0.92 g/l (Fig. 3a). Any further increase in biomass concentration beyond this value led to a decrease in flocculation efficiency. In Fig. 3b, at lower values of biomass concentration of D. salina, higher concentration factors were observed compared to high biomass concentration. Meanwhile, an increase in the flocculant dose has very little effect on the D. salina concentration factor.

3.4 Nannochloropsis oculata biomass concentration and CPADA, flocculant dose effect

Figure 4a, b showed the interactive effect of biomass concentration and CPADA flocculant dose on flocculation efficiency and concentration factor of N. oculata. Both independent variables showed a significant (p < 0.05, Fig. 4a) effect on the flocculation efficiency and concentration factor. The optimum N. oculata flocculation efficiency of about 95% was observed at biomass concentration and flocculant dose of 0.82 g/l and 0.50% v/v respectively. At this optimum condition, any increase or decrease in either the biomass concentration or CPADA flocculant dose resulted in a decrease in flocculation efficiency. A similar result was observed by Wu et al. [8], who inferred that the decrease in flocculation efficiency of both marine and freshwater microalgae against an increase in their biomass concentration can be attributed to limited amount of magnesium in the growth medium that can influence the flocculation of excess microalgae cells. In Fig. 4b, both flocculant dose and biomass concentration showed a significant effect on concentration factor of N. oculata, with the later showing a much higher significance. In this figure, an increase in CPADA flocculant dose showed little or no significant effect on the concentration factor, but an increase in biomass concentration resulted in an increase in concentration factor up to a maximum of 7 at about 0.82 g/l biomass concentration, where further increase beyond this level led to a decline in concentration factor. Zheng et al. [12] also reported a decrease in concentration factor at high values of biomass concentration for Chlorella vulgaris and Chlorella protothecoides.

3.5 ANN modeling, prediction and comparison with RSM

The results of ANN modeling prediction using the chosen ANN architecture (Fig. 1) with two input layers of two neurons (biomass concentration and flocculant dose), one hidden layer of 10 neurons and output layer of one neuron (flocculation efficiency or concentration factor) are shown in Tables 4 and 5. Plots of the ANN predicted values against experimentally observed values for the flocculation efficiency and concentration factor of both species of microalgae are shown in Fig. 5, 6, 7 and 8. The correlation coefficients, R for the training, validation, testing and the whole data set for the flocculation efficiency of D. salina are 0.999, 1.000, 1.000 and 0.999 respectively (Fig. 5). Similar values of R were also obtained for the D. salina concentration factor as 0.995, 1.000, 1.000 and 0.997 for the training, validation, testing and the whole data set respectively (Fig. 6). Moreso, the correlation coefficients, R for the training, validation, testing and the whole data set for the flocculation efficiency of N. oculata are 0.999, 1.000, 1.000 and 0.999, respectively (Fig. 7). The values of R obtained for the N. oculata concentration factor are 0.994, 1.000, 1.000 and 0.994 for the training, validation, testing and the whole data set, respectively (Fig. 8). The values of R obtained show that ANN models (both flocculation efficiency and concentration factor) gave good correlations between the experimental and predicted values of both microalgae species. Figure 9 shows the plots of MSE against epoch for the training, validation and testing of the developed ANN architectural topology of 2-10-1. It is obvious from the plots that the large values of MSE plummeted and the best validation performances are 0.003 at epoch 2, 0.140 at epoch 2, 0.248 at epoch 4 and 0.011 at epoch 3 for the flocculation process of the microalgae species.

Fig. 5
figure 5

Correlation plots of predicted and experimental observed values for FE of D. salina

Fig. 6
figure 6

Correlation plots of predicted and experimental observed values for CF of D. salina

Fig. 7
figure 7

Correlation plots of predicted and experimental observed values for FE of N. oculata

Fig. 8
figure 8

Correlation plots of predicted and experimental observed values for CF of N. oculata

Fig. 9
figure 9

MSE variation for the training, validation and testing data for the microalgae flocculation process a FE for D. salina, b CF for D. salina, c FE for N. oculata, d CF for N. oculata

The developed RSM and ANN models were evaluated for their generalization capability. This is done by calculating their respective R, R2, MSE, root mean square error (RMSE) and AAD. The values obtained are displayed in Tables 8 and 9. Values of R obtained for ANN models are higher and closer to 1 than the RSM models, which is an indication that the models fit good, but with the former has a better fit. Similarly, the R2 values obtained for the ANN models are also higher than the corresponding R2 values of the RSM models. The values of other measures of statistical indicators, like MSE, RMSE and AAD, were low for all data but the values obtained for ANN models are much lower than RSM models (Tables 8, 9). This further confirmed that ANN models have higher generalization ability than the RSM models for the microalgae flocculation process. Graphs of experimental values, as well as predicted values plotted against the experimental runs, are shown in Figs. 10 and 11, the parity plots of the predicted and observed values are also presented in Fig. 12. A close look at these plots shows that the predictions by ANN closely match the experimental values in the flocculation of microalgae species studied, which further supports the higher generalization capability of ANN compared to RSM. This present work has demonstrated that ANN models gave better and more accurate predictions than RSM, as reported in the literatures [19, 21, 34, 35].

Table 8 Generalization ability evaluation of the RSM and ANN models (D. salina)
Table 9 Generalization ability evaluation of the RSM and ANN models (N. oculata)
Fig. 10
figure 10

Parity plots of predicted versus experimental values for RSM and ANN models (D. salina)

Fig. 11
figure 11

Parity plots of predicted versus experimental values for RSM and ANN models (N. oculata)

Fig. 12
figure 12

Comparison of predicted values by RSM and ANN models with experimental values a FE of D. salina, b CF of D. salina, c FE of N. oculata, d CF of N. oculata

3.6 Multi-objective optimization of flocculation parameters

The aim of this optimization study is to simultaneously obtain maximum values for flocculation efficiency and concentration factor in harvesting of the microalgae, an objective that is inconsistent with respect to the results obtained from both RSM and RSM-GA multi-objective optimization results (Tables 10, 11). Practically, high flocculation efficiency is preferred to high concentration factor, because of loss of microalgae biomass during harvesting when high concentration factor is given priority [36]. Meanwhile, a high concentration factor has been attributed to reduction in the size of microalgae dewatering equipment [37], which will subsequently lower the equipment cost. In view of this, a comprise must be made either to minimize biomass loss or reduce equipment size. Based on the RSM optimization results (Table 10), the optimum dependent variables for the flocculation process are 0.54 g/l biomass concentration, 0.50% v/v flocculant (CPADA) dose for the D. salina and 0.50 g/l biomass concentration, 0.50% v/v flocculant (CPADA) dose for N. oculata. The values predicted for the response variables were flocculation efficiencies of 92 and 95% and the concentration factors of 34 and 7 for D. salina and N. oculata, respectively. These conditions were selected based on highest desirability values of 0.83 and 0.99 for D. salina and N. oculata, respectively (Table 10). The genetic pareto front result for multi-objective optimization of the microalgae flocculation process is shown in Fig. 13, and the results are summarized in Table 11. It obvious from the table and the figures that at higher flocculation efficiencies, the corresponding values of the concentration factors are lower and vice versa. In both microalgae species, the optimum responses chosen are flocculation efficiencies of 93 and 94% and concentration factors of 12 and 13 for D. salina and N. oculata respectively. The corresponding values for the optimum dependent variables are 1.77 g/l biomass concentration, 1.00% v/v flocculant (CPADA) dose for the D. salina and 1.64 g/l biomass concentration, 1.17% v/v flocculant (CPADA) dose for N. oculata. The values predicted for the input variables by RSM-GA are significantly higher than that of RSM. Even in the case of N. oculata, the prediction is outside the ranges of biomass concentration factor considered for the study. The optimum predicted response variables for both RSM and RSM-GA were validated experimentally using two independent replicates and results presented in Table 12. The obtained experimental values (Table 12) agree closely with the predicted optimum values by RSM and RSM-GA.

Table 10 Optimum condition predicted by RSM
Table 11 Summary of Pareto front response for the multi-objective optimization
Fig. 13
figure 13

Pareto front response for the multi-objective optimization of flocculation process

Table 12 Experimental validation of optimum conditions by RSM and RSM-GA

4 Conclusion

In this study, the feasibility of using bio-based derived alkali, CPADA, to induce flocculation of microalgae has been highlighted. The application of CPADA to induce microalgae flocculation has resulted in more than 90% flocculation efficiency, although with lower concentration factor in one of cases tested. The potential of CPADA to induce flocculation in microalgae culture is attributed to its high potassium content. Results has shown that microalgae biomass concentration is the most significant parameter compared to the CPADA flocculant dose in the flocculation process. The ANN model has shown to have higher generalization ability than the RSM model, based on the evaluated statistical indices, although both have shown good and accurate predictions. In order to optimize the flocculation process, a multi-objective optimization was carried out by RSM and RSM-GA techniques. RSM predicted lower values of the dependent variables with high flocculation efficiency, unlike the RSM-GA, where much higher values of input variables are predicted with minimal difference in predicted flocculation efficiency, compared to RSM. This work confirmed the use of CPADA as a potential bio-based alkali that can be employed in flocculation of microalgae, which can possibly lower the microalgae harvesting cost. The predictive and optimization ability of RSM, ANN and GA were further established.