Introduction

Water remediation methods have been intensively researched and used in the current industrial era. This is due to the production of waste products and contaminants from industries whereby water bodies such as seas, rivers and lakes are used as disposal sites. The irresponsible disposal of these pollutants will not only deteriorate the environment’s aesthetics but also cause problems to the existing fauna and flora, which would eventually affect humans at large. These water remediation methods have been discussed in the literature (Ali et al. 2013; Robinson et al. 2001).

Phytoextraction is one of the simplest, easiest and low-cost methods. This is because the method utilises living plants to take in the pollutant through their root systems and these plants only require solar input and nutrients from either soil or water. Phytoextraction has been used for the remediation of wastewaters containing heavy metals (Ali et al. 2013), hydrocarbons (Agamuthu et al. 2010) and dyes (Movafeghi et al. 2013; Torbati et al. 2014). Various plants had been studied for their phytoextraction potential, including water lettuce (Lu et al. 2010), Tagetes patula (Sun et al. 2011) and sunflowers (Adesodun et al. 2010).

Azolla pinnata (AP) is a water fern found in Asia, tropical Africa and some regions of Australia. It has a symbiotic relationship with a species of bacteria known as Anabaena azollae that can fulfil the necessary nitrogen requirements of AP due to its nitrogen fixation ability. This not only allows AP to thrive in water bodies that lack nutrients, but is also responsible for AP’s fast growth and high reproduction rate. Recent studies have shown that AP has the potential to be used as a water remediation material through the biosorption of textile dyes (Kooh et al. 2015, 2016b, c, d) and by phytoextraction methods (Sood et al. 2012).

Rhodamine B (RB) is a red-coloured solid dye that dissolves in water to emit a fluorescent reddish violet-coloured solution. Due to this, it is highly used in fluorometers, flow cytometry, ELISA and fluorescence microscopy, and also in many other industrial products such as paints, textiles, paper, wool, silk, cotton and leather. As in most synthetic dyes, RB is also considered as a toxic compound with LD50 for rat at 500 mg kg−1; LC50 for rainbow trout is 217 mg L−1 and EC50 for water flea is 22.9 mg L−1 (Safety Data Sheet: rhodamine B, Global Safety Management, Inc. 2018).

This study focuses on using live AP to remove RB from aqueous solutions by the phytoextraction method. Three parameters, namely pH, AP dosage and RB dye concentration, were investigated to see how they can affect the performance of the phytoextraction process by AP. Artificial neural network (ANN) and random forests (RF) models are used to predict the removal efficiencies of the phytoextraction method and the performances of both the ANN and RF models were compared.

The involvement of machine learning (ML) algorithms for the modelling of water remediation process is not uncommon. Data (training sets) are fed into the algorithms, and the predicted output was compared with the experimental data to test the performance of the models. There are many mathematical models that are useful for building models in water science (Dehghan and Abbaszadeh 2017); however, this study focused on ANN and RF. The monitoring of biological processes is not simple due to the involvement of many processes and the occasional nonlinear nature of the data. Therefore, the use of the ML algorithms can be advantageous because they can process nonlinear and noisy data, and predict the performance of the biological systems. ANN is one of the commonly used models and it is inspired by the basic function of a biological neuron (Dehghan et al. 2009). Its use was reported in several phytoextraction studies such as the phytoextraction of basic red 46 using watercress (Torbati et al. 2014) and Lemna minor (Movafeghi et al. 2013). RF is an improved version of decision tree algorithms which averages the output from many smaller decision trees, instead of using a single massive decision tree which can overfit the training data (Breiman 2001; Dehghanian et al. 2016). The use of RF in modelling water remediation has been limited and was reported in the adsorption studies of Congo red removal using tin sulphide-modified activated carbon (Dehghanian et al. 2016).

Materials and methods

Plant sample and chemical reagents

The AP was obtained from the Brunei Agriculture Research Center, Brunei Darussalam. The AP was sonicated for 30 s, followed by gentle washing using distilled water and gently dried with a paper towel before introducing it into disposable plastic cups with a diameter of 6.5 cm, 8.5 cm height and a volume capacity of 250 mL.

RB (95% dye content, Mr 479.01 g mol−1, Sigma-Aldrich) was used as purchased without further purification. The pH of the aqueous solution was adjusted using diluted 0.1 mol L−1 NaOH and HNO3 solutions (Fluka) and monitored using a Thermo Scientific Orion 2-star benchtop pH meter.

Phytoextraction procedures

The effects of initial dye concentrations (5, 10, 15 and 20 mg L−1), pH (3, 4, 5, 6 and 7) and plant dosage (0.2, 0.4, 0.6 and 0.8 g, fresh mass) were investigated systematically in this study. All final dye solutions and the control contained macronutrients which included 3.3 mM Mg(NO3)2, 2 mM CaCl2, 1.3 mM K3PO4 and 1 mM KNO3, and micronutrients of 5.4 μM FeCl3, 0.016 μM Cu(NO3)2, 0.038 μM Zn (NO3)2 and 0.004 μM (NH4)6 Mo7O24·4H2O as according to Wagner (1997), although with some modifications concerning the amount of nitrate. The reason for adding nitrate was because RB possesses antiseptic properties which can interfere with the nitrogen fixing ability of AP’s symbiotic cyanobacteria, Anabaena azollae. The setup was placed under a balcony in order to avoid the exposure of direct sunlight and provide shelter during rain and was subjected to a photoperiod of 13 h of natural light and 11 h of darkness. A Shimadzu UV-1601PC UV–visible spectrophotometer was used to determine the RB dye concentration at λmax of 555 nm. All the experiments were carried out in duplicates.

The removal efficiency was calculated by the equation:

$${\text{Removal efficiency }} = \frac{{\left( {C_{\text{i}} - C_{\text{f}} } \right) \times 100\% }}{{C_{\text{i}} }}$$
(1)

where Ci is the initial dye concentration and Cf is the final dye concentration.

Growth rate estimations

The growth rate of AP was determined by using relative frond number (RFN) (White 1936). The equation used is as follows:

$${\text{RFN}} = \left( {{\text{Frond number at}}\,n{\text{th}}\,{\text{day}}{-}{\text{frond number at 0 day}}} \right)/{\text{frond number at 0 day}}$$
(2)

Fronds of AP were counted daily, from day 0 to day 14, for all experiments.

Determination of plant pigments

The plant photosynthetic pigments such as chlorophyll a (Chla), chlorophyll b (Chlb) and total xanthophylls and carotenoids (CX) were determined spectrometrically using a Shimadzu UV-1601PC UV–visible spectrophotometer at wavelength ranging from 350 to 750 nm. At the end of the phytoextraction process, the AP was washed with distilled water to remove excess dye and dried using paper towels. The AP was weighed and then ground with a mortar and pestle and extracted using 10 mL of 80% acetone in a dark area. All extractions were analysed within 30 min and the estimation of plant pigments were based on the Arnon’s method (Arnon 1949).

Arnon’s expressions of plant pigment estimation are as follows:

$${\text{Chl}}_{\text{a}} \,\left( {{\text{mg g}}^{ - 1} } \right) = \left[ {\left( {12.7 \times A_{663} } \right){-}\left( {2.6 \times A_{645} } \right)} \right] \times V/m$$
(3)
$${\text{Chl}}_{\text{b}} \,\left( {{\text{mg g}}^{ - 1} } \right) = \left[ {\left( {22.9 \times A_{645} } \right){-}\left( {4.68 \times A_{645} } \right)} \right] \times V/m$$
(4)
$$C_{X} \,\left( {{\text{mg g}}^{ - 1} } \right) = \left[ {\left( {1000 \, A_{470} \times V/m} \right){-}1.90\,{\text{Chl}}_{\text{a}} {-}63.14\,{\text{Chl}}_{\text{b}} } \right]/214$$
(5)

where V (mL) is the volume of 80% acetone, m (mg) is the mass of the plant, while A663, A645 and A470 represent the absorbance at wavelengths 663 nm, 645 nm and 470 nm, respectively.

Machine learning algorithm setup procedures

The Weka software package version 3.9 (Hall et al. 2009) was used for building both the ANN and the RF models. The classifiers used for the ANN and RF models were function/multilayerPerceptron and tree/randomforest, respectively. Stratified tenfold cross-validation was applied to both algorithms, where there is a holdout of 10% data for testing, following which the process is repeated 10 times with each repeat using a different segment of data. This process is to avoid overfitting and overtraining of data to obtain a more generalised predicted outcome (Witten et al. 2011).

The construct of the ANN model (backpropagation) consists of three layers: input, hidden and output. The input layer consists of four neurons (time, plant dosage, pH and initial dye concentration) where each consists of 154 values as obtained from the experiment. The main parameter used for optimising the ANN model is the number of neurons in the hidden layer (0–12, at interval of 1). The output layer consists of only one neuron (removal efficiency).

The RF model is optimised using different parameters as its architecture is different from ANN. RF is an ensemble method which involves the averaging of many decision trees (Breiman 2001; Dehghanian et al. 2016). The RF model in Weka (no pruning at default) uses an estimator known as bagging, which means the algorithm picks a sample of data rather than all of them at once. Three parameters such as the bagsize (20–100%, at interval of 20%), numIteration (1–100) and seed (1–5) were selected for optimising the RF algorithm. The numIteration parameter (default at 100) is equivalent to the number of trees used in other software packages, which refers to the number of repeats of algorithm learning where the previously built model influences the newer model. The seed is the generation of a sequence of randomness, which is random when the value is 1, and becomes less random as the value increases.

The performance of both ANN and RF models were evaluated based on the value of correlation coefficient (R) and error function mean squared error (MSE) and root mean squared error (RMSE). The R, MSE and RMSE are all generated by the Weka software package. The R measured the statistical correlation between the experimental and the predicted values. The value of R ranged from 0 to 1, with 1 indicating a perfect correlation and 0 indicating the other extreme (Witten et al. 2011). Both MSE and RMSE are common error functions and their smaller values indicate smaller magnitudes of error.

All experimental data were converted into values between 0 and 1, using the minimum–maximum normalisation equation as follows:

$$X_{{\mathrm{N}}} = \frac{{\left( {X_{i} - X_{{\mathrm{min} }} } \right)}}{{\left( {X_{{\mathrm{max} }} - X_{{\mathrm{min} }} } \right)}}$$
(6)

where X represents neuron in the input and output layers, XN and Xi are the normalised and non-normalised values of the experimental data, respectively, and Xmax and Xmin are the maximum and minimum values of the experimental data (Basheer and Hajmeer 2000). The generated predicted data of both ANN and RF models were back-calculated to the original scale before plotting the graphs.

Results and discussion

Optimisation of ML models

Prior to the prediction of the phytoextractive capability of AP, the parameters of the machine learning algorithms were required to be optimised in order to obtain quality output. The summary of the optimisation of the two ML models (ANN and RF) is given in Table 1.

Table 1 Optimisation of parameters for ANN and RF algorithms

For the ANN model, only the number of neurons in the hidden layer was optimised. When the parameter of the number of neurons in the hidden layer of the ANN was set to zero, the algorithm reverted to a simple linear regression where the R value was 0.772, which is a poorer model when compared to the ANN that was built with a hidden layer. The optimum number of neurons for the ANN model for modelling the phytoextractive capability of AP is 6, where the R is 0.941 and the error is the lowest (MAE = 0.59, RMSE = 0.81). For the RF model, three parameters (bagsize, numIterations and seed) were optimised. By setting the numIteration to 1, the model reverted to a single tree model where the R is 0.888 and is inferior to models that have a higher numIteration setting. The optimised RF model was achieved with 100% bagsize (default value), numIterations at 100 (default value) and seed at 3, where the R is 0.975, MAE at 0.042 and RMSE at 0.055.

The closeness between the experimental data and those predicted by both the ANN and RF algorithms is shown in Fig. 1, where it can be observed that the RF-predicted data are much closer to the experimental data than the ANN, while the ANN-predicted data generalised better than the RF.

Fig. 1
figure 1

Plots that visualised the closeness between the experimental data and predicted data obtained from a ANN, b RF algorithms

Effect of pH

The effect of the RB dye uptake by live AP at different pH levels is shown in Fig. 2. The highest dye removal efficiency occurred at pH 3.0, while increasing pH resulted in decreased dye removal efficiency. At the 14th day, the removal efficiencies at pH 3.0, 4.0, 5.0, 6.0 and 7.0 were 76.0%, 69.4%, 61.8%, 55.8% and 49.1%, respectively. This behaviour was due to the chemical property of the RB dye molecule. RB dye molecules exist as cationic monomeric forms at pH < 4.0, while becoming zwitterionic and dimeric at pH > 4.0 (Gad and El-Sayed 2009). Our previous work also reported that cationic species of RB adsorbed on AP’s surface better than the bigger, dimeric forms (Kooh et al. 2016a).

Fig. 2
figure 2

Experimental data of the effect of pH on the phytoextraction of RB (initial dye = 10 ppm, plant dosage 0.4 g), in comparison with those predicted by ANN and RF models

As shown in Fig. 2, the ANN generated smooth curves and generalised more when compared to the RF model. The curves generated by the RF are “noisy”, but are much closer to the experimental data than the ANN.

Effect of plant dosage

The effect of plant dosage on the removal efficiency of RB using AP is summarised in Fig. 3. At the end of the phytoextraction process, the plant dosage at 0.4 g was observed to obtain highest removal efficiency at 76.0%, followed by 0.6 g (71.9%), 0.8 g (70.8%) and lastly 0.2 g (67.4%).

Fig. 3
figure 3

Experimental data of the effect of AP dosage on the phytoextraction of RB (initial dye = 10 ppm, initial pH = 3.0), in comparison with those predicted by ANN and RF models

The pH is a major factor influencing the removal efficiency of a phytoextractive process, as shown in Fig. 2. However, aquatic plants have been known to change the pH of the water medium over time (Gopal 1987). The pH measurement of the dye solution during the phytoextraction process is summarised in Table 2, where a higher dosage of AP caused larger alterations of the pH within the first 24 h of the phytoextraction process. On day 1, the 0.6 g and 0.8 g AP dosages incurred changes in pH of the aqueous solution from 3.0 to 4.0 and 5.0, respectively, as compared to 0.4 g AP which only increased to 3.5. The huge deviation from the initial pH may have been caused by the higher dosage of AP resulting in the phytoextraction process operating outside their optimum pH of 3.0 (as discussed in previous section) due to the formation of larger dimeric RB dye molecules. Thus, these data explain the higher removal efficiency of AP dosage at 0.4 g when compared to 0.6 g and 0.8 g.

Table 2 pH of dye solutions before and after phytoextraction process using AP

The RF model, again, predicted data closer to the experimental data than those generated by the ANN, although ANN model produced smoother curves.

Effect of initial dye concentration

The effect of initial dye concentrations (5–20 ppm) is summarised in Fig. 4, where higher initial dye concentrations resulted in higher amounts of dye removal. This can be explained with the Fick’s diffusion law, where the concentration gradient provides the driving force for the phytoextraction of RB dye (Frijlink et al. 2015). This behaviour is also observed in the phytoextraction of triphenylmethane dyes (malachite green and methyl violet 2B) using AP (Kooh et al. 2016a, 2018).

Fig. 4
figure 4

Experimental data of the effect of initial dye concentration the phytoextraction of RB using AP, in comparison with those predicted by ANN and RF models

The ANN-predicted data generated by the ANN models were close to experimental data for the 5, 10 and 15 ppm concentrations, while there was a slight underestimation for the 20 ppm samples. However, for the RF model, the predicted data faithfully followed the experimental data at 5 and 10 ppm, while generating oddly shaped curves for concentrations of 15 and 20 ppm. This behaviour is probably due to noisier dataset, which is shown in Fig. 4.

Reusability of AP in phytoextraction of RB

In order to determine whether AP can be reused after a phytoextraction cycle (in this case 14 days, labelled as the first cycle), the remediated RB dye solution was removed and replaced on the 15th day with a new 10 ppm RB dye solution (with nutrient media added, pH 3.0) using the same AP. The phytoextraction process was continued for another 14 days (second cycle). The data are summarised in Fig. 5, where it can be seen that the same AP was able to phytoextract RB on the second cycle at about the same level as the first. These data hinted the usefulness and potential of AP in a continuous treatment of dye wastewater.

Fig. 5
figure 5

Repeated batch of the phytoextraction of 10 ppm RB using AP (pH 3.0, 0.4 g AP)

Comparison of removal efficiency of AP with other plants via the phytoextraction method

There are currently no studies concerning the phytoextraction of RB for direct comparison, and therefore, our findings are compared with the phytoextraction of other dyes by other living plants. The removal efficiencies of 10 ppm methyl violet 2B (basic dye) and malachite green (basic dye) using 0.4 g AP (at their individual optimum pH, phytoextraction period = 7 days) were 89.8% and 84.4% (Kooh et al. 2016a, 2018), respectively, which were slightly better when compared to phytoextraction of RB (xanthine dye) at 76.0% (0.4 g AP, pH 3.0, phytoextraction period = 14 days). Duckweed (Lemna minor), at roughly similar conditions (phytoextraction period = 7 days), was reported to achieve 88% removal efficiency for 10 ppm malachite green (Torbati 2015). Another study, which involved a different species of Azolla (Azolla filiculoides), reported a removal efficiency at about 85% for removal of 10 ppm acid blue 92 (azo dye) (2.0 g plant dosage, phytoextraction period = 6 days) (Khataee et al. 2013).

Estimation of growth by RFN and plant pigment measurements

To determine the plant health under the phytoextraction process, two growth estimators (RFN and plant pigment concentration) of AP were determined. RFN with a value of 1.0 indicates that the plant growth has doubled, while a value of − 1.0 indicates that all the plants have died. The control (AP without addition of dye), as determined in our previous work, has an RFN of 0.17 at day 2 which gradually increased to 0.22 at day 3 and remained constant at 0.26 between day 4 and 7 (Kooh et al. 2016a). Figure 6 shows the values of RFN of the phytoextraction of RB using AP at day 2 for 5, 10, 15 and 20 ppm were 0.12, 0.03, 0.28 and 0.33, respectively, while at day 6 they were 0.32, 0.22, 0.36 and 0.34. At the end of the phytoextraction process (day 14), the RFN values were 0.32, 0.33, 0.34 and 0.31. With RFN values higher than 0.26 (the control), this indicates the AP with RB has higher growth than the control, and this hints the possibility of AP metabolising RB and its intermediates for growth. In comparison with the phytoextraction (day 6) of 10 ppm methyl violet 2B and 10 ppm malachite green using AP, the RFN was lower at 0.18 and 0.07, respectively, indicating that AP can tolerate RB much better than methyl violet 2B and malachite green dyes (Kooh et al. 2016a, 2018).

Fig. 6
figure 6

RFN data of the phytoextraction of various concentrations of RB using AP over period of 14 days

A summary of another growth estimator by the amount of pigments present at the end of the phytoextraction process is shown in Fig. 7. Chlorophyll and carotenoids are present in most plants, and there are two main forms of chlorophyll which are Chla and Chlb. The chlorophyll pigments are the main pigments used by plants for photosynthesis. The carotenoids are plant pigments that give the plant yellow, orange or red colours, and are generally used during photosynthesis and to protect chlorophyll from photodamage (Demmig-Adams et al. 1996).

Fig. 7
figure 7

Amount of plant pigments in AP, exposed to various concentration of RB, at the end of the phytoextraction process

As determined in our previous work, the amount of Chla, Chlb and CX of the control (without dye) at the end of the phytoextraction process was 0.24, 0.27 and 0.05 mg g−1, respectively (Kooh et al. 2018), which were all lower than AP exposed to RB dye (Fig. 7). These data indicated that in the presence of RB, the growth was higher than the control, which is in agreement with the conclusion derived from RFN data (Fig. 6).

Conclusions

This investigation involved the use of AP to remediate RB by phytoextraction method with the inclusion of the ANN and RF machine learning algorithms for predictive modelling, in which the ANN model produced a more generalised output when compared to the RF. It was found that pH had major influence on the phytoextractive process, and plant dosage can change the pH of the aqueous solution. The optimum condition for the phytoextraction of 10 ppm RB is at pH 3.0 with a plant dosage of 0.4 g, yielding removal efficiency at 76%. The growth estimation by the RFN indicates that AP can tolerate RB up to concentrations as high as 20 mg L−1, and the estimation of pigment studies which show higher plant pigments than the control further reinforces the RFN data. These growth estimator data showed higher growth in the presence of RB and indicated the possibility of AP being able to utilise RB and its intermediates for growth.