Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models

Latif, Junaid; Chen, Na; Saleem, Azka; Li, Kai; Qin, Jianjun; Yang, Huiqiang; Jia, Hanzhong

doi:10.1007/s44246-024-00125-0

Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models

Original Article
Open access
Published: 22 April 2024

Volume 3, article number 39, (2024)
Cite this article

Download PDF

You have full access to this open access article

Carbon Research Aims and scope Submit manuscript

Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models

Download PDF

Junaid Latif^1,2,
Na Chen^1,2,
Azka Saleem³,
Kai Li^1,2,
Jianjun Qin^1,2,
Huiqiang Yang^1,2 &
…
Hanzhong Jia^1,2

743 Accesses
4 Altmetric
Explore all metrics

Abstract

Persistent free radicals (PFRs) are emerging substances with diverse impacts in biochar applications, necessitating accurate prediction of their content and types prior to their optimal use and minimal adverse effects. This prediction task is challenging due to the nonlinearity and intricate variable relationships of biochar. Herein, we employed data-driven techniques to compile a dataset from peer-reviewed publications, aiming to systematically predict the PFRs by developing supervised machine learning models. Notably, extreme gradient boosting (XGBoost) model exhibited the best predictive performance for both regression and classification tasks in predicting the PFRs, achieving a test R² value of 0.95 for PFR content prediction, along with an Area Under the Receiver Operating Curve (AUROC) of 0.92 for PFR type prediction, respectively. Based on XGBoost model, a graphical user interface (GUI) was developed to access PFRs predictions. Analysis of feature importance revealed that the biochar properties, such as metal/non-metal doping, pyrolysis temperature, carbon content, and specific surface area were identified as the four most significant factors influencing PFRs contents. Regarding the types of PFRs in biochar, specific surface area, pyrolysis temperature, carbon content, and feedstock were top-ranked influencing factors. These findings provide valuable guidance for accurately predicting both the contents and types of PFRs in biochar, and also hold significant potential for highly efficient utilization of biochar across various applications.

Graphical Abstract

Highlights

• Recognizing dual nature of PFRs, a machine-learning framework predicts them in biochar.

• XGBoost excels, achieving an R² (0.95) for PFR content and an AUROC (0.92) for PFR type.

• Important factors of PFR: doping, pyrolysis temp, carbon, and surface area.

• GUI enhances accessibility, enabling PFR predictions before biochar preparation.

Application of machine learning in prediction of Pb2+ adsorption of biochar prepared by tube furnace and fluidized bed

Article 20 March 2024

Unveiling the drives behind tetracycline adsorption capacity with biochar through machine learning

Article Open access 17 July 2023

Machine learning predicting and engineering the yield, N content, and specific surface area of biochar derived from pyrolysis of biomass

Article Open access 29 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Biochar is a stable form of charcoal produced by pyrolyzing biomass in a low-oxygen environment (Pan et al. 2019). It offers various benefits, including enhanced soil fertility (Yang et al. 2019), improved water retention (Hussain et al. 2020), increased crop yields (Chen et al. 2022), reduced greenhouse gas emission (Zhang et al. 2024), sequestering carbon from the atmosphere (El-Naggar et al. 2018), and contaminants removal (Liu et al. 2022). The porous structure, high surface area, and catalytic ability of biochar provide abundant sorption sites and facilitate chemical reactions, rendering biochar effective in pollutant sequestration, soil remediation, and diverse catalytic processes (Liang et al. 2021). The performance of biochar, such as its catalytic ability and adsorption capacity, is influenced by a combination of factors including its physical characteristics (e.g., porosity and surface area), chemical attributes (e.g., pH and composition), and the specific origin of feedstock (Yuan et al. 2022). During the production processes of biochar, the cleavage of covalent bonds in organic macromolecules or electron transfer between free radical precursors and metals in biomass produces a new substance in biochar, known as persistent free radicals (PFRs) (Liao et al. 2014; Ruan et al. 2019). Unlike traditionally recognized short-lived free radicals, PFRs demonstrate a prolonged lifespan ranging from hours to months (Pan et al. 2019). This extended lifespan enables PFRs to remain active in the environment, resulting in more enduring and significant effects (Fang et al. 2014).

PFRs exhibit both positive and negative effects across various biochar applications. Positively, PFRs facilitate electrons to stable oxidants (e.g., O₂, H₂O₂, and persulfate), thereby generating diverse reactive oxygen species (ROS) crucial for the degradation of organic pollutants (Fang et al. 2014, 2017; Qin et al. 2017). Furthermore, PFRs effectively facilitate the reductive or oxidative transformation of heavy metals, like Cr(VI) (Dong et al. 2014; Zhu et al. 2023) and As(III) (Dong et al. 2014), by macromolecular free radicals without oxidants (Qiu et al. 2023). Conversely, owing to their stability and persistence, PFRs can induce adverse effects by prolonging interactions. These radicals have the potential to trigger ROS production, hindering seed germination and bud growth in crops (Liao et al. 2014). Furthermore, they can induce oxidative stress in the human organisms, resulting in ROS production and subsequent molecular oxidation of tissues, as well as DNA damage (Chuang et al. 2017; Liu et al. 2023). Hence, in contexts where the primary objective is pollutant remediation, an elevated presence of PFRs proves advantageous due to their efficacy in long-term remediation efforts. However, when biochar is utilized as a soil amendment to support crop cultivation, a reduced content of PFRs is preferable. Therefore, the anticipation of PFR levels in biochar prior to its preparation holds paramount significance and should be tailored to specific application requirements.

Although several studies have identified key factors affecting PFRs formation in biochar, including feedstock (Wang et al. 2022), pyrolysis temperature (Zhang et al. 2022; Prasertpong et al. 2023), and substituted aromatics (Wang et al. 2022), their prediction remains challenging due to the involvement of highly variable synthesis methods, diverse precursors, and complex reaction processes. In recent years, data-driven techniques, notably employing machine learning (ML) algorithms such as random forests (RF), support vector machines (SVM), and deep learning (DL) approaches like multilayer perceptron (MLP), have gained significant attention in constructing predictive models across various domains of environmental science (Zhong et al. 2021), including contaminant monitoring (Gao et al. 2021; Ullah et al. 2023), micropollutant oxidation (Cha et al. 2020), and new materials designing (Tang et al. 2020). Recently, the scope of ML predictions has expanded to encompass diverse applications within biochar research, such as forecasting of micronutrients (Ullah et al. 2023), heavy metal immobilization and migration (Li et al. 2023), wastewater treatment (Kanthasamy et al. 2023), antibiotic adsorption (Zhang et al. 2023a), biochar functioning as biocatalyst (Wang et al. 2023), and its impacts on GHGs emissions (Han et al. 2024). These algorithms offer distinct advantages over traditional statistical methods by capturing nonlinear and complex relationships between features and target variables (Zhu et al. 2020). For example, RF utilizes recursive binary splitting of data to minimize the residual sum of squares, providing advantages such as the ability to make local and global predictions, non-biased weights, and effective management of imbalanced and small datasets (Golden et al. 2019; Konstantinov and Utkin 2021). SVM employs hyperplanes to separate classes and maximize the margin, making it suitable for handling complex relationships and high-dimensional data (Hussain 2019). MLP is an artificial neural network architecture comprising interconnected nodes that process input data and learn complex patterns (Bihl et al. 2023). In parallel with the adoption of ML algorithms for predictive modeling, the integration of GUI models has become pivotal. GUIs offer intuitive platforms for researchers to interact with complex ML algorithms, streamlining model development, parameter tuning, and result visualization (Suthers et al. 2021). This enhanced accessibility promotes broader adoption of ML techniques, empowering researchers to tackle complex phenomena like PFR prediction in biochar with greater efficiency and accuracy.

In order to address the dearth of studies on predicting the PFRs in biochar, we developed five ML-based regression and classification models: RF, extreme gradient boosting (XGBoost), light gradient boosting machines (LGBM), SVM, and MLP. The primary objective of this study is to elucidate the systematic application of ML tools for predictive analytics and their utilization in extracting valuable insights into the formation process of PFRs during the biochar preparation across different feedstock. A data-driven approach was applied and illustrated in Supporting Information Fig. S1. Firstly, data collection and preprocessing were conducted to ensure its quality and suitability for analysis. Secondly, a comprehensive descriptive statistical analysis was performed, and the Spearman correlation coefficient (SCC) was calculated to identify potential relationships between variables. Thirdly, both regression and classification algorithms were developed and compared to predict PFRs. Feature analysis based on the results of predictive ML models was then carried out. Lastly, a Graphical User Interface (GUI) was developed to enhance the accessibility in predicting PFRs in biochar.

2 Methodology

2.1 Data collection, imputation, preprocessing, and correlation analysis

For the data collection process, a comprehensive review of the literature was performed based on published articles in Web of Science and Scopus databases using the keywords of “Biochar” AND “EPFR”, “Biochar” AND “Environmentally persistent free radicals”, “Biochar” AND “PFR”, “Biochar” AND “Persistent free radicals”. The keyword search returned over 110 experimental works, which were then manually screened by including all the data for the two empirical categories (1) biochar and (2) PFRs properties, including 9 features and the 2 predictors. Therefore, 30 articles were shortlisted and deemed suitable for data collection relevant to the study (Table S1). All screened data were initially accepted impartially, without any initial judgment or bias regarding the data validity. It is imperative to highlight that a substantial portion of our dataset originated from assays utilizing the DPPH standard coupled with EPR detection. Regarding other standards, such as Cr³⁺ diamond, their utilization in preceding studies is limited, with only one study identified employing Cr³⁺ for PFRs detection. All the data points were carefully marked off to avoid duplicate or multiple entries. For the articles whose data were not directly listed in tables or as text, the open-source WebPlotDigitizer software was used to obtain the necessary data from their figures.

The detailed procedure of ML exploration associated with PFRs in biochar is illustrated in Fig. S1. Ten parameters were identified from biochar properties as input features, including pyrolysis temperature (PT ^oC), specific surface area (SSA m² g⁻¹), doping index (DI), feedstock (FS), retention time (RT [min]), pH, carbon content (C% wt [%]), oxygen content (O% wt [%]), hydrogen content (H% wt [%]), and nitrogen content (N% wt [%]), and 2 parameters were defined as output predictors, including the content (10¹⁵ spins g⁻¹) and types (g-factor) of PFRs. In the classification of PFR types, three distinct g-factors exist: g1 denotes carbon-centered radicals with values below 2.0030, g2 corresponds to carbon-centered radicals with adjacent oxygen with values range of 2.0030–2.0040, and g3 designates oxygen-centered radicals with g-factors exceeding 2.0040 (Tian et al. 2009; Ruan et al. 2019). Among these, g1 PFRs exhibit more stability and reduced reactivity than g3, attributed to the lower electronegativity of carbon atoms than oxygen (Dellinger et al. 2007). From the total dataset, 11, 9, and 170 data points were missing for the SSA, C%, and pH data, respectively. To fill these data gaps and ensure a consistent dataset, a ML model was employed to predict the missing values of SSA and C% using other features, following the methodology provided by Yuan et al.(2021). In summary, highly correlated features were identified by SCC analysis (as shown in Fig. 1i), which included SSA, C%, and PT. These features were utilized to impute missing values using the RF model. The RF model was trained using available data points, while missing data points were treated as testing data and subsequently predicted based on the trained model. The feature pH was excluded from the dataset due to adequate data unavailability (Palansooriya et al. 2022). Finally, 9 input features and 2 output predictors, comprising 253 data points were obtained and used for machine learning exploration (Table S2).

To rectify the imbalance within the dataset, specifically with respect to the DI feature, an up-sampling technique was implemented (detailed information is available in Text S1; Fernández et al. 2018). Subsequently, to enhance the efficiency of the ML models during training for rapid convergence, the input features underwent encoding and normalization utilizing Scikit-Learn. Our training dataset comprises 2 categorical and 9 numerical features. It is necessary to convert categorical features into numerical variables for the interpretation of ML models. Hence, we utilized LabelEncoder for DI, and OrdinalEncoder for FS (McGinnis et al. 2018). Numerical features were uniformly scaled using StandardScaler to obtain a similar scale and approximation of a normal distribution.

Descriptive statistical analysis of input and target features was conducted and the SCC was used to investigate the correlation among input features. Values fall between the range of −1 to 1, where 0 indicates no linear correlation, and a high negative or positive value indicates a strong negative or positive correlation, respectively.

2.2 Model construction and hyperparameter optimization

Five ML algorithms (e.g., RF, XGBoost, SVM, LGBM, and MLP) were compared and evaluated to predict the contents and types of PFRs in biochar, as regressor and classifier ML models. These ML models have been proven to be suitable and successful for midsize datasets (Zhu et al. 2019; Li et al. 2020, 2021). A description of the target features, used in ML models for PFRs prediction in biochar is available in Text S2.

The entire dataset containing 253 data points, was subjected to multi-training by splitting them into randomly chosen training and testing subsets. Thus, 85% of the total data points were randomly selected and labeled as training data, and the remaining 15% was labeled as test data for the final evaluation of the developed models. During the training phase, we meticulously adjusted the hyperparameters of each algorithm to minimize the mean-squared error for PFRs prediction using five-fold cross-validation. Furthermore, to ensure the robustness of the model, we conducted a Y-scrambling analysis (Moore et al. 2022). Various machine learning algorithms underwent tuning with specific hyperparameters. For instance, ensemble models (RF, XGBoost, LBGM) required fine-tunning of parameters including the number of trees, depth of each tree, and max_features. SVM hyperparameters, including epsilon (ε), kernel function, and penalty (α), were also optimized. Additionally, NN configurations involved fine-tuning of parameters like hidden layer sizes, activation function, and learning rate, to improve model convergence (Table S3; Palansooriya et al. 2022). To explore these crucial parameters comprehensively, we utilized a param_grid containing various combinations of hyperparameters. GridSearchCV was then employed to systematically search through all combinations of these hyperparameters using cross-validation, aiming to identify the optimal combination that maximizes the specified scoring metric.

2.3 Performance evaluation of regression and classification models

The assessment of regression model performance relied on metrics such as the coefficient of determination (R²) and the root mean square error (RMSE) (Zhu et al. 2019; Hu et al. 2022). R² and RMSE values were calculated using Eqs. (1) and (2), respectively.

$${R}^{2}=1-\frac{{\sum }_{n=1}^{N} (\hat{{\text{y}} }-y{)}^{2}}{{\sum }_{n=1}^{N} (\hat{{\text{y}} }-\hat{{\text{y}} }{)}^{2}}$$

(1)

$$RMSE=\sqrt{\frac{{\sum }_{n=1}^{N} (\hat{{\text{y}} }-y{)}^{2}}{N}}$$

(2)

where ŷ, y, and y̅ are the predicted, actual, and mean values of the target feature, respectively, n is the data point at any given instance, and N is the total number of data points.

The performance of classification models was evaluated by AUROC and Confusion Matrix (CM), which are described in Text S3. In the ROC curve, the false positive rate (FPR) on the x-axis and the true positive rate (TPR) on the y-axis were calculated through Eqs. (3) and (4), respectively.

$$TPR =\frac{TP}{TP+FN} \times 100 \left(\%\right)$$

(3)

$$FPR =\frac{FP}{TN+FP} \times 100 \left(\%\right)$$

(4)

2.4 Model interpretation and feature importance

The ML model was utilized to investigate the significance and influence of each feature on the target features related to PFRs. Feature analysis was conducted using the SHapley Additive exPlanations (SHAP) method, a widely employed technique in feature analysis (Text S4; Lundberg et al. 2018; Onsree et al. 2022; Prasertpong et al. 2023). All modeling tasks were executed in Python (version 3.09), utilizing open-source libraries such as Pandas and NumPy for data processing, Scikit-learn for encoding, scaling, handling the dataset’s imbalanced nature, and imputation of missing values, seaborn and matplotlib for visual representation, and SHAP for model interpretation and feature importance assessment.

3 Results and discussion

3.1 Descriptive statistics and correlation analysis

The raw data, comprising 253 data points, underwent a descriptive analysis to obtain preliminary insights into all input features and target variables. This analysis involved computing the minimum, maximum, and average values for numerical features along with frequency counts for categorical features. The contents and types of PFRs are usually determined by the FS, PT, SSA, and other properties of biochar (Yuan et al. 2022). Concerning output variables, the content of PFRs based on literature data ranged from the lowest value of 0.103 × 10¹⁵ spins g⁻¹ to the maximum value of 5210 × 10¹⁵ spins g⁻¹ with the mean value and standard deviation (SD) of 330 × 10¹⁵ spins g⁻¹ and 738 × 10¹⁵ spins g⁻¹ (Fig. 1a), respectively. The g-factor of PFRs in the literature varied from 2.0020 to 2.0049 with a mean value of 2.0033 and SD equaling to 0.00060 (Fig. 1b). For the input variables, the reported values of PT in the literature varied from 200°C to 700°C (Fig. 1c). The SSA ranged from 1.65 m² g⁻¹ to 231 m² g⁻¹ with a mean value of 86.5 m² g⁻¹ and SD equaling to 56 m² g⁻¹ (Fig. 1d). The mean and SD values for C%, O%, H%, and N% were 49.5 ± 20.2, 32.0 ± 11.0, 6.0 ± 2.0, and 1.95 ± 1.55 wt (%), respectively (Fig. 1e and f). The categorical features including FS and DI, underwent ordinal and label encoding, respectively. The total value counts for biochar with doping and without doping were 74 and 179, respectively (Fig. 1g). Meanwhile, FS value counts from literature for woody lignocellulosic biochar (WLCB), non-woody lignocellulosic biochar (NWLCB), non-lignocellulosic biochar (NLCB), and co-pyrolysis of different feedstock (CP) were 71, 74, 77, and 31, respectively (Fig. 1h).

The SCC was utilized to examine the relationship among features (Fig. 1i), including the two categorical features. Strong positive correlations (SCC values > 0.75) were observed between PT with SSA and SSA with C%, while strong negative correlations were observed among features C% with H% (-0.58), PT with O% (-0.53), and H% (-0.51). Generally, higher PT results in biochar characterized by increased SSA and reduced atomic ratios of hydrogen to carbon H/C and oxygen to carbon O/C (Xiao et al. 2016; Wang et al. 2022). These changes are attributed to enhanced porosity, dehydration and devolatilization, carbonization, and reduction of functional groups at elevated temperatures (Bushra and Remya 2020; Yang et al. 2023). These changes result from the decomposition of volatile compounds, leading to a higher carbon proportion relative to hydrogen and oxygen in the biochar. The hybrid correlation identified among the various input features facilitates their retention in building an effective predictive model, as each feature contributes independently to the model’s predictive capacity.

3.2 Development of regression and classification predictive models

Five ML models, including RF, XGB, LGBM, SVR, and MLP were developed and evaluated for their capability to predict the content and types of PFRs in the dataset using the input features described previously in the data preprocessing section. Table 1 presents a comparison of the performance of ML models. The models demonstrated comparable performances for predicting PFRs contents (spins g⁻¹), with XGBoost and RF achieving the highest R² values of 0.98, closely followed by the LGBM with an R² of 0.95. However, SVR and MLP exhibited relatively lower R² values of 0.78 and 0.87, respectively (Fig. S2 and Table 1). In terms of predicting PFR types (g-factor), the XGBoost exhibited the highest performance with an accuracy of 0.92, followed by LGBM and RF with accuracies of 0.90 (Fig. S3 and Table 1). MLP and SVM models exhibited slightly lower accuracies of 0.87 and 0.86, respectively. Notably, the SVR and MLP models achieved relatively lower R² and accuracy values for PFR prediction (Table 1). This disparity in performance could be attributed to the smaller and imbalanced dataset used in the study, which may not fully capture the intricacies of the underlying patterns and relationships in the target population, particularly when certain classes are underrepresented (Jordan and Mitchell 2015). Previous studies have highlighted that ensemble models, such as XGBoost, tend to perform better on smaller datasets due to their ability to handle such complexities while being less sensitive to overfitting (Golden et al. 2019; Zhu et al. 2020), whereas SVM and MLP provide optimal performance for large datasets (Padarian et al. 2019).

Table 1 Comparative evaluation of regression and classification models

Full size table

XGBoost emerged as the most proficient predictive model for both PFR content and types. To access its feasibility on the test dataset, its performance is visually depicted in Fig. 2, which encompasses data from 77 samples. In Fig. 2a, a joint scatterplot effectively illustrates the relationship between the actual and predicted values of PFR content on biochar. The XGBoost model exhibits remarkable predictive capabilities of PFRs in biochar, as evidenced by the test R² value of 0.95, highlighting its robust generalization abilities on unseen data. Figure 2b, illustrates a close match between actual values and model predictions, indicating that the XGBoost model accurately captures the underlying relationships between biochar and the content of PFRs. It should be noted that there are no evident signs of overfitting or underfitting in Fig. 2b, as the predicted values, presented by the blue color dotted line, follow a smooth trajectory without significant deviations from the actual data points, implying a balanced model complexity (Ma et al. 2023).

For the prediction of PFRs types, Fig. 2c, displays the AUROC plot of the XGBoost classifier model's performance for each class of PFRs (g1, g2, and g3). The ROC curve for g1 achieved an impressive AUC of 0.92, while g2 and g3 had AUCs of 0.89 and 0.98, respectively. Moreover, the mean ROC was 0.92, averaged across all three labels. In Fig. 2d, the XGBoost model's classification performance was evaluated on three distinct labels: g1, g2, and g3. The confusion matrix analysis revealed accurate predictions for 22 instances of g1 while misclassifying 6 and 0 as g2 and g3, respectively. For g2, the model correctly predicted 23 samples, but it misclassified 4 as g1 and 3 as g3. Similarly, for g3, the model demonstrated strong predictive abilities, accurately classifying 12 samples, while misclassifying 1 and 1 for both g1 and g2. The combined results from the AUROC and CM plots confirm the effectiveness of the XGBoost model in accurately classifying PFRs into their respective classes, validating its overall efficacy in classification tasks.

Additionally, we conducted a permutation test, specifically Y-scrambling, to further evaluate the model's performance and confirm that the models were not randomly obtained. This involved scrambling the labels of PFRs within the training set 100 times, creating 100 pseudo training sets. For each pseudo training set, the model was built using 80% of the data with optimized parameters and its performance was evaluated based on the remaining 20%. Subsequently, the predictive performances of these 100 pseudo training sets were compared with those of the original training set (Table S4). The analysis revealed a significant difference between the original data and the Y-scrambled data, indicating that the original models outperform random chance.

3.3 Model-based interpretation and feature exploration

Feature importance analysis was conducted utilizing a fine-tuned XGBoost model to evaluate the relative importance of biochar properties in predicting the contents and types of PFRs by regression and classification models (Fig. 3). The bee-swarm summary plot in Fig. 3a, provides an overview of the correlation and directionality between the biochar properties and their corresponding Shapley values during model construction for predicting PFRs content. The bar plot in Fig. 3b displays the contribution and average impact of each biochar property on the prediction of three types of PFRs using mean SHAP values. The important features for predicting the content of PFRs were ranked as follows: DI, PT, C%, SSA, FS, H%, O%, N%, and RT. Important features for the prediction of the PFR types were identified in the order of SSA, PT, C%, FS, O%, H%, N%, RT, and DI.

Based on the outcomes derived from SHAP analysis, the top four influential features of biochar for PFR content are DI, PT, C%, and SSA, while for PFR types are SSA, PT, C%, and FS were explored for their intrinsic correlation with PFRs. FS is considered as a pivotal role in PFRs, despite the diversified elemental composition from various feedstocks. A consistent PFR trend was observed when the PT ranged from 200 to 700°C (Deng et al. 2020). Although FS appears to exert a relatively lesser influence in our study, its role in predicting PFRs is detailed in Text S5. Furthermore, to analyze the impact of each data point and its effect on predicting PFRs, SHAP partial dependence plots (SPDP) and kernel density estimations plots (KDE) were employed as depicted in Figs. 4 and 5, considering both categorical and numerical factors. Moreover, to corroborate the reliability of SPDP through ML-interpreted methods, box and count plots with binning features for PFRs content and types were presented in Figs. 4a-d, and 5a-d, respectively. Remarkably, DI emerged as the most significant contributor to determining the content of PFRs within biochar, as evidenced in Fig. 3a. The SPDP boxplot analysis, as illustrated in Fig. 4a, revealed a positive correlation between PFRs content and DI. Regarding the variation in the PFRs content within biochar, we observed a range of 1–1000 × 10¹⁵ spins g⁻¹ for non-doped, 2000 × 10¹⁵ spins g⁻¹ for the copper-doped, and 3500–4500 × 10¹⁵ spins g⁻¹ for zinc, nickel, and iron-doped. Notably, the highest PFR content, reaching up to 6000 × 10¹⁵ spins g⁻¹, was observed in non-metal-doped biochar enriched with N and S in the box plot of Fig. 4a (Yu et al. 2020; Zhang et al. 2022; Zhang and Zhao 2022). These findings align with previous research, affirming that doping elements enhance PFRs content in biochar. For example, Yu et al. (2020) employed electron paramagnetic resonance (EPR) spectroscopy to demonstrate the capacity of N-doping in biochar. In a study involving antibiotic degradation in various biochar with elemental doping, the highest PFRs content was found to be 9.23 × 10¹⁹ spins g⁻¹ in N-doped, followed by 6.10 × 10¹⁹ spins g⁻¹ in S-doped and 4.36 × 10¹⁹ spins g⁻¹ in NS-dopped biochar, compared to 2.45 × 10¹⁸ spins g⁻¹ in non-doped biochar (Zhang et al. 2022). This doping process redistributes electrons to neighboring carbon atoms through interconnected p-conjugated networks of polymeric carbon nitride, consequently increasing the PFRs content in biochar. Meanwhile, N or S doping also modifies internal structures and defect values (I_D/I_G) of biochar for the enhanced PFRs contents without affecting the g-factor of PFRs (Zhang and Zhao 2022).

Feature analysis identified PT as the second most important factor influencing both the content and types of PFRs (Fig. 3). The SPDP in Fig. 4b displays a unimodal distribution curve, indicating a positive association between PT and PFRs content within the temperature range of 200–500°C. This relationship significantly enhanced the model’s predictive capacity within this temperature range. However, the impact of PT on PFRs content shifted to as temperatures exceeded 500°C. These outcomes highlight that our prediction results are comparable with the reported results detected by EPR spectra, suggesting a favorable temperature range of 400–500°C for maximizing PFRs content (Bi et al. 2022). To elucidate the relationship between PT and PFRs types, a KDE plot in Fig. 5b, was employed. This visual representation depicted a gradual increase in g1 and g2 with rising PT, while g3 exhibited a distinct decline. Further analysis delved into the specifics of g1, g2, and g3 distributions. The count plot in Fig. 5b demonstrates that g1 predominantly emerged within the PT range of 400 to 700°C, accompanied by a gradual decrease in its concentration within this interval. In contrast, g3 was primarily observed below 500°C, while g2 was situated between 200 and 600°C. Notably, the absence of g1 and g3 was noted at 200°C and 700°C, respectively. The findings suggested that as the temperature increases, PFRs gradually transform from g2 to g3 and then to g1, indicating that the elevation of pyrolysis alters the type of PFRs present, consistent with a prior research reported by Odinga et al. (2020). This phenomenon may be attributed to the influence of temperature on oxygen-containing functional groups present in biochar. At 300 °C, PFRs primarily consist of oxygen-centered radicals (g3 > 2.0040) or carbon-centered radicals with oxygen atoms (g2: 2.0030–2.0040), originating from phenolic organic compounds in biochar that form stable phenoxy radicals through electron transfer to transition metals. As temperature surpasses 400 °C, carbon-centered radicals (g1 < 2.0030) gradually become the dominant PFRs, forming cyclopentadienyl groups at higher temperatures (Zhang et al. 2020). At 700 °C, PFRs rapidly break down, rendering their signal undetectable (Odinga et al. 2020).

The C% within biochar emerged as a pivotal predictor for PFRs, ranking third in feature importance for both PFRs contents and types (Fig. 3). Previous studies have underscored the significance of C% in influencing g1-type PFR and overall PFRs content (Zhang et al. 2022). Our findings reinforce this relationship, revealing a direct correlation between PFR content and higher C% in biochar, as depicted in Fig. 4c. This connection is strengthened by the elevated SHAP values associated with C%, enhancing the predictive capability of ML model for PFR content. Notably, our analysis indicated that biochar samples with C% falling within the 40% to 80% range exhibit significantly higher PFRs levels. These results could explain the previous findings observed by Zhang et al. (2023b), where the addition of polystyrene (PS) during antibiotic degradation resulted in an increased C% and PFRs content within biochar. Additionally, Ni et al. (2020) found that elemental analysis of biochar following PS-addition pyrolysis revealed a remarkable 90% increase in C%, subsequently elevating PFR levels and the prevalence of g1-type (< 2.0030) PFRs within the biochar matrix. Another study corroborated the significance of C% for PFRs, indicating that higher urea proportions led to reduced C% and PFR contents but elevated N% in biochar (Bi et al. 2022). Furthermore, our analysis in SCC also indicated a negative correlation between C% and N% in biochar (Fig. 1i). These insights highlight the significant role of C% in PFR formation, with N showing an inverse correlation with PFRs, likely due to the presence of distinct N-types like pyridine N, pyrrole N, and quaternary N in N-rich biochar, absent in pure cellulose biochar (Tian et al. 2013, 2014). Additionally, our study reveals the complex interplay of N% in biochar, influenced by DI and urea addition. Non-metal-doped biochar with high N% exhibits elevated PFR content through electron redistribution (Zhang and Zhao 2022), while urea introduction increases N% but decreases PFR content (Bi et al. 2022), indicating a multifaceted relationship between DI, urea, and PFRs which may be influenced by various unknown factors. Further research is needed to elucidate the specific nitrogen-containing compounds generated by urea and their impact on PFR properties. The relationship between C% and the g-factor of PFRs interaction is depicted in the KDE plot in Fig. 5c. Notably, an increase in C% corresponded to a decreasing KDE trend of g3 and a simultaneous increase in g2 and g1. This observation is further reinforced by the count plot, which illustrates that g1 is less likely to be present in biochar with low C%, in contrast to those with higher C%. Notably, a C% range of 40–80% is most conducive to supporting the presence of g1-type PFRs, while g3 is more likely to be associated with biochar containing less than 60% carbon. Additionally, g2 tends to fall within the mid-range of C%. These findings underscore the crucial role of C% in biochar which amplifies the production of PFRs particularly those characterized by carbon-centered radicals (g1 < 2.0030) (Ni et al. 2020; Zhang et al. 2023b).

SSA of biochar in our study was investigated as an important feature and ranked first and fourth in predicting the g-factor and spins g⁻¹ of PFRs in biochar, respectively (Fig. 3). SSA of biochar increases as the PT increases, leading to a higher content of carbon-centered PFRs compared to lower SSA (Zhang et al. 2022). Our results in the KDE plot (Fig. 5a) indicated the positive contribution of PFRs (g1) with increasing SSA, while also showing a decrease in g2 and g3-type PFRs. Further insights from the SHAP values within the KDE plot reveal that an increase in SSA enhances the prediction accuracy of g1, while slightly reducing the predictive performance of g2 and g3, particularly up to SSA 150 m² g⁻¹. A closer examination using count plots of Fig. 5a further supports these findings. They reveal that g1 tends to be more prevalent within the SSA range of 100–200 m² g⁻¹, whereas g3 was predominantly present in the lower range of SSA below 50 m² g⁻¹. Interestingly, g2 appeared primarily within the SSA range of 50–150 m² g⁻¹, reinforcing the intricate interplay between SSA and the composition of PFRs in biochar. The correlation of SSA and the contents of PFRs is shown in Fig. 4d, which depicts higher PFR content within SSA ranges from 100–150 m² g⁻¹. SHAP values indicated a positive influence on PFR content prediction up to an SSA of 100 m² g⁻¹, beyond which a diminishing trend became apparent, potentially impacting the model's predictive performance. While these findings may deviate from some prior studies, the limited data available at higher SSA levels could contribute to this disparity. Further analysis brought to light an elevated N% in biochar collected at increased SSA, suggesting a complex interaction. The SHAP interaction plot was used in Fig. S6 to unveil the interplay between higher N%, SSA, and PFR content, influencing the model's predictive accuracy. Notably, an increase in N% within nitrogen-rich biochar affected PFR contents without altering the g-factor (Bi et al. 2022). Our findings align with previous studies, suggesting that PFRs content is mainly related to the SSA of biochar and the degree of defect structure (I_D/I_G values) of raw material (Zhang and Zhao 2022). It may be attributed to the availability of more active sites in biochar, which positively contributes to the content and carbon-centered PFRs (Zhang and Zhao 2022). These results enhance our understanding of the complex relationship between SSA, N and PFRs formation in biochar.

3.4 Development of graphical user interface (GUI) tool

To ensure the accessibility of our prediction models to scientists and practitioners, web-based GUI tool was developed (refer to Fig. S8). This tool accepts biochar properties as optimal input variables and provide two essential outputs: it employs regression analysis to predict the content of PFRs (spins g⁻¹), and utilizes classification analysis to predict the types of PFRs (g-factors, including g1, g2, and g3) for practical real-world applications. This scientific tool not only streamlines the prediction process but also equips researchers and scientists in the biochar domain with a sophisticated resource for comprehensive PFR analysis. Consequently, it presents a valuable opportunity to save both time and resources in research and engineering endeavors that focused on exploring PFRs within biochar materials.

4 Conclusion

The double-edged sword property of PFRs presents an intriguing environmental challenge. We aim to harness their beneficial effects, primarily in pollutant cleanup, by increasing their presence while minimizing their detrimental impact when using biochar as a soil amendment for crop growth, wherein lower PFR levels are preferred. However, achieving the desired PFR in biochar is contingent on several factors, presenting a substantial challenge (Wang et al. 2022; Zhang et al. 2022; Prasertpong et al. 2023). Moreover, the exclusive reliance on EPR for PFR detection poses limitations, as EPR is not widely accessible, is often expensive, requires expertise for spectral interpretation, and is inapplicable before biochar preparation, hindering the optimization of preparation parameters for PFRs. This underscores the importance of predictive modeling for PFRs in biochar. In addressing this knowledge gap, we embarked on an ML-based study, incorporating data sourced from reputable journals. This approach resulted in robust predictive performance. Consequently, our study illuminates the role of PFRs in biochar research, effectively filling a critical knowledge void in the field. To enhance the accessibility of PFR prediction for general applications, we have integrated prediction models in a web-based GUI tool. This tool harnesses the power of our ML models to provide a seamless and efficient prediction experience. Furthermore, we plan to incorporate some additional factors such as pH, and heavy metal contents to enhance the model's predictive capabilities for understanding the properties of PFR in various environmental matrices. We aim to advance our comprehension of the environmental implications of biochar and its intricate interactions with PFRs, ultimately contributing to more effective and sustainable environmental applications of biochar.

Availability of data and materials

The datasets used or analyzed during the current study are available in supporting information.

Abbreviations

PFRs:: Persistent free radicals
ML:: Machine learning
SCC:: Spearman correlation coefficient
PT:: Pyrolysis temperature
SSA:: Specific surface area
DI:: Doping index
FS:: Feedstock
RT:: Retention time
C%:: Carbon content
O%:: Oxygen content
H%:: Hydrogen content
N%:: Nitrogen content
GUI:: Graphical User Interface
R² :: Correlation coefficient
RMSE:: Root mean square error
AUROC:: Area Under the ROC Curve
CM:: Confusion Matrix
SPDP:: SHAP partial dependence plots
KDE:: Kernel density estimations plots

References

Bi D, Huang F, Jiang M, He Z, Lin X (2022) Effect of pyrolysis conditions on environmentally persistent free radicals (EPFRs) in biochar from co-pyrolysis of urea and cellulose. Sci Total Environ 805:150339. https://doi.org/10.1016/j.scitotenv.2021.150339
Article CAS Google Scholar
Bihl T, Young II WA, Moyer A, Frimel S (2023) Artificial Neural Networks and Data Science. In: Encyclopedia of Data Science and Machine Learning. IGI Global, pp 899–921. https://doi.org/10.4018/978-1-7998-9220-5.ch052
Bushra B, Remya N (2020) Biochar from pyrolysis of rice husk biomass—characteristics, modification and environmental application. Biomass Convers Biorefinery :1–12. https://doi.org/10.1007/s13399-020-01092-3
Cha D, Park S, Kim MS, Kim T, Hong SW, Cho KH, Lee C (2020) Prediction of oxidant exposures and micropollutant abatement during ozonation using a machine learning method. Environ Sci Technol 55(1):709–718. https://doi.org/10.1021/acs.est.0c05836
Article CAS Google Scholar
Chen Q, Lan P, Wu M, Lu M, Pan B, Xing B (2022) Biochar mitigates allelopathy through regulating allelochemical generation from plants and accumulation in soil. Carbon Res 1(1):6. https://doi.org/10.1007/s44246-022-00003-7
Article Google Scholar
Chuang GC, Xia H, Mahne SE, Varner KJ (2017) Environmentally persistent free radicals cause apoptosis in HL-1 cardiomyocytes. Cardiovasc Toxicol 17:140–149. https://doi.org/10.1007/s12012-016-9367-x
Article CAS Google Scholar
Dellinger B, Lomnicki S, Khachatryan L, Maskos Z, Hall RW, Adounkpe J, McFerrin C, Truong H (2007) Formation and stabilization of persistent free radicals. Proc Combust Inst 31(1):521–528. https://doi.org/10.1016/j.proci.2006.07.172
Article CAS Google Scholar
Deng R, Luo H, Huang D, Zhang C (2020) Biochar-mediated Fenton-like reaction for the degradation of sulfamethazine: Role of environmentally persistent free radicals. Chemosphere 255:126975. https://doi.org/10.1016/j.chemosphere.2020.126975
Article CAS Google Scholar
Dong X, Ma LQ, Gress J, Harris W, Li Y (2014) Enhanced Cr (VI) reduction and As (III) oxidation in ice phase: important role of dissolved organic matter from biochar. J Hazard Mater 267:62–70. https://doi.org/10.1016/j.jhazmat.2013.12.027
Article CAS Google Scholar
El-Naggar A, Lee SS, Awad YM, Yang X, Ryu C, Rizwan M, Rinklebe J, Tsang DC, Ok YS (2018) Influence of soil properties and feedstocks on biochar potential for carbon mineralization and improvement of infertile soils. Geoderma 332:100–108. https://doi.org/10.1016/j.geoderma.2018.06.017
Article CAS Google Scholar
Fang G, Gao J, Liu C, Dionysiou DD, Wang Y, Zhou D (2014) Key role of persistent free radicals in hydrogen peroxide activation by biochar: implications to organic contaminant degradation. Environ Sci Technol 48(3):1902–1910
Article CAS Google Scholar
Fang G, Liu C, Wang Y, Dionysiou DD, Zhou D (2017) Photogeneration of reactive oxygen species from biochar suspension for diethyl phthalate degradation. Appl Catal B Environ 214:34–45. https://doi.org/10.1021/es4048126
Article CAS Google Scholar
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets. Springer. https://link.springer.com/content/pdf/. https://doi.org/10.1007/978-3-319-98074-4.pdf
Gao F, Shen Y, Sallach JB, Li H, Liu C, Li Y (2021) Direct prediction of bioaccumulation of organic contaminants in plant roots from soils with machine learning models based on molecular structures. Environ Sci Technol 55(24):16358–16368. https://doi.org/10.1021/acs.est.1c02376
Article CAS Google Scholar
Golden CE, Rothrock MJ Jr, Mishra A (2019) Comparison between random forest and gradient boosting machine methods for predicting Listeria spp. prevalence in the environment of pastured poultry farms. Food Res Int 122:47–55. https://doi.org/10.1016/j.foodres.2019.03.062
Article Google Scholar
Han Z, Leng Y, Sun Z, Lin H, Wang J, Zou J (2024) Machine learning-based estimation and mitigation of nitric oxide emissions from Chinese vegetable fields. Environ Pollut 343:123174. https://doi.org/10.1016/j.envpol.2023.123174
Article CAS Google Scholar
Hu Y, Zhang B, Guo Q, Wang S, Lu S (2022) Characterization into environmentally persistent free radicals formed in incineration fly ash and pyrolysis biochar of sewage sludge and biomass. J Clean Prod 373:133666. https://doi.org/10.1016/j.jclepro.2022.133666
Article CAS Google Scholar
Hussain SF (2019) A novel robust kernel for classifying high-dimensional data using Support Vector Machines. Expert Syst Appl 131:116–131. https://doi.org/10.1016/j.eswa.2019.04.037
Article Google Scholar
Hussain R, Ravi K, Garg A (2020) Influence of biochar on the soil water retention characteristics (SWRC): Potential application in geotechnical engineering structures. Soil Tillage Res 204:104713. https://doi.org/10.1016/j.still.2020.104713
Article Google Scholar
Jordan MI, Mitchell TM (2015) Machine learning: Trends, perspectives, and prospects. Science 349(6245):255–260. https://doi.org/10.1126/science.aaa8415
Article CAS Google Scholar
Kanthasamy R, Almatrafi E, Ali I, Sait HH, Zwawi M, Abnisa F, Peng LC, Ayodele BV (2023) Biochar production from valorization of agricultural Wastes: Data-Driven modelling using Machine learning algorithms. Fuel 351:128948. https://doi.org/10.1016/j.fuel.2023.128948
Article CAS Google Scholar
Konstantinov AV, Utkin LV (2021) Interpretable machine learning with an ensemble of gradient boosting machines. Knowl-Based Syst 222:106993. https://doi.org/10.1016/j.knosys.2021.106993
Article Google Scholar
Li J, Pan L, Suvarna M, Tong YW, Wang X (2020) Fuel properties of hydrochar and pyrochar: Prediction and exploration with machine learning. Appl Energy 269:115166. https://doi.org/10.1016/j.apenergy.2020.115166
Article CAS Google Scholar
Li J, Zhu X, Li Y, Tong YW, Ok YS, Wang X (2021) Multi-task prediction and optimization of hydrochar properties from high-moisture municipal solid waste: Application of machine learning on waste-to-resource. J Clean Prod 278:123928. https://doi.org/10.1016/j.jclepro.2020.123928
Article CAS Google Scholar
Li J, Pan L, Li Z, Wang Y (2023) Unveiling the migration of Cr and Cd to biochar from pyrolysis of manure and sludge using machine learning. Sci Total Environ 885:163895. https://doi.org/10.1016/j.scitotenv.2023.163895
Article CAS Google Scholar
Liang L, Xi F, Tan W, Meng X, Hu B, Wang X (2021) Review of organic and inorganic pollutants removal by biochar and biochar-based composites. Biochar 3:255–281. https://doi.org/10.1007/s42773-021-00101-6
Article CAS Google Scholar
Liao S, Pan B, Li H, Zhang D, Xing B (2014) Detecting free radicals in biochars and determining their ability to inhibit the germination and growth of corn, wheat and rice seedlings. Environ Sci Technol 48(15):8581–8587. https://doi.org/10.1021/es404250a
Article CAS Google Scholar
Liu Z, Xu Z, Xu L, Buyong F, Chay TC, Li Z, Cai Y, Hu B, Zhu Y, Wang X (2022) Modified biochar: synthesis and mechanism for removal of environmental heavy metals. Carbon Res 1(1):8. https://doi.org/10.1007/s44246-022-00007-3
Article Google Scholar
Liu F, Zhu K, Wang Z, Liu J, Ni Z, Ding Y, Zhang C, Jia H (2023) Production of reactive oxygen species and its role in mediating the abiotic transformation of organic carbon in sandy soil under vegetation restoration. Carbon Res 2(1):35. https://doi.org/10.1007/s44246-023-00074-0
Article Google Scholar
Lundberg SM, Erion GG, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. ArXiv Prepr ArXiv180203888. https://doi.org/10.48550/arXiv.1802.03888
Ma Y, Xie Z, Chen S, Qiao F, Li Z (2023) Real-time detection of abnormal driving behavior based on long short-term memory network and regression residuals. Transp Res Part C Emerg Technol 146:103983. https://doi.org/10.1016/j.trc.2022.103983
Article Google Scholar
McGinnis WD, Siu C, Andre S, Huang H (2018) Category encoders: a scikit-learn-contrib package of transformers for encoding categorical data. J Open Source Softw 3(21):501 https://joss.theoj.org/papers/10.21105/joss.00501.pdf
Article Google Scholar
Moore GJ, Bardagot O, Banerji N (2022) Deep Transfer Learning: A Fast and Accurate Tool to Predict the Energy Levels of Donor Molecules for Organic Photovoltaics. Adv Theory Simul 5(5):2100511. https://doi.org/10.1002/adts.202100511
Article CAS Google Scholar
Ni B-J, Zhu Z-R, Li W-H, Yan X, Wei W, Xu Q, Xia Z, Dai X, Sun J (2020) Microplastics mitigation in sewage sludge through pyrolysis: The role of pyrolysis temperature. Environ Sci Technol Lett 7(12):961–967. https://doi.org/10.1021/acs.estlett.0c00740
Article CAS Google Scholar
Odinga ES, Waigi MG, Gudda FO, Wang J, Yang B, Hu X, Li S, Gao Y (2020) Occurrence, formation, environmental fate and risks of environmentally persistent free radicals in biochars. Environ Int 134:105172. https://doi.org/10.1016/j.envint.2019.105172
Article CAS Google Scholar
Onsree T, Tippayawong N, Phithakkitnukoon S, Lauterbach J (2022) Interpretable machine-learning model with a collaborative game approach to predict yields and higher heating value of torrefied biomass. Energy 249:123676. https://doi.org/10.1016/j.energy.2022.123676
Article Google Scholar
Padarian J, Minasny B, McBratney A (2019) Using deep learning to predict soil properties from regional spectral data. Geoderma Reg 16:e00198. https://doi.org/10.1016/j.geodrs.2018.e00198
Article Google Scholar
Palansooriya KN, Li J, Dissanayake PD, Suvarna M, Li L, Yuan X, Sarkar B, Tsang DC, Rinklebe J, Wang X (2022) Prediction of soil heavy metal immobilization by biochar using machine learning. Environ Sci Technol 56(7):4187–4198. https://doi.org/10.1021/acs.est.1c08302
Article CAS Google Scholar
Pan B, Li H, Lang D, Xing B (2019) Environmentally persistent free radicals: occurrence, formation mechanisms and implications. Environ Pollut 248:320–331. https://doi.org/10.1016/j.envpol.2019.02.032
Article CAS Google Scholar
Prasertpong P, Onsree T, Khuenkaeo N, Tippayawong N, Lauterbach J (2023) Exposing and understanding synergistic effects in co-pyrolysis of biomass and plastic waste via machine learning. Bioresour Technol 369:128419. https://doi.org/10.1021/acs.est.1c08302
Article CAS Google Scholar
Qin J, Chen Q, Sun M, Sun P, Shen G (2017) Pyrolysis temperature-induced changes in the catalytic characteristics of rice husk-derived biochar during 1, 3-dichloropropene degradation. Chem Eng J 330:804–812. https://doi.org/10.1016/j.cej.2017.08.013
Article CAS Google Scholar
Qiu Y, Zhang T, Zhang P (2023) Fate and environmental behaviors of microplastics through the lens of free radical. J Hazard Mater 453:131401. https://doi.org/10.1016/j.jhazmat.2023.131401
Article CAS Google Scholar
Ruan X, Sun Y, Du W, Tang Y, Liu Q, Zhang Z, Doherty W, Frost RL, Qian G, Tsang DC (2019) Formation, characteristics, and applications of environmentally persistent free radicals in biochars: a review. Bioresour Technol 281:457–468. https://doi.org/10.1016/j.biortech.2019.02.105
Article CAS Google Scholar
Suthers PF, Foster CJ, Sarkar D, Wang L, Maranas CD (2021) Recent advances in constraint and machine learning-based metabolic modeling by leveraging stoichiometric balances, thermodynamic feasibility and kinetic law formalisms. Metab Eng 63:13–33. https://doi.org/10.1016/j.ymben.2020.11.013
Article CAS Google Scholar
Tang B, Lu Y, Zhou J, Chouhan T, Wang H, Golani P, Xu M, Xu Q, Guan C, Liu Z (2020) Machine learning-guided synthesis of advanced inorganic materials. Mater Today 41:72–80. https://doi.org/10.1016/j.mattod.2020.06.010
Article CAS Google Scholar
Tian L, Koshland CP, Yano J, Yachandra VK, Yu IT, Lee S, Lucas D (2009) Carbon-centered free radicals in particulate matter emissions from wood and coal combustion. Energy Fuels 23(5):2523–2526. https://doi.org/10.1021/ef8010096
Article CAS Google Scholar
Tian Y, Zhang J, Zuo W, Chen L, Cui Y, Tan T (2013) Nitrogen conversion in relation to NH3 and HCN during microwave pyrolysis of sewage sludge. Environ Sci Technol 47(7):3498–3505. https://doi.org/10.1021/es304248j
Article CAS Google Scholar
Tian K, Liu W-J, Qian T-T, Jiang H, Yu H-Q (2014) Investigation on the evolution of N-containing organic compounds during pyrolysis of sewage sludge. Environ Sci Technol 48(18):10888–10896. https://doi.org/10.1021/es5022137
Article CAS Google Scholar
Ullah H, Khan S, Chen B, Shahab A, Riaz L, Lun L, Wu N (2023) Machine learning approach to predict adsorption capacity of Fe-modified biochar for selenium. Carbon Res 2(1):29. https://doi.org/10.1007/s44246-023-00061-5
Article Google Scholar
Wang Y, Gu X, Huang Y, Ding Z, Chen Y, Hu X (2022) Insight into biomass feedstock on formation of biochar-bound environmentally persistent free radicals under different pyrolysis temperatures. RSC Adv 12(30):19318–19326. https://doi.org/10.1039/D2RA03052G
Article CAS Google Scholar
Wang R, Zhang S, Chen H, He Z, Cao G, Wang K, Li F, Ren N, Xing D, Ho S-H (2023) Enhancing biochar-based nonradical persulfate activation using data-driven techniques. Environ Sci Technol 57(9):4050–4059. https://doi.org/10.1021/acs.est.2c07073
Article CAS Google Scholar
Xiao X, Chen Z, Chen B (2016) H/C atomic ratio as a smart linkage between pyrolytic temperatures, aromatic clusters and sorption properties of biochars derived from diverse precursory materials. Sci Rep 6(1):1–13. https://doi.org/10.1038/srep22644
Article CAS Google Scholar
Yang F, Zhang S, Sun Y, Tsang DC, Cheng K, Ok YS (2019) Assembling biochar with various layered double hydroxides for enhancement of phosphorus recovery. J Hazard Mater 365:665–673. https://doi.org/10.1016/j.jhazmat.2018.11.047
Article CAS Google Scholar
Yang J, Zhang Z, Wang J, Zhao X, Zhao Y, Qian J, Wang T (2023) Pyrolysis and hydrothermal carbonization of biowaste: A comparative review on the conversion pathways and potential applications of char product. Sustain Chem Pharm 33:101106. https://doi.org/10.1016/j.scp.2023.101106
Article CAS Google Scholar
Yu J, Zhu Z, Zhang H, Shen X, Qiu Y, Yin D, Wang S (2020) Persistent free radicals on N-doped hydrochar for degradation of endocrine disrupting compounds. Chem Eng J 398:125538. https://doi.org/10.1016/j.cej.2020.125538
Article CAS Google Scholar
Yuan X, Suvarna M, Low S, Dissanayake PD, Lee KB, Li J, Wang X, Ok YS (2021) Applied machine learning for prediction of CO₂ adsorption on biomass waste-derived porous carbons. Environ Sci Technol 55(17):11925–11936. https://doi.org/10.1021/acs.est.1c01849
Article CAS Google Scholar
Yuan J, Wen Y, Dionysiou DD, Sharma VK, Ma X (2022) Biochar as a novel carbon-negative electron source and mediator: electron exchange capacity (EEC) and environmentally persistent free radicals (EPFRs): a review. Chem Eng J 429:132313. https://doi.org/10.1016/j.cej.2021.132313
Article CAS Google Scholar
Zhang Y, Zhao J (2022) Comparison of different S-doped biochar materials to activate peroxymonosulfate for efficient degradation of antibiotics. Chemosphere 308:136442. https://doi.org/10.1016/j.chemosphere.2022.136442
Article CAS Google Scholar
Zhang Y, Sun X, Bian W, Peng J, Wan H, Zhao J (2020) The key role of persistent free radicals on the surface of hydrochar and pyrocarbon in the removal of heavy metal-organic combined pollutants. Bioresour Technol 318:124046. https://doi.org/10.1016/j.biortech.2020.124046
Article CAS Google Scholar
Zhang Y, Xu M, He R, Zhao J, Kang W, Lv J (2022) Effect of pyrolysis temperature on the activated permonosulfate degradation of antibiotics in nitrogen and sulfur-doping biochar: Key role of environmentally persistent free radicals. Chemosphere 294:133737. https://doi.org/10.1016/j.chemosphere.2022.133737
Article CAS Google Scholar
Zhang P, Liu C, Lao D, Nguyen XC, Paramasivan B, Qian X, Inyinbor AA, Hu X, You Y, Li F (2023a) Unveiling the drives behind tetracycline adsorption capacity with biochar through machine learning. Sci Rep 13(1):11512. https://doi.org/10.1038/s41598-023-38579-8
Article CAS Google Scholar
Zhang Y, Huang Y, Hu J, Tang T, Xu C, Effiong KS, Xiao X (2024) Biochar mitigates the mineralization of allochthonous organic matter and global warming potential of saltmarshes by influencing functional bacteria. Carbon Res 3(1):6. https://doi.org/10.1007/s44246-023-00087-9
Article Google Scholar
Zhang Y, He R, Zhao J, Zhang X, Bildyukevich AV (2023b) Effect of aged biochar after microbial fermentation on antibiotics removal: Key roles of microplastics and environmentally persistent free radicals. Bioresour Technol 374:128779. https://doi.org/10.1016/j.biortech.2023.128779
Article CAS Google Scholar
Zhong S, Zhang K, Bagheri M, Burken JG, Gu A, Li B, Ma X, Marrone BL, Ren ZJ, Schrier J (2021) Machine learning: new ideas and tools in environmental science and engineering. Environ Sci Technol 55(19):12741–12754. https://doi.org/10.1021/acs.est.1c01339
Article CAS Google Scholar
Zhu X, Wang X, Ok YS (2019) The application of machine learning methods for prediction of metal sorption onto biochars. J Hazard Mater 378:120727. https://doi.org/10.1016/j.jhazmat.2019.06.004
Article CAS Google Scholar
Zhu X, Tsang DC, Wang L, Su Z, Hou D, Li L, Shang J (2020) Machine learning exploration of the critical factors for CO₂ adsorption capacity on porous carbon materials at different pressures. J Clean Prod 273:122915. https://doi.org/10.1016/j.jclepro.2020.122915
Article CAS Google Scholar
Zhu Y, Wei J, Li J (2023) Decontamination of Cr (VI) from water using sewage sludge-derived biochar: Role of environmentally persistent free radicals. Chin J Chem Eng 56:97–103. https://doi.org/10.1016/j.cjche.2022.06.015
Article CAS Google Scholar

Download references

Acknowledgements

We thank the authors whose work is included in the machine learning analysis and development of GUI predictive models, and express gratitude to all members of the research team for their support and suggestions.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 42277030, 42307334), and the Introduction Plan for High end Foreign Experts (Grant No. 110000207920220115).

Author information

Authors and Affiliations

College of Natural Resources and Environment, Northwest A and F University, 3# Taicheng Road, Yangling, 712100, China
Junaid Latif, Na Chen, Kai Li, Jianjun Qin, Huiqiang Yang & Hanzhong Jia
Key Laboratory of Low-Carbon Green Agriculture in Northwestern China, Ministry of Agriculture and Rural Affairs, Yangling, 712100, China
Junaid Latif, Na Chen, Kai Li, Jianjun Qin, Huiqiang Yang & Hanzhong Jia
College of Life Sciences, Northwest A and F University, Yangling, 712100, China
Azka Saleem

Authors

Junaid Latif
View author publications
You can also search for this author in PubMed Google Scholar
Na Chen
View author publications
You can also search for this author in PubMed Google Scholar
Azka Saleem
View author publications
You can also search for this author in PubMed Google Scholar
Kai Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Qin
View author publications
You can also search for this author in PubMed Google Scholar
Huiqiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hanzhong Jia
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Data collection, methodology and analysis were performed by Junaid Latif, Azka Saleem, Kai Li, Jianjun Qin and Huiqiang Yang. Supervision and funding were provided by Na Chen and Hanzhong Jia. The first draft of the manuscript was written by Junaid Latif and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Na Chen or Hanzhong Jia.

Ethics declarations

Additional note

The link of the web-based software with GUI for the prediction of PFRs in biochar is available at https://EPFRs.pythonanywhere.com/. Additionally, we present the ML pipeline developed in this study in the form of an open-source Python notebook hosted on GitHub “https://www.github.com/junaid1990/PFRs_ML” under the file name “PFRs_ML.ipynb”.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Handling editor: Fengchang Wu.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Latif, J., Chen, N., Saleem, A. et al. Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models. Carbon Res. 3, 39 (2024). https://doi.org/10.1007/s44246-024-00125-0

Download citation

Received: 25 December 2023
Revised: 25 March 2024
Accepted: 27 March 2024
Published: 22 April 2024
DOI: https://doi.org/10.1007/s44246-024-00125-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models

Abstract

Graphical Abstract

Highlights

Similar content being viewed by others

Application of machine learning in prediction of Pb2+ adsorption of biochar prepared by tube furnace and fluidized bed

Unveiling the drives behind tetracycline adsorption capacity with biochar through machine learning

Machine learning predicting and engineering the yield, N content, and specific surface area of biochar derived from pyrolysis of biomass

1 Introduction

2 Methodology

2.1 Data collection, imputation, preprocessing, and correlation analysis

2.2 Model construction and hyperparameter optimization

2.3 Performance evaluation of regression and classification models

2.4 Model interpretation and feature importance

3 Results and discussion

3.1 Descriptive statistics and correlation analysis

3.2 Development of regression and classification predictive models

3.3 Model-based interpretation and feature exploration

3.4 Development of graphical user interface (GUI) tool

4 Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Additional note

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation