Background

Minimal change disease (MCD) is one of the major causes of nephrotic syndrome (NS), which commonly occurs in senior people and children. The pathogenesis of MCD is not quite clear, but it is likely that its mechanisms are not the same for patients in different age subgroups. The role of podocytes in proteinuria development in MCD has received increasing attention in the last decade [1, 2]. MCD is the most common cause of NS in children greater than 1 year of age, accounting for 70–90% of patients. In NS patients at puberty, this proportion of MCD significantly decreases as other glomerular diseases, such as membranous nephropathy, become more common causes [3, 4]. MCD can be found in approximately 10–15% of adult patients with NS [5, 6], and is the cause of idiopathic NS in 90% of children. As such, for children with idiopathic NS, treatment is usually initiated without the need for renal biopsy unless clinical and laboratory evidence suggests other glomerular diseases. The causes of NS in adults are more varied and renal biopsy is usually required for diagnosis [7]. The diagnosis of MCD in adults currently mainly relies on renal biopsy.

The advantages of renal biopsy are its relatively safety, simplicity, and ease with which it is performed; however, the invasive procedure is not risk-free [8]. The most frequent clinically significant complications following percutaneous renal biopsy include haemorrhage, arteriovenous shunting, infection, nephrectomy and even death in some rare conditions. Additionally, renal biopsies cannot be applied to patients with NS for which it is contraindicated [9, 10] or those who refuse the procedure, while certain hospitals lack nephrologists with sufficient operative skills to perform the biopsy. Therefore, it is necessary and important to explore a non-invasive and practical classification model for distinguishing MCD from non-MCD for adult patients with NS.

It is often quite difficult to select appropriate predictors for such a prognostic model and estimate the regression coefficients for the selected predictors correctly. The traditional method for variable choice is stepwise regression, but the resulting model often has high variance and poor flexibility. In recent years, the introduction of the concept of regularized regression has been a critical breakthrough in the regression analysis field, in which the most well-recognized and widely used method is LASSO regression, proposed in 1996.

LASSO regression sets relatively unimportant independent variable coefficients to zero, which are then excluded from modelling by constructing a penalizing function for all variable coefficients [11, 12]. Its clinical applications are broadly used for supervised learning, which starts with a goal for predicting a known output or target. Supervised learning focuses on classification, which involves choosing one among several subgroups to best describe a new data instance, and prediction which involves estimating an unknown parameter. Supervised learning is also often used to estimate risk. After statisticians’ unremitted efforts for more than 20 years, initially starting from the LASSO concept proposed by R. Tibshirani in 1996 [13] to now, a relatively complete set of theories has been established. All of these factors have made statistical inferences based on punishment estimates and the establishment of our diagnostic prediction model for MCD a reality.

To the best of our knowledge, we are the first to have developed a non-invasive diagnostic model using LASSO logistic regression and determined markers for evaluating disease severity by analysing serological parameters and clinical signs in the present study. Moreover, we assessed the predictive accuracy of this model in internal patient test groups. The establishment of the model could provide physicians with an additional tool for clinical diagnostic evaluation of adult patients with NS, especially when renal biopsy is impractical for various reasons.

Method

Patients and clinical information

A total of 1009 inpatients with NS from January 12,017 to November 52,019 in the Department of Nephrology, the Second Affiliated Hospital of Xi’an Jiaotong University were enrolled in this study. Among them, there were 71 patients with MCD and 938 patients with non-MCD confirmed by renal biopsy.

The inclusion criteria were as follows: a) renal biopsy performed during hospitalization and the lack of a prior renal pathologic diagnosis from other hospital, b) lack of any immunosuppressive treatment or renal replacement therapy before admission, and c) age between 14 and 75 years. Patients meeting any of the following criteria were excluded: a) no renal biopsy reports in our hospital, b) reports of any immunesuppression treatment or renal replacement therapy before hospitalization, c) ages younger than 14 years, and d) a majority of clinical data missing.

Definition

The kidney specimen for renal pathological diagnosis was taken by ultrasound-guided percutaneous renal biopsy, which was performed before administration of immunosuppressive therapy. The specimens were examined under light microscopy and transmission electron microscopy in accordance with standard procedures and were reviewed by more than two experienced renal pathologists. The renal pathological features of MCD are as follows: no glomerular lesions or only minimal mesangial prominence examined by light microscopy; negative staining or low-intensity staining for C3 and IgM examined by immunofluorescence microscopy; and diffuse foot process effacement without electron-dense deposits determined by transmission electron microscopy [5]. The non-MCD renal pathological diagnoses for patients with NS included mesangial proliferative glomerulonephritis (MsPGN), focal segmental glomerulosclerosis (FSGS), membranous nephropathy (MN), and membranoproliferative glomerulonephritis (MPGN).

Statistical analysis

Statistical differences between 2 groups were evaluated by t-tests, Mann-Whitney U tests, chi-square tests, logistic LASSO regression analysis and receiver operating characteristic (ROC) analysis as appropriately described below. Statistical analysis was performed by IBM SPSS Statistics 20 software. Normally distributed data are expressed as the mean ± standard deviation (SD) and were evaluated using unpaired Student’s t tests. Non-normally distributed data are expressed as medians with corresponding 25th and 75th percentiles (interquartile range) and were compared using Mann-Whitney U tests. Categorical variables were analysed using chi-square tests. Logistic LASSO regression analysis was performed to establish the diagnostic model. ROC analysis was performed to measure the diagnostic value of clinical factors and an area under the ROC curve (AUC) greater than 0.70 was considered to indicate good specificity [14]. A P value less than 0.05 was considered statistically significant.

Model selection and validation

Due to the limited size of the data, we choose cross-validation for model selection. Cross-validation divides the obtained data multiple times into different training sets and validation sets. If the prediction model established with the training set shows the corresponding characteristics (discrimination and consistency) on the validation set, it means that the established prediction model works well.

First, we randomly divided the sample data into two parts (75% training set, 25% validation set), then used the training set to train the model and verified the model and parameters with the validation set. Of course, the validation set is only a quarter of the original data. This validation set is considered internal verification. Then, we resampled the data to generate additional training sets and validation sets and repeated the training and validation steps. Finally, we selected the optimal model and parameters.

Procedures

LASSO regression analysis of the data was performed with the “glmnet” package in R software. The variables that achieved the highest AUC in the LASSO regression model were selected and used to construct the logistic diagnostic model, followed by establishment of the nomogram with the “RMS” package. Seventy-five percent of the data were selected randomly as the training set, while the remaining 25% of the data were assigned as the validation set. Then, we assessed how close the actual outcomes were able to be accurately predicted by every nomogram. ROC analysis was carried out by the R package “pROC”.

Results

Patient characteristics

Table 1 shows detailed clinicopathological characteristics and differences between the MCD patients and non-MCD patients. Data are displayed as the median (interquartile range) except sex. There were 14 clinical indicators with significant differences between the two groups, including systolic blood pressure (SBP), diastolic blood pressure (DBP), haemoglobin (HB), platelets (PLT), red blood cell (RBC) count, immunoglobulin G (IgG), immunoglobulin A (IgA), immunoglobulin M (IgM), complement 3 (C3), complement 4 (C4), immunoglobulin E (IgE), complement 1q (C1q), urine protein and eGFR (P < 0.05). The median values of blood pressure in the non-MCD group, including SBP and DBP, were higher than those in the MCD group. Interestingly, although the difference in blood pressure between the MCD group and the non-MCD group was statistically significant, only slight differences in the medians between these two groups were observed (121/79 mmHg vs. 125/80 mmHg), and the blood pressure levels of most patients were within the normal range.

Table 1 Baseline patient characteristics and differences between the MCD group and non-MCD group

Logistic LASSO regression analysis and ROC analysis

LASSO is a popular method used in regression analysis with high-dimensional predictors [15]. Before initially establishing the model, as many independent variables as possible are usually selected to minimize the model deviation due to a lack of important independent variables. In this way, the 25 variables in our study were all substituted into the modelling procedure. The predicted probabilities (PRE-1) were calculated and saved. Among these 25 variable candidates, DBP and blood IgG, IgM, and IgE were suggested to be significant predictors of an MCD diagnosis (Fig. 1b). We noticed that the AUC gradually increased as the number of variables increased, which were used to establish the diagnostic prediction model. Cross-validation was then performed on the lambda grid points, and the lambda value with the smallest cross-validation error was selected (i.e., either 4 or 16 variables).

Fig. 1
figure 1

Construction of the MCD diagnosis model. a Statistics of the ROC curves for the 23 associated parameters between MCD and non-MCD. Two vertical lines are drawn at the values (i.e., 4 and 16) chosen by cross-validation. b LASSO coefficient profiles of the 23 MCD-associated characteristics. A vertical line is drawn at the value chosen by cross-validation

However, the modelling process needs to find the independent variable set that has the strongest explanatory power to the dependent variable, that is, to improve the interpretability and prediction accuracy of the model through independent variable selection. We calculated the AUC of the two models and plotted the ROC graphs (Fig. 2a and b). According to the principle of preventing overfitting, the “DBP + IgG + IgM + IgE” combination was strikingly significant in distinguishing between MCD and non-MCD, which was evaluated via logistic LASSO regression analysis. The AUCs for determining MCD in the combined model consisting of “DBP + IgG + IgM + IgE” and in the 16 parameter-based model were 0.88 and 0.886, respectively (Fig. 2). Finally, according to the obtained lambda value, we refit the model with all the data.

Fig. 2
figure 2

ROC curves of the two MCD classification models. a ROC curve for the “DBP+ IgG+ IgM+ IgE” combination. b ROC curve for the 16-parameter model

Nomogram for predicting the risk of MCD

To provide nephrologists with a quantitative method to predict a patient’s probability of having MCD while avoiding an unnecessary renal biopsy, we constructed a nomogram based on the validation data set and the equation from the discriminant analysis.

It can be seen from Fig. 3 that the composition of the nomogram can be divided into three categories: a) Variables used in the prediction model, for example, DBP and IgG. The line segment corresponding to each variable is marked with a scale, which represents the range of possible values of the variable, and the length of the line segment reflects the outcome of the factor, that is, the size of the contribution of an event; b) The corresponding variable scores, namely, the points at the top of the figure, which represent the corresponding scores of each variable for different values. The total score of the corresponding individual scores after all the variables are added is the Total Points; c) Prediction of the probability of the occurrence of an event: For example, the line “risk” at the bottom of the figure represents the probability of the occurrence of MCD.

Fig. 3
figure 3

Nomogram for predicting the risk of MCD

Discussion

In the present study, we established and validated a novel prognostic tool based on the “DBP + IgG + IgM + IgE” combination to improve the prediction of an MCD diagnosis and avoiding an unnecessary biopsy. This proposed model can successfully predict the probability of an MCD diagnosis in patients with NS.

The present study was designed as a retrospective cohort study. Compared with previous studies, our study included more parameters, including C3, C4 and C1q, etc., all of which are known as important biomarkers for common glomerular diseases with NS. As such, the model tool established from these parameters could be more powerful and accurate for the prediction of MCD diagnosis [16, 17]. Based on statistical analysis, the current understanding of the mechanisms of MCD development and practical clinical experience, 12 out of 25 useful parameters were selected as predictors for MCD diagnosis. The 12 parameters included SBP, DBP, HB, PLT, RBC, IgG, IgA, IgM, C3, C4, IgE, and C1q. Among them, four indicators, including DBP, IgG, IgM and IgE, were specifically screened. Similar results were obtained by using commonly used multiparameter analyses and logistic LASSO regression analyses, which confirmed that these four parameters were strikingly significant in differentiating an MCD from a non-MCD diagnosis. Furthermore, 25% of patient data were used to validate the derived equations for classifying MCD.

Recently published studies demonstrated that blood pressure variability (BPV) is considered an important cardiovascular disease (CVD) risk factor, with evidence suggesting that it is associated with clinical outcomes [18,19,20,21]. The results reported by Sethna et al. showed that the application of the association of BPV with renal outcomes could be extended to patients with primary glomerulopathy. Moreover, they further described a 5% increase in the occurrence of the composite endpoint (ESRD or eGFR decline < 40%) with a one-unit increase in the SD of SBP [22]. Aggressively lowering systolic blood pressure which lowers diastolic pressure to less than 70 mmHg may increase mortality risk for patients with chronic kidney disease according to an observational study [23]. Given these clinical observations, we speculate that this slight difference in blood pressure between MCD patients and non-MCD patients might affect the clinical outcome and is possibly clinically significant. We will continue to study the effect of blood pressure in this population of patients.

The exact pathogenesis of MCD has not yet been well elucidated. Previous studies have shown that the serum IgE might play an important role in the pathogenesis of MCD and might serve as a prognostic indicator in terms of steroid responsiveness in MCD patients [24, 25]. There is a large body of evidence to suggest that T cells, especially type 2 helper T cells, play a key role in driving podocyte injury in MCD [26]. Of all cytokines released by Th2 cells, it is frequently associated with increased levels of interleukin-4 (IL-4) and raised production of IL-13. IL-13 regulates the switching of immunoglobulin production toward IgE [27,28,29]. Several studies also revealed increased serum IL-13 levels in patients with MCD [30]. This finding is consistent with our finding for the performance of increased serum IgE levels in predicting MCD. The predictive value of serum IgE for MCD diagnosis has gradually become accepted in recent years [31].

Other studies have revealed that the levels of serum IgG and IgM decrease during the relapse of steroid-sensitive nephrotic syndrome [32,33,34]. Disproportionally decreased levels of IgG subclasses, especially of IgG1 and IgG2, cause a decrease in serum total IgG levels during relapses [33, 35]. Decreased serum IgG levels may result from urinary loss of IgG or an impaired class switch from IgM to IgG in B cells [33]. Our study showed a significantly decreased IgG level in patients with MCD and an increased serum IgM level, as expected. Although some studies have shown the glomerular deposition of these immunoglobulins in kidney specimens of MCD patients [25, 34, 36, 37], they seemed more likely to be non-specific makers of any glomerular disease due to potential blood immunoglobulin contamination in the glomerular capillaries, especially of IgG or IgM. Based on the papers published available, there are few studies on the relationship between these parameters and pathological indicators reflecting the severity of the disease.

We realize that there are some limitations in our study. This was a retrospective cohort study, and it is critical to further verify the sensitivity and specificity of the model for the diagnosis of MCD in clinical practice. The model was established based on data from patients of Chinese ethnicity, especially adults, so its generalizability is limited, and it may be susceptible to the causes of the inherent biases of such a study format. We will further validate the accuracy and repeatability of the prediction model for an MCD diagnosis with prospective studies in multicentre clinical trials.

Few clinical risk prediction models for an MCD diagnosis have been developed. Combining multiple factors (DBP, IgG, IgM and IgE) displayed a better effect in predicting the diagnosis of MCD than that of prior reports [16, 17, 38]. The measurement of serum immunoglobulins is cheaper and faster than that of other indexes and is easily available in general hospitals. Furthermore, we utilized a more scientific statistical method to avoid the interference of human factors on the construction of the model and randomly selected a subset of patient data to internally test our model.

Conclusions

In summary, we first established a diagnostic model that combined multiple factors (DBP, IgG, IgM and IgE) to effectively distinguish MCD patients from non-MCD patients among adults with high sensitivity and specificity. The four parameter-based classifier potentially offers clinical value in predicting a patient’s probability of having MCD diagnosis, avoiding renal biopsy in adults with NS.