Introduction

The literature has established that the academic achievements of students are influenced by the efforts exerted by the agents involved in the education process, namely, the school attended, the parents lived with, and the students. Despite the importance of effort, research on educational achievement has not adequately investigated the role of effort as an independent input in the education process, theoretically and empirically. Although student effort (e.g., subjective perceived effort, objective time spent learning) plays a crucial role in educational outcomes, parental effort (e.g., family involvement, parents’ interest in their children) and school effort (e.g., classroom instruction, school management) are also vital (Gamboa & Waltenberg, 2012; Edmark & Persson, 2021; Golley & Kong, 2018; Broer et al., 2019; Dietrich et al., 2021). However, not every effort is equally important in achieving the desired outcomes. Notably, the literature has not investigated each effort’s relative importance in explaining academic achievements.

This study aims to answer two questions: what is the relative importance of the individual effort variables in explaining academic achievements, and because some fields of influence appear more critical than others in predicting the outcome variable (Adler et al., 2018), what is the collective importance of these fields of influence. These answers will demonstrate the relative contribution of the effort variable group, which comprises school effort, parental effort, and student effort, in explaining the academic achievements.

To answer these questions, I use the China Education Panel Survey 2013 − 2015, a rich dataset that follows a cohort of sampled students throughout their junior high school years. The data are collected through comprehensive questionnaires completed by students in 7th and 9th grades, providing detailed information on their attitudes toward school and education. To construct measures of effort exerted by the different agents in the education process, I use a range of indicators. For students, I use self-reported answers to questions such as their reported time spent studying and their motivation to set and work toward academic goals. For parents, I examine their level of involvement in their children’s education, such as whether they supervise their child’s homework or frequently attend meetings with teachers. To capture the level of effort exerted by schools in the education process, I adopt 15 constructed effort variables, such as the implementation of interventions that support academic growth and development by the school, the availability of academic guidance services for students, and the type of disciplinary methods employed by the school’s administration.

However, the empirical challenge is selecting economically and statistically significant effort variables that predict academic achievements among an adequate number of potential predictors. Traditional statistical methods techniques such as least squares regression (OLS) are limited in achieving accurate variable selection and good out-of-sample performance, especially when the number of regressors is large (Steyerberg & Harrell, 2016). Furthermore, the complex interplay between effort variables and academic performance might not be captured by parametric assumptions (Roick & Ringeisen, 2017), necessitating the use of estimation techniques that offer enhanced flexibility.

Machine learning techniques are ideal for addressing these challenges and answering this paper’s two research questions. These techniques select influential features and model complex relationships between input variables and outcomes (Takeda et al., 2013). They are also ideal for managing high-dimensional data with many predictors while avoiding overfitting, a common problem in traditional statistical methods. In this study, I leverage four state-of-the-art machine learning tools—Lasso, Random Forest, AdaBoost, and Support Vector Regression (SVR)—to predict which individual effort variable or effort group is most predictive of academic performance. Lasso is a widely used regularization regression method for variable selection, which helps identify the most relevant effort variables. The remaining three methods model nonlinear relationships, which is crucial because of the complex interplay among varying effort variables and academic performance. Additionally, these methods rank variables by their prediction power, providing intuitive comparisons of the relative importance of different effort variables. Overall, using machine learning techniques, I perform a comprehensive analysis of many potential predictors and identify the most relevant effort variables for predicting academic achievement.

To assess the relative importance of each effort variable, I employ the aforementioned machine learning algorithms, using all 45 effort variables and 15 controlled variables to predict academic performance. I identify the top 20 most important effort variables based on the coefficient magnitudes in Lasso and SVR, SHapley Additive exPlanations (SHAP) value in Random Forest, and importance score in AdaBoost. Among the top 20 predictors, most variables relate to school effort and parental effort, and some variables relate to student effort. Parents’ demanding requirement is the most significant predictor among all individual variables. Furthermore, students’ and parents’ educational expectations exert a greater influence on academic achievements than other factors do. The practice of inviting parents to school events is the third most predictive factor for students’ grades. These results indicate that if parents and schools prioritize education and highly value academic achievement, they may be more likely to provide a supportive (both at home and school) environment and encourage students’ academic pursuits (Gbollie & Keamu, 2017).

To assess the importance of effort-related group variables, I include school effort, parental effort, and student effort variables independently in the model to predict academic performance. The results indicate that “school effort” is the most influential predictor of academic achievement, followed by parental effort, and students’ effort has a limited impact. These findings underscore the crucial role of a supportive school environment, namely, school events and teacher supervision, in promoting academic success (Park et al., 2017). Furthermore, the heterogeneity test between male and female students finds that for girls, school effort has a greater impact on academic achievement than parental effort does; the opposite is true for boys. Thus, particularly in China, where parental investments often favor boys in multi-children families (Ling, 2017), girls who receive more attention and financial incentives from schools and teachers can be more equipped to navigate these changes and achieve academic excellence (Tang & Horta, 2021). These findings suggest that gender-specific interventions and support programs are necessary to improve academic outcomes for girls during the critical period of intellectual and academic development of junior high school.

Methodologically, this study contributes to the understanding of the relative contribution of multiple variables to students’ academic achievements by using modern machine learning models (Masci et al., 2018). Traditional statistical models may provide biased results due to the unknown functional form of how effort affects grades, potential interactions between effort variables, and collinearity. By contrast, machine learning techniques offer flexibility, feature selection, model validation, and robustness to multicollinearity (Ogutu et al., 2012). Thus, the machine learning approaches in this study obtain good out-of-sample prediction accuracy by selecting relevant variables and reducing overfitting (Dalalyan et al., 2017). This strategy provides a feasible and superior approach to narrowing outcome predictors, especially in the case of large high-dimensional databases in educational research. The ability to accurately predict unequal educational outcomes deepens the understanding of the effort-related factors that drive educational success and clarifies a strategic direction for additional compensation and policy intervention.

The findings of this study contribute to the literature by providing empirical evidence for the theoretical prediction that differences in how parents, schools, and students perceive and act in achieving higher academic performance can lead to disparities in academic outcomes (e.g., Edossa et al., 2018). This study emphasizes that academic success is not solely determined by objective structural factors, such as family background and school resources (Berkowitz et al., 2017) but also by latent motivation and tangible action efforts (Gneezy et al., 2019). As such, this study underscores the importance of increasing the effort to improve academic results and highlights the necessity to stimulate effort as a more feasible and effective approach than modifying social background or school resource allocation (Yeager & Dweck, 2012).

The identification of school effort as the most significant predictor of academic success has critical policy implications. Policymakers can focus on promoting various forms of school effort, such as creating a supportive and positive learning environment, providing individualized tutoring for students, encouraging teacher supervision, and enhancing student and parental participation in school events. By prioritizing school effort, policymakers can more effectively improve academic achievement than by relying solely on material resources or higher-quality teaching faculty and facilities.

The paper is organized as follows. The theoretical backgrounds are illustrated in Sect. 2, and the data source and variables are briefly introduced in Sect. 3. The methodology includes benchmark methods, feature extraction principles, and machine learning techniques and is presented in Sect. 4. The relevant results and discussions are elaborated in Sect. 5. The conclusion and potential policy implications are provided in Sect. 6.

Theoretical background

Sociologists have long been concerned with the extent to which inequality of opportunity, caused by circumstantial factors and family endowment, contributes to inequality of outcomes. Blau and Duncan (1967) were the first to establish a dual-driven theoretical model of family resource investment and self-motivated effort from a micro perspective. They proposed the status attainment model, using path analysis to explore the extent to which the occupational attainment of the population in the United States is influenced by their family background and level of education at the micro-level (Ganzeboom et al., 1991; Winship, 1992). They regarded an individual’s academic status attainment as the result of multiple factors that emerge sequentially throughout their life cycle; thus, they developed a pathway model incorporating innate and self-induced elements and inter-and intra-generational mobility into the analysis.

Although the classical status attainment model has been developed from structural and psychosocial perspectives, this study argues that several concerns are still worth discussing and expanding. Drawing on Bourdieu’s (1984) theory of habitus and cultural capital, and Baumrind’s (1971), Lareau’s (2002), and others’ studies of family parenting styles and school effectiveness, effortful devotions, namely, cognitive capacity, non-cognitive motivation, and observable time-devoted, may also be considered unavoidable factors impacting on educational outcomes (Deluca & Rosenbaum, 2001; Guan et al., 2006; Inzlicht et al., 2018; Shenhav et al., 2021). However, status attainment research has mainly disregarded the effort factor, which treats human capital invested in education as an effort factor concerning the family background to explain offspring educational outcomes (Caldas & Bankston, 1997; Sewell & Shah, 1968; Sheldon & Epstein, 2005). Human capital input is not equivalent to the individual effort factor; it primarily serves as a transmission and mediator between paternal and offspring status (Bourdieu, 2002; Kohn et al., 1990). The effects of actual psychological and behavioral efforts as independent exogenous variables and the mechanisms via which they function have not been examined.

To improve the understanding of the actual psychological and behavioral efforts exerted during task performance, Steele’s (2020) framework on effort, which distinguishes between objective and subjective effort, offers valuable insights. Steele (2020) defined “objective effort” as tangible and measurable actions that reflect the amount of energy or work invested in a task, and “subjective effort” encompasses intangible internal experiences and attitudes associated with a task or goal. To operationalize effort in the context of this study, I adopt Steele’s (2020) definition of effort. In this study, effort is examined at the individual student level, and at the level of parents and schools. Specifically, students’ effort can be observed in how they approach education broadly, how they respond to classroom interactions with teachers, how much time they dedicate to and how much motivation they have for learning, how much inspiration they receive from family, how they conduct the necessary tests, and some other objective and perceived effort exerted to meet academic needs (Dunlosky et al., 2020; Mudrak et al., 2021). Similarly, by viewing “parents” and “schools” as behavioral agents in the same manner as individual students, their efforts to improve students’ educational outcomes during the task can also be defined as objective and subjective efforts rather than solely focusing on educational investment behaviors. These efforts can include psychologically devoting attention, stimulating motivation, instilling a sense of belief, and behaviorally spending additional time and energy on academic tasks (Stables et al., 2014; Ng & Wei, 2020).

These various efforts, shaped by family or school, modify educational behaviors that result in varying levels of academic performance, and increases social status (Burić & Sorić, 2012; Zimmerman, 2013). Efforts and effort-based capability can also supplement outcome disparities caused by structural factors when acting in different directions and with different forces (Darling-Hammond, 2018). If students inherently believe in devoting attention, parents and relevant schools would spend more time and energy on academic tasks, and the student’s favorable outcomes would increase. Enhancing students’ educational success is challenging, if not inconceivable, if the three key agents do exert the effort, regardless of the student’s family background or school quality (Richardson et al., 2012). By taking a more nuanced approach than that in the literature to the role of effort in educational outcomes, understanding how different factors interact to influence student achievement can improve.

Therefore, highlighting the potentiality and capability of efforts to reduce outcome inequalities is rational. The learning motivations and exertions of students, parents, and schools can, to some extent, complement, compensate, and counter structural disadvantages in achieving equal outcomes (Amis et al., 2020; Hirsch, 2019). Additionally, students’ acquisition of social and academic status is assuredly an integrated process affected by circumstantial and effort-related factors (e.g., Hodge et al., 2018) and a final collaborative result among efforts of parents, schools, and individuals (De Fraja et al., 2010). The distinction among the three agents’ efforts is more akin to a spectral range than a dividing line; in reality, every action can be determined by a combination of these three components. An overemphasis on the influence of one level of factors at the expense of others may lead to reductionism or ecological fallacy in methodology (Curran & Bauer, 2011). Therefore, a thorough analysis of the three sides must be considered, especially to examine which aspect dominates students’ learning progress, resulting in disparities in student achievements under an integrated framework. More specifically, this study aims to determine the extent to which factors can have the most predictive effects on educational outcomes, while all three types of efforts are considered simultaneously. The conceptual framework for this study is illustrated in Fig. 1, which provides a comprehensive overview of the research progress.

Fig. 1
figure 1

Conceptualized framework

Data and measures

Data source

This study uses data from the China Education Panel Survey (CEPS), a nationally representative survey designed and implemented by Renmin University in China. The primary aim of the survey is to investigate the impact of various factors, namely, family, school, individual, and macro-social structures, on students’ academic achievements. The survey was conducted by selecting a sample of 112 schools, 438 classes, and approximately 30,000 students by using a national sampling method.

The large sample size of the CEPS is a substantial strength of this study because it generates more accurate averages, identifies outliers, and yields reduced margins of error (Wang, 2016), enhancing the external validity of the findings. Moreover, the survey provides detailed information on three key agents’ efforts and demographic characteristics, namely, individual innate ability, family background, and school resources, which are essential for understanding the internal and external environments of students (Ma & Wu, 2019; Xu, 2016) and, thus, this analysis.

I drop missing values for all relevant variables and remove extreme values to avoid potential bias from outliers. The final estimation sample comprised 24,974 students’ information.

Measures

Dependent variable

Academic achievement To measure academic achievements, I use students’ total test scores: the sum of Chinese, Mathematics, and English scores. The data were sourced from the students’ term exam scores across two consecutive school years and provided by their respective schools. Table 1 shows that the average score for these students is 236 points, accounting for 52.4% of the maximum possible score of 450 points.

Table 1 Descriptive statistics

Predictors

Student effort Psychological and behavioral efforts play a crucial role in improving educational attainment (Schunk & DiBenedetto, 2020). Therefore, the effort students invest in their academic work is a significant factor influencing their academic achievement. To assess student effort, I use multiple survey measures, namely, students’ (1) self-reported time spent studying and completing homework assignments, (2) subjective perception of their effort levels, (3) proactive efforts to seek additional help or resources when needed, (4) motivation to set and pursue academic goals, and (5) engagement in extracurricular activities that promotes academic growth and development. These measures are encoded into categorical variables, with higher values representing a greater level of student effort.Footnote 1 I include 15 proxies for student effort.

Parental effort Parental effort is assessed based on parents’ level of involvement and support in their child’s education, and their attitudes toward their child’s academic performance (Avvisati* et al., 2010). Specifically, this study measures parental effort by using four variables: parents’ (1) engagement in their child’s studies, (2) willingness to discuss their child’s progress with teachers, (3) academic goals and career aspirations for their child, and (4) role in modeling good study habits and time management skills at home. Higher values on these measures indicate a higher level of parental effort in contributing to their child’s educational success. I include 15 variables to measure parental effort.

School effort This study measures school effort by using five indicators related to activities that extend beyond the mandatory requirements of educational institutions (Baños et al., 2019): (1) implementation of interventions that support academic growth and development; (2) parent and community involvement in school activities and events; (3) provision of academic and life guidance to students; (4) practice of grouping students based on similar abilities; and (5) disciplinary methods employed by schools, such as offering night study sessions or individualized academic tutoring. Higher values on these measures indicate greater school effort in fostering students’ academic success. I include 15 school effort variables.

Descriptive statistics

Methodology

This study incorporates 45 effort variables as the key independent variables. Understanding the relative contribution of each variable to students’ academic performance is empirically challenging. First, the functional form of how effort affects grades is unknown. Various efforts may interact and have nonlinear effects on a student’s academic performance. Assuming a simple, additive linear model using conventional OLS imposes strong parametric assumptions and might provide biased results. Multiple variables may exhibit collinearity, making isolating their marginal effects difficult. To alleviate these concerns, I employ machine learning techniques, which offer several benefits over traditional statistical approaches.

  1. (1)

    Flexibility: Machine learning algorithms can learn complex and nonlinear relationships between independent and dependent variables without imposing strict assumptions.

  2. (2)

    Feature selection: Machine learning can automatically identify the most relevant variables among the 45 effort variables, providing a more concise and interpretable model than those in the literature.

  3. (3)

    Model validation: Machine learning models leverage techniques such as cross-validation, which helps ensure the external validation of the findings and reduces the risk of overfitting.

  4. (4)

    Robustness to multicollinearity: Machine learning methods, such as regularization techniques, can manage situations where predictor variables exhibit collinearity, mitigating the adverse effects on the model’s performance.

By leveraging these advantages, machine learning techniques enable a more nuanced exploration of the relationship between various effort variables and students’ academic achievements, ultimately deepening the understanding of the factors that drive educational success.

Benchmark model: OLS

To investigate the relationship between effort factors and students’ academic achievement, I first estimate the following baseline linear regression model:

$$\begin{array}{c}{y}_{i} = StudentEffort{\beta }_{1}+ParentalEffort{\beta }_{2}+SchoolEffort{\beta }_{3}+X\sigma +{u}_{i} ,\#\end{array}$$
(1)

where \({{\varvec{y}}}_{{\varvec{i}}}\) is the total test scores of student \({\varvec{i}}\), and \(\mathbf{S}\mathbf{t}\mathbf{u}\mathbf{d}\mathbf{e}\mathbf{n}\mathbf{t}\mathbf{E}\mathbf{f}\mathbf{f}\mathbf{o}\mathbf{r}\mathbf{t},\) \(\mathbf{P}\mathbf{a}\mathbf{r}\mathbf{e}\mathbf{n}\mathbf{t}\mathbf{a}\mathbf{l}\mathbf{E}\mathbf{f}\mathbf{f}\mathbf{o}\mathbf{r}\mathbf{t}\), and \(\mathbf{S}\mathbf{c}\mathbf{h}\mathbf{o}\mathbf{o}\mathbf{l}\mathbf{E}\mathbf{f}\mathbf{f}\mathbf{o}\mathbf{r}\mathbf{t}\) are vectors that include all student effort, parental effort, and school effort variables, respectively. \({\varvec{X}}\) is a control variable vector, namely, students’ demographics, parents’ background characteristics, class-fixed effects, and year-fixed effects. By integrating control variables in the regression, the comparison can be restricted to students with similar characteristics, which improves the precision of estimates of the effect of effort factors. \({{\varvec{u}}}_{{\varvec{i}}}\) is the error term. The regression equation was estimated using ordinary least squares (OLS). To make the coefficients comparable, I normalize all the right-hand-side variables. Thus, the coefficients of the effort variables can be interpreted as the change in academic achievements associated with a one standard deviation change in the corresponding effort variable. I use this normalization procedure to compare the effects of different types of effort variables on academic performance in a standardized manner.

Individual variable importance using machine learning tools

In this section, I use multiple machine learning techniques to examine the explanatory power of each effort variable. I first provide a brief introduction to the machine learning models used and then explain the analysis procedure.

Lasso The first method used is Lasso, a widely used regularization regression technique. Lasso regression performs both feature selection and regularization to enhance the predictive accuracy and interpretability of statistical models. The objective function of Lasso is to minimize the following:

$$\begin{array}{c}SSR+\lambda \sum_{j = 1}^{p}\left|{\beta }_{j}\right|\#\end{array}$$
(2)

Lasso is similar to regression in that it still requires the imposition of parametric assumptions. The first term that I minimize is the sum of squared residual (SSR), equivalent to regression. However, Lasso includes a penalty term (the second term) that ensures that it does not overfit the data and delivers good predictive performance under approximate sparsity. A key aspect of operationalizing Lasso is tuning the “complexity cost” λ, which involves selecting the appropriate value for the penalty level. The best practice is to use cross-validation to identify the optimal value for this hyperparameter.

Random forest The second method used is Random Forest, a tree-boosting method that achieves high prediction accuracy in many prediction tasks. Random forest is a flexible nonparametric model that can manage complex interactions among variables and is well suited for high-dimensional data. It works by building an ensemble of decision trees on random subsets of the data and variables. This approach helps reduce overfitting and improve the accuracy and robustness of the model. The final prediction is then made by averaging the predictions of all the decision trees in the ensemble. Random Forest also provides information on variable importance, which can help identify the most important predictors of academic outcomes. To avoid overfitting, Random Forest also has hyperparameters, such as the number of trees in the ensemble, the maximum depth of the trees, and the minimum number of samples required to split a node. Cross-validation is used to select the optimal values of these hyperparameters.

AdaBoost The third method used is another ensemble method, AdaBoost, a boosting algorithm that iteratively combines weak classifiers to create a strong classifier. AdaBoost is effective in a wide range of prediction tasks and is particularly useful for identifying important predictors. It works by assigning higher weights to observations misclassified by the current set of weak classifiers, emphasizing these observations in the next round of classification. By iteratively improving the classification accuracy of the weak classifiers, AdaBoost creates a strong classifier that accurately predicts the outcome variable. One advantage of AdaBoost is its ability to identify important predictors by assigning higher weights to more informative variables for classification. This allows a focus on the most important variables and reduces the dimensionality of the data, which can improve the accuracy and interpretability of the model.

Support vector regression (SVR) The last method used is SVR. SVR constructs a hyperplane in a high-dimensional space that maximally separates the data points into two classes: one for the outcome variable below a certain threshold and the other for the outcome variable above the threshold. SVR is particularly useful for identifying important variables. By selecting the most informative variables for inclusion in the kernel function, which is used to transform the input variables into a higher-dimensional space, SVR can improve the predictive accuracy of the model while reducing the dimensionality of the data. Another advantage of SVR is its ability to manage nonlinear relationships between the input and the outcome variable. SVR achieves this improvement by using a kernel function where nonlinear relationships can be more easily captured. Thus, SVR is a powerful tool for predicting continuous outcome variable and identifying the most important predictors.

Procedures for selecting the most important variables

To gain insights into the importance of individual variables, I use the following procedures:

  1. (1)

    All variables were standardized before analysis to ensure comparability.

  2. (2)

    The dataset was split into a training set (80%) and a test set (20%) to evaluate the performance of the models.

  3. (3)

    The aforementioned four machine learning models were trained on the training set, and hyperparameters were selected using tenfold cross-validation. In AdaBoost, decision trees were used as weak classifiers.

  4. (4)

    The feature importance was sorted in descending order, and the top 20 features were selected, excluding control variables. For Lasso and SVR, the absolute value of the coefficient magnitude was used to measure variable importance. For Random Forest, the mean absolute SHAP value was used. In AdaBoost, an importance score, calculated by summing the weights of the samples misclassified by the weak classifiers in each iteration of the boosting process, was used to measure variable importance.

  5. (5)

    The test mean squared error (MSE) was computed to assess the goodness of fit of the models.

Group variable importance using machine learning tools

Procedures for assessing group variable importance

To investigate the relative importance of each variable group (i.e., student effort, parental effort, and school effort) in predicting academic outcomes, I use the following procedures:

  1. (1)

    Again, all variables were standardized, and the dataset was split into a training set (80%) and a test set (20%).

  2. (2)

    Machine learning models were trained using only the variables in each of the three groups separately: student effort, parental effort, and school effort. This method allowed for a direct comparison of the relative importance of each variable group in predicting academic outcomes.

  3. (3)

    The test MSE was computed for each separate model, with the variable group with a smaller test MSE indicating a higher model fit and greater importance of the variables in that group.

By comparing the test MSE across the models, I gained insights into the relative importance of each variable group in predicting academic outcomes. These results are suitable to inform educational policies and interventions aimed at improving academic performance, such as focusing on increasing parental involvement or improving teaching practices in schools.

Results and discussion

Benchmark model: OLS

Figure 2 presents the baseline OLS point estimates and 95% confidence intervals. To conserve space and avoid distraction from the focus of this analysis, I do not report the coefficients of control variables. Figure 2 suggests that several factors have a significantly positive impact on academic performance. Specifically, parents’ expectations for their child’s academic performance, student and parental expectations, and the presence of teachers during night study sessions have a positive and statistically significant effect on academic outcomes. Notably, parents’ expectations have the strongest positive influence on educational achievement.

Fig. 2
figure 2

Standardized beta coefficient plot of OLS estimated effects on academic achievements

In contrast, certain teaching methods, such as teacher-led lectures, stratified teaching, and bilingual teaching, have a statistically significant and negative effect on academic performance. Parents’ involvement in tutoring and supervision also has an adverse impact. Furthermore, the frequency of student and parent visits to museums, and parents’ strictness regarding homework and exams have negligible effects on academic achievement; their coefficients are centered around zero. Similarly, variables such as extra homework, attending extra Chinese classes or summer/winter camps, and self-perceived persistence and faith in learning, have minimal impact on academic performance; their coefficients are small.

However, interpreting the results with caution is essential. First, including too many independent variables in an OLS model can lead to overfitting and may result in nonsignificant predictors included in the model. Second, the baseline linear model imposes strong parametric assumptions that may not hold in practice. Therefore, considering alternative methods that improve the capturing of the complexity of the data and identify the most important predictors of academic outcomes is crucial.

Individual variable importance

Figures 3, 4, 5, 6 display the top 20 important predictors for academic achievement as determined by four models: Lasso, Random Forest, AdaBoost, and SVR.

Fig. 3
figure 3

Coefficient (Absolute) magnitude of variables assessed by lasso

Fig. 4
figure 4

Mean absolute SHAP value of variables assessed by random forest

Fig. 5
figure 5

Importance scores of variables assessed by ada boost

Fig. 6
figure 6

Coefficient (Absolute) magnitude of variables assessed by SVR

Lasso, AdaBoost, and SVR produce remarkably similar results. Figures 3, 5, and 6 reveal that the most important predictor is parents’ requirements for their child’s academic performance (ParRequirement). Parents who set demanding requirements and actively engage in their child’s education can provide valuable support and resources that contribute to their child’s success (Boonk et al., 2018). Following ParRequirement, all three models predict that both students’ expectations (StuExpectation) and parents’ expectations (ParExpectation) are highly instrumental in forecasting academic achievement. This finding implies that students who possess higher levels of intrinsic motivation and receive encouragement from their parents tend to perform better academically (Ryan & Deci, 2020). Furthermore, the robust predictive power of schools’ practice of inviting parents to attend school events (SchClassReport) and having teachers supervise night study (SchSupervision) underscores the positive impact of a supportive school environment on students’ academic success (Deming et al., 2014). Overall, these findings stress the importance of parental involvement and supportive school environments in promoting students’ academic success. By prioritizing education and providing a supportive and engaging learning environment, parents and schools can help students reach their full potential (Gbollie & Keamu, 2017).

The Random Forest model produces similar predictors, although the relative rank differs slightly from that of the other three models. Figure 4 shows that student self-expectations (StuExpectation) rank first for feature importance, with a mean absolute SHAP value of more than 16. The school employing group discussion (SchGroupDiscussion) as a main teaching approach ranks second in determining students’ academic performance because it can promote a collaborative learning environment that promotes critical thinking, communication, and problem-solving skills, leading to a more engaged and active learning experience (Al-Samarraie & Saeed, 2018). Parents’ requirements (ParRequirement) and expectations (ParExpectation) in their child’s academic records are also crucial, as shown in the previous three models.

An alternative method to interpret the results is examining the number of parental, school, and student variables ranked among the top 20 most important predictors. The machine learning models indicate that school effort is the most important factor, and student effort is the least important. For instance, in Lasso, among the top 20 predictors, 10 variables related to school effort, 7 to parental effort, and 3 to student effort. In SVR and AdaBoost, 8 variables pertained to school effort, with a higher relative rank among the top 20 predictors, and 5 variables related to student effort. Overall, school effort-related variables have greater predictive power. They are critical because they reflect the quality and effectiveness of the educational environment. A school that provides pupils with support and resources while promoting a positive and involved learning community is more likely to improve academic achievement than a school that does not focus on these traits (Berkowitz et al., 2017).

Although the analysis examines variable importance, comparing the four models based on their in-sample and out-of-sample performance is also important. Table 2 presents the MSE values on the training and test data, with a lower MSE value indicating better performance in predicting the outcome variable. The results demonstrate that the Random Forest model outperforms the other models in test MSE, with a relatively low value of 2318, suggesting that it fits the test data better than the other three models do. AdaBoost, another ensemble method, performs worse than Random Forest with a test MSE of 3195, although it performs slightly better than Lasso (test MSE = 3363) and SVR (test MSE = 3377). Because of its flexibility and strong performance, I use the Random Forest model to assess the importance of the group variable in the next section. I use Lasso regression as a robustness check because of its interpretability.

Table 2 Comparison of machine learning model performance in predicting outcome variable

Group variable importance

Figures 7 and 8 display the results of the variable group importance analysis. The Random Forest and Lasso models predict that the “school effort” category is the strongest predictor of educational outcomes, followed by parental effort and individual effort. Using school effort variables alone yields better model prediction (lower MSE) than using parental or student effort variables.

Fig. 7
figure 7

Group variable importance assessed by random forest

Fig. 8
figure 8

Group variable importance assessed by lasso

Schools that devote considerable effort are more likely to motivate and engage students in their academic work than those that do not, leading to better academic results. One possible explanation for this phenomenon is that a supportive school environment can create a sense of belonging and motivation among students, resulting in increased engagement and effort in their academic pursuits (Won et al., 2018). When parents are involved in school events and teachers provide academic support outside regular class time, students may perceive that the school community values and supports their education (Đurišić & Bunijevac, 2017). This perception can lead to improved academic performance because students are more likely to take their studies seriously and strive for success. Additionally, the extra academic support and sense of community provided by a supportive school environment can help students overcome challenges and obstacles that may impede their academic progress (Darling-Hammond & Cook-Harvey, 2018).

Parental effort might be another critical group variable that affects students’ academic achievements, responding to studies that parents’ involvement and high expectations are incentives for academic improvement (Fang et al., 2018). Additionally, the demanding requirements set by parents may increase students’ learning performance. This result also can be explained as follows: more effective interaction between students and parents, as a critical part of high educational investments, leads to an increase in attention and improvement in support from family (Boonk et al., 2018). In addition, when parents convey their knowledge, attitudes, and disciplines toward learning, the student’s correspondingly improved performance will evoke or “demand” additional home instruction in a virtuous cycle (Soni & Kumari, 2017).

To investigate whether there are gender differences in the impact of effort levels on academic achievement, I conduct the same analysis on samples of male students and female students. Figures 9 and 10 present the results. For girls, in academic success, school effort is more significant than parental and individual effort. For boys, parental effort is the most important factor. This gender-based difference may explain the recent trend of higher academic achievement among girls than among boys, particularly in China, where parental investments are more likely to favor boys than girls in multi-children families (Ling, 2017). Conversely, schools and teachers are more likely to provide equal incentives and resources to both genders, creating a more level playing field for girls to excel academically (Tang & Horta, 2021; Verge, 2021).

Fig. 9
figure 9

Group variable importance assessed by random forest (by Gender)

Fig. 10
figure 10

Group variable importance assessed by lasso (by Gender)

Overall, this study provides valuable insights into the factors that impact academic achievement among junior high school students in China. The findings highlight the importance of group-level factors, specifically school effort, especially academic support, in predicting academic success. A supportive school environment that engages and motivates students, and a school community that values and supports education, can have a significant impact on academic outcomes. Moreover, the study reveals gender differences in the effects of effort levels on academic achievement, with school effort being the most substantial factor for girls and parental effort being the most substantial factor for boys. Thus, policymakers aiming to improve academic performance should focus on stimulating school efforts, which is a more feasible and practical goal than attempting to change the social context of families or the resources of school hierarchies. By emphasizing the importance of school effort and parental involvement, policymakers can create a more supportive and conducive learning environment for students, improving academic outcomes and opportunities for success.

Conclusion and implications

Predicting educational outcomes is crucial to policy implementation and social development. This study uses machine learning techniques to provide insights into how parental, school, and individual efforts might shape and aggravate educational inequalities. This study stresses the importance of effort as a direct predictor of student academic outcomes. It considers effort a vital driver of upward mobility in social and educational settings. If rewarding students’ effortful behaviors, such as increasing their determination, perseverance, and patience regarding learning, efforts can compensate for unbalanced educational resources gained from family backgrounds to sustain or upgrade the existing academic status and future social status. The core finding of this research also indicates that efforts from both parents and schools, whether they are analyzed through their distinct variables or perceived as two group variables, are identified as decisive factors in improving educational outcomes. Therefore, future social and educational inequalities studies must consider the potential for various efforts where distinct effects exist across socioeconomically heterogeneous groups. What might be faster and more reliable than waiting for their economic circumstances to improve is to encourage effort in groups with comparatively disadvantaged social or academic status.

This study emphasizes the critical role of school efforts in improving educational outcomes. By contrast, in China, the government has primarily focused on policies aimed at strengthening school resource-based effectiveness, such as the “Quality Education” initiative that invests in teacher training and educational materials (Wang et al., 2019) and the “Double Reduction” policy that transfers academic responsibilities to families (Eryong & Li, 2022). Schools should also prioritize implementing intramural motivational strategies to enhance students’ self-efficacy and susceptibility to incentives (Hong et al., 2017). This approach would address outcome and effort disparities by improving the quality of in-school education across all stages of schooling. Findings in Zhu (2019) and Fu (2020) support the effectiveness of this approach. Therefore, schools should carefully and thoughtfully design and implement motivational strategies to create a positive, supportive learning environment for their students.

On the basis of this research, I propose the following strategies to improve school efforts and students’ academic performance:

  1. (1)

    Increase the number of internal school programs, such as workshops, assemblies, and classroom discussions, emphasizing the link between school effort and student academic achievement.

  2. (2)

    Provide personalized instruction, extracurricular events, and one-on-one support from tutors or teachers for students in need.

  3. (3)

    Create a positive learning atmosphere that motivates students to participate in their studies and be responsible for their academic growth. This strategy can be accomplished by promoting an engaging and inclusive school context, providing opportunities for student leadership and collaboration, and recognizing and celebrating students’ achievements.

Furthermore, recognizing the potential limitations of a prediction task is critical, for example, correlation versus causation; thus, non-causal estimates based on statistical relationships between effort-related factors and students’ academic achievements may not reliably identify the variables’ underlying causal impacts. As a result, the findings of this study should be interpreted with caution and supplemented with other research methods, such as randomized controlled trials, to demonstrate causal correlations. Another possible limitation is that predictive models only forecast outcomes within the range of the data used to train them, resulting in erroneous extrapolations. Thus, additional data, including that from other sources, must be collected to provide a comprehensive picture of the predicted effort variables.