Introduction

According to the Education for All (EFA) report, knowledge stimulates the stock of human capital in an economy (Karoui et al., 2018; Kim et al., 2021) and increases the probability of resources being equally distributed of regardless of gender, caste, color, or region (Heb, 2020; de Bruin et al., 2020). Gender equality in education is indispensable for developing countries like Pakistan which holds rich human capital to improve economic growth (Asif et al., 2019). The existence of patriarchy, cultural norms, regional conflicts, son preference, and traditional notions of womanhood regarding procreation, domestic chores, and early marriage have deep roots in society (Ashraf, 2018). All the impediments that women face have interconnected bases in prevailing gender differences and insufficient investment in education (Kleven et al., 2019) at the household and state level; these also negatively impact the economic growth in Pakistan (ur Rahman et al., 2018).

Some educational initiatives are working effectively in Pakistan but have not completely achieved. These include alternative learning programs (ALPs) for formal schools, digital innovations programs by the collaborations between UNICEF and UNESCO targeting the attainment of Sustainable Development Goals (Ministry of Federal Education, 2022), an EU partnership to implement a 5-year development program (Education Ministry of Balochistan, 2021), the Ilm- PossibleFootnote 1 Project for Zero OOSC (out of school children), and equity-based critical learning (STEM, 2021). However, 22.84 million children of secondary school age have never enrolled in formal education (UNESCO Pakistan Country Strategic Document, 2018–2022). In addition, the literacy rate has declined from 62 to 58 % (World Bank Statistics, 2022) that has increased global inequality (Paris21 Strategy Agenda, 2030). This situation raises the question as to whether existing educational policies and projects are adequate for curbing the gender inequality in different provinces of Pakistan (see Fig. 1).

Fig. 1
figure 1

Literacy rate by province and gender in Pakistan. Source: Author construction based on data from PSLM Bureau of Statistics, Pakistan. Figure 2 displays the trend of per capita income from 2005 to 2019, one of the inevitable indicators of educational achievement. The statistics calculate a sharp drop in per capita income after 2010, which improved in 2012 but eventually declined after 2016

The country has been ranked 151 out of 153 countries by the Gender Parity Index. It has also been found that 21 % of boys and 32 % of girls in primary education have experienced gender-based discrimination (Human Rights Watch, 2018). Likewise, boys are 15 % more likely to have the opportunity to go to school than girls, as boys are viewed as financial assets by their parents. Evidently, if household income is equally distributed, girls perform outclass in grades (Yi et al., 2015), provide higher marginal returns to education (Whalley & Zhao, 2013), and achieve sustainable environment (Heb, 2020). The economic benefits that result from female education are as high as those that result from male education (Minasyan et al., 2019; Sen et al., 2019), particularly in relation to the achievement of tertiary education (Alfalih et al., 2021; Wu et al., 2020). In addition, although Pakistan has the largest young population in Asia, approximately 80 % of the female population has never participated in the labor market, and 130 million girls (those aged between 6 and 17 years) have never attended any form of educational institution (World Bank, 2020). Nevertheless, the latent demand for schooling remains associated with the socioeconomic status and purchasing power of the household (Asif et al., 2019). Likewise, parental and household treatment effects can formulate considerable gender gap that requires thorough investigation at micro level.

The aim of this study is to examine the relationship between gender differences in education and household income in Pakistan. Measuring gender differences with the help of microdata and through the use of qualitative and quantitative approaches is not easy in studies of human capital development (Najeeb, 2020). Nor is the investigation of the circumstances that lead to more investment in a male child than a female child a straightforward matter. Findings in this area remain inconclusive, which demonstrates a lack of research conducted at the household level in Pakistan (Minasyan, 2019). In addition, many studies of the effect of household income on education suffer from bias-related issues which arise as a result of measurement errors and spurious relationships. Some studies use corresponding variables, such as permanent income (Kingdon, 2005), or ignore endogeneity while controlling for children’s cognitive skills (Chevalier et al., 2002). Others deal with potential endogeneity by examining sector- or community-based union membership (Chevalier, 2013), government tax changes (Paul, 2002), and rented or owned lands with the caution of the weak instrument (Okabe, 2016).

This study determines education achievement using ordered logit and logit models by two outcome variables: education attainment (categorical variable) and current enrollment (binary variable). It seeks to examine the causes of the prevailing gender differences in Pakistan by examining the per capita income and socioeconomic characteristics of households. This study attempts to deal with underlying potential endogeneity using a novel approach for a non-linear model and examines extant inequalities and gender effects within households. This study finds a positive and robust relationship between gender and education attainment, and the significant transformation from primary- to tertiary-level education by per capita income of the household; this contradicts the results of Munshi (2017). The findings are significantly negative with regard to the relationship between gender and current enrollment, which is opposite to the findings of the study by Maitra (2003). After dealing with potential endogeneity using the two-stage residual inclusion (2SRI afterwards) method, the results contradict those of prior studies (Chevalier et al., 2002; Maitra, 2003), and they establish a clear link between education and income along with other socioeconomic characteristics. The findings show that inequalities in education, at the micro level, exert a more powerful impact on girls than boys in relation to reducing education attainment and current enrollment. Gender decomposition reveals that individual and community attributes favor boys’ education over that of girls.

This study contributes to the literature in the following ways. Firstly, there is a risk that the factors that influence education achievement remain mis-specified due to the fact that limited information is available about children’s environments and family structures. This is why it is vital to focus on the determinants of human capital at the micro level. Most existing studies focus on the role of education and the impact of gender inequalities in relation to their impact on economic growth across countries (Assoumou-Ella, 2019; Evans et al., 2021), within country at the macro level (Rammohan et al., 2018), and focus on only one education level (Lloyd et al., 2005). This study is the first to attempt to highlight the importance of the gender gap in relation to education attainment and current enrollment and confirm whether it exists or not. It does so by examining the link between per capita income and the socioeconomic characteristics of households using a repeated cross-sectional dataset that has not achieved much academic attention from scholars in relation to the country of Pakistan. Secondly, this study develops an empirical strategy for non-linear model to address the potential endogeneity by using 2SRI approach that remain ignored mostly. It exploits exogenous variations using income shocks, windfall income, and non-labor resources to examine the potential endogeneity between income and education (Banzragch et al., 2019; Chevalier et al., 2002). Lastly, while previous studies have argued that gender inequality influences economic growth (Kopnina, 2020), some of these studies contain troubling contradictions (Sirine, 2015), some do not find that gender inequality affects economic growth to a considerable degree (Maitra, 2003), and some investigate its unidirectional effect (Tansel & Bodur,2012). This study captures discriminations effect in education investment in boys and girls by education inequalities and gender decomposition estimated at household level. It also adopts alternative specifications of gender inequalities to examine economic returns on education.

The rest of this study is structured as follows. The “Literature Review” section explains the importance of gender equality with reference to previous studies. The “Methodology and Data” section describes the methodology and the data used in this study. The “Empirical Results and Discussion” section presents the results and analysis, and the “Conclusion and Policy Implications” section concludes the study while also discussing policy implications and the limitations of the study.

Literature Review

Education is an essential element of the Cobb–Douglas production function (Saleem et al., 2019) that can improve human capital, promote economic growth, and curb poverty in the long term (Arshed et al., 2019). Many countries have experienced improvements in enrollment rates; however, their economic growth appears difficult to achieve. This mechanism of human capital can be revisited and revised by focusing on the equal distribution of education in economic and sustainable approach (Livingstone, 2018). The study of Duflo et al. (2021) examines the impact of free secondary education on gains in economic welfare after the completion the target of UPE (universal primary education). They use data relating to secondary high schools from 54 districts in Ghana to examine 1500 students enrolled in a scholarship program. They find that the program increased secondary-level education attainment by 27 % and further resulted in better learning skills and lower rates of early marriage and reduced fertility rates among girls. This suggests a potential movement toward the more equal treatment of the genders within households. However, they did not find any significant influence of education attainment on future employment. Using the Barro-Lee dataset of education attainment, Evans et al. (2021) estimate the gender gap and its effects on long-term economic growth. Instead taking the gender gap ratio, it prefers to employ difference of the education attainment between men and women. Their findings indicate that low levels of education in women are the reason why the gender gap has become so pronounced in many countries. This gap is revealed to be highly correlated with the age of the women and per capita income.

The study by Kopnina (2020) discusses the sustainable educational goals that are indispensable for progressive universal education and economic growth. It reveals alternative measures that might influence the circular economy and argues that gender differences will decrease as a result of investment in female education. It endorses the use of the term “empowerment education,” and particularly to refer to females who remain unempowered with regard to their financial independence and social status. They propose the direct influence of female education on the food patterns, efficient consumption of household and natural resources, and renewable energy that can handle growing population in a sustainable approach. Likewise, the study of de Bruin et al. (2020) finds that education and income can promote sustainability and reduce gender inequality. They use age, education, and different types of work to analyze the gender-differentiated impact of these factors on economic change.

Another study, that of Rammohan et al. (2018), examines gender disparity in education using district-level data in India and ordinary least squares (OLS) regression. To do so, they use data related to the gender gap between male and female education attainment, GDP per capita, and ethnicity. Their study finds that those living in wealthier districts are more inclined toward educating their daughters than those living in poor ones. Sahoo and Klasen (2021) focused on female participation in the STEM streams by using the variables: female, siblings, age, parental education, test scores, household size, and ethnicity. They reveal that girls are 20 % less likely to enroll in STEM streams than boys. The plausible explanation for lower female participation is associated with parental preferences and income disparity in the household. Maitra (2003) uses a probit model and a censored probit model simultaneously and finds that there is no gender difference in the current enrollment rates of boys and girls (6–12 years) but that there is a higher gap in relation to grade attainment for girls (13–24). The data used is from the Matlab Health and Socio-Economic Survey (MHSS) of rural Bangladesh, which surveys 149 villages. The explanatory variables include religion, household size, number of siblings, the head of the household’s education level and occupation, a log of per adult household expenditure, and household characteristics such as the number of bedrooms, access to water and a toilet, and the availability of electricity. The endogeneity issue of the income has dealt by taking the residual term of the log of the adult expenditure in the household.

The study of Davis et al. (2019) uses the World Value Survey (1981–2014) to capture individual effects on women’s status. They argue that individual decision-making can increase women’s education attainment, their choice to bear a child, and advance economic sustainability such as urbanization and the provision of basic necessities. The above effects provide economic benefits that further support gender equality and discourage the traditional role of women in the society. Robb et al. (2012) examine the gender differences in education attainment using data about university graduates and an ordered probit model. They find that female students perform better than their male counterparts but that they are less likely to obtain a first-class degree. It is shown that factors such as the type of institution, individual abilities, and the choice of subjects are not the reason for gender inequality; however, the effects of these factors increase the gender gap in relation to educational performance. The predict probabilities of their study explain that the likelihood that female students will attain a first-class degree is 5 %, compared with 8 % for male students. Other studies also advocate that reducing gender differences in education achievement can have transitional and long-term effects on women’s empowerment (Kabeer, 2021), legal protection (Durrani et al., 2018), employment (Najeeb et al., 2020), and sustainable growth (Heb, 2020).

Prior Literature in the Context of Pakistan

In the context of Pakistan, Ashraf et al. (2018) apply Dickey–Fuller generalized least squares (DF-GLS) to examine the impact of secondary school attainment on gender inequality. They employ multiple sources of data about Pakistan, namely, economic surveys, the National Assembly of Pakistan database, and the Pakistan Social and Living Standards Measurement (PSLM) survey. They use the Gender Inequality Index (GII) as the dependent variable. The findings show that economic deprivation can decrease women’s participation in the labor force and their education attainment. Notably, the external or spillover effects of education attainment on gender inequality are also crucial to understanding the lower purchasing power of the household. Qureshi (2007) conducts a bivariate regression analysis using the Learning and Educational Achievements in Punjab Schools (LEAPS) dataset to investigate whether the education attainment of an older sister impacts on the education attainment of younger children in the household. Mainly, it describes a spillover effect in education that remains unnoticed to receive its maximum economic benefit. It takes into account age, the father’s education level, the household head’s education level, the number of children, the infrastructure of the household, the regional languages, and the number of the districts in the province. The findings reveal 0.2 % of years of schooling increases in youngers boys by the older educated sister that can be the potential human capital in the future labor market. However, their study fails to analyze the spillover effect that an educated older brother has on a younger sister.

The study of Asif et al. (2019) demonstrates that the strong and significant impact of investment in education without gender bias creates other avenues for sustainable growth in Pakistan. Likewise, some studies investigate education investment to explore other dimensions including welfare gains in relation to eradicating hunger (Ali et al., 2021), the awareness of climate change by energy consumption and recyclable goods (Ali et al., 2019), the transformation of society into one with equal rights and zero violence (Durrani et al., 2018), the female leadership in entrepreneurial decision-making (Shaheen and Ahmad, 2022), and the voluntary effort toward food security and patterns of daily life (Qazlbash et al., 2021). Mahmood et al. (2012) use time-series data (1971–2009) to investigate the relationship between human capital investment and economic growth. In their work, autoregressive distributed lag (ARDL) and OLS models show a positive relationship between high enrollment rates in education and economic growth rates in the short and long term.

A similar strategy is proposed by Zaman (2010), who also suggests that there is a correlation between female education and economic development. Interestingly, Lloyd et al. (2005) find that parents tend to prefer that girls and boys attend separate schools; however, availability of primary schools and type of school (public or private) also play key roles. The study of ur Rahman et al. (2018) finds that a solution to the vicious cycle of poverty comes in the form of increasing the education level of a household. By using logistic regression, they find a negative relationship between education and poverty in Pakistan. They emphasize the role of education plays in providing potential human capital for the labor market and even generating new and improved employment opportunities that result in better living standards and economic well-being. However, a key issue with regard to the aforementioned studies is that they do not propose well-specified econometric strategies with that can be implemented to tackle gender differences in education, while others fail to address the potential endogeneity in non-linear models and some remain unable to decompose gender effects within the household.

Methodology and Data

Data and Variables

This study uses repeated cross-sectional data from the PSLM survey conducted by the Pakistan Bureau of Statistics (PBS) of the Government of Pakistan for the seven fully available rounds from 2005 to 2019 (2005–2006, 2007–2008, 2010–2011, 2011–2012, 2013–2014, 2015–2016, and 2018–2019). The survey was designed to provide social and economic indicators at the provincial and district level; starting in 2004, the survey aims to accurately describe the country. The sample size of the PSLM surveys is approximately 80,000 households. The total number of observations after pooling the data is 1,011,849.

This study uses two models for alternative measurements of the education achievement of boys and girls; the first model is education attainment (the highest completed schooling; aged 9–24 years), and the second model is current enrollment (aged 5–24 years). The boys and girls are restricted in first model to the 9–15, 16–19, and 20–24 age groups for primary-, secondary-, and tertiary-level education attainment, respectively. The following criteria are considered: additional year for class repetition by the students, late admission into schools, the completion standards of the Pakistan education system, and traditional age requirements for entering in school adopted in past studies (Maitra et al., 2003). In addition, boys and girls are restricted to not having the status of head or working person. In the first model, education attainment is a categorical outcome variable examines by the ordered logit model that can be defined as:

$$Education\ attainment=\left\{\begin{array}{l}\; 0= No\ education\\ {}\; 1= Primary\ education\ \left( Grade\ 1-5\right) \\ {}\; 2= Secondary\ education\ \left( Grade\ 6-12\right)\\ {}\;3= Tertiary\ education\ \left( Grade\ 13-16\right)\end{array}\right.$$

In the second model, current enrollment is a dichotomous outcome variable examines by logit model that can be described as:

$$Current\ enrollment=\left\{\begin{array}{l}\; 1= Currently\ enrolled\ in\ school\ or\ other\ institutions\\ {}\; 0= Otherwise \end{array}\right.$$

The explanatory variables include dummy variables of the gender and age of boys and girls depending on models. Age is represented by a linear and quadratic term to control for birth cohort effects and capture non-linearity effects on education achievement. As age is directly proportional in contributing to cognitive skills and human capital, age square indicates marginal returns from age that decrease over time.

Other explanatory variables include the marital status of the household members (Kingdon, 2005). This study uses a series of dummy variables for the education level of individuals including the head of the household, parents, and other members of the household (i.e., those older than 24). This is because a joint-family structure is the majority form of family structure in Pakistan, and the head of the household is usually not the father but rather any elderly family member. Likewise, head’s personal treatment and decision-making influence on the education achievement. In addition, using parental education instead of maternal education is also feasible for gender difference analysis to avoid the issue of multicollinearity. Several tests are run to check for multicollinearity, including the variance inflation factor (VIF) and correlation matrix. The VIF for each predictor variable should be less than 10. It is 7.02 for the education attainment model and 4.12 for current enrollment model.Footnote 2 The siblings’ variable is used to control the reciprocal relationship between quantity and quality of education (Hazarika, 2001; Maitra, 2003). Occupational heterogeneity is controlled by different household members’ professions (McNabb, 2002) ranging from high-salaried (officer) to low-salaried (laborer) professions.

The variable of interest in this study is the per capita income of the household. It represents the household’s possible investment in education, which can maximize economic returns and minimize gender inequality. The availability of electricity, gas, and broadband internet access is a proxy for household infrastructure and technology advancement. The latter is of interest as it may impact on digital education, sustainable development, and the urgency of the micro- and macro-crisis such as health. The high demand to shift education from formal to virtual platforms during the COVID-19 pandemic has opened up new dimensions with regard to the acquisition of skills and knowledge. The other control variables consist of the dependency ratio, household size, ownership of house and any establishment other than agricultural land (ur Rahman et al., 2018), and ownership of the cultivating land for the personal use of the household (Sawada, 2009). Finally, community characteristics are controlled by including dummy variables for locations and number of the provinces of the country (Hazarika, 2001).

Empirical Strategy

The Model

The concept of the ordered logit model for education attainment is to incorporate intermediate continuous variable says y in the latent regression accompanied by the observed (xi) explanatory variables and the unobserved error term (εi). The range of y is divided in adjacent intervals that comprise four categories—namely, 0 = no education, 1 = primary education, 2 = secondary education, and 3 = tertiary education—related to latent variable (Y). The structural model for latent education is:

$${Y}_i^{\ast }={x}_i\beta +{\varepsilon}_i$$
(1)

where β is the vector of the parameters to be estimated; ε is the disturbance term, which is assumed to be independent across observations; and y can take value with observations.

For the discrete choices, the following are observing as:

$${Y}_i=0\kern0.5em if-\infty <{x}_i\beta +{\varepsilon}_i<{\tau}_0\kern6.25em for\kern0.5em \left( no\ education\right)$$
(2)
$${Y}_i=1\kern0.5em if\kern0.5em {\tau}_0<{x}_i\beta +{\varepsilon}_i<{\tau}_1\kern6.75em for\kern0.5em \left( primary\ education\right)$$
(3)
$${Y}_i=2\kern0.5em if\kern0.5em {\tau}_1<{x}_i\beta +{\varepsilon}_i<{\tau}_2\kern7em for\kern0.5em \left( secondary\ education\right)$$
(4)
$${Y}_i=3\kern0.5em if\kern0.5em {x}_i\beta +{\varepsilon}_i>{\tau}_2\kern9.5em for\kern0.5em \left( tertairy\ education\right)$$
(5)

Where Y is the category of education attainment, and τ denotes the threshold parameters, explaining the transition from one category of education attainment to another category. Consequently, τ must satisfy the rule according to τ0 < τ1 < τ2 < τ3, as the εi is logistically distributed. The resulting probabilities can be observed as:

$$P\left({Y}_i=0\right)=P\left({Y}_i^{\ast}\le {\tau}_0\right)$$
(6)
$$P\left({Y}_i=1\right)=P\left({Y}_i^{\ast}\le {\tau}_1\right)-P\left({Y}_i^{\ast}\le {\tau}_0\right)$$
(7)
$$P\left({Y}_i=2\right)=P\left({Y}_i^{\ast}\le {\tau}_2\right)-P\left({Y}_i^{\ast}\le {\tau}_1\right)$$
(8)
$$P\left({Y}_i=3\right)=P\left({\tau}_2\le {Y}_i^{\ast}\right)$$
(9)

Hence, the probability of outcome can imply as:

$$P\left({Y}_i=j\right)=F\left({\tau}_j-{x}_i\beta \right)-F\left({\tau}_{j-1}-{x}_i\beta \right)$$
(10)

Meanwhile,

$$F(.)=\frac{\mathit{\exp}(.)}{1+\mathit{\exp}(.)}\ demonstrates\ as\ P\left({Y}_i=j\right)=\frac{1}{1+{e}^{-{\tau}_j+{x}_i\beta }}-\frac{1}{1+{e}^{-{\tau}_{j-1}+{x}_i\beta }}$$
(11)

whereas the log likelihood function for ordered logistic regression is:

$$\sum\nolimits_{i=1}^J\sum\nolimits_{j=1} Ln\left[F\left({\tau}_j- x\beta \right)-F\left({\tau}_{j-1}- x\beta \right)\right]$$
(12)

The function formulates the ordered logit model with multiple equations, whereas each equation presents the logit model (Williams, 2005). The econometric model is therefore:

$$Education\ Achievement=f\left( PC\ Income, Individuals, HH, Provinces+{\varepsilon}_i\right)$$
(13)

Endogeneity Bias

The main econometric challenge is to identify the endogeneity problems. There is the possibility that variable per capita income is likely to be related to unobservable factors that affect education achievement in many ways not included in the regression. There may be errors in measuring per capita income that bias the results. In addition, a causal relationship may exist between income and education achievement. This relationship might also be influenced by parental economic circumstances, social status, and any spurious third factor such as personal preferences. Reverse causality occurs when the poor educational performance of the boy or girl might lower household income and vice versa. Therefore, the model may suffer from omitted variable bias and (reverse) causality issues. The literature also explores per capita income as an endogenous variable that has instrumented by parental and household characteristics including employment, education, and farming activities (Bratti, 2007; Hoogerheide, 2012). Other studies examine its causal relationship with income shock (Coelli, 2005), the difference in households’ incomes, rainfall, and climate change in relation to productivity concerns (Fichera et al., 2015).

In the first model of education attainment, income shock such as head unemployment and non-labor resources of grandparents in the household are used as instruments for per capita income (Behrman et al., 1997). If the head of the household is unemployed, this is unlikely to influence the total years of schooling undertaken by boys and girls when there is a joint family structure where the parents are responsible for meeting educational expenditures. Similarly, the permanent or non-labor income of the grandparents is an exogenous and strong instrument that does not directly affect the total years of schooling undertaken by boys and girls (Bratti, 2007). However, these instruments may affect the current enrollment of boys and girls, thus necessitating the exploration of other exogenous variables. Therefore, the potential endogeneity in the current enrollment model is captured by another set of exogenous variables; first, the difference in per capita income between households (included in the PSLM survey) and country, and second, windfall income. The difference in per capita income is a proxy for income shock that does not relate to agricultural goods but rather a retrospective analysis of households having or not having wages. This difference may represent the transitional effect of the financial condition of the household (Björkman-Nyqvist, 2013; Sawada, 2009) in developing countries such as Pakistan. Similarly, windfall income comprises mainly of the unearned income of the household or non-labor income that includes lottery wins, inheritances, gifts, unexpected charity payments, and irregular sources of income (Kingdon, 2005; Powdthavee et al., 2013), which are exogenous.

Another source of endogeneity might arise due to the relationship between education spending and current enrollment in the logit model regression. The literature provides instruments for education spending such as community-, labor-, or industry-union membership of the household’s head that in unavailable in PSLM dataset while some studies refer to the head’s occupations (Maitra, 2003). The estimation results after instrumenting education spending with the head of household’s occupation show that the null hypothesis of homogeneity is not rejected, as it has a p-value of 0.93. However, this study tries to control educational spending through the addition of dummy occupational variables, home ownership, and land cultivation (Maitra, 2003; Shea, 2000). Other individual and socioeconomic characteristics are considered as exogenous. The OLS regression (for instrument validation) and alternative approaches to capture potential endogeneity—such as the control function approach, two-stage least squares (2SLS) (ignoring the nature of the outcome variable), and the IV probit model (splitting the outcome variable into a binary variable where necessary)—are also examined.

2SRI

To apply the 2SRI method, the first step is to find exogenous variables; however, this method is different from the standard IV estimation method. The strategy behind choosing variables is that variables predict a possible definition of homogeneity. The argument behind this method (Terza, 2018) is based on the inappropriateness of the traditional linear instrumental variable estimator for the correction of the endogeneity problem. The core advantage of this method is that the estimated coefficients associated with the residuals from the first-stage regression significantly express the presence of endogeneity in the model (Huasman, 1978). In this method, the first stage consists of the OLS regression and predicts the endogenous variable by using the instruments and the rest of the explanatory variables. The second stage is estimated using the ordered logit model with the inclusion of the first-stage residuals. In the final stage, the whole program is set to be bootstrapped. The latent model will be established by splitting the explanatory variables into exogenous and endogenous variables, say Xex and Xen, and the equation becomes:

$${Y}_i^{\ast }={X}_{ex}^{\prime }{\beta}_{ex}+{X}_{en}^{\prime }{\beta}_{en}+{\varepsilon}_i$$
(14)

The first-stage equation of the 2SRI method is estimated for income using all the exogenous variables and instruments in the OLS regression. It takes the form as:

$${X}_{en}={X}_{ex}^{\prime }{\beta}_{ex}+ Z\gamma +{v}_i$$
(15)

where E(Xen, Z) ≠ 0 and E(εi, Z) = 0; β and γ are coefficient parameters; and vi and εi are error terms, respectively. The second stage of the 2SRI method estimates outcome variable using the residuals obtained from the first-stage equation taken as control variables along with other explanatory variables. The model is described as:

$${Y}_i^{\ast }={X}_{ex}^{\prime }{\beta}_{ex}+{X}_{en}^{\prime }{\beta}_{en}+\varphi \hat{v_i}+{\varepsilon}_i^{\ast }$$
(16)

This method is a simple test of endogeneity: if the residuals of the first stage are statistically significant, then the results will be biased in the first model, refer to control the endogeneity issue (Terza, 2018; Akarçay-Gürbüz & Polat, 2017).

Education and Inequality Parameters

The Gini coefficient for education, average years of schooling, and standard deviation are the inequality parameters that have been considered by observing the education system and structure of the country, the efficiency of learning performance, and variations in gender-specific education investment (Digdowiseiso, 2010; Thomas et al., 2001).Footnote 3 The consideration of these inequalities might help to reveal the socioeconomic and intrahousehold factors behind the different treatment for girls’ education. Therefore, the extended model can be described as:

$$Education\ Achievement=f\left( PC\ Income, Inequality, Individuals, HH, Provinces+{\varepsilon}_i\right)$$
(17)

The gender decomposition examines while using the basic models of each specification by the mean, the coefficient (Kingdom, 2005), and the interactions of the boy dummy variable (Maitra, 2003). Furthermore, the results are decomposed for gender effects by variant type Oaxaca decomposition (Dong et al., 2009; Golsteyn et al., 2014; Pal, 2004). This approach is generally used to examine the gender effects related to economic returns and the wage gap (Oaxaca, 1973). In this study, however, the standard approach has been modified to examine the gender effects related to education achievement. The probability of education attainment determines, say AT, separately for girls and boys with other characteristics, say Xg and Xb, respectively. Assuming \(\Pr \left( AT,{X}_i,{\theta}_i^{\ast}\right)\) is the expected probability of AT and \({\theta}_i^{\ast }\) is the vector if the maximum likelihood estimates of the parameters of the ordered logit model for i = g, b for girls and boys, respectively, the expected AT for any individual would be:

$${AT}_g^{\ast }=\sum\nolimits_{j=0}^3\Pr \left({AT}_j|\ {X}_g,{\theta}_g^{\ast}\right)$$
(18)
$${AT}_b^{\ast }=\sum_{j=0}^3\Pr \left({AT}_j|\ {X}_b,{\theta}_b^{\ast}\right)$$
(19)

Using expected education attainment for the boys’ and girls’ samples, respectively, one can decompose the boy-girl differential in alternative ways as follows:

$${\displaystyle \begin{array}{c}{AT}_b^{\ast }-{AT}_g^{\ast }=\sum\limits_{j=0}^3\left[ \Pr \left({AT}_j |{X}_b,{\theta}_b^{\ast}\right)-\Pr \left({AT}_j|\ {X}_g,{\theta}_b^{\ast}\right)\right]+\sum\limits_{j=0}^3\left[\Pr \left({AT}_j|{X}_g,{\theta}_b^{\ast}\right)-\Pr \left({AT}_j|{X}_g,{\theta}_g^{\ast}\right)\right]\\ {}= Explained\ Variation+ Unexplained\ Variation\end{array}}$$
(20)
$${\displaystyle \begin{array}{c}{AT}_b^{\ast }-{AT}_g^{\ast }=\sum\limits_{j=0}^3 \left[ \Pr \left({AT}_j |{X}_b,{\theta}_g^{\ast} \right)-\Pr \left({AT}_j | {X}_g,{\theta}_g^{\ast} \right) \right]+\sum\limits_{j=0}^3 \left[ \Pr \left( {AT}_j | {X}_b,{\theta}_b^{\ast} \right) -\Pr \left( {AT}_j | {X}_b,{\theta}_g^{\ast} \right) \right] \\ {}= Explained\ Variation+ Unexplained\ Variation\end{array}}$$
(21)

In brief, the explained variation is attributable to the different characteristics of boy-girl, while the unexplained variation is attributable to the different treatment of boys and girls in the household. This is achieved by allowing the parameters to vary while the characteristics are held constant. A similar approach was adopted for current enrollment as well.

The alternative specifications explore the impact of gender inequalities in education achievement on the household income using OLS regression. This study uses three different measurements of gender difference. Considering education attainment, the first indicator—the gender gapFootnote 4—is calculated as the difference in illiteracy rates between girls and boys (Cooray, 2011). The second indicator, gender difference,Footnote 5 measures the difference in education attainment between girls and boys (Baliamoune–Lutz & McGillivray, 2015), while the final indicator—the gender gap ratioFootnote 6—is constructed based on the difference between boys’ and girls’ education attainment (Digdowiseiso, 2010). Similar inequalities in current enrollment for boys and girls (5–24) are also estimated. The alternative specification estimates in the linear regression model are defined as:

$$PC\ Income=f\left( Gender\ Differences, Individuals, HH, provinces+{\varepsilon}_i\right)$$
(22)

Furthermore, the robustness tests for education achievement are examined using several other specifications including ordered probit and probit models, another explanatory variable—per capita expenditure of the household, and provincial heterogeneity.

Descriptive Statistics

The detailed descriptive statistics of the selected variables are exhibited in Table 1. On average, 10 % of boys and girls attain a primary level of education, and 2.1 % attain a tertiary level of education. On average, the variable gender signifies 49 % girls in first model of education attainment and 48.8 % in second model of current enrollment. On average, 38.9 % of boys and girls are currently enrolled in education, and per capita income (in the log) is 8.8 (see Fig. 2). Overall, the age of the household has a nonlinear effect; as with the increase of age of the household’ members, there is decrease in the education level (see Fig. 3). Meanwhile, this study uses the age of the boys and girls, according to the models’ criteria. The mean age in the first model is 15.95 years whereas it is 13.59 in the second model. This study observes a higher ratio of low-salaried occupations (for example, machine operators); thus, the dependency ratio is also higher at 41.6 %. A total of 44.5 % of the population lives in urban areas, 80.6 % receive electricity, and 37.7 % have access to gas supplies. Among other provinces, the highest population locates in the province of Punjab.

Table 1 Description and summary statistics of selected variables
Fig. 2
figure 2

Household’s income in Pakistan. Source: Author construction based on data from PSLM Bureau of Statistics, Pakistan. Figure 2 displays the trend of per capita income from 2005 to 2019, one of the inevitable indicators of educational achievement. The statistics calculate a sharp drop in per capita income after 2010, which improved in 2012 but eventually declined after 2016

Fig. 3
figure 3

Education attainment by age (2005–2019). Source: Author construction based on data from PSLM Bureau of Statistics, Pakistan. Figure 3 expresses the predictive margins between the age of the persons living in the household and their education levels. The probability of primary education attainment decreases after 25 years of age, whereas it is the opposite for the tertiary level. Meanwhile, with the increase in age, it is more likely to achieve secondary education

Empirical Results and Discussion

Determining Education Attainment and Current Enrollment Levels

Table 2 describes the average marginal effects of the ordered logit model for primary-, secondary-, tertiary-level, and no education attainment with the help of household income per capita and various socioeconomic characteristics. In the full sample models, variable gender—girl, increases primary-, secondary-, and tertiary-level education attainment by 0.4, 0.5, and 0.2 percentage points, respectively, at the 1 % significance level; this contradicts the findings of Munshi (2017). Per capita income, on average, increases the likelihood of primary-, secondary-, and tertiary-level education attainment by 0.1, 0.2, and 0.1 percentage points, respectively. The effect of age is more likely to increase secondary- and tertiary- education attainment. As findings reveal that the transitional effect of education attainment is progressive from primary level to secondary level, however, it does not appear with same proportion from secondary level to tertiary level. The impact of the age and squared-age has non-linear effect that can be justified in two manners. Firstly, with the increase in age, the proportion of transition of the education attainment levels decreases. Secondly, there is a negative relationship between the term squared age and education attainment.

Table 2 Average marginal effects for education attainment: ordered logit model regression

Meanwhile, the presence of an educated head of household significantly improves primary-, secondary-, and tertiary-level education attainment—by 5.0, 10.4, and 4.1 percentage points, respectively. Other household members are likely to increase secondary- and tertiary-level education attainment by 20.9 and 11.5 percentage points, respectively. The results show higher marginal effects for education attainment by technicians (low-salaried) compare to managers (high-salaried), indicating that lower occupations have strong inspiration to maximize the human resources capital of the household. In addition, the availability of electricity, internet access, and access to a gas supply are highly likely to enhance education attainment. On average, living in urban area has the likelihood to impact primary-, secondary-, and tertiary education attainment by 0.2, 0.3, and 0.1 percentage points, respectively.

From models 5 to 8, for girls, it can be seen that per capita income significantly increases each level of education attainment. However, it increases secondary-level education attainment more than other levels, by 0.2 percentage points. Age has a significant and nonlinear effect. The variable married is likely to decrease the probability of education attainment by 1.6 and 5.0 percentage points at the tertiary and secondary levels of education, respectively. Interesting, parental education has a positive influence, but it is only significant at the secondary education attainment with 23.3 percentage points. In addition, the presence of an educated head of household and other members also provides a positive and significant effect. On analyzing different occupations, the results indicate a 19.6, 23.7, 8.9, and 1.5 percentage point increase in tertiary education attainment by clerks, officers, managers, and machine operators. The household size shows an inverse relationship with girls’ education attainment, particularly at the secondary level. The household infrastructure provides positive effect on girls’ education attainment. It may exhibit that sustainable consumption of household resources including electricity and gas can exert female education that can promote gender equity and economic returns.

From models 9 to 12, for education attainment, it can be seen that the impact of the per capita income of the household is comparatively equal for boys and girls. The household income is likely to increase secondary—and tertiary—education attainment in boys by 0.2 and 0.1 percentage points, respectively. Parental education is highly unlikely to increase the probability of education attainment. The presence of an educated head of household increases education attainment by 4.9, 9.5, and 3.2 percentage points at the primary, secondary, and tertiary levels, respectively. Similarly, the presence of household members with numeracy skills and secondary education is likely to increase secondary-level education attainment by 10.9 and 22.2 percentage points, respectively. This study observes a strong impact of occupational heterogeneity on education attainment; officers and clerks significantly improve the primary- and secondary-level education attainment in boys. The clerks are highly likely to increase tertiary-level education attainment, by 15.4 percentage points. Compared to Punjab, provinces such as KPK and Balochistan are less likely to increase primary, secondary, and tertiary education attainment.

Table 3 describes the average marginal effects from the current enrollment models with the help of logit model regression.

Table 3 Average marginal effects for current enrollment: logit model regression

In the full model, the estimate of the variable girl is highly significant and negative—an opposite finding to that of past studies (Maitra, 2003)—and likely to decrease the probability of current enrollment in education by 0.8 percentage points. A unit increase in income per capita is more likely to improve the current enrollment rates for girls than it is for boys; an increase of 0.4 percentage points is observed for girls. Age has a nonlinear effect with its squared term; thus, current enrollment rates decrease with age. Additionally, variable married decreases the probability of current enrollment in education in girls by 15.6 percentage points. Current enrollment increases for boys if there are educated household members; however, this is not the case for certain professions such as clerks and machine operators.

Other indicators associated with physical capital such as ownership of establishment or land are negatively related to current enrollment rates. This indicates that education is not the primary objective among landowners, as they do not worry about employment. The educational transition from primary to higher grades is less valuable than monetary assets, and most people are reluctant to leave their ancestral profession if it is associated with land cultivation. Household infrastructure is likely to benefit girls more than boys, however when we examine the influence of living in an urban location, which is highly likely to increase enrollment rates in education for boys. The dependency ratio provides higher marginal effects for current enrollment in boys, which further supports the objective of this study. The majority of the households in Pakistan support male earners who are likely to bear all the expenditures. Therefore, the parents prefer to invest in boys’ education for potential job opportunities and financial support in the long run. Results from siblings shows a positive relation to current enrollment and reveal higher quantity-to-quality trade-offs particularly among girls. The results show a higher marginal effect in KPK province; this might be due to the new framework of free and accessible education that has been in place since 2013 (KPK Government Statistics, 2021).

Dealing with Endogeneity Bias

Table 4 shows the results of the average marginal effects using the ordered logit model regression/2SRI approach after dealing with endogeneity. In the full sample, the per capita income of the household is likely to increase education attainment at each level by a higher ratio compared to the aforementioned results. There is a drastic increase in primary-level education attainment: 11.2 percentage points. Likewise, secondary- and tertiary-level education attainment increase by 15.9 and 4.9 percentage points, respectively. Even the variable gender is almost two times higher than the previous results for secondary-level education attainment. Other indicators that illustrate higher marginal effects are educated head of household, household size, and infrastructure. The results find a positive relationship between education and urbanization by introducing income shock of head unemployment and non-labor resources. It retrieves two strong arguments; first, the income shock is likely to increase potential human mobilization for confronting household economic burden. The second, non-labor resources exert positive impact on population by increasing non-market activities, as time allocation shifts from work to leisure.

Table 4 Average marginal effects for education attainment by IV approach: 2SRI/ordered logit model regression

From models 5 to 8, for girls, the results are significant but with higher marginal effects than the full sample. A sharp increase in secondary-level education attainment is caused by household income: an increase of 10.8 percentage points. Results find negative relationship between married persons and education attainment of the girls, especially at primary level. It might be possible that married persons are quite young in age, particularly women, without having any education awareness and sufficient resources. These results may indicate the need of awareness programs in the household to encourage women education and discourage early-age marriages. On the other hand, a significant decrease in household size supports an increase in primary-level education attainment.

There is a higher impact of per capita income on boys’ education attainment than girls, indicating household’s preferences. The per capita income of the household is likely to increase primary-, secondary- and tertiary-level education attainment by 13.0, 17.9, and 4.9 percentage points, respectively. The presence of an educated head of household has a strong and positive effect on boys’ education attainment; however, it is the opposite for girls’ primary-level education attainment. The results show that intermediate internet access is more effective for girls than boys. Meanwhile, household size also impacts quite positively on boys’ education attainment as they are potential lone bread earners for their families. Living in an urban location results the potential career for boys, thus revealing a positive correlation with education attainment.

The average marginal effects are shown in Table 5 for current enrollment after dealing with potential endogeneity. Per capita income is four times more likely to increase the likelihood of current enrollment in the full sample than the results reported in the “Determining Education Attainment and Current Enrollment Levels” section. Its impact is 4.4 percentage points for boys and girls. The variable girl reduces the probability of current enrollment by 0.3 percentage points. The results find a significant effect of parental education on boys, thus revealing a gender bias in investment in education. Similar results are reported for the impact of educated members of the household and the occupations of those living in the household. The other results describe a wider gap in current enrollment in Sindh and Balochistan, where girls are highly unlikely to enroll in any kind of educational institution.

Table 5 Average marginal effects for current enrollment by IV approach: 2SRI/logit model regression

Estimations of Education Attainment and Current Enrollment by Inequalities

Table 6 illustrates the average marginal effects by incorporating different educational inequalities such as the Gini coefficient, years of schooling (on average), and standard deviation for education attainment by ordered logit model, as shown in panels A, B, and C. For this moment, only results with educational inequalities have been provided. Full results can be provided on demand. In girls’ sample, by examining panel A, we can see that the Gini coefficient is highly significant and indicates a sharp decrease in tertiary- and secondary-level education attainment, by 0.6 and 1.6 percentage points, respectively. Furthermore, in panel B, the average years of schooling have positive relationship with secondary- and tertiary-level education attainment. In panel C, the estimates explain that the standard deviation inequality decreases secondary- and tertiary-level education attainment by 0.1 percentage points, respectively. For boys’ sample, in panel A, the results show that the Gini coefficient decreases the secondary- and tertiary-level education attainment of boys; however, the marginal effects are slightly higher compared to those for girls. In panel B of average years of education, there is an equal improvement in secondary- and tertiary-level education attainment of boys; however, no significant effect is found in panel C.

Table 6 Average marginal effects for education attainment with education inequalities: ordered logit model regression

The relationship between current enrollment and educational inequalities is shown in Table 7. In panel A, the results indicate that educational inequalities impact both boys and girls. However, examining the marginal effects by gender, the Gini coefficient is found to be higher for boys. In panel B, the average years of schooling of currently enrolled boys and girls are higher for girls by 0.6 percentage points. This indicates that girls are almost 0.4 times more likely to enroll in school. There is an insignificant impact of standard deviation on boys’ current enrollment; however, it is the opposite for girls. A unit increase in standard deviation decreases the probability of girls’ current enrollment by 0.2 percentage points.

Table 7 Average marginal effects for current enrollment with education inequalities: logit model regression

Explaining the Gender Gap and its Decomposition

Table 8 provides mean statistics and differences in the coefficients in relation to education attainment.

Table 8 Gender differences of selected variables: mean, coefficient, and interaction estimations

In panel A, most of the household characteristics favor girls; these include personal attributes such as age and infrastructure while per capita income, educated members, head, and urbanization provide higher mean probabilities for boys ’education attainment. The difference between boys and girls is shown in the last column by interacting the boy dummy variable with each explanatory variable as an additional regressor in the basic model of the full sample using ordered logit model regression. The estimates find favorable values for girls’ education attainment in relation to the education level of her parents and the head of the household and household characteristics. Panel B provides mean statistics and coefficient differences for current enrollment. The personal attributes such as age, married, and infrastructure have the higher mean probabilities for girls’ education attainment. The last column displays the differences between boys and girls and shows that educated head, urbanization, and provinces favor boys.

Table 9 presents the gender differences in education attainment and current enrollment by predicted probabilities using variant type Oaxaca decomposition by incorporating four scenarios. Such as (i) girls using estimated parameters obtained from girls’ equation, (ii) girls using estimated parameters obtained from boys’ equation, (iii) boys using estimated parameters obtained from boys’ equation, and (iv) boys using estimated parameters obtained from girls’ equation (Pal, 2004). Comparatively, boys are having approximately two times lower corresponding probabilities using girls’ parameters. Conversely, the probability of girls’ education attainment increases almost two times higher using boys’ parameters. A similar proportion of increase observes in girls’ current enrollment using boys’ parameters. While two times lower probabilities observe for boys’ current enrollment using girls’ parameters. The estimates of difference are presented with the boys’ reference. In the end, explained and unexplained variations of gender difference are estimated. While explained variation in education attainment and current enrollment are −142.8 and 41.4 %, respectively (Dong et al., 2009). The unexplained variation, generally considers as discrimination, has higher values in both models and highlight the different treatment between boys and girls in the household. However, this study presents such variation as gender differences that may be due to unobservable factors and imperfectly observable attributes.

Table 9 Gender decomposition by predicted probabilities

Alternative Specification and Robustness Tests

In Tables 10 and 11, the estimates are presented for education attainment and current enrollment using other models such as ordered probit and probit models (McNabb et al., 2002), and other variables such as per capita expenditure and permanent income (non-labor assets) are included.

Table 10 Average marginal effects for education attainment: ordered probit model, per capita expenditure, and permanent income
Table 11 Average marginal effects for current enrollment: ordered probit model, per capita expenditure, and permanent income

In both models, the results are highly significant and provide additional evidence to support the previous estimations. The variable girl is more likely to increase education attainment at the secondary level. The unit increase in income per capita is marginally higher in the probit model regression. The findings show that per capita expenditure is likely to positively impact on girls’ education, particularly in relation to secondary-level education attainment. Considering the robust test by incorporating the permanent income of the household, the variable gender is positively significant with education attainment. A unit increase in permanent income raises primary- and secondary-level education attainment more in boys. In addition, there is sharp increase in boys’ current enrollment with a unit increase in permanent income. Other robustness tests, including provincial heterogeneity, the control function approach, IV probit, 2SLS, and the determination of education attainment and current enrollment for boys and girls from a different age group (13–24), are available on request.

Table 12 presents results for alternative specification where per capita income is the dependent variable and gender inequalities (education attainment and current enrollment) as interested variables. This specification can also be interpreted as the future earning potential of girls and boys. Considering education attainment, in panel A, the gender gap due to illiteracy decreases income by approximately 11.3 % more in girls compared to boys. In panel B and C, gender difference is likely to decrease income by 3.2 and 1.2 % in girls. Moving toward current enrollment, in panels A, B, and C, each gender inequality reduces the household income comparatively higher among girls than boys by 7.1, 3.0, and 1.7 %.

Table 12 Relationship between gender differences in education and income: alternative specification/ ordinary least square regression

Conclusion and Policy Implications

Despite having the potential for human resource capital, Pakistan struggles with extreme poverty, socioeconomic disparity, and gender inequality at the grass-root level (Ali et., 2021; Asif et al., 2019). To address these undeniable issues, it becomes crucial to comprehend the significance of the equal distribution of household resources in education regardless of gender that builds a sustainable economic structure toward global equality (Kopnina, 2020). This study aims to examine education achievement and underlying gender differences using two models: education attainment and current enrollment. The findings highlight the importance of the relationship between education and income along with other household characteristics. This study deals with potential endogeneity by using the 2SRI approach and examines gender and educational inequalities at the micro level.

The findings demonstrate that household income has a significant and positive impact on education attainment and the current enrollment of boys and girls. The education attainment transition from primary to tertiary-level is successful that supports the past studies (Duflo et al., 2021; Wu et al., 2020). However, the transition from primary to secondary education is higher than that from secondary to tertiary education attainment. The community and individuals’ attributes support education investment in boys indicating household and socioeconomic preferences. Girls can improve their education with the availability of personal and household attributes (Yi et al., 2015). Other findings from education attainment and current enrollment models predict a demographic framework that encourages a sustainable environment with a decline in household size and dependency ratio (Heb, 2020; Asif, 2019; Fichera et al., 2015). These findings contradict those of past studies (Munshi, 2017) and establish a link between temporary residents (daughters) and different occupations of the households, whereby lower-salaried households and deprived areas can significantly improve female education attainment and current enrollment.

The findings show that there is a negative relationship between the Gini coefficient and education attainment and that this gap is wider at secondary and tertiary education levels, thus supporting the results of the basic model. The standard deviation of educational inequalities is higher for girls that further confirms the existence of gender differences in education. Likewise, the findings from the alternative specifications provide decrease in potential economic returns on education by gender inequalities. The findings support those of Pfeffer et al. (2018) with regard to discouraging wealth accumulation in terms of physical capital and increasing investment in female education (Kopnina, 2020). It can effectively transform the developing society of Pakistan by framing public policies for women’s empowerment (United Nations Education, 2030), gender equality (Arshed et al., 2019), poverty alleviation (ur Rahman et al., 2018), and sustainable development (Sen, 2019). Therefore, this study identifies some valuable recommendations for policymakers wishing to promote gender equality:

  • Implement cooperative projects created by federal and local governments that supply free, digital, and up-to-date education in schools, colleges, and universities to improve transition levels, with a particular focus on poor infrastructure, highly deprived regions, and mobility restricted areas.

  • Adopt targeted policies to minimize education and gender gaps between those enrolled and not enrolled in education by supporting low-income households through the allocation of funds, scholarships, and incentives.

  • Reform educational strategies to provide cost effective education in collaboration with parents, teachers, and schools with the aim of creating advanced and scientific curricula aligned with sustainable development goals.

  • Craft awareness campaigns to eradicate gender-specific investment in education, encourage talented females to enter tertiary-level education in particular, and address socioeconomic challenges by establishing reliable and organized educational committees in each province.

Finally, some potential limitations should be noted, as these might open up new horizons for future research. Quantitative research should be conducted to examine other household characteristics and upcoming survey rounds than those discussed in this study.