1 Introduction

With the progress of information and communication technology (ICT) since the 1970s, Internet usage has expanded worldwide (OECD 2018). The gender digital gap in Internet access rose in developed countries in the early stages of ICT development (Bimber 2000; DiMaggio et al. 2001; Fatehkia et al. 2018) but reduced with the increasing diffusion of digital technologies (Haight et al. 2014; Ono and Zavodny 2007; Rice and Katz 2003). Women in developing countries have a significantly lower likelihood of Internet access than men, and this gender disparity in Internet use can enlarge the overall socio-economic gender gap (Alozie and Akpan-Obong 2017; Broadband Commission 2013; Hafkin and Huyer 2007; OECD 2018). Although research has established that Internet use can affect wages (Krueger 1993; Liu et al. 2021; Miller and Mulvey 1997; Pabilonia and Zoghi 2005, etc.) and employment (Alam and Mamun 2017; Atasoy 2013; Deyyling 2017; Mao and Zeng 2017, etc.), empirical studies on the impact of Internet use on the gender wage gap are scarce. This study attempts to bridge this gap by providing new evidence from China—a country that has seen rapid Internet diffusion and gender wage gap growth in the last two decades.

The China’s gender disparity in Internet use can be highlighted through the Statistical Report on the Development of the Internet in China No. 45 (CNNIC 2020), which reveals that the number of Internet users in China reached 904 million in April 2020, of which 48.1% were women (30.4% in 2000).Footnote 1 The statistics suggests the existence of a gender disparity in Internet access in China.

Additionally, the gender wage gap in China has expanded since the 1980s (Gustafsson and Li 2000; Iwasaki and Ma 2020; Ma 2018). Several empirical studies reveal that the main determinants are gender differences in human capital and workplace discrimination against women (Gustafsson and Li 2000; Li and Yang 2010; Li and Yang 2010; Ma et al. 2013). Others indicate that occupational segregation and industry/ownership sector segmentation also contribute to the creation of the gender wage gap (Ge 2007; Li and Ma 2006; Liu et al. 2000; Ma 2018; Meng and Zhang 2001; Wang and Cai 2008; Wang 2005). However, empirical studies on the effects of Internet use on the gender wage gap are scarce (Qi and Liu 2020; Wu 2021; Zhuang et al. 2016) and limited to cross-sectional data analyses, which can result in endogeneity problems.

This study contributes to the literature in four ways: First, in contrast to earlier studies (Qi and Liu 2020; Wu 2021; Zhuang et al. 2016), we examine the influence of Internet use on the gender wage gap in China using the panel data analysis method, such as the fixed effects (FE) model, random effects (RE) model, and the lagged variable (LV) model to address endogeneity problems. This study is the first to investigate the association between Internet use and the gender wage gap in China based on the panel data analysis method. Second, besides investigating the extensive margin effect of Internet use (whether used Internet), we also ascertain the intensive margin effects (the frequency of using the Internet) for different purposes (work and study, communication, leisure) on the gender wage gap, which has been not considered in previous studies. Third, our study is the first to decompose the determinants of the gender wage gap into two components—the gender disparity in Internet access (e.g., gender disparity in the proportion of workers using the Internet) and gender difference in return to Internet use (e.g., the gender difference in the rising range of wage level in terms of Internet use)—based on the IV method. Since policy implications differ across these two components, it would be meaningful to investigate how these two components contribute to the effects of Internet use on the gender wage gap. For example, if the gender disparity in Internet access is found to be the main contributor, a policy promoting Internet accessibility among women is expected to reduce the wage gap. In contrast, if the main contributor is the gender difference in the return to Internet use, reducing the discrimination against women in the workplace may reduce the wage gap. Fourth, although Eastin et al. (2015), OECD (2018), Riggins and Dewan (2005), and Scheerder et al. (2017) have argued the Internet use divisions in education, age, and gender groups, they have not investigated the effect of Internet use on gender wage gap among education and age cohort groups. The gender differences in Internet access and return to Internet use may differ among different education and age cohort groups; therefore, the effects of Internet use on the gender wage gap may differ per these groups. This study compares the effects of Internet use among educational attainment and age cohort groups to provide more evidence to understand the digital division issues in depth. Additionally, we use the three latest wave national longitudinal survey data of 2014–2018, which can provide new information.

The study resulted in four main findings. First, when addressing endogeneity problems, the return to Internet use is higher for men than for women, unlike the results using the ordinary least squares (OLS) model, which indicate the opposite. In this case, the individual heterogeneity problem considerably affects the results, thus suggesting a bias in earlier studies. Second, the results indicate that the gender disparities in return to Internet use are higher in the low-education and older age cohort than in others. Third, both the components—the gender disparity in Internet access and the gender difference in return to Internet use—widen the gender wage gap, with the gender difference in return to Internet use having a higher impact. Fourth, the effects of both components on the gender wage gap vary with the educational attainment and age cohorts.

The remainder of this paper is organized as follows. Section 2 discusses the channels whereby Internet use influences wages and summarizes the empirical literature on the issue. Section 3 introduces the data and the methodology. Section 4 presents and discusses the empirical results, and Sect. 5 concludes the study.

2 Literature review

2.1 Channels of influence of Internet use on the gender wage gap

Several economic theories can explain the gender wage gap. First, gender differences in the endowment of factors such as human capital can contribute to a gender wage gap. Based on the human capital theory (Becker 1964; Mincer 1974), the individual wage level in a perfectly competitive labor market is determined by the workers’ labor productivity. Labor productivity is related to a worker’s human capital (e.g., education and years of work experience). When men have higher educational attainment than women, they may earn a higher wage.

Second, according to the employer discrimination hypothesis (Becker 1957), whenever employers, customers, and colleagues discriminate against women, it can create a gender wage gap. During the economic transition period (after 1978), with the enforcement of the Opening-up Policy, the private sector (i.e., the privately-owned enterprises, the Township and Village Enterprises,Footnote 2 etc.) has developed since the 1990s, and the proportion of workers in the private sector to total workers in urban areas has increased from 27.2% in 1985 to 67.4% in 2020 (NBS 2021).Footnote 3 In the planned economy period (1949–1977), women’s employment promotion policies, such as the equality employment policy, were enforced in the public sector (e.g., the state-owned enterprises: SOEs; the government organizations) by the Chinese government, which led to less discrimination against women (Gustafsson and Li 2000; Ma 2018, 2021a, b). With the progress of SOE reforms, most SOEs have changed to privately-owned enterprises. Although the influences of the equality employment policies are still greater in the SOEs even in the current period, when discrimination against women persists in the private sector, a gender wage gap can arise. Iwasaki and Ma (2020) report that the gender wage gap is higher in privately-owned enterprises than in SOEs, and the gender wage gap expanded during the economic transition period from the 1990s to the 2010s, suggesting that the discrimination against women has become severe with the progress of the market-oriented economic reform in China.

Third, the statistical discrimination hypothesis (Arrow 1973; Phelps 1972) suggests that because of information asymmetry, employers must make employment and wage decisions for employees, both men, and women, based on the average values of some unobservable factors (i.e., work effort, probability of turnover, etc.). If the employer perceives that the probability of doing housework (i.e., childcare, geriatric care, domestic cleaning, cooking, etc.) is higher for women than for men, they may set a lower wage level for women, thus generating a gender wage gap.

In China, factors such as the deregulation of the one-child policy, lack of formal public childcare institutions (e.g., kindergarten), population aging, and insufficiency of institutional care for the elderly have increased the responsibilities of childcare and geriatric care for women (Connelly et al. 2018; Cook and Dong 2011; Ma 2021a, b). Family care has decreased the female labor force participation and reduced women’s work efforts in China (Chen and Fan 2016; Chen et al. 2016; Chen 2019; Connelly et al. 2018; Ma 2021a, b). Therefore, statistical discrimination may contribute to widening the gender wage gap in China.

Fourth, the crowding hypothesis (Bergmann 1974) stated that there remains gender occupational segregation in the labor market, which means women concentrate on female-dominated occupations (e.g., staff, service job), whereas men on male-dominated (e.g., manager, technician). When the wage levels of male-dominated occupations are higher than those of female-dominated occupations, the gender wage gap arises. The evidence from the empirical studies for the developed countries has supported the crowding hypothesis (Brown et al. 1980; Kidd and Shannon 1996; Kidd 1993; Miller 1987). Meng (1998) and Li and Ma (2006) also reported that in China, there exists gender occupational segregation, which contributes to the formation of the gender wage gap.

Finally, the monopsony power hypothesis suggests that imperfect competition may lead to a gender wage gap. When a firm has the monopsony power in the labor market and sets a lower wage level for women, the gender wage gap arises (Hirsch 2010). Vick (2017) reports that the monopsony power hypothesis is supported in Brazil.

Based on these theories or hypotheses, in this study, we considered that Internet use could lead to the formation of gender wage gap in China through two channels: (i) the explained component (e.g., gender disparity in Internet access), and (ii) the unexplained component—workplace discrimination against women (e.g., gender difference in return to Internet use as reflected in the wages).

Regarding the first component–the effect of gender disparity in Internet access on the gender wage gap, (i) based on human capital theory, the wage level in a perfectly competitive labor market is determined by a worker’s labor productivity; the labor productivity is related to a worker’s human capital. Internet use can be considered an element of human capital: Internet users are usually considered to be those with higher skills (or higher productivity) than non-users. If the percentage of Internet users in the women’s group is lower than that in the men’s, the gender wage gap arises. (ii) According to the crowding hypothesis, men may occupy male-dominated occupations (e.g., manager, technician), which have a higher likelihood of using the Internet for work, which may lead to the gender divisions in Internet access.

To consider the second component–the effect of gender differences in return to Internet use on the gender wage gap, (i) according to the employer discrimination and statistical discrimination hypotheses, when workplace discrimination exists, wage levels tend to be set lower for women than men despite similar endowments (e.g., Internet use skills). (ii) Based on the crowding hypothesis, if Internet use is evaluated higher for male-dominated occupations than female-dominated occupations by a firm, the gender wage gap arises. (iii) According to the monopsony power hypothesis, if a firm has monopsony power and values Internet use more highly for men than women, the gender wage gap widens.

Although it is assumed that the two components may contribute to the formation of the gender wage gap, there is no empirical study to investigate the issue. In this study, we attempt to estimate the contribution rates of these two components in the subsequent decomposition analysis and provide new evidence to understand the causes of the gender wage gap in China in depth.

2.2 Empirical studies on the gender wage gap in China

There are numerous empirical studies on the gender wage gap in developed countries and China (Iwasaki and Ma 2020).Footnote 4 We summarize only the main ones in China as follows.

Gustafsson and Li (2000), Liu et al. (2000), Li and Yang (2010), Li et al. (2014), and Ma et al. (2013) use the Blinder–Oaxaca model (Blinder 1973; Oaxaca 1973) for the decomposition analysis, thereby demonstrating that both the explained (e.g., gender differences in educational attainment) and unexplained (e.g., returns to education) components affect the gender wage gap. Most studies show that the contribution of the unexplained component is higher than that of the explained component, which suggests that workplace discrimination against women is the primary cause of the gender wage gap in China. The contribution rates of the unexplained component in the gender wage gap for the local urban resident group were reported to be 52.49% in 1988 and 63.20% in 1995 (Gustafsson and Li 2000); and 52.0% in 1995, 69.0% in 2002, and 77.7% in 2007 (Li et al. 2014). The rate for the migrant group was 74.32%–84.38% in 2008 (Li and Yang 2010), and that for all residents was 49.18% in 1996 (Meng and Zhang 2001). In addition, the values by wage percentiles were 86.08–101.80% in 2006 and 45.31–91.73% in 2009 (Ma et al. 2013). A few empirical studies also focus on the effect of segmentation by sector on the gender wage gap in China. Li and Ma (2006) and Wang (2005) analyze the influence of occupational segregation on the gender wage gap. Ge (2007), Ma (2018), and Wang and Cai (2008) explore the impact of segmentation by industry or enterprise ownership sector type on the gender wage gap in China and report that the unexplained component in intra-sector differentials drives the gender wage gap in China.

2.3 Empirical studies on Internet use and gender wage gap

Empirical studies on Internet use and the gender wage gap are scarce in developed countries and China. Borghans et al. (2014) and Ge and Zhou (2020) find that Internet skills and their return affect the changes in the gender wage gap in developed countries. Beaudry and Lewis (2014) hold that changes in return to computer skills are an important factor that explains the decline in the gender wage gap. Using industrial data, Ge and Zhou (2020) report that robot use reduces the gender wage gap, while an increase in computer capital raises the gender wage gap in the US. However, these studies are not based on the decomposition method. Therefore, the contribution rates of the gender disparity in Internet access and gender difference in return to Internet use in forming the gender wage gap are unclear.

Regarding China, only three studies focus on the issue. Zhuang et al. (2016) use data from the Third Survey on Chinese Women’s Social Status and apply the propensity score matching method to estimate the wage function. They find that the Internet use wage premium for women is 90.6% of that for men, suggesting that the return to Internet usage is lower for women than men. Using data from 2010, 2013, and 2015 Chinese General Social Surveys (CGSS) and the Blinder–Oaxaca decomposition method, Qi and Liu (2020) report that both Internet access and the return to Internet usage reduced the gender wage gap in 2013 and 2015. Wu (2021) uses data from the 2017 CGSS, the OLS model, and the Blinder–Oaxaca decomposition method to analyze the effect of Internet use on the gender wage gap and reveals that the contribution rate of gender disparity in Internet access is -4.34%, which reduces the gender wage gap.

Although these studies provide some evidence on the effects of Internet use on the gender wage gap in China, they have not addressed the individual heterogeneity problem owing to the use of cross-sectional analysis methods. Additionally, the intensive margin effects of Internet use (e.g., the frequency of Internet use) for different purposes (e.g., work and study, leisure) were not considered. Moreover, the differences in the effects of Internet use by the educational attainment and age cohorts were not considered. This study can address these neglects.

3 Methodology

3.1 Model

To estimate the gender disparity in return to Internet use, we estimate the wage function. The OLS model is expressed by Eq. (1).

$$Ln{W}_{i}\,=\,{a}_{0}+{\beta }_{1}{Int}_{i}+{\beta }_{2}{Female}_{i}+{\beta }_{3}{Female*Int}_{i}{+\gamma \sum {X}_{i}+v}_{i}$$
(1)

In Eq. (1), \(i\) represents the individual. \(Int\) is the Internet use variable, \(Female\) is the female dummy; \(Female*Int\) is the interaction term of Internet use and female dummy variable;\(Ln{W}_{i}\) denotes the logarithmic value of wages; \(\sum X\) represents a set of the factors (i.e., education, years of work experience, occupation, etc.) that affect the wages; \(\beta\) and \(\gamma\) are the estimated coefficients; and \(v\) denotes the error term. The total of \({\beta }_{1}\) and \({\beta }_{3}\) are the return to Internet use, \({\beta }_{3}\) is the gender differences in return to Internet use.

However, the endogeneity problem may exist in Eq. (1) for three reasons: first, the omitted variable may influence the likelihood of using the Internet and the wage level. We constructed the variables to control factors that may affect wage level as much as possible. However, some unobservable variables may also affect the results. We used the IV method to address the issue expressed by Eqs. (24).

$$\mathrm{Pr}\left({Int}_{i}=1\right)={a}_{1}+{\beta }_{Z}{Z}_{i}{+\beta }_{1}{Int}_{i}+{\beta }_{2}{Female}_{i}+{\beta }_{3}{Female*Int}_{i}+ \gamma \sum {X}_{i}+{u}_{i}$$
(2)
$$Ln{W}_{i}={a}_{0}+{\beta }_{1}{\widehat{Int}}_{i}+{\beta }_{2}{Female}_{i}+{\beta }_{3}{Female*Int}_{i}{+\gamma \sum {X}_{i}+\delta }_{i}$$
(3)
$$corr\left(Z,\delta \right)=0\mathrm\,and\,corr\left(Z,u\right) \ne 0$$
(4)

In Eqs. (24), \(u\), and \(\delta\) denote the error terms, respectively. \({Z}\) represents the IV. The internet penetration rate at the regional (provincial or local) level and the importance of using the Internet attitude were generally used in previous studies (e.g., Cao and Jiang 2020; Zhao and Li 2020). We performed several tests for these IVs, but the results rejected the hypothesis that these IVs are exogenous, suggesting that they were not valid for this study. We used two variables—(i) the provincial optical cable circuit in 1999 and (ii) the provincial long-distance cable line length in 1999 as IVs in this study. Both are the oldest data that we could obtain from the government’s official dataset.Footnote 5 It can be assumed that Internet installations in recent survey years (2014, 2016, and 2018) are closely related to the regional telecommunication capability in the past (such as the optical cable circuits or the long-distance cable line length 15 to 19 years back), while the influence of the regional telecommunication capability in the past on an individual’s recent income levels is small, which may fit the conditions of the IV from the econometric perspective. We performed several tests, including the Durbin-Wu-Hausman test, the exclusion test (Hansen’s J statistic), and the weak identification test (Cragg–Donald Wald F statistic). The results indicated that the two IVs are appropriate (see Tables 2, 3, 4 and 5 and discussions in Sect. 4).

Second, as \(v\) in Eq. (1) includes the errors related to individual-specific and time-invariant factors (\({\rho }_{i}\)) and the idiosyncratic error (\(\varepsilon\)), an individual heterogeneity problem may arise in Eq. (1). We use the FE or RE model to address this heterogeneity problem. We perform the Hausman specification test to judge the validity of the FE and RE models. In Eq. (5), \(t\) denotes the longitudinal survey year.

$$Ln{W}_{it}=\mathrm{\alpha }+{\beta }_{1}{Int}_{it}+{\beta }_{2}{Female*Int}_{it}{+\gamma \sum {X}_{it}+\rho }_{i}+{\varepsilon }_{it}$$
(5)

Third, the endogeneity issue may arise due to reverse causality. It is assumed that Internet use may affect wages (explored in this study). However, wages may also affect the likelihood of Internet use. For instance, the likelihood of using the Internet for work and study may be greater for high-wage workers than low-wage workers. Since there is a two-way relationship between Internet use and wage, we use a one-period lagged variable (LV) model to address the potential reverse causality. We assume that Internet use in period \(t-1\) may affect the wage level in period \(t\). However, wages in period \(t\) cannot influence the likelihood of using the Internet in period \(t-1\). In Eq. (6) below, \(t\) represents the recent period (e.g., 2018), \(t-1\) represents the prior period (e.g., 2016), and \({Int}_{t-1}\) expresses the Internet use in the prior period (\(t-1\)).

$$Ln{W}_{it}=\mathrm{\alpha }+{\beta }_{1}{Int}_{it-1}+{\beta }_{2}{Female*Int}_{it-1}{+\gamma \sum {X}_{i}+\rho }_{i}+{\varepsilon }_{it}$$
(6)

Then, to investigate the effect of Internet use on the gender wage gap, we use the Blinder–Oaxaca decomposition method (Blinder 1973; Oaxaca 1973) to decompose the determinants into two components: (i) the explained component, comprising the gender endowment differences (i.e., the gender disparity in Internet access, etc.), and (ii) the unexplained component, composed of the gender differences in the evaluated price of each factor (i.e., the gender difference in the return to Internet use, etc.). Equations (7, 8) describe the modelFootnote 6:

$$\overline{{LnW_{m} }} - {\text{~}}\overline{{LnW_{f} }} = \beta _{{\text{m}}} \left( {\sum {\bar{H}_{m} } - \sum {\bar{H}_{f} } } \right) + \left( {\beta _{{\text{m}}} - \beta _{{\text{f}}} } \right)\sum {\bar{H}_{f} }$$
(7)
$$\overline{{LnW_{m} }} - {\text{~}}\overline{{LnW_{f} }} = \beta _{{\text{f}}} \left( {\sum {\bar{H}_{f} } - \sum {\bar{H}_{m} } } \right) + \left( {\beta _{{\text{f}}} - \beta _{{\text{m}}} } \right)\sum {\bar{H}_{m} }$$
(8)

In Eqs. (7) and (8), \(\overline{{LnW }_{m}}- \overline{{LnW }_{f}}\) is the gender wage gap; \({\beta }_{\mathrm{m}}\), \({\beta }_{\mathrm{f}}\) is coefficient of each factor calculated based on the male and female wage functions, respectively; \(\overline{H }\) denotes the mean value of each factor, including Internet use. \({\beta }_{\mathrm{m}}(\sum {\overline{H} }_{m}-\sum {\overline{H} }_{f})\) or \({\beta }_{\mathrm{f}}(\sum {\overline{H} }_{f}-\sum {\overline{H} }_{m})\) is the explained component, and (\({\beta }_{\mathrm{m}}-{\beta }_{\mathrm{f}}\))\(\sum {\overline{H} }_{f}\) or (\({\beta }_{\mathrm{f}}-{\beta }_{\mathrm{m}}\))\(\sum {\overline{H} }_{m}\) is the unexplained component which includes the discrimination against women in the workplace.

To compare the effects of Internet use by groups, we also calculate the estimations per educational attainment and age cohort group.

3.2 Data

Three waves of data (2014, 2016, and 2018) from the China Family Panel Studies (CFPS 2020) dataset are used in this study. The reasons for using the CFPS are considered as follows: first, the CFPS is a nationally representative longitudinal survey of Chinese communities, families, and individuals launched in 2010 by the Institute of Social Science Survey, Peking University, China. Although Chinese General Social Survey (CGSS) data was used in the previous studies (e.g., Qi and Liu 2020), since the CGSS is the cross-sectional survey data, the individual heterogeneity problem could not be addressed. On the contrary, this study performs the analysis using the panel data analysis method (e.g., the FE and RE models, the LV model) to deal with the endogeneity problems, this study, based on the CFPS, can provide the robustness results on the issue. Second, although the China Health and Retirement Longitudinal Survey (CHARLS) is a longitudinal survey having information on Internet use, the survey targets of CHARLS are individuals aged 45 and older, whereas the CFPS covers all age generations. Therefore, we can use the data from the CFPS to compare the differences in the effect of Internet use among the younger, middle-aged, and older generations in this study. Third, we can obtain the rich information on Internet use, such as the used Internet and the frequency of using the Internet for different purposes (e.g., working and study, communication, and leisure); the latter is the unique question item in the CFPS which is firstly utilized in this study on the issue.

The CFPS is designed for individual-, family-, and community-level longitudinal data collection in contemporary China and provides information on Internet use, wages, and other factors (education, years of work experience, sex, occupation, industry sector, etc.). The 2010 CFPS baseline survey data was obtained through multi-stage probability sampling with implicit stratification. Multi-stage sampling reduces the operational cost of the survey and permits the analysis of the social context. In the 2010 baseline survey, the CFPS successfully interviewed nearly 15,000 families and 30,000 individuals within these families, with an approximate response rate of 79%. The respondents were tracked through annual follow-up surveys. The CFPS 2010 covers 25 provinces and municipalities. Only the latest three waves (2014, 2016, and 2018) of the CFPS, which include the survey item on Internet use, have been used in this study. The CFPS sample sizes are 37,147 (2014), 36,892 (2016), and 37,354 (2018).

The logarithmic value of the hourly wage is used as the dependent variable. The wages for 2014, 2016, and 2018 have been adjusted using the regional Consumption Price Index (CPI) in the rural and urban areas published by China’s National Bureau of Statistics (NBS 1999) to account for inflation, using the 2014 CPI as the baseline.

We calculate the hourly wage as a dependent variable based on the annual earned income and corresponding working hours.

The key independent variable is an Internet use dummy variable based on the questionnaire item: “Did you use the Internet in the past year?” (1 = has used the Internet in the past year, 0 = otherwise); we primarily use the variable to estimate the extensive margin effect of Internet use, which was estimated in the literature. Based on the questionnaire items on the frequency of Internet use by purpose (work, study, communication, shopping, entertainment), we originally constructed three indicators to investigate the intensive margin effect of Internet use in this study: (a) frequency of using the Internet for work, including work and study, (b) frequency of using the Internet for communication, and (c) frequency of using the Internet for leisure, including shopping and entertainment. Based on the five questions items in the CFPS as “please answer the frequency of using the Internet for study, work, communication, entertainment, commercial activity (e.g., Internet payment, shopping): (i)almost every day; (ii)3–4 times a week; (iii)1–2 times a week; (iv) 2–3 times a month; (v) once a month; (vi) once few months; (vii) never use)”, we re-coded each frequency as “7 = almost every day; 6 = 3–4 times a week; 5 = 3–4 times a week; 4 = 1–2 times a week; 3 = once a month; 2 = once in few months: 1 = never use”. We calculated the total values for (a) and (c) and used their arithmetic mean in the analysis.

We constructed an interaction term of Internet use and a female dummy variable to investigate the gender difference in return to Internet use in wage functions.

Based on the economic theories and existing studies, we identified a set of variables that may affect wages as specified by the wage functions, such as years of schooling, years of work experience and its squared term, ethnicity (1 = han, 0 = minority), party membership (1 = member of Communist Party of China, 0 = non-member), urban residents(1 = urban residents, 0 = rural residents), occupation (1 = manager and technician, 0 = otherwise), industry sector (1 = manufacturing industry sector, 0 = otherwise), workplace ownership (1 = state-owned sector, 0 = otherwise), region (west, central, east), and survey year.

As described in Sect. 3.1, we used two variables: the provincial optical cable circuit and the provincial long-distance cable line length in 1999, as IVs in this study.

This analysis is limited to respondents aged 16–60 years and excludes missing values; the longitudinal sample size is 18,381.Footnote 7

4 Empirical results and discussion

4.1 Descriptive statistics

Figure 1 shows the kernel density of the logarithmic value of wages by Internet-using and non-using groups. The wage level is higher for men than women in both groups, and the gender wage gap is higher in the Internet-using group than that in the not using one. The results indicate that there exists a gender wage gap, and the gender wage gap differs among Internet using and non-using groups.

Fig. 1
figure 1

Source: Authors’ calculations based on the data from CFPS of 2014, 2016, and 2018. M: male workers; F: female workers

Kernel density of the logarithm of wages for Internet using and not-using groups.

The proportions of individuals using the Internet by gender are summarized in Table 1. In general, an increase is observed from 2014 to 2018 for both men and women. However, the percentage is approximately 5% higher for men than women in each year of the three years, suggesting the existence of gender disparity in Internet access.

Table 1 The proportion of using the Internet by sex

In terms of the disparity by educational attainment group,Footnote 8 the percentage of individuals using the Internet is higher for men than women in the low- and high-education groups, whereas the opposite holds in the middle-education group. Regarding the disparity by age cohort, the percentage of individuals using the Internet is higher for men than women in each age cohort. The gender disparity in Internet access is the largest in the middle-aged generation born during 1970–1989 and the least in the younger generation born after 1990.Footnote 9 The results indicate that the gender disparities in Internet access differ by educational background and age cohort. Therefore, the heterogeneous group should be considered in the analysis.

4.2 Basic results

Table 2 presents the results of the wage function analysis using five models—the OLS (Model 1), IV (Model 2), LV (Model 3), RE (Model 4), and FE (Model 5).Footnote 10 Regarding the appropriateness of the IVs, the endogeneity test result (Durbin–Wu–Hausman test) is statistically significant at the 1% level; therefore, the null hypothesis considering that all the variables are exogenous is rejected. The Hansen J statistic is not significant, thus revealing that the IV is exogenous in the second stage estimation. Furthermore, the Cragg-Donald Wald F statistic is 26.393, which is larger than 10, suggesting that the weak identification problem can be neglected. Therefore, it can be considered that the IVs are valid. The results of the F test and the Breusch–Pagan Lagrange multiplier test indicate that both the FE and RE models are appropriate compared to the OLS model. The Hausman test results (2308.67, p = 0.000) suggest that the FE model is more appropriate than the RE model. The main findings are summarized as follows.

Table 2 The gender differences in return to Internet use

First, the coefficients of Internet use are significantly positive in Models 1–5, suggesting that Internet use may increase the wage levels when addressing individual heterogeneity and other endogeneity problems. The results can be explained based on human capital theory considering Internet use skill as a kind of human capital, which can potentially increase labor productivity. Additionally, Internet use may have a signaling effect whereby an employer may evaluate that an Internet-using worker possesses higher skills than a non-Internet-using worker.

Second, in terms of the gender differences in return to Internet use, the results of interaction item of Internet use and female dummy variable in Model 1 (OLS), Model 3 (LV), and Model 4 (RE) indicate that the return to Internet use is significantly greater for women than men. However, considering the endogeneity problem, the wage premium is significantly greater for men than women in Model 2 (IV). Additionally, although the result is non-significant in Model 5 (FE), it is a negative value. The results suggest a bias in the results derived from the OLS model. When using appropriate models (e.g., the IV or FE model) to address the endogeneity problem, the return to Internet use is higher for men than women.

4.3 Estimations on the intensive margin effect of Internet use

As shown in Table 2, although we analyzed the extensive margin effects of Internet use on the gender wage gap using a binary variable (whether used Internet), the intensive margin effect of Internet use (e.g., the frequency of using the Internet) on the wage gap should also be considered. Additionally, the effects of Internet use on the wage gap may differ for different purposes of Internet use. For example, when comparing the group using the Internet frequently for leisure (e.g., entertainment, shopping) to the group using it frequently for work or study, it is assumed that the group using the Internet frequently for work or study is likely to obtain higher earned income. To investigate the intensive margin effects of Internet use, we used the three indices: (i) the frequency of using the Internet for work and study, (ii) the frequency of using the Internet for communication, and (iii) the frequency of using the Internet for leisure including shopping and entertainment, to replace the binary variable of Internet use in Table 2 and re-estimate the models. The results are presented in Table 3.

Table 3 Frequency of Internet use for different purposes

The results from the Hausman specification test, F-test, and the Breusch–Pagan Lagrange multiplier test indicate that the FE model is more appropriate than the OLS and RE models. Regarding the validity of the IV method, the results in the first stage estimations indicate that both IVs significantly affect the likelihood of using the Internet at the 1% level; the results from the Durbin–Wu–Hausman test, Hansen J statistic, and Cragg-Donald Wald F statistic values indicate that the IV method is valid. Therefore, we report the results using the FE and IV models in Table 3.Footnote 11

First, the results from the IV method indicate that the coefficients of Internet use are positive and significant at the 1% level in Models 1–3. It is also positive and significant at the 10% level in Model 2 from the FE model. The conclusions are consistent with the results in Table 2.

Second, the results from the IV methods show that the coefficients of the interaction term of Internet use and female dummy variables are negative values and significant at the 1% level in Models 1–3, but they are not significant in the results from the FE model. These results are consistent with those in Table 2.

4.4 Estimations considering heterogeneous group

The results for the heterogeneous group based on educational attainment and age cohort are summarized in Tables 4, 5.Footnote 12 The results from the Hausman specification test, F-test, and the Breusch–Pagan Lagrange multiplier test indicates that the FE model is more appropriate than the OLS and RE models. Regarding the validity of the IV method, the results in the first stage estimations indicate that both IVs significantly affect the likelihood of using the Internet at the 1% level; the results from the Durbin–Wu–Hausman test, Hansen J statistic, and Cragg-Donald Wald F statistic values indicate that the IV method is valid in most cases. Therefore, we report the results using the FE and IV models in Tables 4, 5.

Table 4 The gender differences in return to Internet use by educational group
Table 5 The gender disparity in return to Internet use by age cohort

Table 4 presents the low-, middle- and high-educational group results. To secure enough samples for the analysis and consider the distribution of workers by education attainment levels,Footnote 13 we distinguished the samples into three groups (i) the low-education group (elementary school and lower); (ii) the middle-education group (junior and senior high school); and (iii) the high-education group (college and higher). The main findings are as follows.

First, in terms of the return to Internet use by the educational group, the results from the IV method show that the coefficients of Internet use are positive and significant at the 1% level in the middle- and high-educational groups, whereas it is not significant for the low-education group. The results from the FE model reveal that the Internet use coefficient is significant only for the middle education group at the 5% level. They suggest the effects of the return to Internet use on wages are much more significant for the middle- and high-educational groups. Additionally, the individual heterogeneity problem considerably affects the results.

Second, in terms of the gender difference in return to Internet use by the educational group, the results from the IV method show that the coefficient of the interaction of Internet use and female dummy is negative and significant at the 5% level for the middle-and high-educational groups, while it is insignificant for the low-education group, thus indicating when the other factors are held consistent, the return to Internet use is lower for female workers than male workers, and the gender difference in return to Internet use is higher in the middle-and high-educational groups than the low-education group. However, the results from the FE model are not significant in three educational groups.

Third, comparing the magnitude of the coefficients from the IV method show that the gender difference in return to Internet use is larger in the high-education group (− 18.215) than in the middle education group (− 6.856). The possible reasons can be considered as follows: the discrimination against high-educated women in the workplace, which may be caused by the glass-ceiling problem, or due to the gender occupational segregation among the high-education group is serious much more than that in the low- and middle-educational groups.

Table 5 summarizes the results by three age cohorts: the younger (born after 1990), middle-aged (born in 1970—1989), and older (born before 1969) generations. First, the results from the IV method show that the coefficients of Internet use are positive and significant at 1% levels in both the middle-aged and older cohort groups, while they are not significant for the younger cohort. The FE model results reveal that the coefficients of Internet use are significant for both the younger and middle-aged cohort groups at 5 and 10% levels, while it is not significant for the older cohort. The results suggest that, in general, there is a positive effect of Internet use on wages in each age cohort group, and the influence of individual heterogeneity on the return to Internet use is greater for the older cohort.

Second, in terms of the gender difference in return to Internet use by age cohorts, the results from the IV method show that the interaction coefficients of Internet use and female dummy are negative values and significant at 1% or 5% level for both middle-aged and older cohorts, and not significant for the younger cohort. The results from the FE model are not significant in the three age cohorts.

Third, the gender difference in return to Internet use was found higher in the older cohort than the other age cohorts when compared using the magnitude of the coefficients based on the IV method. It may be caused by that the discrimination against female workers is serious much more among older generations than those among the younger and middle-aged generations.

4.5 Decomposition results

The results in Table 1 indicate that Internet access differs by gender, and those in Tables 2, 3 suggest that the return to Internet use is different for men and women. However, how the two components affect the formation of the gender wage gap is unclear. Therefore, we conduct a decomposition analysis to calculate the contribution rates of these two components (Table 6). We also perform the decomposition analyses based on the educational attainment and age cohort (Tables 7, 8).

Table 6 Decomposition results of Internet use and the gender wage gap
Table 7 Decomposition results of Internet use and the gender wage gap by educational group
Table 8 Decomposition results of Internet use and the gender wage gap by age cohort

Table 6 reports the decomposition results for the total sample. First, the explained and unexplained components contribute to the formation of the gender wage gap. The influence is slightly less for the former (46.3%) than the latter (53.7%).

Second, in terms of the effects of Internet use, it is shown that both the gender disparity in Internet access and gender difference in the return to Internet use contribute to widening the gender wage gap, and the contribution rates of both are higher than the other factors (e.g., education, occupation). Additionally, the contribution rate is larger for the return to Internet use (72.4%) than the Internet access (50.1%). The results indicate that although both components drive the gender wage gap in China, the effect of the gender difference in return to Internet use is greater, suggesting that workplace discrimination against women in terms of Internet use is a serious issue in China.

Table 7 shows the decomposition results based on the educational attainment group. The results indicate that the effects of gender differences in both the Internet access and return to Internet use differ across different educational groups.

First, the gender disparity in Internet access reduces the wage gap in three educational groups; its effect is greater in the high-education group than those in the low- and middle-education groups.

Second, the gender difference in return to Internet use widens the gender wage gap in the high-education group, whereas they reduce the wage gap in the low- and middle-education groups.

Hence, these results suggest that the workplace discrimination against female workers is greater for the high education group in terms of Internet use than the other group, which widens the gender wage gap in China.

Table 8 presents the results of the decomposition analysis for three age cohorts: (i) the older cohort (born before 1969), (ii) the middle-aged cohort (born between 1979 and 1989), and (iii) the younger cohort (born after 1990).

First, the gender disparity in Internet access widens the gender wage gap in three age cohorts; the effects are greater for the middle-aged (54.6%) and older age cohorts (47.3%) than for the younger age cohorts (16.3%).

Second, the gender difference in return to Internet use reduces the gender wage gap in three cohorts; the effects are greater for the younger age cohorts.

The results of Table 8 suggest that the gender disparity in Internet access in the middle/older age cohorts and the workplace discrimination against the younger women in terms of Internet use widens the gender wage gap in China.

These results contradict those of Qi and Liu (2020), who report that both components reduce the gender wage gap in the younger, middle-aged, and older groups. There are two reasons: first, the method of analysis differs. This study conducts the decomposition analysis based on the IV method, whereas Qi and Liu (2020) use the OLS model; it can be concluded that an endogeneity problem such as the unobservable omitted variable issue might exist in the earlier studies. Second, the period of analysis in this study is 2014–2018, whereas it is 2010–2015 in Qi and Liu (2020). With Internet diffusion and Internet technology progressing, the effects of both components may change over time. The results in this study suggest that the gender divisions in Internet accessibility might become severe in the middle-aged and older age cohorts in the recent period (2014–2018).

4.6 Further discussions on the limitations of this study

It should be noted that this study has several limitations. First, although we used the IV, LV, RE, and FE models to attempt to address the endogeneity problem, future research should also explore the causal association between Internet use and the gender wage gap.

Second, the gender gap in the return to Internet usage may also result from the gender disparity in Internet-using abilities (or skills). Ge and Zhou (2020) report that the firm attributes (e.g., capital, trade exposure) and firm technology level (e.g., robot or computer use situations) affect the gender wage gap. Future studies can conduct a detailed survey on individuals’ Internet use skills and workplace technology levels.

Third, although the occupation, industry sector, and enterprise ownership type were used to control the influence of the workplace on the gender wage gap, other factors (e.g., wage and employment systems, enterprise attributes) may also affect the wage gap. We should conduct an employer-employee survey on the issue in the future.

Fourth, some studies have found that the work preferences differ by gender (e.g., Beblo and Görges 2018). This gender preference disparity in using new technology can also affect Internet access and return to Internet use. Further research should consider the influence of personality factors and self-selection on the issue.

Finally, due to the limitation of the survey period, we only investigated the issue in the current period (2014, 2016 and 2018), and the longer-term analysis has become a new challenge in the future.

5 Conclusions

Using national longitudinal data from CFPS of 2014, 2016, and 2018, this study empirically analyzed the influence of Internet use on the gender wage gap in China, considering the endogeneity problems. It yields the following four main conclusions.

First, according to the results derived from the OLS model, the return to Internet usage is higher for women than men, meaning the results are similar to those of earlier studies. However, when longitudinal survey data is used to address heterogeneity and other endogeneity issues based on the IV and FE models, the results show that the return to Internet usage is higher for men than women. The individual heterogeneity problem considerably affects the estimations, thus suggesting an estimation bias in the existing literature. The results based on the frequency of Internet use for different purposes confirm the conclusions.

Second, the gender difference in return to Internet use differs by heterogeneous groups: it is higher in the middle-/high- education groups and middle-aged/older age cohorts than those in low education and younger age cohorts.

Third, the decomposition results indicate that, in general, the two components (the gender disparity in Internet access and the gender difference in return to Internet use) drive the gender wage gap in China; the effect of gender difference in the return to Internet use is greater.

Fourth, the influence of Internet usage on the gender wage gap varies with the educational attainment and age cohort groups. For example, the gender difference in the return to Internet use widens the gender wage gap in the high-education group while they reduce the wage gap in the low- and middle-education groups; the influences of gender disparity in Internet access on the formation of gender wage gap are greater for the middle-aged and older generations than the younger generation.

The study highlights two policy implications. First, there exists a large gender difference in return to Internet use even when the other factors (e.g., education, occupation) are held constant, which highlights the prevalence of workplace discrimination against women in terms of Internet use, especially for the higher-educated women. The enforcement of the implementation of equality policies, such as the equality employment policy and “equal pay for equal work” policy, can be expected to reduce the gender wage gap. Moreover, the discrimination against women may be caused by the more family responsibilities for women than men (Connelly et al. 2018; Ma 2021a, b). Thus, policies that reduce the responsibilities of childcare and geriatric care, such as the one to promote the establishment of public kindergartens and long-term care insurance, are also expected to close the gender wage gap in the long run. Second, the results suggest that the policies aimed at the reduction of the Internet access disparities among various groups, such as women in the middle-aged and older generations, may contribute to reducing the gender wage gap.

Despite these limitations mentioned above, this study investigated the influence of Internet use on the gender wage gap and provided new evidence on the determinants of the gender wage gap in the era of the digital economy from China. We anticipate that the insights about the gender disparities in Internet access and gender difference in return to Internet use (including the discrimination against women in terms of Internet use in the workplace), that contribute to the formation of the gender wage gap in China, can provide valuable lessons for other countries as well.