1 Introduction

Digital economy has become an important driver for productivity growth and economic development in the new era of industrialization. The last two decades have witnessed great changes in various fields such as economy, politics, culture, and ecology, propelled by digital technological process affecting all the production processes and aspects of people’s livelihood. The application of digital technologies has fundamentally reshaped people’s life styles and industrial production methods, creating numerous new norms of livelihood and services unseen in the last century. China attaches great importance to the development of digital economy, and makes great efforts to promote the integration of digital technologies such as big data, artificial intelligence and cloud computing with the real economy. As a result, digital economy has expanded rapidly in China. From 2012 to 2021, the scale of China’s digital economy grew from 11 trillion yuan to over 45 trillion yuan, and the proportion of digital economy to GDP increased from 21.6 to 39.8% (State Council Information Office of China 2022), which has attracted serious attention of policy makers and academic researchers.

The purpose of this paper is to study this new form of economic growth and the changes in income distribution it has caused, paying special attention to the impact of digital economy on the wage gap between high- and low-skilled workers. Digital economy is a new form of economic growth after agriculture and industry, and in this economy data has become a new factor of production. Although the development of digital economy is becoming increasingly important, there is a lack of understanding of how digital economy affects income distribution as most previous studies have mainly focused on how digital technologies affect productivity. Much of the extant literature focuses on the impact of a particular aspect of the digital economy such as automation or artificial intelligence on human labor (Acemoglu and Restrepo 20202018), rather than the whole digital economy. The impact on human labor which mainly focuses on the demand of labor and wages unfortunately has remained unclear. On the one hand, there are theories showing that the development of digital technologies such as automation will reduce labor demand (Jones et al. 2014; Jones and Sloane 2010). On the other hand, there are also theories showing that the application of digital technologies has created jobs and hence increased wage incomes (Bessen 2019). After reviewing earlier studies, Acemoglu and Restrepo (2019) conclude that the net impact of automation on labor demand depends on the counteractive forces of three different effects. However, extant empirical studies have not come to a consistent conclusion regarding the relationship between labor demand and automation. As for the impact of AI on wages, some literature suggests that the advancement of AI applications can promote worker wages (Luo and Guo 2022), while others suggest that the advancement of AI may have a U-shape relationship with worker wages (He 2021). However, few studies have paid attention to the heterogeneous impact on wage incomes between high- and low-skilled workers. This paper aims to fill this literature gap using a large panel dataset provided by CFPS in 2010–20 through systematically examining the relationship between digital economic development and the wage gap between these two types of workers in the Chinese context.

We measure digital economy as a whole which can make a more comprehensive assessment of the digital economy development of a province and avoid the impact of a single digital technology on the workers cannot accurately reflect the changes within the region. We suggest that the reaction of human labor to digital technologies may be at work in the process of digital economy impacting income distribution. On the one hand, with the development of digital economy, the application of advanced digital technologies causes unemployment and there is ever-rising pressure on employment. It is costly for unemployed workers to seek new job opportunities matching well with their current skills. The slow process of job searching may cause or intensify labor misallocation with serious implications on income distribution as well as wage differentials between high- and low-skilled workers. On the other hand, facing the new employment situation attributed to the development of digital economy, workers may make positive changes in response to the possibility of being replaced by machines. Workers who are more familiar with digital technologies or learn more about them have stronger ability to adapt to the mode of production supported by digital technologies. Workers who are unfamiliar with digital technologies may have to face mismatches between technologies and skills, rendering them to a disadvantaged position in the labor market. To verify the role of labor skills in the process of digital economic development and wage incomes, we divide workers into low-skilled and high-skilled groups, and use the wage gap between them to show the inequality of income distribution resulting from the application of digital technologies.

We select samples from the Chinese labor force in 30 provinces during 2010–20 from the China Family Panel Studies (CFPS). We set up an ordinary least squares model to verify the relationship between digital economic development and the wage gap between high- and low-skilled workers. We show that the development of digital economy widens the wage gap between these two groups of workers. The empirical results remain robust after we control for micro-individual variables, province-specific variables, province fixed effects, year fixed effects and industry fixed effects.

Considering a possible endogeneity problem, an instrumental variable approach is also applied to the basic regression model. The instrumental variables include a digital economy index lagged by one period and the interaction term of topographic relief and national IT service revenue. Using two-stage least squares (2SLS) estimation, the effect of digital economy on the wage gap between high- and low-skilled workers remains positive and statistically significant despite a smaller point estimate of the relevant coefficient.

With clear evidence showing that digital economic development widens the wage gap between high- and low-skilled workers, we posit two transmission mechanisms closely related to worker responses to digital technologies: (1) labor misallocation, (2) mismatch between technologies and labor skills represented by worker responses to the risk of being replaced by machines. Firstly, labor misallocation is a passive reaction caused by digital economic development which has a more potent impact on low-skilled workers, widening the wage gap between high- and low-skilled workers. Secondly, the mismatch between technologies and labor skills is caused by an active reaction of worker responses to digital economic development. Facing the risk of being replaced by machines, high-skilled workers are more frequent to use Internet to improve their abilities such as studying and getting information than their low-skilled counterparts. The mismatch between technologies and labor skills for low-skilled workers widens the wage gap between the two groups of workers.

The marginal contributions of this paper include: (1) A new index measuring digital economic development is constructed; (2) A new and large panel dataset is used to explore the impact of digital economic development on the wage gap between high- and low-skilled workers in the Chinese context; and (3) Two mediating mechanisms are identified to study how digitalization affects the wage gap through labor misallocation and skill mismatching.

The rest of this paper is structured as follows. Section 2 reviews the relevant literature relating to digital economy and labor employment. Section 3 provides a theoretical analysis regarding digitalization and wage inequality. Section 4 describes the data and empirical models. Section 5 discusses the empirical results with robustness tests. Section 6 concludes with policy implications.

2 Literature review

There are two branches of literature related to digitalization and wage incomes. One is about the measurement method of digital economy, and the other is about the impact of digital economy on the labor market.

Regarding the measurement of digital economy, there is still no definite method. Nowadays, the levels of digital economy measurements focus on the national, provincial, and city levels. National level measurements usually use the indexing method (Wang et al. 2021b; Dahlman et al. 2016; Bukht and Heeks 2017), national economic accounting method (Gao and Xu 2021; Liu et al. 2022), digital economy value-added measurement method (Chen and Kong 2021) and satellite account construction method (Barefoot et al. 2018). Provincial level measurements usually use an indexing method (Wu and Wang 2022; Wan and Luo 2022) and the digital economy value-added measurement method (Zhu et al. 2021). City level digital economy measurements use an indexing method (Liu and Yin 2021).

The indexing method is commonly used at all levels, but the items involved in index and calculations used for synthesizing the index vary in different approaches. To construct an indicator system, extant literature mainly refers to the reports related to digital economy issued by official statistical agencies. For example, as the procedure used by Wu and Wang (2022), they choose digital industries classified by National Bureau of Statistics of China as the first level indicators. Another popular choice of the first level indicators is from reports published by China Academy of Information and Communications (Wan and Luo 2022). Some studies use their own posited theories on the selection of indicators, but all of them are based on “digital foundation”, measurable indicators of digital infrastructures, “digital industries” that measures digital technologies used in industries, and the government’s use of digital technologies for “digital governance” (Wang and She 2021; Deardorff 2017; Mesenbourg 2001). For the calculation step of assigning weights to each indicator during index synthesis, extant literature often uses the entropy method (Wu and Wang 2022; Liu and Yin 2021), principal component analysis (Zhang and Chen 2021), and equal weighting method (Wang et al. 2021b).

Another relevant literature is the impact of digital economy on the labor market. However, few studies have directly measured digital economy as a whole and explored its effect on labor. Many studies have mainly focused on the impact of automation and artificial intelligence on the labor market but there is no clear conclusion regarding the relationship between digital technologies and labor.

Most extant literature generally discusses the impact of digital economy on labor from the following three aspects: wages, employment and worker behavior. Some studies have explored the effects of artificial intelligence on wages. For example, Luo and Guo (2022) use the panel data of Chinese listed manufacturing companies from 2011–19, suggesting that the advancement of AI applications can promote employee wages, while others specifically using the data of Chinese artificial intelligence listed enterprises argue that the advancement of AI has a U-shape relationship with employee wages (He 2021). To examine the wage gap between different types of workers, employees are classified either by skills (Zhou and Wang 2021) or the state of health (Hallock et al. 2022). Concerning the impact of digital technologies on employment, there is always a debate on whether the use of machines will bring about unemployment (Arntz et al. 2016; Baumol 1967) or it will create new jobs (Sorgner 2017). Acemoglu and Restrepo (2019) set an integrated model and described three effects in their study. The productivity effect attributed to the decline of cost caused by replacing human labor with cheap machines can expand economy and raise labor demand. The reinstatement effect can also raise labor demand by creating new tasks which are of non-automation. However, the displacement effect goes in the opposite direction which can reduce the set of jobs due to the substitution of labor by machines. The ultimate impact of automation on employment depends on the net effect of three different effects. As for the impact of AI on the behavior of employees, there is an unambiguous relationship between digital technology and worker behavior. Some studies construct an analysis framework of employee behavior and suggests that for knowledge-based employees with higher education and learning ability, their autonomous learning behavior is enhanced because of the insecurity that their jobs may be replaced by AI (Lv and Li 2020), but other studies also use the concept of “technical hollowing-out”, suggesting that the diffusion of AI reduces knowledge-based employees’ job autonomy and increases reliance on the judgments made by AI (Wang 2019).

Another theory that explains how digital economy may affect the labor market is the so-called directed technical change. It explained the increase in wages of certain skilled workers after 1980s. The central results of the theory show that technologies adopted can be directed to workers of different skill levels, high-skilled or low-skilled. The type of workers coordinated with directed technologies widely used in production will have a rise in wages and employment opportunities (Wang et al. 2021a; Kiley 1999). Another theory of endogenously directed technical change suggests that the application of different directed technologies depend on the relative supply of workers with different levels of skills. For example, during the first industrial revolution, because of abundant low-skilled workers, it was profitable to use technologies directed to low-skilled workers which induced the directed technical change (Acemoglu 2002). The models of endogenously directed technical change are still evolving in the new era of digital technological progress (Guellec 2021).

Relevant study by Bai and Zhang (2021) to this paper examines the impact of digital economic development on wage equity of workers at different skill levels in China during 2002–13. They divide workers into high-, medium- and low-skilled workers by education and job title and find that digital economy widens the wage gap but reduces the welfare gap for workers with different skill levels. The mechanism on the wage gap is production efficiency improvement induced by digital economic development, and on the welfare gap is the application of digital governance. The model and empirical research set by the authors, however, do not consider worker responses to the development of digital economy.

Extant studies provide useful insights with multi-perspectives regarding digitalization and the labor market, but they have only focused on certain aspects of the digital economy, such as automation and AI, shedding limited light on the relationship between digital economic development and labor employment, especially the wage gap between different types of workers with various skills. Moreover, the mechanisms behind how digital economic development affecting the wage gap between different skilled workers has not been fully discussed. Finally, most relevant studies do not use the recent data and their research findings are not supported by the most recent development of digital technologies in a rapidly evolving environment. This is why this paper may have some useful contributions to the literature.

3 Theoretical analysis

3.1 Digital economy and the wage gap between high- and low-skilled workers

Extant research suggests that no matter what impact digital economic development on the labor market, any impact will represent the restructuring of the patterns of factors involved in production and services. Acemoglu and Restrepo (2019) verified three effects in the impact of automation on employment and each of these effects was related to the reallocation of capital and labor. The productivity effect substitutes capital for labor by replacing human labor with cheap machines, but with the cost declining and the economy expanding, labor demand will increase in the same sector or other sectors where human labor has comparative advantage over machines. The reinstatement effect also relates to making new tasks in the sectors where human labor has comparative advantages over machines, enabling labor moving to these sectors. The displacement effect is about substituting capital for labor by replacing human labor with cheap machines. The reallocation of factors means the redistribution of income.

Given the actual situation, China is in the early stage of digital economic development. The reinstatement effect is not fully in play and the displacement effect still dominates (Cai and Huang 2019). The displacement effect causes unemployment and it has a stronger effect on low-skilled workers. So we suggest that digital economic development may widen the wage gap between high- and low-skilled workers.

The theory of directed technical change can also provide evidence to the relationship between digital economy and wage inequality. By setting the endogenously directed technical change model, Loebbing (2022) showed that if the aggregate production function is quasi convex, a rise in skill premium can be accompanied by an increase in the relative supply of skilled labor, attributing to an improvement in the productivity of capital caused by the rise in the relative supply of skilled labor. Digital economy involves in the directed technical change which is particularly more beneficial to high-skilled workers than their low skilled counterparts. So according to Loebbing (2022), with the development of digital economy, the wages of high-skilled workers are supposed to increase due to the rise in skill premium.

According to Guellec (2021), the distribution of the new value created by digital innovation is related to the job nature of workers. In “winner-take-all” markets, the rents from digital innovation are mainly distributed to key members in the winning firms who have capital and skills, such as investors, shareholders, managers and the most important employees. The shareholders benefit from increasing dividends and share prices attributed to digital innovation and managers benefit from increasing compensation which is set to mitigate the principal-agent problem (Jensen and Murphy 1990). For average employees, they rarely get the rents from digital innovation. The distribution varies in job types which can be used to classify worker skills (Bai and Zhang 2021).

Therefore, this paper proposes Hypothesis 1: Digital economic development will widen the wage gap between high- and low-skilled workers.

3.2 Theoretical mechanisms of digital economy affecting workers’ wage gap

3.2.1 Mismatch between technologies and labor skills

When labor skills cannot satisfy the requirements of working with digital technologies applied by their workplace, the adjustment of labor demand will slow down, resulting in a widening wage gap between high- and low-skilled workers (Acemoglu and Restrepo 2019; Goldin and Katz 2007). For example, as a kind of digital technology featured by skill bias, AI requires a large number of skilled talents who are readily adapted to it. The long time to cultivate talents causes the delay between the application of AI and the change of human capital (Nedelkoska and Quintini 2018). The technological adjustment process weakens the productivity effect and the reinstatement effect of the application of digital technologies with consequential impact on workers’ wage gap. As those who are less readily adapted to digital technologies, low-skilled workers are more likely to be affected by the mismatch between technologies and labor skills.

In this paper, in order to measure the mismatch between technologies and labor skills, we suggest that this mismatch is attributed to an active reaction of human labor to the development of digital economy. Although the use of Internet spreads rapidly, the frequency of using Internet still varies between workers with different skill levels. It is partly caused by the different response of high- and low-skilled workers to the risk of being replaced by machines. For example, for high-skilled workers with better education and training, their independent learning behavior is enhanced due to the insecurity that their jobs may be replaced by artificial intelligence (Lv and Li 2020), and the Internet is an important channel for them to learn or obtain information to resist the risk of losing jobs. Thus the frequency of using Internet as a response to keep jobs from being replaced by machines is a possible mechanism. High-skilled workers are more likely to use the Internet to improve themselves and are, therefore, more likely to get more pays and/or keep their jobs, while low-skilled workers are less likely to improve themselves by the Internet and are more likely to be affected by job substitution with machines. The lower the ability of low-skilled workers to use digital technologies to improve themselves, the greater the mismatch between their skills and job requirements. This could be a mechanism for digital economic development to widen the wage gap between high- and low-skilled workers.

Therefore, this paper proposes Hypothesis 2.1: digital economic development intensifies the mismatch between technologies and labor skills, which widens the wage gap between high- and low-skilled workers.

3.2.2 Labor misallocation

According to the theory posited by Acemoglu and Restrepo (2019), the displacement effect of the application of digital technologies can lead to a more severe labor misallocation problem. Digital technologies such as automation put some of the workers out of work by replacing them with cheap machines. When these unemployed workers go back to face the ever-rising pressure in the labor market, it is hard and costly for them to find alternative jobs which are suitable for their skills. During the process of searching new jobs, a severe labor misallocation problem emerges. It is also predicted that by 2050 there will be a new “useless class” who will not be employed because of job substitution with machines (Shen 2019).

In this paper, we regard the growing labor misallocation as a passive reaction of human labor. As explained by Loebbing (2022) who sets a model of endogenously directed technical change, the application of technologies in firms depends on the relative labor supply with different levels of skills, and the theory of directed technical change needs to take the change of labor market into account rather than merely focusing on the final outcome of the labor market such as wages. The increasing labor misallocation can be regarded as a feature of labor market change due to the adaption of digital technologies. Based on the theory of endogenously directed technical change, labor misallocation related to relative supply of skilled labor plays a role in how digitalization affects the wage differential between workers with different skills.

Extant research suggests that low-skilled workers are more likely to be affected. According to Guellec (2021), average employees, or low-skilled workers, face more pressure in the labor market and are more likely to be involved in a temporary job. They are harder to be employed appropriately and in a weak position to negotiate their wages. However, for high-skilled workers, they have stronger ability to resist the risk of unemployment, and there are more opportunities for them to be re-employed. So low-skilled workers are more likely to be replaced by machines and the following labor misallocation problem also affects their wages. We suggest that labor misallocation could be another mechanism for digital economic development to widen the wage gap between high- and low-skilled workers.

Therefore, this paper proposes Hypothesis 2.2: digital economic development intensifies labor misallocation, widening the wage gap between high- and low-skilled workers.

4 Data description

4.1 Data sources

Both micro-individual data and provincial-level data are included in this study. The micro-individual data are collected from the database consisting of tracking surveys conducted by the China Family Panel Studies (CFPS) spanning over 2010–2020, precisely in 2010, 2011, 2012, 2014, 2016, 2018, and 2020. CFPS is a national, comprehensive social tracking survey project in China which can provide panel data to researchers. The CFPS sample covers 25 provinces (excluding Hong Kong, Macao, Taiwan, Tibet, Xinjiang, Qinghai, Ningxia, Hainan and Inner Mongolia) which account for 95% of Chinese population, so it can be regarded as a nationally representative sample (Xie et al. 2014). A wide range of variables are covered in the database and the respondents are both urban and rural residents. Micro-individual data used in this paper cover the incomes of respondents and their ages, health status, marriage status, and the like. The samples selected are those who are paid to work during the survey period, excluding private business owners and self-employed business owners. And we eliminate samples with missing data on key variables such as education level and income. Provincial data are obtained from public data from the National Bureau of Statistics of China, statistical yearbooks of each province over the years, China Statistical Yearbook, China Information Yearbook, and Yearbook of China Information Industry, which are used to measure digital economic index at the provincial level. Considering the availability and continuity of data, we eventually calculated the digital economy index of 30 provinces (excluding Hong Kong, Macao, Taiwan, and Tibet) in China from 2010–20. Combined with the samples obtained from the CFPS database, the final sample of this paper covers 25 provinces in China from 2010–20.

4.2 Empirical model

The benchmark regression model is shown in Eq. (1), which is analyzed based on the ordinary least-squares (OLS) regression method.

$${\mathrm{lnincome}}_{\mathrm{icpt}}={\mathrm{\alpha }}_{0}+{\mathrm{\alpha }}_{1}{\mathrm{college}}_{\mathrm{it}}*{\mathrm{digital}}_{\mathrm{pt}}+{\mathrm{\alpha }}_{2}{\mathrm{college}}_{\mathrm{it}}+{\mathrm{\alpha }}_{3}{\mathrm{digital}}_{\mathrm{pt}}+\mathrm{\alpha CVs}++{\updelta }_{\mathrm{c}}+{\updelta }_{\mathrm{t}}+{\updelta }_{\mathrm{p}}+{\upvarepsilon }_{\mathrm{icpt}}$$
(1)

The dependent variable is the logarithm of the real annual income of individual i who works in industry c in province p in year t. The core explanatory variable is the interaction term between whether the education level of individual i is lower than college representing the level of worker skills in year t and the level of the digital economy in province p where the individual i comes from in year t. As variables “college” are dummy variables which equal 1 if workers are high-skilled and equal 0 if workers are low-skilled, the estimated coefficient of the core explanatory variable α1 can reflect the wage gap between high-skilled and low-skilled workers when they are affected by the digital economy. The control variables represented by CVs include variables for micro-individual characteristics and variables for province-level characteristics. We also use industry fixed effects \({\updelta }_{\mathrm{c}}\), year fixed effects \({\updelta }_{\mathrm{t}}\), and province fixed effects \({\updelta }_{\mathrm{p}}\) to control the impact of factors that vary with industry, year, and province but are unobservable. We employ heteroscedasticity-robust standard errors.

4.3 Variable measures

4.3.1 Dependent variable

The dependent variable is the logarithm of real annual salary income, defined as “lnincome”. The CFPS survey provides “salary income obtained in the past 12 months” of individuals. Taking 2010 as the base period, salary income is deflated by the Consumer Price Index (CPI) of each province to get the real annual income in comparable prices.

4.3.2 Core explanatory variables

The core explanatory variables are defined as “college*digital”, which are the interaction terms between the level of worker skills and the digital economy index. The worker skill level variables (college) are represented by the education level of workers. According to Acemoglu (2002), if the education level of a worker is below college, the worker can be defined as a low-skilled worker, and a worker whose education level is college or above can be defined as a high-skilled worker. The worker skill level variables (college) are dummy variables which equal 1 if the education level is college or above and equal 0 if the education level is lower than college. The digital economy index (digital) is a comprehensive evaluation index calculated by constructing a digital economy indicator system and using the entropy method to calculate the weights of each indicator.

The digital economy index, which shows the level of digital economic development in each province, is measured by constructing a digital economy indicator system. This indicator system is prepared by combining the procedure used by Wu and Wang (2022) and Wang et al. (2021b). We draw on some measuring indicators of Wang et al. (2021b) of digital industry such as E-commerce and market scale, and indicators of digital environment such as digital application. The selection of digital foundation indicators is based on Wu and Wang (2022). The complete system consists of three levels of indicators, the first level of which includes digital foundation, digital industry, and digital environment, and the specific composition of the second-level and third-level indicators are shown in Table 1. The linear interpolation method and linear trend method are applied to fill missing data. After standardizing all indicators, we use the entropy method to calculate the most appropriate weights for each indicator and synthesize the digital economy index.

Table 1 Digital economy indicator system

Another component of the core explanatory variables is labor skills variables. We adapt the classification method of labor skills set by Acemoglu to classify workers by education. Workers with less than college education are defined as low-skilled workers, and those with college or higher education are defined as high-skilled workers.

4.3.3 Control variables

Micro-individual variables include age, gender taking the value of 1 if the individual is a man and zero otherwise, marriage status (spouse), taking the value of 1 if the individual has a spouse and zero otherwise, household registry (hukou), taking the value of 1 if the individual is not agricultural household and zero otherwise, and health conditions (health) taking the value 1–5 representing a range from healthy to very unhealthy, specifically, 1 means “healthy”, 2 means “average”, 3 means “relatively unhealthy”, 4 means “unhealthy”, and 5 means “very unhealthy”.

Province-specific variables include the level of economic development (lnGDPpc), measured as the logarithm of GDP per capita, level of government involvement in local economic activities (fiscal), which may be related to the investment attitude and investment amount in digital economy infrastructure of government, measured as the ratio of fiscal expenditure to GDP, industrial structure (indstru) measured by the ratio of value added of the tertiary industry to value added of secondary industry, economic growth rate (GDPgrowth) measured by GDP growth rate, foreign investment utilization (FDI) measured by the ratio of FDI to GDP and research funding abundance (RD) measured by the ratio of R&D expenditure to GDP. The data requiring deflating treatment are deflated using 2010 prices as the base period.

4.3.4 Mechanism variables

The first mechanism variable is the importance of the Internet for information access (netgetim). This variable is derived from the database of CFPS, which asks respondents how important they think the Internet is to their personal access to information and asks them to select options to answer on a 5-point scale, with higher scores indicating that individuals think the Internet is more important to them in terms of access to information.

The second mechanism variable is the frequency of workers using the Internet for learning (netstudyfre). This variable is also derived from the database of CFPS, which asks respondents how often they use the Internet for learning. The answers are categorized as never, occasionally, several times a month, several times a week, and almost every day, and which are also treated as dummy variables from 1–5 in this paper in the empirical analysis.

The final mechanism variable is the logarithm of the labor misallocation index. In this paper, the labor misallocation index is calculated for each province which is a function of labor price distortion coefficient (γLp) by referring to Cui et al. (2019). It is defined that the labor misallocation index lmis such that: lmisp = \(1/{\upgamma }_{\mathrm{Lp}}\), where γLp = \((\frac{{L}_{p}}{L})/(\frac{{s}_{p}{\upbeta }_{Lp}}{{\upbeta }_{L}})\). \(\frac{{L}_{p}}{L}\) represents the proportion of labor used in province p to the total labor; \(\frac{{s}_{p}{\upbeta }_{Lp}}{{\upbeta }_{L}}\) is the proportion of labor used in province p when labor is effectively allocated; \({\upbeta }_{Lp}\) is the labor output elasticity of each province estimated by the production function; and γLp reflects the degree of labor misallocation. A larger absolute value of the index lmis indicates more severe labor misallocation, while the opposite indicates less labor misallocation.

The descriptive statistics of relevant variables are shown in Table 2. About 20 percent of the workers are high-skilled workers who have college or higher education. During the ten years sample period, the minimum of digital economy index of 30 provinces is 0.027 and the maximum of it is 0.935, showing big differences in the development of digital economy between provinces. About 59 percent of sample respondents are men and the average age is about 39. The mean of variable “health” is 2.590 which show that the physical condition of people is average or even inferior. These data show that the sample selected in this paper meets the situation of people who can participate in the labor market.

Table 2 Descriptive statistics

5 Empirical results

5.1 Baseline regression

To examine the causal relationship between digital economic development and the wage gap between high- and low-skilled workers, we first conduct the ordinary least squares (OLS) regression based on Eq. (1). The results of the baseline regressions are shown in Table 3. Column (1) reflects the OLS regression without the inclusion of control variables and fixed effects; while in column (2), we add individual characteristics control variables. In column (3), we add province-level control variables based on column (2). All the control variables as well as industry, year and province fixed effects are included in column (4). In column (5) and column (6), we add industry-year fixed effects while adjusting for other fixed effects. In these six columns, the estimated coefficients of the core explanatory variable, which is constructed by multiplying the digital economy index and worker skills level variables, are all significantly positive at the 1% level, indicating that digital economic development widens the wage gap between high- and low-skilled workers, verifying Hypothesis 1. Control variables show other factors which can affect the income of workers. In column (2)-(6), the estimated coefficients of “gender” variable are about 0.46 which show that when the parameters on all other variables are the same, the income of men is about 46% higher than women. The estimated coefficients of “age” variable present that one year older reduces the by 0.7%. The estimated coefficients of “spouse” variable show that if a worker has a stable partner, he/she will earns about 40% more than if doesn’t. The non-agriculture household registry (“hukou” takes the value of 1) rather than agriculture household registry and better health also make workers earn more. As for province-specific control variables, the estimated coefficients in column (3) show that the level of economic development (lnGDPpc), level of government involvement in local economic activities (fiscal), foreign investment utilization (FDI) and research funding abundance (RD) are all positively correlated with the income of workers. But the increase of the ratio of value added of the tertiary industry to value added of secondary industry (indstru) and the increase of economic growth rate (GDPgrowth) reduce the income of workers.

Table 3 Effect of digital economic development on the wage gap between high- and low-skilled workers

5.2 Robustness test

We use the following methods for the robustness test. The results of the robustness test are shown in Panel A of Table 4.

Table 4 Robustness test

Firstly, we replace the digital economy measure in the regression. We choose the construction method of Zhao et al. (2020) for the digital economy indicator system. According to Zhao et al. (2020), the digital economic process is measured by Internet development and digital financial inclusion. In the indicator system, Internet development includes (1) the number of Internet broadband access users, (2) the proportion of computer service and software workers, (3) the total number of telecommunications services per capita and (4) the number of mobile phone users. Digital financial inclusion has only one indicator: The Peking University Digital Financial Inclusion Index of China (PKU_DFIIC). These five indicators are assigned weights by the method of principal component analysis. We recalculate the level of digital economy development using new indicators, and apply the new digital economy index to the regression. The regression results are shown in column (1), where the estimated coefficients of the core explanatory variables are almost the same as before and are significantly positive at the 1% level. This supports the conclusions of the baseline model.

Secondly, by examining the data, we find a part of income data may not match reality. To avoid the interference of extreme outliers, the dependent variables (lnincome) are winzorized at the two tails of 1 and 99%. As shown in column (2), this results in a large variation in the estimated coefficient of the core explanatory variables, but it is still significantly positive at the 5% level.

The final method we take for the robustness test is excluding samples under the age of 18 or over the age of 60, which are less likely to be employed in the labor market. The results are shown in column (3). The estimated coefficient of the core explanatory variable is smaller than that of the baseline regression but is still significantly positive at the 1% level.

This study may probably have an endogeneity problem between digital economic development and the wage gap between high- and low-skilled workers that cannot be addressed using the OLS regression technique. According to the theory of directed technical change, it is an endogenous technical change caused by changes in the relative supply of labor at different skill levels. An increase in the relative supply of high-skilled workers causes skill-biased technical change. This means that a change in the labor force may trigger a change in the level of digital economic development. Moreover, an increase in the relative supply of high-skilled workers also improves productivity, raising the so-called skill premium. In other words, directed technical change may widen the wage gap between workers at different skill levels. Nonetheless, this does not necessarily warrant a causal relationship between digital economic development and the concerned wage gap.

To address the probable endogeneity problem, we use instrumental variables to conduct a two-stage least squares (2SLS) regression. The first instrumental variable selected in this paper is a one-period lagged variable of the digital economy index according to Chen (2022) and Guo and Zeng (2022). The second instrumental variable is the interaction between topographic reliefs and national IT services revenue. Topographic characteristics are exogenous variables that are separate from the economic system. And the level of terrain can affect the construction of digital infrastructure, which eventually affects the level of local digital development. Since the base sample of this paper is time-varying and the topographic relief of the same province is constant, we selected the national IT services revenue to reflect the time variation. The national IT services revenue is the sum of provincial IT service revenue, which is used as the excluded variable because we believe that the part of other provinces will not affect the part of current province. At the same time, the national IT services revenue can reflect the development of the digital economy, which satisfies the strong correlation condition of instrumental variables with endogenous variables.

The 2SLS regression results are respectively presented in Panel A and Panel B of Table 4. The selection of instrumental variables passes the over-identification test and the weak instrumental variable test. The estimated coefficient of the core explanatory variable in column (4) is slightly smaller than that of the baseline regression but is still significantly positive at the 1% level. This demonstrates the robustness of the baseline regression results and affirms a causal relationship between digital economic development and workers’ wage gap.

After testing, we can determine the robustness of the conclusion obtained from the benchmark regression that the growth of digital economy widens the wage gap between high- and low-skilled workers.

5.3 Tests for mechanisms

In order to investigate the mechanisms behind the impact of digital economic development on the wage gap between high- and low-skilled workers, this paper examines the mechanism from the perspectives of the response of human labor to the development of digital economy: the mismatch between technologies and skills of workforce and labor misallocation.

The mismatch between technologies and skills of workforce is represented by the response of workers to the risk of unemployment. In the test of the mechanism of the response of workers to the risk of being replaced by machines, this paper focuses on the use of the Internet for individual improvement, measured as the importance of the Internet as a source of information and the frequency of using the Internet for learning purposes. They form interaction terms with the core explanatory variables, respectively. The regression results of the mechanism analysis are shown in columns (1) and (2) in Table 5. Column (1) tests about how often workers use the Internet for learning, and column (2) tests about how important workers consider the Internet for accessing information. Both regression results show that considering these mechanism variables, digital economic development widens the wage gap between high- and low-skilled workers, verifying Hypothesis 2.1. So the mismatch between technologies and labor skills could be a mechanism for digital economic development to have an impact on wage gap. The possible explanation could be that with digital economic development, people can use the Internet more conveniently but there are more risks of losing jobs to machines. High-skilled workers will learn more and acquire information by Internet to improve themselves compared to their low-skilled peers, eventually affecting the wage gap between them.

Table 5 Tests for mechanisms

In the mechanism test of the labor misallocation, we use the logarithm of the labor misallocation index to measure the degree of labor misallocation. The core explanatory variable of the mechanism analysis is the interaction term between the variable of the labor misallocation and the core explanatory variable of the baseline regression. The regression results are shown in column (3) of Table 5, and the estimated coefficient of the core explanatory variables in this mechanism test is significantly positive at the 5% level, verifying Hypothesis 2.2. Combined with the theory posited by Acemoglu and Restrepo (2019), it can be interpreted that digital economic development leads to machines displacing certain worker groups from employment opportunities, and those workers cannot find the most suitable job immediately, which intensifies the degree of labor misallocation. As low-skilled workers are more likely to be replaced by machines than high-skilled workers, their wage gap inevitably widens.

5.4 Heterogeneity analysis

The above study examines the impact of digital economic development on the wage gap between high- and low-skilled workers and its mechanisms based on a full-sample perspective. However, through heterogeneity analysis, we found differences in the degree of influence between groups as the entire sample are divided into various sub-samples based on gender, age and region.

5.4.1 Gender

This paper divides the whole sample into two sub-samples based on the gender of workers. The baseline regression models are estimated using the two sub-samples separately. The results are presented in Table 6. The estimated coefficients in columns (1) and (2) are both significantly positive at the 1% level. However, the estimated coefficient in column (1) is smaller than that in column (2), showing that males with different skill levels are more likely to have a wider wage gap affected by digital economic development than females.

Table 6 Heterogeneity analysis

5.4.2 Age

To test the differences in the impact of the development of the digital economy on workers of different skill levels between ages, the whole sample is divided into the 40 years old and below sub-sample and the above 40 years old sub-sample. The baseline regression models are estimated using the two sub-samples separately. The results are presented in columns (3) and (4) in Table 6. The estimated coefficient in column (3) representing sub-samples that are 40 years old and below is significantly positive at the 5% level, but the estimated coefficient in column (4) representing sub-samples that are above 40 years old is significantly positive at the 1% level. The value of the estimated coefficients also shows that the groups above 40 with different skill levels are more likely to have a wider wage gap affected by the digital economy development than younger workers. This finding is particularly interesting as it suggests that older aged people with low skills are obviously more vulnerable to a wage shock as a result of digitalization because they are less able to upgrade their skills than their younger counterparts in the labor market.

5.4.3 Region

The whole sample is divided into three sub-samples based on the region of workers. The first sub-sample contains workers who are from the eastern region; the second sub-sample comprises those who are from the central region, and the third sub-sample comprises those who are from the western region. The basic model is estimated using the three sub-samples accordingly. The results are presented in columns (3)-(5) of Table 6. The estimated coefficients show that workers from the eastern region with the highest level of digital development are more likely to have a wider wage gap between different skill levels affected by the digital economic development than workers from other regions.

5.4.4 Area

To test the differences in the impact of digital economic development between urban areas and rural areas, the whole sample is divided into two sub-samples based on the hukou of workers which show the domicile place. The first sub-sample contains workers who have non-agricultural hukou representing samples from urban areas, and the second sub-sample contains workers who have agricultural hukou representing samples from rural areas. The baseline regression models are estimated using the two sub-samples separately. The results are presented in columns (8) and (9) of Table 6. The estimated coefficient in column (8) representing urban areas is significantly positive at the 1% level, but the estimated coefficient in column (9) representing rural areas is significantly positive at the 10% level. The value of the estimated coefficients also shows that the groups from urban areas are more likely to have a wider wage gap affected by the digital economy development than those from the rural areas.

6 Conclusions

In recent years, when digital economic development becomes more and more important, there is a need to study this new form of economic growth and the changes in income distribution it has caused. This paper specifically focuses on whether digital economic development affects the wage gap between high- and low-skilled workers in the Chinese context.

Using the database of CFPS for the period 2010–20 covering 25 Chinese provinces (including province-level metropolitan cities and autonomous regions), this paper explores the impact of digital economic development on the wage gap between high- and low-skilled workers. The empirical results show that the higher the level of digital economic development, the wider the wage gap between them. This paper also identifies and verifies different mediating mechanisms that are found to have an effect on the relationship between digital economic development and the wage gap. We take the reaction of human labor to the development of digital economy into consideration. We regard the mismatch between technologies and labor skills as an active reaction and the labor misallocation as a passive response. The mismatch between technologies and labor skills is measured by the response of workers to the risk of being replaced by machines. This demonstrates that high-skilled workers are more likely to improve themselves by using the Internet facing the risk of losing job as a result of digital economic development. High skilled workers are more able to keep their jobs and/or achieve higher wages, widening the wage gap between them and their low-skilled counterparts in the process of digitalization. Another mechanism is labor misallocation arising from the time needed for job searching in a new labor market environment induced by digital economic development. The displacement effect brought about by digital economic development is more likely to make low-skilled workers be replaced by machines and unable to find suitable jobs immediately, putting them in a more disadvantaged position in terms of alternative employment and wage incomes.

The research findings in this paper have important policy implications. They suggest that governments at different levels as well as enterprises should take more responsibilities to assist workers in the labor market during the digital economic development process. Specific policies on skill training and internet infrastructure should be made more available to all the people, particularly the less educated workers to enable them to become more adaptive to the new and challenging labor market. Specific policies for the older aged people, women and workers in the less developed regions in the central and western parts of the country are also warranted.