1 Introduction

Education or human capital is widely recognized as one of the most important factors of economic development (e.g. Benhabib and Spiegel 1994; Barro 1999; Krueger and Lindahl 2001; Pritchett 2001; Acemoglu and Dell 2010; Gennaioli et al. 2013; Delgado et al. 2014) and has been repeatedly analyzed in empirical convergence research as a conditioning factor of convergence between countries and regions (e.g. Mankiw, Romer, Weil 1992). High level of human capital plays a major role in the success of countries and regions. Education is an important factor of economic and also social exclusion in many dimensions. It also generates positive externalities—Rauch (1993) confirms that higher average levels of human capital in US cities influence higher wages. Dalmazzo and de Blasio (2005) criticize focusing exclusively on wages. They consider in addition rents and confirm the existence of human capital externalities. Only in recent years have empirical works appeared that verify the existence of a relationship between the convergence of income and convergence of human capital or education, which is different from using human capital or education as a conditioning factor explaining convergence. Wolff (2000) finds convergence of the levels of education between 24 OECD countries in the period 1950–1990 and confirms its impact on labor productivity convergence. Berry and Glaeser (2005) confirm the divergence of human capital between US cities and suggest that it is related to the decline of the income convergence process. This does not mean, however, that the convergence of human capital and income will proceed in the same way. Rattsø and Stokke (2014) discover a separate convergence of income and education for 89 Norwegian NUTS4 regions and do not find a statistically significant relationship between these two processes—the education level increases in large cities with a limited increase of income in these regions, while the fastest income growth is observed in regions with a relatively low and stable education level. However, the dynamics of the convergence processes of both phenomena are not formally compared to each other, due to the lack of appropriate methods for measuring the similarity of convergence processes. This paper fills the existing methodological gap and proposes methods of testing the similarity of convergence processes based on a distribution analysis.

Educational reforms in Poland introduced in 1999 aimed at equalizing the educational opportunities (the quality of human capital) for students with differing socio-economic status. As a result, over the last two decades, Polish pupils have made significant progress in their educational performance, which is confirmed by international PISA surveys. Simultaneously, Poland has achieved a significant improvement of indicators related to income on a national level. Previous research shows that the educational reform in Poland led to the reduction of the spatial differentiation of educational achievements of Polish pupils (Drucker and Horn, 2016), while conclusions regarding regional income convergence in Poland are not unequivocal and depend on the analyzed regional level.

The purpose of this article is to verify whether there is statistical relationship between the patterns of educational achievements convergence and income convergence processes in Poland on the regional (subregions, NUTS 3) and local (counties, LAU 1) level. The data on educational achievements are lagged because we compare the achievements of 15-years old pupils and the income of the working population proxied by personal and corporate tax revenues of the municipalities of their residence. The proposed methodology for comparing the convergence processes of two phenomena includes two alternative methods of the whole distribution analysis—transition matrices and their continuous equivalent—kernel estimators of conditional density functions.

This article is innovative in several ways. First, we introduce a novel approach to measure the similarity of convergence processes of two phenomena and call it parallel convergence. The proposed methodology has more applications. It can be used to compare the notion of convergence of a single phenomenon in different periods, in different groups of regions, on different regional levels, to mention just few potential applications. Second, we address an important research question whether an impressive progress in educational achievements and income observed for Poland at the country level is accompanied by similar convergence patterns of both phenomena on the regional and local level.

The paper is structured as follows. Section 2 briefly describes theoretical background, while in Sect. 3 the data is discussed in details. In Sect. 4 the concept of parallel convergence and the distribution-dynamics-based methodology of its testing are introduced. The convergence processes for income and educational achievements with the use of transition matrices and kernel density estimates are documented in Sects. 5 and 6 respectively. The relationship between exam-results convergence and income convergence is pursued in the Sect. 7. Concluding remarks are summarized in the last section.

2 Theoretical background

There are empirical studies devoted to human capital convergence (e.g. Sab and Smith 2001; Cuaresma 2005), despite the absence of economic theories predicting the convergence or divergence of this phenomenon between countries and regions. Like other types of capital, human capital grows thanks to investment and learning. School education has the greatest impact on its development (Burgess 2016). In addition, cross-country comparisons suggest that what really matters to growth is not the level of human capital or educational attainment, but its quality—educational achievements. Schoellman (2012), Delgado et al. (2014), and Hanushek (2016) show that educational achievements, measured by the average test results or acquired competences, provide a more reliable measure of human capital than the number of years of schooling. The rate of economic growth is more strongly correlated with educational achievements than with the acquired level of education. Last but not least, mean years of schooling or tertiary education attainment, typically included in empirical growth regressions as a proxy for human capital, are difficult to apply at a local level. Availability of these data at a county-level in Poland is limited to the census years which takes place once in a decade, while comparable data on standardized exam scores of 15-year pupils are available on all regional and local levels on a year-to-year basis.

The research on human capital convergence at the regional level leads to ambiguous conclusions. Wheeler (2006) and Hammond and Thompson (2010) show tendencies toward the divergence of human capital between US cities and states, as measured by the share of highly qualified employees. In turn, Südekum (2008), analyzing the regions of West Germany in 1977–2002, confirms the convergence of human capital: regions with a large initial share of highly qualified employees had a higher total increase in employment, but a lower increase in the number of high-skilled jobs.

Decentralization of education is probably the most important process from the point of view of the territorial convergence/divergence of educational quality. Prior to 1990, the structure of the education system in Poland corresponded to the needs of a centrally planned economy, with a large number of specialized vocational schools subordinated to different sectoral ministries. Primary education, general secondary schools, and the universities were managed by the Ministry of Education. Starting from 1990, a gradual process of education decentralization began, in line with the practice in other EU/OECD countries seeking to strengthen democratic participation and local decision making. In 1999 two reforms were introduced in Poland – in administration and schooling. As part of the first reform, a county level (LAU 1) of local government was established. It now became the statutory duty of counties to run basic vocational schools, technical colleges as well as general and profiled high schools. Meanwhile, the education reform involved creating middle (lower secondary) schools as an intermediate level between primary and upper-secondary education with the primary goal of equalizing the educational opportunities for students with differing socio-economic status. With 40% of the population living in rural areas, and a very fragmented school network, Poland struggled to provide primary education of acceptable quality in all localities. The reform expanded general compulsory education, as the duration of education based on the common core curriculum for the whole cohort has increased from eight to nine years. Before 1999 compulsory education consisted of a one track primary and secondary level school which lasted for 8 years usually till age of 15. This 8-year general school was substituted by a 6-year primary school and a 3-year lower secondary school. They could be followed by upper secondary tracks: academic secondary track (3-year liceum), secondary vocational track (4-year technikum) or a basic 3-year vocational track.

One of the most applauded outcomes of the educational reform was the notable improvement of Poland’s rank in the PISA test, and the reduction of inequality in cognitive skills as measured by PISA (Jakubowski et al. 2016). Sitek (2016), using PISA data, shows that the transition from the two-stage to the three-stage model has led to a reduction in the dispersion of exam results as a result of improving the performance of weaker students, as well as a reduction in differences between students with low and high socio-economic status.

Therefore, convergence of educational achievements on both regional and local level is expected. We also aim to study the causal effects of the reform—its impact on the process of regional and local convergence of income. Drucker and Horn (2016) provide some estimates on the effect of lower secondary schools on Poland’s labour market. Using a quasi-panel of observations to estimate the treatment effect by difference-in-difference method these authors find evidence that the introduction of lower secondary schools caused an increase in the probability of employment, and wage. Our aim is to verify this relationship and its dynamics at a more aggregate level.

The purpose of this article is to verify whether there is statistical relationship between the patterns of regional and local convergence of educational achievements and income convergence processes in Poland. In particular, we aim to test whether regional convergence of income mimics (with a reasonable delay) the patterns of convergence of exam results.

3 Data

The empirical analysis in this paper is applied on the regional (72 NUTS 3 regions) and local (380 LAU 1) level and uses the data on income and educational achievements.

Educational achievements are measured by the result of the math-science part of the lower secondary school final exam carried out when pupils are fifteen years old. This exam is the last verification of educational achievements in the course of education in Poland which is standardized and fully comparable for a full cohort of students of a given age. We are aware that the educational achievement of fifteen-year olds is not directly related to the regional and local income generation process, but, as mentioned earlier, consistent and comparable data on qualifications of the current labor force (even educational attainment, i.e. years of schooling, share of the labor force with higher education, or similar) are not available for Poland on NUTS 3 and LAU 1 levels on a year-to-year basis. Annual data on the educational attainment of working-age population are available only down to NUTS 2 level (Labour Force Survey), while the detailed data down to municipalities only for the years of the national census (2002, 2011). We focus on the math-science part of the test as the achievements in humanities are considered less reliable and more prone to measurement errors.

Standardized and comparable data on the results of that exam are available for the years 2002–2015 at all regional levels (source: Polish Central Examination Board). The data on exam results are available down to the level of municipalities (LAU 2), but this level seems to be too disaggregated. Many workers commute across municipalities and the analysis of income and education dynamics at a more aggregated spatial units that are common labor markets is more appropriate.

In the assumptions and practice of the Polish system of external examinations, the main focus was on the comparability of exam results in a particular year. The consequence of this assumption is the inability to compare results over time. For this reason in the further part of the study the relative results of the exams were analyzed in relation to the national average in a particular year.

There is a time lag between the education process (especially at the secondary level) and the use of acquired knowledge and skills at work. In recent decades most students in Poland continue their education on the tertiary level and enter the labour market in the age of 24–25 years. Therefore in the empirical analysis the exam scores are lagged by 10 years when we expect that the main share of the cohort has made it to the labour market. Taking into account the availability of the data, the convergence of income in the period 2013–2018 is compared with the dynamics of exam results in 2003–2008. However, some early-adults are not studying at tertiary education or combine studying and working. That is why alternatively we consider lagging the exam scores by 6 years and analyze the dynamics of distribution of exams jointly in the 3-yearly periods 2003–2006, 2006–2009, 2009–2012 for education against 2009–2012, 2012–2015, 2015–2018 for income respectively. Basic statistical measures for the relative exam results on local (LAU 1) and regional (NUTS 3) levels are presented in Table 1.

Table 1 Basic summary statistics for relative exam results in LAU 1 and NUTS 3 regions

One can clearly see the stability of the average exam result over time, while the distribution becomes more homogeneous—the extremes seem to converge to the mean. The convergence of exam results in the first part of the analyzed period is clearly visible in the values of the coefficient of variation, which decreases in the first few years and then stabilizes. The spatial distribution of relative exam results in 2003 and 2008 is presented in the appendix on Fig. 3 (LAU 1) and A2 (NUTS 3).

In order to measure the well-being consistently on both regional levels we proxy income by the share in revenues from personal income tax (PIT) and corporate income tax (CIT) received by municipalities (LAU 2)—source: Local Data Bank, Polish CSO. Income of the population accounts for over 60% of the GDP. PIT is related to wages and salaries, which largely reflect the costs of employment. At the same time, the amount of wages and salaries, and thus also personal taxes, depends on the gross value added generated in a region, both in industry and services. Therefore, tax revenues are used as an indicator of the relative level of economic activity and applied to divide the regional GDP to the local level (Panek et al. 2014, BEA 2019). The use of corporate taxes allows us to take into account also the value added generated by business entities, including the self-employed.

Each municipality obtains approx. 40% of the revenues from PIT and CIT paid by taxpayers residing in this municipality. These data are currently available for the period 2009–2019. The tax system and the tax level in Poland is the same in all regions. Polish residents are taxed on their total income according to a progressive tax scale with one threshold and two tax rates (17% and 32%). In the years 2009–2019 the share of taxpayers whose personal income exceeded the threshold ranged to only 4–5% and there is no data on how these figures differ in regions. We aggregated these data to LAU 1 and NUTS 3 units. To account for differences in employment rates they were divided by the size of the working population. Similarly as in the case of exam results data on income for each region was divided by the national average in a particular year (100 = the national average).

Alternatively we considered using average remunerations reported by the Polish CSO down to LAU 1 regions for the period 2002–2019, but these data have important limitations. Average remunerations reported by Polish CSO do not take into account business entities employing up to 9 people. It is problematic, especially in growing regions, where the number of small firms is likely to increase and in that case, income growth is not picked up by growth of average remuneration. One could also consider regional GDP as an alternative measure of income, but it is not measured on the local level (LAU 1).

Basic statistical measures for the relative income on local (LAU 1) and regional (NUTS 3) levels are presented in Table 2.

Table 2 Basic summary statistics for relative exam results in LAU 1 and NUTS 3 regions

One can see that the average income consistently increases over the whole period, while the maximum is stable (LAU 1) or becomes lower (NUTS 3). For both regional levels the coefficient of variation decreases over time, indicating gradual reduction in cross-regional income inequalities. The spatial distribution of relative income in 2013 and 2018 is presented in the appendix on Fig. 5 (LAU 1) and A4 (NUTS 3).

4 Parallel convergence within the distribution dynamics framework

The proposed methodology for comparing the convergence processes of two phenomena includes two alternative methods of the whole distribution analysis—transition matrices and their continuous equivalent—kernel estimators of conditional density functions. Both methods allow the identification of mobility of analyzed regions within the distribution—in contrast to simpler methods of measuring convergence—beta convergence which concentrates on the relationship between the average growth rate of the analyzed phenomenon and its initial value, or sigma convergence which monitors a selected measure of dispersion and its changes over time (compare the comments to the values of the coefficient of variation in the Data section).

In addition, they also allow for determination of long-term (ergodic) distributions. Parallel convergence will be observed if transition matrices estimated separately for each of the phenomena are identical. One can also test the equality of ergodic distributions resulting from transition matrices or kernel density estimates. Comparing ergodic distributions requires normalization of the initial distributions to assure an identical starting point.

I suggest that the term “parallel convergence” be used for the situation when identical (indistinguishable in the statistical sense) dynamics of economic convergence processes is observed for the two studied phenomena in the analyzed period. The work of Rattsø and Stokke (2014) was among the inspirations for the concept of parallel convergence. These authors use a simplified approach to verify the similarity of income and education dynamics (Rattsø and Stokke 2014) for 89 Norwegian NUTS 4 regions. The authors do not analyze the similarity of convergence patterns for two phenomena, but instead test the similarity of convergence patterns of income in two subgroups of regions.

Following Magrini (2009), I suggest that the analysis of the whole distribution dynamics is the most appropriate method to verify (parallel) convergence. First, a discrete approach is applied, based on transition matrices introduced in convergence research by Quah (1997). The initial distribution is divided into several intervals (groups—usually equal-sized in the initial period). Based on that division the transition matrix (M) is estimated. This shows how the whole distribution (d) evolves over time:

$$ {\text{d}}_{{\text{t}}} = {\text{ M }} \times {\text{ d}}_{{{\text{t}} - {1}}} $$
(1)

The elements of a transition matrix M are the probabilities of transition between different groups. One can infer about the percentage of countries or regions which, being initially in a particular class i, stay in it or move to other groups (j)Footnote 1:

$$ p_{{{\text{ij}}}} = \, P\left( {X_{{{\text{t2}}}} = \, j \, | \, X_{{{\text{t1}}}} = \, i} \right) $$
(2)

Transition matrix allows estimation of the long-run evolution of income distribution—ergodic vector, which should not be treated as a long-run forecast for the distribution. The ergodic vector should rather be interpreted as a synthetic indicator for tendencies in the analyzed period. Convergence will take place if the probabilities in the ergodic vector move towards the group including the average (100% for relative data). High probabilities on the diagonal of transition matrix indicate a strong persistence of the distribution.

Transition matrices have an important limitation regarding the arbitrary selection of boundaries of intervals that define individual groups. This limitation is overcome by the alternative continuous approach, i.e. the full conditional density function. This can be treated as the equivalent of the transition matrix with an infinite number of rows and columns. Let us indicate the initial distribution of the analyzed variable by x, and the final distribution of the same variable by y. Then the conditional distribution of y for the known x can be defined using the following formula:

$$ f\left( {y{|}x} \right) = \frac{{f\left( {x,y} \right)}}{{f_{{\text{x}}} \left( x \right)}} $$
(3)

where fx(x) is the marginal distribution of the variable in the initial period, while f(y, x) is the joint distribution of y and x. To estimate the conditional density function, one needs to replace the numerator and denominator of the above expression with nonparametric estimators. The marginal distribution of the variable in the initial period is estimated using a two-step adaptive kernel density estimation for one-dimensional distributions:

$$ \hat{f}_{{{\text{xA}}}} \left( x \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{1}{{h_{{\text{x}}} w_{{\text{i}}} }}K\left( {\frac{{x - x_{{\text{i}}} }}{{h_{{\text{x}}} w_{{\text{i}}} }}} \right) $$
(4)

where n is the number of observations, hx is the optimal estimation bandwidth for the initial distribution of the variable, and K(.) is the kernel function. In the first stage of the adaptive method weights wi assume the value of 1 for all observations. The joint distribution of the variable in the initial and final period, i.e. the numerator of the Eq. (3), is estimated using the formula:

$$ \hat{f}\left( {x,y} \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{1}{{h_{{\text{y}}} h_{{\text{x}}} w_{{\text{i}}}^{{2}} }}K\left( {\frac{{y - y_{{\text{i}}} }}{{h_{{\text{y}}} w_{{\text{i}}} }}} \right)K\left( {\frac{{x - x_{{\text{i}}} }}{{h_{{\text{x}}} w_{{\text{i}}} }}} \right) $$
(5)

where hy is the optimal bandwidth for the distribution of the variable in the final period. Also the joint density function is initially estimated without differentiating weights for individual observations. Then, the initial estimation of the joint density is used to calculate the weights differentiating the bandwidth locally according to the expression:

$$ w_{{\text{i}}} = \sqrt {\frac{{\widetilde{{f_{{\text{g}}} }}}}{{\hat{f}_{{\text{K}}} \left( {y_{{\text{i}}} , x_{{\text{i}}} } \right)}}} $$
(6)

where the denominator of the formula under the root is the estimate of the joint density function for the observation i calculated using the fixed bandwidth on the basis of Eq. (5), and the numerator is the geometric mean of this estimator calculated for individual data points. The estimation of the final conditional density function is made using the weights from Eq. (6) in Eqs. (4) and (5) and finally calculating the ratio of both formulas according to Eq. (3). On the basis of the kernel density estimator one can also determine the ergodic distribution using the method described by Johnson (2005).

In the case of transition matrices, if the initial distribution of each of the two variables is divided into the same number of quantile groups, then initial distributions can be assumed to be identical. Therefore, with the same starting point, comparison of the transition matrix estimated separately for each phenomenon, as well as a comparison of the resulting ergodic vectors will allow to assess the similarity of convergence patterns. To verify the equality of two transition matrices the Pearson χ2 goodness of fit test might be applied—see, e.g. Cochran (1952) and Anderson and Goodman (1957). The use of this test to verify the null hypothesis that the estimated (empirical) transition matrix is equal to an exogenously defined matrix is discussed e.g. by Bickenbach and Bode (2001). The test statistic has the form:

$$ Q^{*} = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{{j \in F_{{\text{i}}} }} n_{i} \times \frac{{\left( {p_{{{\text{ij}}}} - p_{{{\text{ij}}}}^{0} } \right)}}{{p_{{{\text{ij}}}}^{0} }}{\text{ asy}}\chi^{2} \left( {\mathop \sum \limits_{i = 1}^{N} (f_{{\text{i}}} - 1)} \right) $$
(7)

where ni is the number of observations in the i-th row of the empirical transition matrix, pij means probabilities from the estimated (empirical) transition matrix, pij0—probability from the exogenous transition matrix, Fi = {j: pij0 > 0}—the set of column indices of the exogenous transition matrix, in which elements of the row i are greater than zero, and fi—the size of the set Fi. The test is therefore carried out only for those transition probabilities that are non-zero in the matrix exogenously specified. In the case of testing the equality of two empirical transition matrices, I propose that the test should be applied twice, each time taking one of the compared matrices as exogenous.

The comparison of convergence patterns might be complemented by a comparison of the equality of ergodic vectors. Parallel convergence will occur if for the same initial distribution for both phenomena the obtained ergodic distribution will also be identical. In this case, a standard χ2 test of compatibility of ergodic distributions (proportions of observations) in two subgroups can be used. Due to the fact that the sample (number of territorial units) might be small in this case, I propose usage of the tests discussed by Wilcox et al. (2013). The authors describe two new methods for comparing discrete distributions for a small sample. The first of the proposed new approaches (referred to as ‘method M’) is the extension of the Storer and Kim (1990) method for comparing two independent binomial ratios. The second method (called ‘method B’) is a procedure of multiple comparisons for all corresponding pairs of probabilities from both measurements. It uses the Storer and Kim (1990) method in combination with the Hochberg (1988) approach, which allows for controlling the probability of one or more errors of type I.

For conditional density functions I suggest verification of parallel convergence by using the formal test of equality of ergodic distributions. In order to compare the ergodic distributions for two different phenomena, they should be estimated in such a way that they correspond to identical initial distributions. Therefore, the first step of the proposed test will be the estimation of ergodic distributions for comparable (standardized) initial distributions. Obviously, standardization does not necessarily lead to a statistical equality of initial distributions for both phenomena. Testing the equality of ergodic distributions in order to verify parallel convergence will only make sense if initial distributions are identical. Therefore, I suggest that first the equality of standardized initial distributions be tested. Only then, when the equality of the initial distributions is confirmed, should an appropriate test of equality of ergodic distributions be performed. Not rejecting the null hypothesis in this test will be interpreted as parallel convergence. We use standard Kolgomorov-Smirnov and Anderson–Darling (Engmann and Cousineau 2011) for comparing distributions. If the test concludes that convergence pattern is similar, it is not necessarily the same regions that catch-up with the income level and with the educational level.

In the empirical part of the article first, the distribution dynamics for each of the phenomena was examined separately for the direct transition—between 2003 and 2008 for exam results and between 2013 and 2018 for relative income. Next, the tests of the similarity of convergence processes for income and educational achievements were applied using the above-mentioned methods.

5 Convergence measured within transition matrices

Table 3 and Table 4 show transition matrices for the mobility in the distribution of relative exam results in LAU 1 and NUTS 3 regions respectively for the direct transition between 2003 and 2008. In each case the range of relative exam result has been divided into five groups with boundaries determined on the basis of the quintiles of the initial distribution (boundaries reported in column headers). Probabilities on the diagonal of the transition matrices show the stability of the distribution—the probability of staying in the same group. It did not differ much between groups, apart from the first group of the lowest exam scores. LAU 1 regions with the best exam results maintained a very good result with a probability of 35.5% (33.3% for NUTS 3 regions), while the ones with the lowest results with a probability of 23.7% (26.7% for NUTS 3). The probabilities in the ergodic distribution are similar for both analyzed territorial levels and suggest that in the long run, middle groups 2 and 3 would slightly increase their size, while the groups with the highest and the lowest exam result would become smaller. More regions would have exam results closer to the average, which indicates tendencies toward the convergence of educational achievements. Mobility between groups is relatively strong—almost all cells in the transition matrices contain non-zero probabilities.

Table 3 Transition matrix for relative exam results in LAU 1 regions (2003–2008, direct transition)
Table 4 Transition matrix for relative exam results in NUTS 3 regions (2003–2008, direct transition)

Tables 5 and 6 present the results of the analogously estimated transition matrices with five classes based on the quintiles of the initial distribution for relative income in LAU 1 and NUTS 3 regions respectively. Mobility between groups is much weaker than in the case of relative exam results—much more cells in the transition matrices contain zero probabilities.

Table 5 Transition matrix for relative income in LAU 1 regions (2013–2018, direct transition)
Table 6 Transition matrix for relative income in NUTS 3 regions (2013–2018, direct transition)

Relatively high probabilities on the diagonal of the transition matrices show the stability of the distribution, but mainly in extreme income groups. The probability of staying in the same group is the highest for extreme groups—the richest (86.8% for LAU 1 and 73.3% for NUTS 3) and the poorest regions (61.8% for LAU 1 and 66.7% for NUTS 3). The results also indicate a slightly higher probability of transition to upper (richer) classes for the three lowest groups. Together with the relative stability of the richest group, this gives for NUTS 3 regions an ergodic distribution with a strong concentration of long-term probabilities in the group 4 (52.3%). As this group includes regions with income close to the average (100%), these results indicate the occurrence of income convergence at the regional level. In the case of the local (NUTS 3) level one can observe similar strong concentration of ergodic probability in the richest group of regions (55.6%) which confirms the tendency of relative enrichment of poorer regions and therefore convergence.

Therefore the analysis of transition matrices confirms that convergence of educational achievements in 2003–2008 is followed by the income convergence in 2013–2018 on both considered regional levels. However, there still remains the question whether these processes are interrelated.

6 Convergence within kernel density estimates

Estimated conditional density functions for direct transitions for relative income and relative exam results are shown on Figs. 1 and  2 (for LAU 1 and NUTS 3 regions respectively). The density functions for exam results are clearly more parallel to the horizontal axis, representing the initial result of the exam, especially for LAU 1 regions. This means that the distribution of educational achievements in the final period (vertical axis) was characterized by less variation than in the initial period, which indicates convergence. However, in the case of the density functions for income, a different pattern is visible—a much stronger concentration of the density along the diagonal, i.e. higher persistence of income distribution. But also here weak tendencies for convergence are visible—the extreme ends of the density function for income are more parallel to the horizontal axis. The highest incomes in the final period are less extreme than in the initial period (right part of the density function falls below the diagonal). Similarly, the lowest incomes in the final period are also less extreme than in the initial period (leftmost part of the density function climbs above the diagonal).

Fig. 1
figure 1

Kernel density estimates based a direct transition of relative income and relative exam results in LAU 1 regions

Fig. 2
figure 2

Kernel density estimates based on a direct transition of relative income and relative exam results in NUTS 3 regions

Therefore the analysis of kernel density functions confirms conclusions of the transition matrices. Convergence of educational achievements is followed 10 years later by income convergence on both considered regional levels. In addition, we performed similar analyses (transition matrices and kernel density estimates) for 6-year lag between exams and income, based on 3-yearly transitions of the distribution of exam results in 2003–2006, 2006–2009, 2009–2012 and the distribution of income in 2009–2012, 2012–2015, 2015–2018 respectively. The conclusions did not change. The details of 3-yearly transitions with 6-year lag are not presented here, but are considered in the next section, when formal tests of similarity in convergence patterns are applied.

7 Testing parallel convergence between income and exam results

The initial overview of transition matrices and the resulting ergodic vectors suggests that identical dynamics of distribution for income and exam results is not observed. The test of equality of the transition matrices is applied twice—each time assuming one of the matrices as exogenous. The results are presented in the first part of Table 7. In addition, the methodology of parallel convergence was used to verify whether the patterns of convergence of a particular phenomenon (income or exam results) are identical of both regional levels (results in the last four rows of Table 7).

Table 7 Results of tests for the equality of transition matrices for convergence of income and exam results based on LAU 1 and NUTS 3 regions

Parallel convergence for income and exam results is rejected for both territorial levels. However, one cannot reject that separately for exam results and for income convergence patterns are the same for LAU 1 as for NUTS 3 regions. In addition, the equality of ergodic vectors was also verified. The results of the joint test for all ergodic probabilities (method B, Wilcox et al. 2013) are presented in Table 8.

Table 8 Results of tests for the equality of ergodic vectors for income and exam results (method B)

The equality of ergodic vectors is rejected for the parallel convergence between income and educational achievements. However, again one cannot reject that convergence patterns for exam results were identical in LAU 1 and NUTS 3 regions. The results of the tests of equality of individual ergodic probabilities (method M, Wilcox et al. 2013) are presented in Table 9.

Table 9 Results of tests for the equality of individual ergodic probabilities for income and exam results (method M)

In case of testing for the parallel convergence of income and exam results, the equality of ergodic probabilities was rejected for most groups. Therefore, using tests for transition matrices, one can conclude that the convergence patterns of income and exam results were not identical, their convergence was not parallel. However, in case of comparison of convergence patterns for relative exam results in NUTS 3 and LAU 1 regions one cannot reject that each element in the ergodic vector is identical for both regional levels (Table 10).

Table 10 Tests of equality of initial and final standardized distributions for income and exam results based on LAU 1 and NUTS 3 regions

Similarities of convergence patterns were verified also by means of formal tests of equality of density functions. Initial distributions were standardized to obtain their comparability in terms of the mean and standard deviation and the ergodic distributions resulting from the mobility analysis were standardized accordingly. A graphical comparison of empirical cumulative distribution functions (ECDF) for initial and ergodic distributions of is presented in the appendix in Figs. 9 and 10 for LAU 1 and NUTS 3 regions respectively. Initial distributions are quite similar, except for a small deviation in the left tail. Their equality is confirmed for NUTS 3 regions and for LAU 1 regions only by KS test at 1% significance level. In contrast, ergodic ECDFs are clearly different—distribution for income is moved to the right as compared with the distribution for exam results. The formal tests of their equality indicate that they are not identical.

Therefore, there was no parallel convergence of income and educational achievements in Poland on the regional and local level. The process of convergence of income did not mimic the patterns of convergence of educational achievements—neither with a 10-year delay nor a 6-year delay. Weak tendencies for convergence were observed for each phenomenon separately, but it was not driven by the same regions—convergence patterns were different. Different regions catch-up with the average educational level and with the average income level.

8 Conclusions

The hypothesis assuming identical convergence patterns for income and educational achievements on the regional and local level in Poland has not been confirmed. Despite the occurrence of convergence for both phenomena, it has a different course for each of them. The patterns of convergence of income do not mimic convergence process for exam results. The background understanding is that the level of educational achievements increases in northern and western Poland, while central, eastern and south-eastern regions are lagging behind, excluding cities (see maps A5 and A6 in the Appendix). This can be at least partly attributed to the long-lasting regional differences in Poland. In the XIX century Poland was divided among its three neighboring powers—Russia, Prussia, and Austria. Up until the reunification of Poland in 1918 the three regions were exposed to very different political and administrative cultures and experienced very different rates of economic growth, with territories under Russian rule being generally less advanced economically and lagging in terms of the development of modern social and political structures. Empirical research shows that territorial differences referring to many socio-economic phenomena including education in today’s Poland clearly reflect the historical partitions (Bukowski 2019). Differences between the historical districts have persisted until the present mostly through transmission of norms and beliefs. This seems to be the case also here as the regions where (relative) level of educational achievements decreases almost perfectly fit to the former Russian division.

In turn, (large) cities face the decrease of relative income which contrasts with the concentration of human capital in cities. These results are in line with the conclusions of Rattso, Stokke (2014) obtained for Norwegian regions. There is a long lag between the accumulation of human capital (education) and its use in the labor market. In addition, migrations for work or studies, especially of more motivated individuals, mean that human capital is finally used in regions other than it was generated. Among the implications for the economic policy of this study, one should mention the recommendation of regular monitoring and collection by public institutions in Poland of comparable data on the quantity and quality of human capital in regional and local terms. On the other hand, in order to keep high-quality human capital in the regions where it was generated, public policy should create incentives for people entering the labor market to set up a business, and create regulations conducive to remote work on a larger scale, not only at exceptional times of the pandemic.