1 Introduction

Our understanding of the mechanisms and patterns of international migration over time are impeded both by the lack of data and by inconsistencies in the measurement and collection of the data that are available. In fact, it is well known that the patterns of migration vary significantly depending on which country is reporting the data (Kupiszewska and Nowok 2008; Nowok et al. 2006; Zlotnik 1987). Considering that international migration is the main factor contributing to population growth in Europe, this is very unfortunate. In response to the problem of inconsistent migration data, we have developed a methodology for harmonising the data available to us from countries in Europe. More specifically, we make use of doubly counted information obtained from migrant sending and migrant receiving countries to estimate adjustment factors necessary for producing a consistent set of migration flows. These estimated flows are benchmarked to a particular definition.

Harmonisation of migration data is required for the development of policies on immigration (Kraier et al. 2006). Differences in both the concepts and techniques used to measure migration make any international comparison of migration difficult. There has been a lot of work on data issues and migration definitions, for example, see Champion (1994), Kelly (1987), Kraly and Gnanasekaran (1987), Poulain (1993), Poulain et al. (2006), Raymer and Willekens (2008), United Nations (2002) and Willekens (1994, 1999). Several international institutes such as the International Labour Organization, the Organisation for Economic Co-operation and Development, the United Nations and the European Commission have all invested heavily in the harmonisation of international migration data, but without much success or progression (Bilsborrow et al. 1997; Herm 2006a; Fassmann 2009). In fact, the situation today in terms of migration definitions and measurement is not much better than it was, say, 20 years ago.

Recently, some renewed efforts have been made to improve the migration data situation in Europe. In 2007, the European Parliament adopted a new regulation on migration statistics. This regulation provides clear definitions of immigration and emigration (Official Journal of the European Union 2007), and lists the migration data that must be supplied to Eurostat, the statistical office of the European Union (EU), by Member States. However, this regulation leaves the Member States free to decide how they will provide these data, including the use of estimation methods (Fassmann 2009). The methodology presented in this paper should help national statistical offices to improve and harmonise the data they currently provide to international organisations, such as Eurostat.

The migration definition set out in the 2007 Regulation corresponds to the definition recommended by the United Nations (1998), where an international migrant is defined as ‘a person who moves to a country other than that of his or her usual residence for a period of at least a year’. One problem affecting the implementation of this definition is that some countries are unable to identify their nationals who have left (Fassmann 2009). Furthermore, many European countries exclude the immigration of nationals from the published statistics, as they are not considered to be ‘migrants’. Another important obstacle has to do with the recommended duration of residence in the country of destination. It may take up to 2 years to identify all persons who have stayed at least 1 year, as they may arrive anytime during the annual time period of interest. This means that the publication of migration statistics based on the actual duration of stay may be delayed for some time. To provide statistics to the user community in a quicker fashion, many countries simply count those migrants who have stayed for at least 3 months, which leads to higher numbers than if the 1-year criterion was applied. Other countries use the intended duration of stay as the criterion (Fassmann 2009).

Many European countries do not have reliable statistics on emigration. This is mainly caused by the fact that migrants have little incentive to report their move to the administration of the country they have emigrated from. Moreover, it is difficult to count persons leaving the country because they are no longer present in the country collecting the data. In this situation, comparisons of sending country data with receiving country data provide important information on the degree of underestimation found in reported emigration flows (UNECE 2009). In fact, the analysis of the so-called ‘double-entry matrix’ of migration flows produced by UNECE since the early 1970s, and more recently by Eurostat, has been found to be very useful and informative. Kelly (1987) and Poulain (1999), for example, have used the information contained in this matrix to assess the degree of harmonisation amongst reported data. In doing so, the possibility that very narrow or loose definitions of migration may be used for reported immigration statistics must be taken into account, which results in lower or higher levels of migration flows, respectively, in relation to, say, the United Nations’ recommended 1-year definition (UNECE 2009).

The aim of this paper is to illustrate how reliable estimates of harmonised migration statistics may be obtained from a set of origin–destination flows, where two reported flows are available for each particular flow, i.e., from receiving and sending countries. The new method that we present is based on earlier efforts by Poulain (1993, 1999), and is applied to reported flows between 19 European countries from 2002 to 2007. Note, however, that this paper does not consider flows outside the 19 country system, or those that are missing. Raymer (2008) describes a method for estimating missing migration flow data.

2 Comparability of International Migration Data

The reliability of migration statistics can be measured by how well they correspond to a particular country’s definition or concept of migration. However, as definitions differ across countries, reliability does not guarantee comparability. Moreover, under-registration, under-coverage and accuracy of the collection system also affect the measurement of migration (Bilsborrow et al. 1997; Nowok et al. 2006). First, there may be under-registration of migrants. This may be the case if the data depend on declarations by the migrants themselves. The willingness to report changes in places of residence varies both between countries and between groups of migrants. In general, migrants have more incentive to report their arrival than their departure, as there are usually direct benefits in doing so (e.g., access to social services). Therefore, immigration statistics are generally considered more reliable than emigration statistics (Thierry et al. 2005; UNECE 2009). Second, there may be under-coverage. This measurement category refers to the non-inclusion of particular migrant groups. Here, the differences are most often caused by the absence or inclusion of nationals, students, asylum seekers or irregular (illegal) migrants in the data. In general, asylum seekers are included only when they have been granted refugee status and received a temporary or permanent residence permit. However, in some instances, they are registered at an earlier stage of the asylum process. In other instances, even recognised refugees are not included. Irregular migrants are generally not included in migration statistics, as they are especially difficult to measure (for obvious reasons). In fact, Spain is the only EU country that includes irregular migrants in the official statistics. Finally, data based on sample surveys may be unreliable due to sampling errors. Furthermore, unless the sample size is very large, the data are likely to show irregularities in the patterns across ages or in the distribution of origins or destinations over time, as flows of migrants represent a relatively small proportion of the overall population being surveyed.

The main sources of the differences in the definitions used by EU countries to measure migration are the concepts of place of residence and duration of stay (Zlotnik 1987; Bilsborrow et al. 1997; Kupiszewska and Nowok 2008). The de jure (legal) approach to residence implies that in order to become a resident, a migrant must comply with certain regulations, which tend to differ between nationals and foreigners, and among foreigners, between EU- and non-EU-nationals. For example, it is not uncommon for emigrants to be registered in their country of citizenship (origin) even after several years of living abroad (Thierry et al. 2005). Thus, having a place of residence does not necessary imply a presence in that country. The de facto (actual) approach is connected with physical presence in a country, usually for a specified minimum period of time. To prevent the delay caused by measuring actual duration of stay, most European countries use the intended duration of stay instead (Nowok et al. 2006). Alternatively, the intended duration of stay may be used to provide provisional statistics, which are updated at a later point with the actual duration of stay statistics. Another group of countries measure ‘permanent’ change of residence only (e.g. Poland and Slovakia), which is very restrictive and tends to produce flow levels that are much lower relative to other definitions. The duration of stay criterion used by the majority of EU countries is between 3 months and 1 year. Only three countries (Cyprus, Sweden and UK) apply strictly the 1-year criterion for immigration, as well as for emigration and for both nationals and non-nationals (Thierry et al. 2005). In fact, some countries do not take duration of stay into account at all. Germany is such an example, where everybody taking up a residence is counted as a migrant.

Because of differences in definition, coverage, registration and accuracy of the collection mechanism, the origin–destination matrix of migration flows between European countries based on immigration data reported by the countries of origin tends to differ from the matrix reported by the countries of destination. With respect to definitions, the differences are expected to be systematic over time. For example, the German definition is wider than the Dutch definition which, in turn, is wider than that of Sweden. In fact, Germany reports higher figures than the Netherlands, and the figures of the Netherlands are higher than those reported by Sweden (Kupiszewska and Nowok 2008). A comparison of the size of these reported flows provides information on the effects of differences in definition on the size of migration flows (Bilsborrow et al. 1997; UNECE 2009). However, as mentioned above, not all differences can be explained by differences in definition. In some cases, countries report relatively large percentages of unknown countries of origin or destination. Furthermore, sudden jumps in observations may be caused by changes in definitions or by changes in the registration method.

Data on immigration and emigration flows by country of origin and destination are usually presented in an origin–destination matrix with off diagonal entries containing the number of people moving from any origin i to any destination j in a given calendar year. For this study, we have collected migration data for the 19 countries set out in Table 1. As each flow can be reported by both sending and receiving countries, two migration tables may be produced. Such data are set out in Tables 2 and 3. Here, the average 2002–2007 values of migration between the 19 European countries set out in Table 1 are presented. Table 2 contains flows reported by the countries of destination and Table 3 contains the flows reported by the countries of origin. Clearly, there are large differences between the two sets of reported numbers (see, e.g., Spain to the United Kingdom or Poland to Germany).

Table 1 List of European countries reporting both immigration flows by country of origin and emigration flows by country of destination, 2002–2007
Table 2 Reported migration by country of destination, averages 2002–2007
Table 3 Reported migration by country of origin, averages 2002–2007

3 Method

The differences between reported immigration and emigration numbers are useful for improving and harmonising the migration data. If reported emigration numbers for a given country turn out to be systematically lower than the corresponding immigration numbers reported by the countries of destination, this suggests that the reported emigration numbers are too low. Adjusting these numbers in an upward direction moves them closer to the actual numbers. The same applies to reported immigration numbers. For each country we can estimate one adjustment factor for immigration and one for emigration in such a way that the adjusted immigration and emigration numbers are closer to each other than the reported numbers. To prevent arbitrary judgments biasing the results, we believe the estimation of adjustment factors for immigration and emigration flows should be estimated simultaneously. Moreover, it should be noted that immigration is not necessarily recorded more accurately than emigration. In some situations, sending country data may be considered better (Nowok et al. 2006).

Poulain (1993, 1999) was the first to develop a method to adjust reported immigration and emigration numbers for the purpose of obtaining a consistent set of migration flows. ‘Correction factors’ were estimated by minimising the sum of squares \( \sum\limits_{i,j} {(\hat{\alpha }_{j} I_{ij} - } \hat{\beta }_{i} E_{ij} )^{2} \), where I ij denotes migration from country i to country j reported by the receiving country j, E ij denotes the same flow reported by the sending country i, α j is the adjustment factor for all immigration to country j and β i is the adjustment factor for all emigration from country i. Poulain and Dal (2008) refined this method by dividing the squared differences by the sum of the reported numbers, i.e.,

$$ \sum\limits_{i,j} {(\hat{\alpha }_{j} I_{ij} - } \hat{\beta }_{i} E_{ij} )^{2} /(I_{ij} + E_{ij} ) . $$

This refinement prevents flows from (or to) large countries from biasing the estimates.

Various constraints have been tried by Poulain and colleagues (Abel 2009). For instance, following the iterative approach to harmonising migration flows suggested by van der Erf and van der Gaag (2007), Poulain and Dal (2008) proposed that the estimates should be normalised to Swedish immigration data, as they are generally considered to be highly reliable and in agreement with the UN recommended measure, as well as with the new EU regulation (Herm 2006b). The parameters α j and β i may be estimated by solving a system of linear equations, which result from applying the method of Lagrange multipliers. Multiplying I ij by \( \hat{\alpha }_{j} \) and E ij by \( \hat{\beta }_{i} \) produces two sets of migration flow estimates from country i to country j. The final set of estimates are obtained by simply taking the average of the two, i.e., \( \hat{n}_{ij} = (\hat{\alpha }_{j} I_{ij} + \hat{\beta }_{i} E_{ij} )/2 \), where \( \hat{n}_{ij} \) denotes the harmonised migration flows. Note, Poulain and Dal (2008) applied their correction method first to countries with relatively reliable data to prevent countries with less reliable data influencing the overall patterns. Here, the main concern is that the less reliable data have origin–destination patterns that are not consistent with the actual patterns. Thus, less reliable flows were adjusted in a hierarchical fashion, i.e., by using the harmonised reliable data as a basis.

There are several limitations in the model described above. First, the reported numbers included in the denominator of Eq. 1 are known to be incorrect (Abel 2009). Second, the row and column totals of the two estimated matrices are not equal. As a result, the row and column totals of the average harmonised migration matrix do not correspond to the row and column totals estimated using the adjustment factors. Finally, the method can only be applied to a limited set of countries with reasonably reliable data. This implies that the estimates of the adjustment factors depend on the selection of countries, which may not reflect the broader patterns of interest. For these reasons, we have revised Poulain’s method in two important ways. First, the row-sums and column-sums of the two estimated matrices are set to be equal. Second, we introduce additional constraints on individual cells in the migration matrices, so that more countries (with less reliable data) may be included.

The adjustment factors for our method can be estimated by solving a system of linear equations and imposing a constraint. If we have a N × N receiving country and an equivalent N × N sending country matrix, the adjustment factors for receiving country, α j , and the adjustment factors for sending country data, β i , can be estimated by

$$ \sum\limits_{j} {\hat{\alpha }_{j} I_{ij} = } \hat{\beta }_{i} \sum\limits_{j} {E_{ij} }\quad {\text{for }}i = { 1}, \ldots , \, N;\quad i \, \ne \, j $$
$$ \hat{\alpha }_{j} \sum\limits_{i} {I_{ij} = } \sum\limits_{i} {\hat{\beta }_{i} E_{ij} } \quad{\text{for }}j = { 1}, \ldots , \, N;\quad i \, \ne \, j $$

Equation 2 states that for each country the emigration total estimated on the basis of the adjusted matrix of flows reported by receiving countries equals the emigration total estimated on the basis of the adjusted matrix of flows reported by sending countries. Equation 3 does the same for immigration totals.

Equations 2 and 3 can be written as a homogeneous system of 2N linear equations with 2N unknowns, i.e.,

$$ \begin{array}{l} \hat{\alpha }_{2} I_{12} + \hat{\alpha }_{3} I_{13} + \cdots + \hat{\alpha }_{N} I_{1N} - \hat{\beta }_{1} \sum\limits_{j} {E_{1j} } = 0 \\ \vdots \\ \hat{\alpha }_{1} I_{N1} + \hat{\alpha }_{2} I_{N2} + \cdots + \hat{\alpha }_{N - 1} I_{NN - 1} - \hat{\beta }_{N} \sum\limits_{j} {E_{Nj} } = 0 \\ \hat{\alpha }_{1} \sum\limits_{i} {I_{i1} } - \hat{\beta }_{2} E_{21} - \hat{\beta }_{3} E_{31} - \cdots - \hat{\beta }_{N} E_{N1} = 0 \\ \vdots \\ \hat{\alpha }_{N} \sum\limits_{i} {I_{iN} } - \hat{\beta }_{1} E_{N1} - \hat{\beta }_{2} E_{N2} - \cdots - \hat{\beta }_{N - 1} E_{NN - 1} = 0 \end{array} $$

This system has an infinite number of solutions for α j and β i . For each set of values of α j and β i that solve this system, \( k\hat{\alpha }_{j} \) and \( k\hat{\beta }_{i} \) are solutions as well. In order to find a unique solution one restriction needs to be imposed. In accordance with Poulain and Dal (2008), we assume that the adjustment factor for Swedish immigration is equal to one, since Sweden uses a definition of migration that is consistent with the new EU regulation and the quality of Swedish immigration data is considered to be adequate. This also means that the resulting estimates are harmonised in line with the new European regulation.

The basic assumption underlying our estimation procedure (as described above) is that the distributions of reported immigration by country of origin and reported emigration by country of destination correspond to the distribution of actual migration flows under the harmonised definition. This implies that the reported emigration of country A is x% higher or lower than the actual number (based on the standard definition) for all countries of destination. The same assumption applies to receiving country numbers. However, as we find in the next section, the estimated receiving country flows by country of origin and the estimated sending country flows by country of destination are not always consistent with each other. In a number of cases, specific origin–destination flows have to be considered separately. For that reason, we introduce additional constraints, corresponding to particular origin–destination flows that differ from the remaining flows.

Let us assume that the estimated receiving country migration flow from country p to q, \( \hat{\alpha }_{q} I_{pq} \), differs substantially from the estimated sending country flow, \( \hat{\beta }_{p} E_{pq} \). To make them consistent, we can multiply \( \hat{\alpha }_{q} I_{pq} \) by \( \hat{\gamma }_{pq} \) or \( \hat{\beta }_{p} E_{pq} \) by \( \hat{\delta }_{pq} \) so that both estimates of migration are equal. The question whether we should adjust the estimate based on the reported receiving country or the estimate based on the reported sending country depends on our knowledge of the data.

Given the estimated values of \( \hat{\alpha }_{q} \)and \( \hat{\beta }_{p} \)we can calculate the value of \( \hat{\gamma }_{pq} \) easily from \( \hat{\gamma }_{pq} = \hat{\beta }_{p} E_{pq} /\hat{\alpha }_{q} I_{pq} \) or the value of \( \hat{\delta }_{pq} \) from \( \hat{\delta }_{pq} = \hat{\alpha }_{q} I_{pq} /\hat{\beta }_{p} E_{pq} \). However, introducing \( \hat{\gamma }_{pq} \) or \( \hat{\delta }_{pq} \) changes the estimates of \( \hat{\alpha }_{q} \) or \( \hat{\beta }_{p} \). This also means that the row and column totals of both estimated migration matrices no longer tally. Therefore, we adjust the system of linear Eqs. 2 and 3 by adding constraints on individual cells of the matrices. If we assume that the emigration number reported by country p needs to be adjusted, Eqs. 2 and 3 can be rewritten as

$$ \sum\limits_{j} {\hat{\alpha }_{j} I_{ij} =\,} \hat{\beta }_{i} \sum\limits_{j} {E_{ij} (1 + \hat{\delta }_{pq}^{*} D_{ij} )}\quad {\text{for }}i = { 1}, \ldots , \, N;\quad i \, \ne \, j $$
$$ \hat{\alpha }_{j} \sum\limits_{i} {I_{ij} = } \sum\limits_{i} {\hat{\beta }_{i} E_{ij} (1 + \hat{\delta }_{pq}^{*} D_{ij} )}_{{}} \quad{\text{for }}j = { 1}, \ldots , \, N;\quad i \, \ne \, j $$

where D ij  = 1 if i = p and j = q, D ij  = 0 otherwise, and \( \hat{\delta }_{pq}^{*} = \hat{\delta }_{pq} - 1 \).

The equations including I pq and E pq in the system of Eq. 4 can be rewritten as follows:

$$ \hat{\alpha }_{1} I_{p1} + \cdots + \hat{\alpha }_{q} I_{pq} + \cdots + \hat{\alpha }_{N} I_{pN} - \hat{\beta }_{p} E_{p1} - \cdots - \hat{\delta }_{pq} \hat{\beta }_{p} E_{pq} - \cdots - \hat{\beta }_{p} E_{pN} = 0 $$
$$ \hat{\alpha }_{q} \sum\limits_{i} {I_{iq} } - \hat{\beta }_{1} E_{1q} - \cdots - \hat{\delta }_{pq} \hat{\beta }_{p} E_{pq} - \cdots - \hat{\beta }_{N} E_{Nq} = 0 $$

In contrast with Eq. 4, these are non-linear equations, because they include the term \( \hat{\delta }_{pq} \hat{\beta }_{p} E_{pq} \). The values of the coefficients can be estimated by an iterative procedure. The model can be extended in a straightforward way to include additional constraints. However, for any particular country, the number of constraints should not be too high, as this reduces the available information to estimate α and β.

4 Data

The sending and receiving country migration data have been provided by the national statistical institutes of the EU Member States in response to annual rounds of data collection conducted jointly by five international organisations and coordinated by Eurostat (Kupiszewska and Nowok 2008). As concerns Europe, Eurostat processes and disseminates data received from 37 countries on their website (epp.eurostat.ec.europa.eu). Data sources used by EU member states to produce migration statistics are very diverse (Kupiszewska and Nowok 2008; Nowok et al. 2006). The major types of sources are population registration systems, statistical forms, other administrative registers related to foreigners (such as alien registers, residence permit registers and registers of asylum seekers), sample surveys and censuses. Thirteen EU countries use a population register as the source of migration statistics. Alien registers and residence permit registers are used in seven countries, sometimes in addition to population registers. These registers only provide information on the migration of non-nationals. Cyprus and the UK rely on passenger surveys conducted at the borders, while Portugal and Ireland rely on household surveys. Greece, France and Portugal do not have any data on migration by nationals. Some countries derive their emigration statistics from data on residence permits by assuming a migrant has left the country when a residence permit has expired. Moreover, they often assume that the country of next residence is the country of their citizenship. The result, we believe, is an overestimation of actual emigration to those particular countries. Finally, several countries include in their so-called ‘administrative corrections’ emigration that has not been declared, which cannot be disaggregated by country of next residence.

Data on immigration by country of previous residence or emigration by country of next residence are not always available or complete (Nowok et al. 2006). Thus, the sending country and receiving country matrices, when combined into a double-entry matrix may be incomplete. For some countries, a large share of emigrants have an unknown country of destination: around 75% in Slovenia, 40% in Luxembourg, 35% in Austria, 31% in the Netherlands and 39% in Spain, for example. Fortunately, the estimation of adjustment factors takes this into account.

In the next section, we present our harmonised estimates of migration between 19 European countries that provide data on both immigration by country of origin and emigration by country of destination for the calendar years 2002–2007. The reported data contains both nationals and non-nationals. Table 1 provides a list of the countries. Although there are some data for Ireland, Portugal and Romania, these have not been used because they cover only a part of the migration flows (e.g. only foreigners or nationals). For Iceland, Italy and Luxembourg, data for 1 or more years in the period 2002–2007 are missing. For these countries, the adjustment factors are estimated for averages over the available years.

5 Results

The results presented in this section are obtained by applying the estimation method described in Sect. 3. Table 2 shows the average values of migration between 19 European countries reported by receiving countries for the years 2002–2007 and Table 3 shows the corresponding numbers reported by the sending countries. The countries listed in the row headings refer to origins and those listed in the column headings refer to destinations. A comparison of Tables 2 and 3 reveals large differences between numbers reported by sending and receiving countries. According to the numbers reported by receiving countries, 671,315 migrants per year moved between these 19 countries, whereas the numbers reported by sending countries total 499,105. For 11 countries, the reported receiving country immigration totals are higher than the corresponding sending country totals. For example, Germany reported that 256,221 immigrants arrived from the 18 countries in this study, whereas these countries reported that only 66,905 emigrants moved to Germany. Poland reported that 22,306 persons emigrated to the other 18 countries which, for their part, reported receiving 217,977 immigrants from Poland, suggesting that Polish emigration data are around 10 times too low. For 15 of the 19 countries, the emigration total reported by the sending country is lower than the corresponding totals reported by receiving countries. Keep in mind that receiving country data should not always be considered better than sending country data. Consider, for example, the flows from Poland to Germany in Tables 2 and 3. Here, Germany received an average of 136,927 migrants from Poland, whereas Poland reported that they only sent an average of 14,417. This difference could be explained by the duration criteria used by these countries, with Germany having a very loose definition (instant) and Poland having a very restrictive definition (permanent). So, in comparison with the harmonised definition of a 1 year period, Germany’s reported number is considered too high and Poland’s too low.

The estimated adjustment factors are set out in Table 4. We indicated above that in order to estimate the adjustment factors a restriction was introduced, i.e., the adjustment factor for Swedish immigration is set equal to one. For 16 of the 19 countries, the E ij adjustment factor exceeds one, indicating that sending country numbers tend to be underestimated. However, Table 4 also shows that I ij numbers seem to be underestimated in the majority of countries as well. This may seem contradictory since for 11 of the 19 countries the reported immigration totals exceed the corresponding emigration numbers reported by the sending countries. This is because the reported receiving country numbers should be compared with the adjusted sending country numbers rather than the reported numbers. For example, the immigration total reported by the UK (107,897) exceeds the reported emigration from sending countries to the UK (52,567). The reported emigration to the UK includes 5,219 emigrants from Poland to the UK. However, since the reported emigration from Poland is too low (the adjustment factor equals 18.31, see Table 4) the reported emigration from Poland to the UK is adjusted from 5,219 to 55,506. Moreover, the adjustment factor for Spanish emigration data equals 4.32, so the reported emigration from Spain to UK is adjusted from 3,430 to 16,792. For several other countries, emigration to the UK is adjusted upwards as well. As a consequence, the adjusted emigration numbers to the UK exceed the total of immigration reported by the UK and thus the reported immigration is adjusted upwards as well. Note that the adjustment factors for immigration for most countries are closer to one than the adjustment factors for emigration, which indicates that the reported immigration numbers are more accurate than the emigration numbers.

Table 4 Estimates of adjustment factors for immigration and emigration, 2002–2007

Multiplying the reported numbers in Table 2 by the adjustment factors for receiving country data and the reported numbers in Table 3 by the adjustment factors for sending country data results in two tables for which the row and column totals are equal (not presented here for space reasons). The differences between the cells in these two matrices are considerably smaller than those in Tables 2 and 3. In fact, the root mean squared error (RMSE) is reduced from 8,966 to 2,131. In other words, the differences between the two reported migration flow tables are reduced by 77%. However, we still found some substantial differences in the two estimated migration flow tables. For example, the migration from Poland to Germany estimated on the basis of German immigration data equals 141,035, whereas the estimate based on Polish emigration data is equal to 153,399. These differences reflect the fact that the distribution of reported Polish emigration by country of destination is not consistent with the share of immigration from Poland in the total reported immigration numbers of other countries. As a result, the estimate of the migration flow from Poland to Germany based on Polish data exceeds that based on German data, whereas for most other countries, the adjusted Polish emigration numbers are lower than the corresponding adjusted immigration numbers. This means that one substantial inconsistency in the estimates is likely to influence the estimates of other migration flows. To prevent such inconsistencies from affecting the overall estimates, we have added constraints to individual cells (flows) in the model.

The introduction of constraints to individual cells in the matrix allows us to consider special cases, such as the Poland to Germany flow described above. In total, we found six migration flows where the estimates differed by more than 10,000. Specifically, these flows were Poland to Germany, Poland to UK, Germany to Poland, Germany to UK, Czech Republic to Slovakia and UK to Poland. After identifying the flows with large differences, we then had to decide whether the constraint should be applied to the numbers of the receiving country or of the sending country. Since we believe that reported emigration numbers are generally considered to be less reliable than reported immigration numbers, we apply the constraints to the sending country data, except for the Germany to Poland and UK to Poland flows (i.e., Poland’s immigration data is considered to be of lower quality that the corresponding emigration data reported by both Germany and the UK).

The adjustment factors taking into account the six constraints on individual flows are set out in Table 5. The coefficients (Lagrange multipliers) for the Poland to Germany and Poland to UK flows are both equal to 0.42. This raises the adjustment factor for emigration from Poland from 10.64 (Table 4) to 18.31 (Table 5), while at the same time, the adjustment factor for Polish emigration to Germany and the UK falls to 7.69 (i.e., 18.31 × 0.42). For Polish immigration, the adjustment factor becomes smaller. The high adjustment factor for Polish receiving data was mainly a consequence of the big difference between the two figures for migration from Germany to Poland. Including a constraint for this flow raises the adjustment factor for Poland’s reported flow from Germany by a factor of 1.74 (i.e., the adjustment factor of 14.25 is multiplied by 1.74 to get 24.80). In contrast, the adjustment factor for Poland’s reported flow from the UK falls to 10.40 (i.e., 14.25 × 0.37). For the Czech Republic, the reported emigration numbers are considerably lower than the corresponding reported immigration numbers with one big exception: the number of emigrants reported to Slovakia is relatively large. Clearly, the emigration flows from the Czech Republic to all other countries need to be adjusted by a different factor than the emigration flow to Slovakia.

Table 5 Estimates of adjustment factors for immigration and emigration, 2002–2007, including six additional constraints on individual flows

The adjustment factors in Table 5 illustrate how substantial improvements in the estimated adjustment factors can be made by introducing constraints on specific ‘problem’ flows in the matrix. For example, the inclusion of a constraint for the migration flow from the Czech Republic to Slovakia lowered the adjustment factor for Slovakia’s receiving migration data from 18.90 to 8.34. Another example is German’s receiving data. Here, the adjustment factor is reduced from 1.03 to 0.81. This is mainly explained by the reduction of the estimate of Polish emigration to Germany. Since Germany has a wide definition of migration, one would expect the adjustment factor to be below one. Thus, the adjustment factors in Table 5 appear more plausible than those set out in Table 4.

The harmonised migration tables that used the additional constraints are set out in Tables 6 and 7. The introduction of these constraints led to a further strong reduction in the differences between both tables, as indicated by the RMSE, which fell from 2131 to 952 or by a further 54%. To obtain a final single set of harmonised flows, we believe it is better to rely on Table 6 than on Table 7. This table gives more weight to the receiving country data, which we consider more reliable. Poulain, on the other hand, advocated taking the average of the two estimated matrices. This approach implies that the origin–destination patterns in the reported sending country data are as reliable as those in the reported receiving country data.

Table 6 Estimated migration by country of origin and destination, including constraints on six individual flows, 2002/2007, based on numbers reported by receiving countries
Table 7 Estimated migration by country of origin and destination, including constraints on six individual flows, 2002/2007, based on numbers reported by sending countries

The average adjustment factors estimated for the period 2002–2007 (Table 5) can be applied to the annual reported migration data to create a time series of harmonised flows. In Fig. 1, the estimated total immigration and emigration flows for Germany from and to the other 18 countries in this study are compared. As expected, the estimated numbers are lower than the reported numbers because the definition for Germany is much wider than the harmonised definition. The figure also shows that estimated emigration increases more gradually over time than the reported numbers. In Fig. 2, the immigration and emigration flows for the UK are presented. Here, the average levels of the reported and estimated numbers do not differ much, but the estimated flows show a more gradual pattern over time than the reported flows. One reason for the sharp fluctuations in the reported numbers is that they are based on sample surveys.

Fig. 1
figure 1

Reported and estimated immigration from and emigration to 18 European countries, Germany, 2002–2007

Fig. 2
figure 2

Reported and estimated immigration from and emigration to 18 European countries, United Kingdom, 2002–2007

6 Discussion

The aim of this paper has been to obtain a reasonable and consistent set of international migration statistics. For this purpose we have developed a model using statistical information from different countries. The method is based on an idea originally proposed by Poulain (1993, 1995). Our method differs from his in three important ways. First, we have estimated a set of adjustment factors for receiving and sending country data in a way that ensures consistency in the two sets of marginal totals. Second, we have introduced additional constraints on special origin–destination cases where the average adjustment factors do not apply. This allows us to include countries with less reliable data in our analysis. Third, instead of calculating the arithmetic average of the two estimated matrices, we believe it is better to use the matrix giving more weight to the reported immigration numbers (i.e. Table 6). In this way we take advantage of the fact that the information on countries of origin in receiving country data tend to be more reliable than the country of destination information in sending country data. Finally, our estimates are consistent with the harmonised migration definition based on an (intended) minimum duration of stay of 12 months.

Due to differences in definition, coverage and registration, the origin–destination matrix of migration flows between European countries based on receiving country data tends to differ from the matrix based on sending country data. Germany has a wide definition of migration, as it does not include a time constraint and thus the reported number may well include short term migrants. In contrast, Poland has a very narrow definition of migration and, as a consequence, the reported numbers are very low. By comparing corresponding reported immigration and emigration flows for 19 European countries, we have assessed to what extent German migration statistics are higher than they would be under a harmonised definition and to what extent Polish migration statistics are lower.

However, the large differences between European countries cannot be explained by differences in definitions alone. First, these differences cannot explain why emigration flows are more likely to be underestimated than immigration flows. Second, whereas 11 countries employ a duration limit that is shorter than that of the harmonised definition (Kupiszewska and Wisniowski 2009), only five of these countries have an adjustment factor of immigration below one. The other six countries with durations of 6 months or shorter have adjustment factors for immigration greater than one. These include Austria, Czech Republic, Italy, Luxembourg, the Netherlands and Slovenia. Thus, to an important extent, the differences must also be caused by problems of coverage. This is confirmed by a study comparing migration statistics between Sweden, Denmark and Belgium which suggests that less than 25% of differences are due to differences in the duration criterion (Nowok et al. 2006). The effects of differences in definition and coverage may offset each other to some extent. One would expect the under-registration of short term migrants to exceed that of long-term migrants. A wide definition of migration (i.e. a short duration of stay) would lead to a higher reported number of migrants than would be expected on the basis of the harmonised definition. Under-registration, however, would lead to a smaller number. This may explain why the adjustment factors for Germany are not as low as one might expect from applying the very wide definition.

The main reason for the relatively low numbers reported by sending countries is that emigrants do not have strong incentives to report leaving a country. In particular, this applies to EU citizens who can live in another EU country without asking for a residence permit. One solution might be to introduce a removal card system (Nowok et al. 2006). Here, any person leaving country A would be required to fill in a form to be given to the authorities in country B at arrival. After country B has determined whether or not the person is an international migrant under a harmonised definition, it would then inform country A of the arrival. The Nordic countries have such a system and their immigration and emigration statistics are mutually consistent (Herm 2006a). However, policy makers tend to be more interested in migrants from outside Europe and asylum seekers than intra-European migrants, and therefore such a system is not likely to have a high priority in the future. As long as such a system is lacking, cross-country comparability of migration statistics can only be achieved by comparing statistics from different countries. To the extent that the differences between countries are caused by differences in definitions and coverage, the differences may be expected to remain systematic over time. The method developed in this paper aims to assess the size of these systematic differences. Table 5 shows that for 10 out of the 19 countries in this study, the adjustment factor for sending country data exceeds two, meaning that reported emigration numbers are underestimated by more than 50% in relation to the 1-year duration definition. As a consequence, reported net migration totals may be overstated.

In addition to ‘correcting’ the reported receiving and sending country migration data for differences in definition and coverage, our method contributes to producing estimates that tend to fluctuate less strongly over time. One clear example concerns the UK. Since the UK uses a general purpose passenger survey, the reported flows fluctuate considerably over time. Moreover, flows to some (smaller) countries may not be observed in some years. We believe our method produces more stable estimates of migration flows for the UK (and other countries relying on sample data). Interestingly, the estimated adjustment factors for the UK are close to one. This implies that the sample survey used for estimating migration to and from the UK provides a reasonably reliable estimate of total migration flows on average, but that the annual estimates are affected by sizeable random fluctuations.

The adjustment factors shown in Table 5 can be used to adjust migration numbers to and from countries not included in the matrix, so that total immigration and emigration numbers and total net migration can be estimated for the 19 countries in this study. Before doing so, one-first has to make sure that the share of unknowns in the migration statistics can be distributed evenly across all origins or destinations. If so, the adjustment factors will take this into account. Thus, for estimating total immigration and emigration numbers, the adjustment factors should be applied to total migration numbers excluding unknowns.

The matrix may be extended to include flows with missing data. Raymer (2008) developed a two-step estimation method for countries with missing data (see also De Beer et al. 2009; Raymer and Abel 2008). The first step estimates missing immigration and emigration totals based on harmonised migration flows and covariate information. The second step uses the origin–destination interaction patterns of the harmonised migration flows and covariate information to estimate the missing interaction patterns. This estimation step takes into account the fact that migration is relatively high, for example, between neighbouring countries and countries belonging to a similar language group.

Finally, work is currently being carried out to integrate harmonisation and estimation of missing data into a single (Bayesian) model that also includes measures of uncertainty and expert judgements. The Integrated Modelling of European Migration (IMEM) project recently funded by New Opportunities for Research Funding Agency Co-operation in Europe (NORFACE) is expected to develop such a model (see http://www.norface.org/migration12.html) over the next couple of years. We hope this study will provide an important foundation for work such as this, and other projects aiming to improve our knowledge and understanding of the complexity of international migration.