An Integrated Approach to Surveying Emigrants Worldwide

This chapter describes the research design applied in the research project The Emigrant Communities of Latvia: National Identity, Transnational Relations and Diaspora Politics, which forms the empirical core of this volume. It discusses this methodology in the context of other migration studies and major surveys on migration. Compared to previous studies The Emigrant Communities of Latvia is the most inclusive in terms of the target audience. All Latvians and Latvian nationals abroad were invited to participate in the survey, applying a broad and open definition of ‘Latvian diaspora’ based on personal identification with the Latvian nation and/or citizenship. Being Web-based, the survey did not impose any limitations as to geographic location, aiming at all countries in the world. Combining a wide range of respondent recruitment channels and techniques and supported by a media campaign, the survey reached 14,068 respondents in 118 countries. Innovative solutions were used to increase response rates and to decrease attrition. Several research topics in this study required separate qualitative research approaches. Thus, 159 partly-structured in-depth interviews were also conducted in countries where the Latvian diaspora is largest, as well as in-depth interviews with return migrants and diaspora policy experts. The new methodology has far-reaching potential to be applied to the study of other migrant groups in Europe and beyond. Importantly, The Emigrant Communities of Latvia project has tested and empirically proven the potential of Web surveys in collecting the opinions of large populations of migrants in many countries.

LFS has significant methodological drawbacks and limitations linked to the fact that it is not aimed specifically at migrants (European Commission 2008;Marti and Rodenas 2007). For example, it does not include information on the aim of immigration, language skills or the migrants' situation before migrating. Another limitation is that the LFS is mainly focused on labour market outcomes and provides little insight into other aspects that have recently become a matter of increasing concern, mainly, those linked to socio-cultural integration (Bijl and Verweij 2012;Bilgili et al. 2015;Ersanilli and Koopmans 2011). Another large scale pan-European survey, the European Union Statistics on Income and Living Conditions (EU-SILC) is also hampered by the problem of under-representation and a small number of immigrants (Eurostat 2011). As an alternative, some researchers (Aleksynska 2011;Connor and Koenig 2013;Dronkers and Vink 2012;Wright and Bloemraad 2012) pool data from the small sub-samples of migrants in several waves of the major cross-sectional surveys (usually, the European Social Survey). However, this approach is problematic due to differences in measurement time, definitions and questions, the lack of migration-relevant control variables and most importantly, problems with matching 'pooled-over-time' data (Bilgili et al. 2015;Ersanilli and Koopmans 2013).
A small but growing number of studies employ a double comparative design which looks at more than one immigrant group and more than one destination country (Aleksynska 2011;Fleischmann and Dronkers 2007;Vink et al. 2013;Voicu and Comsa 2014), considering that the situation of immigrants may be affected by the country from which they come (the 'origin effect'); the country to which they migrate (the 'destination effect') and the specific relations between origins and destinations (the 'community effect'). Among the most prominent of such studies are: LIMITS -The Immigrants and Ethnic Minorities in European Cities: Life courses and Quality of Life in a World of Limitations study (2004); SCIICS -Six Country Immigrant Integration Comparative Survey (2008) (Crul et al. 2012;Ersanilli and Koopmans 2013); TIES -The Integration of the European Second Generation survey (2007) (Reichel 2010;Westin 2015); MAFE -The Migration between Africa and Europe project (between 2008 and 2010) (Crul et al. 2012;Schoumaker and Beauchemin 2015); SCIP -The Causes and Consequences of Early Socio-Cultural Integration Processes among New Immigrants in Europe panel study (2013) (Platt et al. 2015). Unfortunately, due to financial and methodological limitations, these and most other existing comparative surveys (e.g., Eurostat/NIDI 2000;Koopmans 2010;Phinney et al. 2006;YMOBILITY), including those conducted with migrants from ECE (Ambrosini et al. 2012;CRONEM 2006;Kogan 2003) cover just a handful of destinations, yet strictly speaking they cannot mathematically disentangle the effect of various contextual factors that vary across countries (Bloemraad and Wright 2014;Koopmans 2013). The only solution that would allow the direct measurement of the effect of various contextual features, while also controlling for other micro and macro-level confounders, is multilevel regression analysis that includes a 2014 a special model on migration The Labour Market Situation of Migrants and their Immediate Descendants was again conducted as part of the LFS, yet the questions are retrospective and the scope of questions are very limited, related mainly to the labour market. significant number of destination countries (Arzheimer 2009;Bilgili et al. 2015;van Tubergen et al. 2004).
In order to obtain reliable results on migrants, sample size and sample design are of crucial importance. Due to the lack of reliable sampling frames from which to sample migrants in the majority of EU countries, previous quantitative studies of emigrants in Europe have relied on methods such as simple snowball sampling, respondent-driven sampling (for example SCIP), Time-Location Sampling or quota sampling based on census data and recruiting respondents at places they usually attend. Due to the high costs of fieldwork involving face-to-face interviews with small minority groups, these methods are usually applied in a narrow geographic space (a selected number of cities or neighbourhoods) and as such are not suited for analysing the effect of, for example, policies or other macro-level factors measured at the national level. Overall, tracing the 'liquid' East-West migrants at a particular place of residence might not be the most appropriate strategy (Eade and Garapich 2009).
Some researchers have used telephone surveys and name sampling from published phone books, registers and/or directories. In a few countries (e.g., the Netherlands) researchers have been able to randomly select respondents from official databases. Unfortunately, such sampling frames are only available to researchers in a few countries and cannot ensure a broad representation of countries. A very promising approach was undertaken by the SEEMIG project LFS Pilot survey 'Migrations' in 2013 which tried to build the sample of emigrants from Hungary and Serbia based on referrals and contact information on relatives abroad provided by the LFS respondents. Unfortunately, this approach did not provide the expected results (Fassmann and Musil 2013). Instead, it demonstrated that it is not realistic to build a large representative sample of emigrants through a big, highly formalised national survey. One can conclude that none of these approaches is able to achieve a significant sample size in many countries without incurring huge costs that would render the study unfeasible.
The solution applied in The Emigrant Communities of Latvia project includes several novel elements and tackles many of the problems of the previous studies. It draws on the fact that the Internet and social media have become an inseparable part of many migrants' lives. With the prevalence of Internet use, online surveys are becoming increasingly more popular and commonplace. The biggest advantage of web surveys is the possibility of achieving a large sample in a substantial number of countries. However, there are other advantages to using a web survey that are expected to facilitate the willingness of respondents to cooperate and answer the questions truthfully. These are: (i) The possibility of anonymity, which should ensure a better representation of irregular migrants than in previous studies; (ii) The ability for respondents to fill in the questionnaire at any time, and even to stop and continue later; (iii) The possibility of using simple and anonymous referrals, ie; to 'share' the survey via Facebook, Twitter, etc. Methodological studies have shown that the way web surveys are conducted is unlikely to lead to distortions in comparison with other survey modes (Grandcolas et al. 2003).
The greatest risks associated with web surveys are the potential bias caused by selfselection and the difficulties of reaching certain socio-demographic groups via the Internet (Askitas and Zimmermann 2015;Bethlehem 2010). However, Eurostat data on Internet use are encouraging as they show that in the EU 78% of people 16 years of age or older have used the Internet during the last 3 months (Eurostat 2014). In the 16-24 age group, 94% are regular Internet users and 89% participate in social networking. Considering that most emigrants are young people (Fuller and Ward 2011) and the Internet is important for migrants as a cheap means of communication with their friends and families at home, the percentage of Internet users among migrants -especially young migrants -can be predicted to be very high. Nevertheless, certain discrepancies and imbalances with regard to the representation of various socio-demographic groups among survey respondents might remain.

Geographic Coverage and the Target Group
The Emigrant Communities of Latvia survey had the widest possible geographic coverage. It did not impose any limitations as to the geographic location of respondents, aiming at all countries in the world. Any Latvian or Latvian national abroad could participate in the survey, regardless of his or her current country of residence. The majority of our respondents -reflecting the Latvian diaspora in general -come from the UK, Ireland, the US, Germany, Norway, Sweden, Denmark, the Netherlands, Belgium, Russia, Canada, Finland, France and Austria, and in total 118 countries are represented in the dataset. For comparison, we also show, in Table 2.1, the distribution of Latvian nationals in different countries around the world according to the official statistics. The Emigrant Communities of Latvia is the most inclusive migration study so far in terms of the target audience. All Latvians and Latvian nationals abroad were invited to participate in the survey, applying a broad and open definition of 'Latvian diaspora', based on identification with the Latvian nation and/or citizenship. Some respondents belonged to a minority ethnic group yet still felt 'Latvian' or 'Latvian nationals'. Others may have given up their Latvian citizenship, or never had it in the first place, yet it did not preclude them from feeling like part of the Latvian diaspora. Nine hundred three respondents (6.4% of the total) belong to the 'old diaspora', 4 i.e., those who left Latvia before 1991, whereas the majority are members of the 'new diaspora' (Fig. 2.1).
In general surveys (e.g., the EU Labour Force Survey or EU SILC) people who are unable to communicate in the survey language are sometimes not interviewed, which excludes a significant proportion of migrants. This is not the case for our In this survey we also consider the liquid nature and diverse patterns of migration. An increasing number of emigrants do not settle permanently in just one country, but alternate between countries or have a home in both. According to our survey, the proportion of such people among emigrants is 17% ( Fig. 2.2). They were also included in the survey.
The lower age limit of the survey is set at 15 years old as for younger children parental consent would be required in Latvia. A few respondents who were under 15 were excluded from the dataset.
Sometimes a bias in the sample might occur due to people with plenty of free time being more likely to complete the survey than, for example, those who are very busy and/or at work. This survey applied an innovative approach, offering respondents an opportunity to fill in a shorter version of the questionnaire (20 min) or the full version of the questionnaire (30 min). Those who chose the shorter version were presented with one of two rotating modules, while the core questions of the questionnaire were maintained for all respondents. This methodological innovation allowed the inclusion of more questions in the survey and helped reduce the loss of respondents due to attrition. Of our respondents, 66% chose to fill in the full version. After the survey period the average length of the interview was calculated at 35 min, showing high levels of motivation among respondents to voice their opinion. Our survey design also made it possible to take a break from filling in the questionnaire and return to it later.
Researchers prepared a list of dissemination channels where information about the survey could be sent. It included 187 different diaspora organisations, diaspora associations (choirs, dance collectives, etc.), Latvian cultural centres, parishes and other organisations popular among the Latvian diaspora. In most cases, they were contacted electronically but sometimes the information pamphlets and posters were delivered physically, to be distributed among members of these organisations. Information pamphlets and posters were also distributed with the help of the Ministry of Foreign Affairs to almost all Latvian embassies in Europe, and placed there for visitors to see (Fig. 2.3). This was an efficient way of disseminating information, as parliamentary elections took place during the fieldwork. This meant that many of our target group visited the embassy to vote at the polling station.
In addition, online groups of Latvian diaspora members were researched, and information about the survey distributed to them too. Information about the survey was distributed to 37 representatives of diaspora newspapers. Many re-published the press releases and placed the information banners on their website, asking readers to participate in the survey. With the help of the state language agency, the information was sent out to the Latvian school network abroad, which includes more than 100 weekend schools. In order to inform more people about the project, distribute information about how to take part in the survey and raise motivation to participate, researchers engaged in regular interviews with various media, including releasing some initial results. Interviews were given both to Latvian and Russian media. Three press releases were prepared and distributed, informing potential respondents about the survey. Researchers also took part in several conferences presenting interim as well as final results. The link to the questionnaire together with an invitation to participate in the survey was placed on the project website www.migracija.lv, in Latvian, Russian and English. People filling in the questionnaires could also Tweet information about the project from the website, or share it on Facebook, Google+, etc. with their friends and acquaintances, which many did.
Many respondents were recruited via the social media site draugiem.lv which is one of the most popular social networking sites in Latvia. Considering that some emigrants might prefer other social networking sites, respondents were also recruited by placing information about the survey on facebook.com, vkontakte.com, odnoklasniki.ru, and latviesi.com.

Fig. 2.3 Information materials used to recruit respondents
Another important, l0 channel for recruiting respondents was through news sites online. The three largest news portals in Latvia: Delfi, TvNet (and Apollo), and Inbox displayed information about the project on their websites in Latvian and Russian for almost the entire period of fieldwork.
Information banners were also placed on other websites frequented by Latvians abroad: the Ministry of Foreign Affairs of the Republic of Latvia, the State Employment Agency, the Latvian Association of Local and Regional Governments and several municipality websites.
In order to reach emigrants who are comparatively inactive, i.e., they do not read news portals, use social networking sites or attend any institutions or organisations, information about the survey was also distributed using Google AdWords. Invitations to take part in the survey were shown to people who used Google search engines from outside Latvia and searched (in Latvian or Russian) for keywords such as Latvian embassy, Latvia, news in Latvia, work in the UK, Latvians in Ireland, Latvijas Radio 2, etc.
The statistical overview in Table 2.2 shows that 23.6% of respondents whose path to the questionnaire could be identified clicked on the link on the project website www.migracija.lv. These are people who heard or read about the project in the media, saw the information posters in embassies or organisations or were told about the survey by their friends or relatives, etc. Another 14.7% used the direct link to the questionnaire. It is most likely they found the link in one of the media publications or were sent the link by their friends. Approximately 10% of those whose path to the questionnaire could be identified were informed about, and attracted to the survey, Among the Russian language recruiting channels, the most important were the news portal Delfi RUS, followed by Odnoklassniki and Vkontakte. These figures do not give a very precise account of how many respondents each of these portals/ sources attracted, as it is possible that the information was seen and interest created by one information source but the respondent clicked on the questionnaire from some other place (eg., the project website).
The fieldwork took place between 4th August and 31st October 2014. To increase response rates, the deadline for filling in the questionnaire was extended twice.

Cleaning the Dataset and Final Sample Size
The dataset was rigorously cleaned before analysis commenced. The initial dataset contained 15,760 entries.
• First, we excluded from the dataset 1235 questionnaires where the respondent had answered only the first few questions. We assumed that most of them are people simply checking what the survey was about, so the answers would not be reliable. • 408 entries were identified as duplicates and deleted; • five entries were excluded due to them not meeting the age requirements (<15 years of age); • 43 questionnaires were excluded on the basis of low reliability. The logical checks developed to test the logical consistency of answers showed them as 'not reliable'.
The total number of interviews in the final dataset was 14,068. Of these, 9284 respondents (66% of the total number) filled in the questionnaire to the end and 4784 partially completed it. 5 This substantial number of respondents makes it the largest survey of emigrants from one country to others ever conducted in Europe. Based on estimates of the size of the Latvian diaspora, more than 5% of Latvian diaspora members abroad participated in the survey.

Correcting the Biases by Using Survey Weights
The various groups in the diaspora population differ both in the intensity of their internet use and in their willingness to volunteer as survey participants. Selfselection associated with web surveys (Bethlehem 2010;Grandcolas et al. 2003) is known to lead to under-representation among certain socio-demographic groups (McCollum and Apsite-Berina 2015). In The Emigrant Communities of Latvia survey, men were under-represented relative to women (inclusion probability was 1.8 times lower for men than for women); older respondents were under-represented relative to younger respondents (inclusion probability of those 55 or older was 2.6 times lower than among those 15-24), and individuals with lower educational achievement were under-represented relative to those with higher educational achievement (inclusion probability was 4.5 times lower) (Goldmanis 2015). However, the largest discrepancies were observed with regard to the ethnic division: the inclusion probability of Russians was 6.6 times lower than that of Latvians (overall 21% of respondents spoke Russian at home before leaving the country). No imbalance was observed with regard to the type of settlement. 6 In the presence of unequal respondent inclusion probabilities, the sample was likely to yield biased (and inconsistent) estimates of population parameters. To correct for this, we applied survey weights that were inversely proportional to the estimated inclusion probabilities of respondents, conditional on a series of socio-demographic variables, including sex, age, level of education and occupation. It is well known that if these control variables captured most of the variation in inclusion probabilities, then the weighted data would yield (approximately) unbiased and consistent estimators (Horvitz and Thompson 1952). The conditional inclusion probabilities were estimated on the basis of official statistics on the distribution of immigrants from each country of origin in each country of destination, as provided by several sources: To approximate the joint distribution of various control variables, a raking (data balancing) algorithm was applied to produce a joint distribution that has marginal distributions corresponding to those given by the external data (as in Battaglia et al. 2004). 7 If the socio-demographic variables used for the computation of weights fully determined the inclusion probabilities, the weighted data would be fully representative of the underlying population (i.e., they would yield fully unbiased and consistent estimates of all population parameters of interest). However, we have to concede that in practice these inclusion probabilities will also be affected by a series of additional factors that we were unable to correct for with survey weights, either because these factors were truly unobservable or latent (such as a respon-dent's intrinsic propensity to volunteer to participate in surveys) or because we had no reliable data on the distribution of these factors in the population (as was the case with the distribution of Latvian immigrants by occupation in the aforementioned Latvian survey). Hence, some residual deviations from full representativeness will remain. However, these deviations are likely to be minor, of an order of magnitude similar to the deviations that non-response would cause in a simple random sample.
The latter point is worth reiterating. While an inherently self-selected sample such as occurs in a web survey might seem fundamentally different from a properly random sample (even with non-response), the stochastic processes determining the final sample in both cases are in fact almost identical, as long as there is a substantial non-response in the simple random sample and all individuals in the population have the positive probability of being included in the 'self-selected' web sample. Regardless of whether the respondents' choice is one of opting in (as in the web survey) or opting out (as in the simple random sample), this choice will nonetheless result in ultimate inclusion probabilities that depend on the characteristics of the individual respondents. Correcting for variation in these probabilities in the case of a web survey is exactly equivalent to using post-stratification weighting to correct for non-response in the case of a random sample. The differences between the two cases are only ones of degree, with variations in inclusion probabilities likely to be larger in the case of the self-selected sample. The bias can increase if the study relies on just one source of recruiting respondents. Hence, in order to improve the representativeness of the sample and to reach different respondents in terms of age, gender, occupation and other characteristics, it is important to employ a wide range of different recruitment channels to reach groups with differing characteristics and using a variety of communication platforms and to aim at achieving as large a sample as possible, as achieved by The Emigrant Communities of Latvia study .

Data Storage and Protection
The Emigrant Communities of Latvia project treats the confidentiality of data and protection of respondents' identities with the utmost care. The dataset is stored on a safe server at the Institute of Philosophy and Sociology, accessible only to a restricted group of researchers. In order to protect the identity of respondents the interviews were anonymised by deleting any information with the potential to identify the respondent (such as their e-mail address if the respondent wrote it in the questionnaire, IP address, token information, etc.) before being placed on the safe server. 8 In addition, all researchers signed a confidentiality declaration committing to non-disclosure of any information that could potentially identify respondents, and agreeing not to share the dataset outside the team of researchers for two years after the end of the project.
The personal data of respondents is not available and will not be made available to any other organisations or institutions [state or other] outside the University of Latvia and the team of project researchers. It is only analysed in an aggregated way, following the best scientific praxis.

Target Group and Recruitment of Respondents
As part of the project, 159 partly-structured in-depth interviews were conducted in countries where the Latvian diaspora is largest: the United Kingdom, Ireland, the United States, Germany, Sweden and Norway. In addition, in-depth interviews with return migrants (18) and diaspora policy experts (16) were conducted in Latvia. The target group of in-depth interviews were representatives of the 'new diaspora', i.e., those who left Latvia after 1991. In-depth interviews with representatives of the 'old diaspora' have been covered to a much larger extent in previous research by, for example, Baiba Bela, Ilze Garoza, Māra Zirnīte, Ieva Garda and others (Bela 2010;Zirnīte 2010;Zirnīte and Lielbārdis 2015).
Several researchers and experts were involved in the collection of data, and the methodology was strictly coordinated between them. Respondents were recruited using social networking sites (facebook.com, linkedin.com, maminklub.lv, draugiem.lv), organisations, institutions and in some cases snowballing and personal referrals. In cases where personal referrals were used, researchers avoided interviewing close friends and relatives. In instances where institutions, organisations and experts needed to be contacted, researchers agreed between themselves who the contact points would be in order to avoid inconsistencies in communication.
One of the priorities of the research team was to ensure the diversity of respondents in terms of: • Age • Gender • Social class/employment status • Time spent abroad • Family status (e.g. children/no children) This strategy ensured that the interviews provided insight into the motivation and attitudes of people with different life experiences and socio-economic backgrounds. Most researchers applied grounded theory (Strauss and Corbin 1990), aiming to achieve 'theoretical sampling' and 'data saturation' as precisely as possible when recruiting respondents.
No monetary compensation was offered to respondents but where possible researchers left behind information booklets about the project, as well as business cards with their contact information in case respondents had any questions. In some cases, token symbols of gratitude were left in the form of chocolates or sweets. Respondents were also informed about the quantitative survey and invited to participate in that too.

Interview Guidelines
To ensure that information on certain themes and issues can be compared across a number of countries, some topics were included in all of the in-depth interviews with emigrants. Most of these topics also mirror the topics of the quantitative survey. This ensures the successful integration of quantitative and qualitative data. Hence, in-depth interviews have the potential to provide a deeper understanding of the quantitative data. With some variations, the topics included in all in-depth interviews with emigrants were as follows: • Descriptions of the migration experience, motivation for emigration and, where applicable, return migration; • Articulation of identity, sense of belonging, historical memory, celebration of festivities; • Significance of family, children, parents, social networks and the maintenance of social contacts in emigration and after returning to Latvia; social networking online, use of social media; • Education in Latvia and abroad; • Employment, professional mobility and acquisition of information on employment opportunities; • Return migration plan: evaluation and impact on personal decisions on whether to return or not.
Interviews were conducted as partly structured in-depth interviews, following interview guidelines. The method also allowed for some flexibility with regard to getting more detailed information on some emerging topics important for a better understanding of the specific research question. The guidelines differed from one location and one researcher to the next, depending on the main topic of interest. Draft guidelines were developed on each of the aforementioned topics which the researchers built on in their interviews, in addition to the main prescribed topics of the interview. The full guidelines were checked and approved by the coordinators of the qualitative research group. The length of the interviews with adults ranged from 26 min to 2 h 16 min, with most interviews taking slightly more than 1 h. Interviews with children were shorter.

Data Storage and Protection
All in-depth interviews were transcribed and stored on a safe server at the Institute of Philosophy and Sociology, accessible only by the administrative assistant and a restricted group of researchers from the project. Researchers prepared a description of each interview (an interview protocol) including basic information on the interview and the respondent such as: • The language of the interview, length of interview, place of interview, interviewer; • Place of birth of the respondent, country of emigration, time spent abroad, age, education, gender, family status, children, employment status, citizenship, history of activism; • Main topics of the interview, including respondent's opinion or experience with regard to the topic.
The interview protocols are important for the in-depth understanding and interpretation of answers in the light of the respondent's socio-demographic characteristics, as well as the specific circumstances that the respondent is or was in. These protocols also make it easier to find necessary information in the interview material, for example, if the researcher wants to analyse what people of certain characteristics say about the topic in different countries, or how respondents of different characteristics feel.
Before being placed on the safe server the interviews were anonymised, in order to protect the identity of respondents. In addition, all researchers signed confidentiality declarations, committing to non-disclosure of the personal information of their respondents.
Agreement was reached with the Latvian National Oral History Centre about the possibility of archiving and depositing the interviews in the Centre's Archive (www. dzivesstasts.lv). This would allow the interview material to have more impact on the scientific community, and be preserved for many years as a testimony of our time. A consent form was prepared and presented to the respondents. 9 Respondents were asked if they would agree to their interview being deposited in the National Oral History Centre Archive (led by Dr. Māra Zirnīte), and if so in the specific form it could be accessed (including whether the respondent's name could be disclosed or not) and to whom (for instance, just the researcher, the project researchers, University of Latvia researchers or anyone). They were also asked to specify any other limitations on use of the interview. If the respondent did not agree that the interview could be included in the Archive, their wish was respected, and the interview was not deposited. This procedure also related to interviews where the consent forms were not offered and not collected. If the respondent allowed the interview to be deposited in the archive but did not permit disclosure of their name, the anonym-ity of the respondent was ensured as the consent form is not publicly available, and the entry was saved with a pseudonym and entry code.

Conclusions and Discussion
The Emigrant Communities of Latvia project has made an important theoretical and methodological contribution to the field of migration studies, and has laid foundations for future research on emigrants, specifically from the perspective of sending countries.
The main contribution of the project concerns the quantitative data collection. Compared to previous studies, it has a number of important methodological advantages: 1. By conducting a survey aimed specifically at emigrants we avoided the limitations typical of general surveys (ESS, ISSP, Eurobarometer), which are mainly that the sub-groups of immigrants are too small for meaningful analysis (Ersanilli and Koopmans 2013;Kraler and Reichel 2010); 2. By developing a new questionnaire instead of relying on existing sources of data we allowed the inclusion of all the necessary items and crucial social background variables that the available studies such as the EU LFS do not always cover (Ersanilli and Koopmans 2013;Kraler andReichel 2010, Reichel 2010;Westin 2015).
In surveys such as the LFS people who are unable to communicate in the official language or languages of the country are not interviewed, thus effectively excluding a significant proportion of migrants. This results in a bias against immigrants whose proficiency in the language of their country of residence is not good enough to answer survey questions (Chiswick et al. 2004;Dronkers and Vink 2012;Platt et al. 2015). This is not the case for this survey. The questionnaire was produced in three languages: in the official language of the country of origin, namely Latvian, as well as in English and Russian.
Immigrants with an unstable or irregular legal status in the country of residence might avoid participating in regular population surveys (Dronkers and Vink 2012). The anonymity provided by a web survey can encourage them to participate.
Harmonisation of translations, methods and weighting is often problematic in major cross-national surveys. In our case, the data collection and weighting was centrally coordinated, careful translation procedures were applied and the questionnaire was completed in the language the respondent understood best. The quality of questionnaires was further tested using cognitive interviews and web probing (Behr et al. 2012;Willis 2005).
While this study employed a sophisticated procedure to calculate statistical weights, reaching those who do not use the Internet is still a legitimate concern in these kinds of studies, especially those in marginal groups, such as the poor and uneducated, people on the street, Roma communities and those working in low-paid agricultural jobs deep in the countryside, and in countries where Internet penetration is lowest. The marginal groups likely to be under-represented or missing in a web survey (outlined above) might be especially important for certain kinds of analysis. To address this drawback of web surveys it would be best in the future to include a supplementary survey of non-Internet users, aiming at those who do not or practically do not use the Internet (e.g.; have not used it in the past 3 months).
Another challenge is that studies conducted at one point in time are unable to overcome the endogeneity problem and to rule out the possibility of reverse causality between integration policies and societal outcomes, as this relationship may be bi-directional or dynamic (Bilgili et al. 2015). Hence, it is important to have information on immigrants at various points in the settlement process (Platt et al. 2015). Monitoring the newcomers that arrived in the country at a certain point in time provides the best data for evaluating the integration process and allows the factors behind different life trajectories to be revealed (Bilgili et al. 2015;Kraler and Reichel 2010;Reichel 2010;Wingens et al. 2011). In contrast, a simple comparison of two moments in time, such as in cross-sectional studies, relates in part to different groups of individuals and does not make it possible to distinguish the time effect (an effect of the length of residence in the country) from the cohort effect (an effect of arriving in the country at a certain period of time). Despite the clear advantages of longitudinal data, in migration studies they are rare (Kraler and Reichel 2010). Sometimes researchers use a synthetic cohort design combining different surveys (Martinovic et al. 2009;Beauchemin et al. 2010) but it is not an ideal solution. Therefore, research should, whenever possible, aim at a longitudinal panel design. In The Emigrant Communities of Latvia survey, respondents were asked if they would take part in future studies on migration, and if so, to leave an e-mail address where they could be sent an invitation to participate. Fifty-four percent of all respondents (7649 respondents in total) left their e-mail address to be used in future studies on migration, and even more people agreed to be contacted again in a recent study of Polish migrants in the UK (Platt et al. 2015). In contrast to previous studies (e.g., Schneider and Holman 2011), it would be best for the subsequent waves of the study to include those who have already returned home or re-emigrated (using an adjusted return-migrant questionnaire, similar to Krings et al. 2013), thus avoiding the potential bias caused by the fact that those who are not successful (e.g., the unemployed) or, by contrast, those who have achieved their emigration goals, are likely to return to their home countries (Kleinepier et al. 2015;Stark 1991). In order to ensure the comparability of the first and subsequent waves of the study and to enable a comparison of various newcomer cohorts, the next waves should focus not just on those who expressed interest in participating in the first wave of the study, but essentially on replicating the research design of the first wave of the study -a similar strategy as used in the POLPAN longitudinal panel survey.
The use of qualitative methods in this study has also led to important insights, in particular with regard to situations when information is collected in different national contexts by researchers focusing on connected yet different themes. Coordination of interview guidelines and methods and careful planning is required to allow overarching comparisons between contexts. Depositing qualitative interviews in a data archive has not so far become a gold standard among researchers yet it would be invaluable for making possible future use by other scholars of the material collected and, if the respondent agrees, the general public. Consent forms should always be used and should specify beforehand the various permissions and limitations with regard to use of any particular interview. Interview protocols containing the main information on respondents are useful for quickly navigating through the information collected.
Overall, this new methodology of surveying migrants has far-reaching potential to be applied to the study of various migrant groups in Europe and beyond. Importantly, the study described has tested and empirically proven the potential of Web surveys in collecting the opinions of large populations of migrants, and has provided insight into calculating survey weights for multiple countries based on external data.
The importance of evidence-based policy-making is being acknowledged by increasing numbers of experts, and in this context studies like The Emigrant Communities of Latvia play a crucial role. The huge response from the partners of the project has been truly encouraging, proving that the Latvian diaspora has not lost touch with its homeland, and that there is great potential for future cooperation in the area of research and beyond.