Keywords

1 Introduction

Migrant surveys present a number of methodological challenges, including a possible lack of sampling frames, migrants’ status as a hard-to-reach and mobile population, and, in most cases, multilingualism. Depending on the national context, the extent of these difficulties may vary. As far as sampling frames are concerned, few countries maintain centralized population registers that provide accurate information on migrants. The situation in Denmark, Sweden, Italy, and Spain is better than in most countries, since their registers provide data on inhabitants’ characteristics, including various categories of non-citizens. Nevertheless, using these registers as sampling frames demands some caution (Careja & Bevelander, 2018; UNECE, 2019), since they tend, on the one hand, to underrepresent short-term migrants and migrants whose legal status is unclear, meaning they are not readily “observable” by the State. On the other hand, these registers tend to overrepresent foreign-born persons whose emigration from a receiving country is not always easy to track. Population registers in many other countries are even more problematic with respect to migrant sampling. They may, for example, provide a limited number of categories that can be used to identify migrants, or they enforce rules that do not enable or motivate migrants to register. In some cases, there is no national register but several registers specific to particular communities (Salentin & Schmeets, 2017; Sanguilinda et al., 2017), and in the absence of a population register, scholars can use censuses and other statistical data as sampling frames (Kühne & Kroh, 2017; Reichel & Morales, 2017). Even migration-focused statistics do not, in most cases, represent the entirety of the migrant population of a receiving country accurately, especially in cases where a considerable share of migrants are undocumented, as in the USA (Hoefer et al., 2012).

In situations in which population registers or other data sources do not contain categories that enable an explicit differentiation between migrants and non-migrants, onomastic sampling can be used. While this method is effective in some contexts (Prandner & Weichbold, 2019; Salentin, 2014), it does not work in others, for example, in countries with a large non-migrant multiethnic population. Russia is one such case, since it has large numbers of people without a migrant background who have names and surnames that nonetheless closely resemble those of migrants from Central Asia and the Caucasus, due to the shared nomenclature traditions of Islamic culture.

Other methods such as random route walking (Reichel & Morales, 2017), time-location sampling (Agadjanian & Zotova, 2012), the area cluster approach (Vigneswaran, 2009), and respondent-driven sampling (Zotova et al., 2016) have limited efficiency when migrants come from different countries and are spatially dispersed in the receiving state. Moreover, the conditions of work and accommodation for low-qualified migrants in certain contexts can imply limited access for researchers. Such contexts may include “closed” workplaces, e.g., construction sites, large bazaars, and the “back-offices” of the catering industry. Long working hours, workplace residency, or employer-organized dormitories may constitute further barriers, as may low levels of trust and a reluctance to open apartment doors to unfamiliar persons. Random digit dialing can be too laborious if the share of migrants in the population is low, necessitating a high number of calls and screening to recruit a sufficient number of survey participants.

With respect to these circumstances, a new source of hope for migration scholars has been the development of digital information and communication technologies (ICT) and the rapid increase of the Internet penetration rate (World Bank, 2019). The active utilization of ICT both in the “mainstream” survey industry (Toepoel, 2015) and among migrants (Bucholtz, 2018; Dekker et al., 2015) has paved the way for the use of ICT in migration studies. As of today, this is done in a variety of ways. Researchers conduct ethnographic studies of online migrant communities (Mateos & Durand, 2012) and use big data on social networking sites (SNS) and other online services to estimate the number of migrants from specific countries of origin in destination countries (Spyratos et al., 2018; Zagheni et al., 2017), or even to assess the degree of their integration (Dubois et al., 2018; Herdağdelen et al., 2016). Significantly, researchers also conduct surveys online. In this latter case, SNS assume a greater importance (Hu & Wang, 2015; Wei & Gao, 2017) as a venue for recruiting respondents through posts in specific interest groups (Moreh, 2019), snowball sampling (Herz, 2015) or using special targeting instruments provided by SNS for advertising purposes (du Plooy et al., 2018; Pötzschke & Braun, 2017).

The advantages of sampling migrants using SNS and surveying them online are clear: researchers can contact a population that is geographically dispersed within a short timeframe (McGhee et al., 2017; Sue & Ritter, 2012). Another advantage is that online, researchers can interact with a population that may otherwise be difficult to reach for various reasons. As mentioned previously, these difficulties may include the “closed” character of migrants’ workplaces and accommodation or inadequate documentation (c.f. research on Internet surveys of illicit drug users [Miller & Sønderlund, 2010; Temple & Brown, 2011]). Importantly, web-based surveys are better equipped to address sensitive questions than are other survey modes (Milton et al., 2017), which can be important for migration studies (as, for example, concerning questions about migrants’ documentation or lack thereof). Moreover, online surveys enable an easier coordination and organization of surveys across multiple countries simultaneously. Using the multilingual interfaces of survey software, such surveys do not require the deployment of interviewers who speak all of the migrants’ native languages. Indeed, interviewers are not required at all. Researchers conducting such surveys would most likely require translators during the development stage of a questionnaire, the testing of its online implementation, and when cleaning and coding the data (unless, of course, the researcher is proficient in the languages spoken by the target population).

The advantages of web-based surveys in combination with SNS-based recruitment make this method promising for a significant number of research contexts, including Russia, since it does not have a population register, its census data on migrants tend to be inaccurate and incomplete (Mkrtchyan, 2011), and its statistics on foreign citizens are too limited to serve as a sampling frame, not to mention the even more inadequate data on persons with a migrant background (those who are naturalized or who are migrants’ descendants). Migrants in Russia are dispersed across the country, regions, and cities.Footnote 1 At the same time, the Internet and SNS penetration rate among migrants is quite high. According to our bilingual face-to-face survey of Kyrgyz migrants in Moscow in 2014 (Varshaver et al., 2014), 63% of respondents used SNS. Today, several years later, we can expect this figure to be higher for three reasons: the penetration rate of the Internet has continued to rise (World Bank, 2019), there are more SNS today, which were not included in our questionnaire in 2014 (e.g., Instagram), and migrants to Russia are mostly youngFootnote 2 and thus generally more internet “savvy.” In total, from 2016 onwards, we conducted five web-based surveys that employed SNS-based targeting of first- and second-generation migrants from Central Asia and the South Caucasus.Footnote 3

While research has indicated the advantages of sampling migrants using targeting on SNS (du Plooy et al., 2018; Pötzschke & Braun, 2017), the drawbacks of this method—and, even more so, the proposal of potential solutions—constitute an almost unexplored field. Among the most significant of these drawbacks is the possibility that the method may result in biased samples. We have succeeded in finding only one paper that explicitly addresses the problem of biases: in their study of Polish migrants in the UK, McGhee et al. (2017) gathered data online via Polish-language Facebook groups and Polish online media. Before presenting their substantial results, the authors compared the composition of their sample with the Annual Population Survey (“the largest ongoing household survey in the UK, based on interviews with the members of randomly selected households” [Nomis. Official labour market statistics, 2020]). They documented several discrepancies in the socio-demographic characteristics of the two samples in terms of gender, age, and education, but concluded that the offline Annual Population Survey also did not provide a representative picture of migrants. Significantly, they did not propose any further steps to remedy the problem.

In this chapter, we step away from a focus on the “bright side” of the method and instead explore the biases it may present and propose solutions that may help to countervail them. We base our analysis on material from the five web-based migrant surveys we conducted using targeting on SNS. First, we describe the procedure for surveying migrants by targeting them on SNS, then we provide an outline of the major challenges we identified, and lastly, we delineate possible solutions, which we illustrate with material from one of the surveys. We conclude that, at present, the range of biases remains more considerable than the opportunities to adjust for them. Thus, it could be time to concede to this difficulty and instead direct our efforts to exploring other approaches to data analysis and presentation that are more suitable for the contexts of uncertainty.

2 Online Survey with Targeting on SNS: Description of the Procedure

In this section, we provide a general outline of the procedure for conducting surveys using targeting methods on SNS. This includes creating a questionnaire and an advertisement, as well as defining targeting criteria.

A starting point is uploading a questionnaire to a special online service for conducting surveys (e.g., SurveyMonkey) and creating an advertisement that includes a link to the survey on an SNS. An advertisement consists of an image, explanatory text, and a motivational button (Fig. 3.1). In our surveys, the pictures may contain national symbols (such as flags) or photographs of people and landscapes (Fig. 3.2). The explanatory text is intended to appeal to a target audience and thus to serve as an additional sorting and attracting mechanism alongside the specified targeting variables. If the rules of the SNS allow for it, migrants’ native languages can be used in the advertisement.Footnote 4 After clicking on the advertisement, a user is directed to the survey’s landing page, which includes a language selection menu (if necessary) and a welcome text providing information about the survey. On the last page of the questionnaire, researchers may express their gratitude to the survey participants and share links to their website and SNS pages for the project and/or research team.

Fig. 3.1
figure 1

Example of a banner used for male respondents of Armenian descent to advertise a survey of second-generation migrants on SNS (2017)

Fig. 3.2
figure 2

Example of a banner used to advertise a survey of Tajik female first-generation migrants on SNS (2018)

The next step involves choosing targeting criteria that effectively define the audiences on a given SNS that the researchers seek to reach via advertisements. SNS differ in the targeting options they support. The most popular SNS worldwide, Facebook, shares an advertising platform with Instagram. This platform offers several ready-made targeting criteria that can be appropriate for migration studies, such as “expats from country X in country Y,” but they do not cover all nationalities comprehensively. For example, criteria for migrants from Central Asian countries in Russia are not available. In such cases, researchers can use other features of targeting, for example, the “interests” category. When conducting a Facebook survey of second-generation migrants from Central Asia, South Caucasus, and Ukraine in Russia, we chose an intersection of two features: location (Russia) and interests (interests that can be described using keywords related to the country or culture of their parents). As an example, our list of interests for targeting second-generation migrants from Kyrgyzstan included Kyrgyzstan, Kyrgyz language, and three major cities—Bishkek, Osh, and Talas.

However, in some countries Facebook may be significantly less popular as compared with other social media or altogether unavailable. Such cases include China and Iran (where Facebook is banned) and the post-Soviet countries where Facebook does operate, but on a smaller scale than other mostly local social media. In Russia, as of 2018, the most popular SNS are the following (in descending order): Vkontakte (36 million users who posted at least once during the month preceding the study), Instagram (24 million), Odnoklassniki (16 million), Facebook (2 million), Twitter (0.8 million), Moi Mir (0.099 million) (Brand Analytics, 2018). The prominence of Odnoklassniki, Vkontakte, and Instagram remains true for the three Central Asian states sending migrants to Russia—Kyrgyzstan, Tajikistan, and Uzbekistan (The Open Asia, 2020). In the South Caucasus, the situation is different: in Armenia, Vkontakte is considered the most popular SNS, but Facebook outstrips Odnoklassniki (Sputnik Armenia, 2018), whereas in Azerbaijan the largest share of the social media market is held by Facebook and Instagram (Midia.Az, 2018). Data on SNS usage among migrants is unavailable, but we can hypothesize that their situation would broadly reflect the SNS ratings in Russia and their country of origin. Thus, depending on the focus of the study, we used a combination of Odnoklassniki, Vkontakte, Instagram, and Facebook for our various surveys of first- and second-generation migrants.

The two main Russian SNS, Vkontakte and Odnoklassniki, are owned by the Mail.ru Group and offer quite similar targeting options (both differ from Facebook). Mail.ru Group has its own advertising platform, MyTarget, which until 2019 was the only means of advertising on Odnoklassniki (since then Odnoklassniki has developed its own advertising provisions). The users of MyTarget can access all of the advertising options provided by Odnoklassniki but only some of the options provided by Vkontakte. The latter, however, also offers separate advertising options. Neither Vkontakte nor MyTarget support targeting criteria such as “expats from X in Y.” Moreover, their “interests” categories are in a fixed format (menu selection) and do not contain any country/culture-related options. However, from our fieldwork, we know that at least some people with a migrant background in Russia (both first- and second-generation) participate in social media groups with ethnic connotations, such as “Uzbeks in Moscow” or a group of Tajik humorous anecdotes. Therefore, we compiled lists of such groups on Vkontakte and Odnoklassniki and used them as a basis for targeting along with other requirements relevant for each situation (e.g., age, gender, location, or place of residence). In terms of geographical criteria, both Vkontakte and MyTarget differ from Facebook. When setting up an advertisement campaign on Facebook, advertisers (or researchers) select locations and define whether targeted users reside there, have recently been there, or currently are travelling there. Vkontakte and MyTarget offer two options with respect to geography. The first involves setting up an ad campaign that will be disseminated across specific countries, regions, or cities, but without an option to specify users’ relations with these locations. In choosing users according to this criterion, Vkontakte analyzes the information that users provide on their profiles, whereas MyTarget analyzes IP addresses, i.e., users’ current locations. The second option involves selecting one or several dots on a map with a radius of up to 10 km (MyTarget) and up to 40 km (Vkontakte) and defining users’ relations with these locations. When launching a national campaign on Vkontakte or MyTarget, it is not feasible to select the specific relations users have with locations, and so the possible choices are restricted to either a user’s designated place of residence (Vkontakte) or current location (MyTarget). In certain contexts, current location may serve as a useful parameter, whereas in others, it may be too crude, or even misleading. An example of the latter situation would be research contexts involving significant tourist flows between countries, alongside migrant flows. However, this is not the case for migration flows between Central Asia or the South Caucasus and Russia where economic migrants constitute the vast majority of flows.

3 Biases and the Search for Solutions

One of the most serious challenges involved in targeting respondents with a migrant background on SNS for web-based surveys concerns biases. In this section, we discuss these biases and propose possible solutions for mitigating them.

The biases manifest in the sampling of migrants on SNS can be shown as a complicated structure with several layers (Fig. 3.3). To begin, not all migrants use the Internet (1–2). Among those who do, not all are registered on SNS, and some of those who are registered do not use their accounts (2–3). Those who have accounts on SNS may follow different patterns of online behavior, and therefore, may be classified in different ways by the SNS. Such differences may structure their potential to be selected as part of the target audience for a given advertisement (3–4). Moreover, migrants may have different habits in their SNS usage (3–5), for example, an individual may have accounts on several SNS, whereas another may have only one SNS account; and another may be highly active on SNS, participating in multiple groups, whereas another may log in only to exchange messages. Due to these variations, individuals’ chances of seeing an ad are not equal. In addition, some users who qualify to be targeted and could be exposed to an ad are not selected by the advertisement algorithms of the SNS (4–5). Users also may either click on an ad, skip it, or miss it altogether while inattentively scrolling their news feed (5–6). Finally, not all of the users who click on a given ad necessarily proceed to the survey questionnaire (6–7) and, among those who do, not all will go on to complete it (7–8). Based on these observations, we can hypothesize that the respondents who eventually complete a survey could differ considerably from our hypothetical sampling frame.

Fig. 3.3
figure 3

Biases of a web-based migrant survey using targeting on SNS

Addressing this complicated set of biases demands different approaches. To begin, we need a better understanding of migrants’ Internet and SNS usage and engagement, which would help with estimating the probability of the participation of various categories of migrants in surveys. This probability is determined by various factors: one migrant does not use the Internet or social media at all, one indicates that she/he comes from a specific country of origin, one hides this fact, another spends a lot of time online, one pays scant attention to advertisements when scrolling their feed, and so on. These factors correspond to layers 1–6 in Fig. 3.3. To the best of our knowledge, no comprehensive study on this topic has been conducted with respect to the Russian context, and internationally the number of such studies remains low (Madianou & Miller, 2013; Law & Chu, 2008). However, we can use the small amount of data that we do have to hypothesize which characteristics differentiate those migrants who use SNS from those who do not (layers 1–3 in Fig. 3.3). Thus, we performed a regression analysis on data from a face-to-face survey of Kyrgyz migrants in Moscow in 2014Footnote 5 (Varshaver et al., 2014) where we asked whether our respondents used Vkontakte, Odnoklassniki, Facebook, or Moi Mir (the most popular SNS at the time) (Table 3.1). As aforementioned, 63% of these respondents used SNS. In our regression model, the dependent variable was having an account on at least one of the four specified SNS. The independent variables included age, Russian language proficiency, education level, gender, urban/rural place of birth, region of birth in Kyrgyzstan (south or north), income, and number of trips to Russia.Footnote 6

Table 3.1 Results of logistic regression analysis (dependent variable: “usage of SNS,” 0 no, 1 yes)

The two statistically significant factors are age and Russian language proficiency: SNS are more actively used by those who speak fluent Russian and those who are younger. While it is not surprising that age is a significant factor in these calculations, we can only hypothesize as to why Russian language proficiency matters, especially given that education does not. One plausible explanation is that even though currently, all the popular SNS offer an option to set up an interface in almost any language of the former Soviet Republics, when the SNS were first introduced in the post-Soviet space, they were initially only supported in Russian and, therefore, were most accessible for those with Russian proficiency. If this is true, the significance of the Russian language may now be less high than it was in 2014. However, if other explanations are plausible, we can expect the significance of the Russian language to remain high. When assessing these findings, it is important to bear in mind the limitations of the 2014 survey. First, the data are by now somewhat outdated, since the SNS landscape is very dynamic. Second, since the survey was limited to Kyrgyz migrants and only those residing in Moscow, we can hypothesize that other migrant groups in different locations could entail quite different outcomes. Regarding other ethnic groups, e.g., Tajik and Uzbek, gender also may factor in SNS usage due to the different constructions of gender relations among the Kyrgyz, Tajik, and Uzbek migrants (Rocheva & Varshaver, 2017). Third, since the survey was conducted in the vicinity of randomly selected metro stations, it may have omitted those migrants who rarely use the metro, e.g., drivers (including taxi drivers), janitors who use bikes, housewives, and so forth. Nevertheless, the results of this survey indicate that online surveys using SNS-based sampling can yield findings that are biased towards those who are younger and speak better Russian.

Another approach to mitigating a selection bias stemming from SNS targeting methods would be to conduct two surveys with the same questionnaire and target population: one would be a face-to-face or telephone survey structured as closely as possible to be random, and the other would be an online survey using SNS-based sampling. A comparison of the results would enable the calculation of propensity weights for subsequent use (c.f. Lee, 2006; Terhanian & Bremer, 2012). We can hypothesize two designs for a propensity score adjustment that appear to be feasible, although resource-intensive, for the Russian context. The first would involve conducting a random face-to-face survey of foreign students—since statistics exist that reveal the distribution of foreign students across Russian universities—in parallel with an online survey of foreign students on several SNS. The second would be to study several locations with a high concentration of migrants, using both an online survey with targeting on SNS and a face-to-face random survey.

Moreover, it would be useful to study how SNS construct their different target variables (e.g., how Facebook defines who is an “expat from Poland in the UK” or who has interests related to “Armenia, Yerevan, and the Armenian language”) and how SNS select which users are exposed to an ad among all those to whom the specified target variables apply. This study would correspond with layers 3–5 in Fig. 3.3. To date, SNS have refrained from disclosing this information, since they consider it to be commercially sensitive and have not shown much interest in cooperating with researchers regarding these matters.

Finally, yet importantly, we need to further explore the interaction between a respondent and a questionnaire, including its welcome text, so to understand better who is more likely to leave a survey page before starting the questionnaire, or to drop out of a survey without completing it (layers 6–8 in Fig. 3.3). The matter of suitable questionnaire length is one example of conventional wisdom on these matters, but there may be further issues of specific relevance for respondents with a migrant background.

These directions for further research would appear to be long-term goals. Are there any “tactical” steps apart from them that we can take to enhance the results from research using SNS targeting methods? We think there are.

First, we can assess whether any dropout bias exists by comparing “completers” with those who drop out. Second, we can do an external validation that compares our results with data from a different source, such as available statistics. Where these statistics are lacking, such validation can take less conventional forms. For example, when we carried out a survey of second-generation migrants and their local peers in Russia using SNS-based sampling, and we lacked statistics on second-generation migrants, we compared the distribution of our respondents across various Russian regions, and the distribution of their ethnicities, with the corresponding characteristics of the Russian population provided in the Russian census. Drawing on this comparison, we checked our results for any significant discrepancies and identified only those we had expected and were able to account for.

A third possible tactical intervention involves weighting the data according to the results of a dropout analysis or external validation. A set of methods are available for reducing bias in online surveys, which are based on comparing the survey data with other data considered representative of the target population and adjusting it accordingly. Adjustment experiments reveal that a basic procedure of weighting obtains almost the same results as other more complicated procedures. Moreover, it has been found that the complexity of statistical procedures is less important to an effective bias reduction than is the choice of variables for the adjustment (Mercer et al., 2018). With respect to the choice of variables, optimal results are achieved when researchers take into account not only demographic variables, but also behavioral and attitudinal ones (e.g., political views, health, internet usage, etc.) (Taylor, 2000; Mercer et al., 2018). However, one problem is that in migration studies, the range of external representative data is very limited, and so it is difficult to devise an extensive set of behavioral and attitudinal variables for weighting. Nonetheless, weighting can be based on limited statistics and results from a dropout analysis.

These tactical actions can be undertaken in different combinations, depending on the data. In the next section, we further illustrate their applicability by commenting on a survey of migrants from Uzbekistan in Russia.

4 Illustrative Example: Survey of Migrants from Uzbekistan in Russia

In March–June 2017, we conducted research into the labour market participation of migrants in Russia. This included a survey of Uzbek migrants (May 2017) who constitute the largest group of labour migrants in the country. We implemented the online questionnaire in SurveyMonkey and disseminated advertisements targeting participants on two SNS—Odnoklassniki and Vkontakte. We targeted SNS users who were at least 18 years old and who participated in or liked groups/pages with ethnic connotations (e.g., “Uzbeks in Moscow”). The advertisements were shown to those users who resided or were located at the time in Russia. We were less concerned with the risk of getting too many people who were just visiting Russia as guests or tourists, since the majority of the flows from Uzbekistan to Russia, at the time, was, and still is, comprised of people travelling for economic reasons.

The choice of SNS is an important step. Ideally, we would have known the shares of Uzbek migrants using different SNS and set up our campaigns to reach the necessary number of respondents on each SNS. However, so far, we only know the number of users of different SNS in Uzbekistan. Over two million Uzbek users of Odnoklassniki visit this SNS monthly (77% are male) (Odnoklassniki, 2019), whereas Vkontakte has one million monthly Uzbek users (Infocom, 2018). Instagram and Facebook have 0.89 and 0.72 million Uzbek users each, respectively (Infocom, 2018). This ranking was also supported in our interviews for previous research projects. Therefore, for this survey, we chose the two most popular SNS, Odnoklassniki and Vkontakte, and decided to allow for a “natural flow” by assuming that readiness to respond would be equivalent on the two SNS and that each would therefore contribute a number of respondents proportional to the popularity of these sites among these migrants.

Language is another issue. Odnoklassniki and Vkontakte limit the usage of languages other than Russian. This limitation was problematic, since we wanted to stress the Uzbek character of our advertisement. To circumvent these restrictions, we chose to use the colors of the Uzbek flag in the advertisement and included our question, within the image, in Uzbek: “Do you work in Russia?” («Россияда ишлаяпсизми?»).Footnote 7 However, the main text accompanying the ad was in Russian: “Do you work in Russia? Complete this survey. Help to make migrants’ life better!” (Fig. 3.4). At the beginning of the survey, respondents could choose whether they wanted the questionnaire to be displayed in Uzbek or Russian.

Fig. 3.4
figure 4

Screenshot of an ad for our survey of Uzbek migrants in Russia

In total, 1099 individuals chose a language on the survey landing page (51% selected Uzbek and 49% Russian), and 865 responded to the first question. A total of 388 went on to complete the survey. With respect to the 1099 individuals, 66% (729 individuals) were recruited through Odnoklassniki and 34% (370 individuals) through Vkontakte.

The accuracy of our targeting can be measured in four dimensions. First, of the 803 individuals who responded to a question regarding their current location, only 5 (0.6%) said they were not presently in Russia. Second, out of the 865 respondents who provided their year of birth, 17 (2%) were underage (less than 18 years old). The third dimension relates to migrant background and the Uzbek/Uzbekistan connection, which can be measured in two ways: place of birth and citizenship. Out of the 865 respondents who replied to the question asking for their place of birth, 77% were born in Uzbekistan and another 16% in neighboring countries in Central Asia. Out of the 865 respondents who provided an answer to the question about citizenship, 655 (76%) had Uzbek citizenship. Last, we targeted individuals who were not just visiting Russia. Out of the 701 respondents who answered the question as to what they were doing in Russia, 661 people (79%) said they were working, 10% working and studying, 6% studying, 4% were occupied with housework; 1% chose the option “other,” mentioning that they simply lived in Russia, and only 1 person defined himself as a guest. Thus, our cleaned dataset included respondents who were at least 18 years of age, who were located or resided in Russia, who were not just visiting, and who had Uzbek citizenship and/or were born in Uzbekistan. Regarding the entire dataset, 540 respondents fit this description, but not all of them completed the survey. A total of 303 respondents fit this description and completed the survey, so they are included in our final database. Overall, regardless of their characteristics, 388 respondents completed the survey. The ratio between 303 and 388 (78%) can be understood as accuracy rate. Who are the 85 respondents who completed the survey, but did not fit the selection criteria? The majority (67 out of 85) consisted of individuals from two countries that neighbor Uzbekistan—Tajikistan and Kyrgyzstan—of whom 26 self-identified as Uzbeks. So, if we take into account respondents who self-identified as Uzbeks yet come from other Central Asian countries, the accuracy rate rises to 84% (326 respondents out of 388). Nonetheless, in the subsection that follows, we use our stricter criteria for inclusion.

4.1 Dropout Rate and Language Bias

The completion rate for the respondents who fit our selection criteria was 56% (303 responded to the last question, out of the 540 individuals who responded to the minimal list of questions necessary to identify them as a fit). Although a conventional standard does not exist for the completion/dropout rate for web-based surveys, and even less so for migrants, some researchers have indicated 60% as an acceptable completion rate for web-based non-probability panels for the general population (Liu & Wronski, 2018). As an example, a web-based survey of Polish migrants in other European countries had a completion rate of 72% (Pötzschke & Braun, 2017), but web-based surveys of “hidden” populations, such as drug users, have had completion rates as low as 38.3% (Temple & Brown, 2011).

As we hypothesized previously, the completion rate could become a source of bias if respondents with different characteristics drop out, to different degrees. To test this hypothesis, we performed a logistic regression analysis (Table 3.2). The dependent variable was whether or not participants responded to the final question, whereas the independent variables included gender, age, questionnaire language, education level, and whether a respondent came to the survey via MyTarget/Odnoklassniki or Vkontakte. The only significant variable was language: if a respondent filled out a questionnaire in Russian, they were twice as likely to complete it.

Table 3.2 Results of a regression analysis (dependent variable “presence of response to the final question,” 0 no, 1 yes)

4.2 External Validation

The two sets of data to which we compared our sample were the limited migration statistics and exposure data provided by the SNS. The official statistics provided by the former Federal Migration Service (FMS) of the Russian Federation until 2016Footnote 8 indicate the number of male and female foreign citizens of different age groups who were in Russian territory on a specific date, but do not disaggregate them by goals of entry. These statistics also do not include those who obtained Russian citizenship. Nevertheless, we can use the data for Uzbek citizens who were in Russian territory in 2016. The exposure data reveal how many users with different demographic characteristics were exposed to a given advertisement.

Our data (N = 303) included 72% male and 28% female respondents, but a comparison with the migration statistics found that women were overrepresented in our survey (Table 3.3). Migration from Uzbekistan and other Central Asian countries to Russia has a mostly male character (Rocheva & Varshaver, 2017). As regards age groups, our dataset included fewer respondents aged 40 years and older than were provided by migration statistics reports for this specific group, and more respondents aged 30–39 years. At the same time, quite surprisingly, the share of the youngest group aged 18–29 was almost the same in the statistics and our dataset.

Table 3.3 Comparison of our dataset with migration statistics according to gender and age groups

The overrepresentation of women in our dataset can be explained by several factors, including women’s more active usage of SNS or SNS groups with ethnic connotations, or their higher inclination to participate in (and complete) online surveys, as is suggested in previous research (Smith, 2008). In our dropout/completion analysis (Table 3.2), we found that gender was not a significant factor. We can assess whether women are more likely than men to participate in a survey if exposed to an advertisement by calculating a conversion rate, i.e., the ratio of respondents who completed a questionnaire to those who were exposed to an advertisement on SNS (Table 3.4). Whereas the Odnoklassniki conversion rate was higher for females than for males, the exact opposite was true in Vkontakte. Thus, we cannot conclude that females are more likely to participate in a survey after being exposed to an advertisement.

Table 3.4 Conversion rates for Odnoklassniki and Vkontakte according to gender

Interestingly, the share of women, among the users who were exposed to an advertisement, was similar (28–30%) on both SNS, and higher than the share of women among migrants in the available statistics. This finding leads us to suggest that women may be more active users of SNS or groups with ethnic connotations on SNS, which, in turn, may contribute to an explanation for the overrepresentation of women in our dataset as compared with the migration statistics.

4.3 Weighting

We have been able to identify several biases in our sample. First, respondents who selected the Uzbek language questionnaire were less likely to complete the survey. Second, women were more prone to participate in the survey than men on Odnoklassniki, whereas the opposite was true for Vkontakte. Third, in comparison with migration statistics, there were more females and people aged 30–39 and fewer people aged 40 and older in our dataset. Ideally, we would adjust our dataset according to these identified biases— sequentially, one by one—as if these were “layers” we wanted to restore. However, the procedure of weighting allows for only one step, not several. Moreover, we could not accommodate gender differences according to both the migration statistics and conversion rates within this step. Therefore, we opted for a combination of language dropout data and gender and age proportions from the migration statistics. Before we describe the weighting procedures, we need to make an assumption. Since migration statistics are based on current nationality, and since our database contained citizens of both Uzbekistan and Russia, we had to assume that the proportions of men and women of different ages, among those who retained Uzbek citizenship and those who were naturalized, were equivalent.

The three variables we used for weighting were age, gender, and language. Since our sample was not that large, we used three age groups instead of four: 18–29 years old, 30–39 years old, and 40 years and older. We used the following formula to calculate the weighting coefficients (w):

$$ w=\frac{k\ast m}{n} $$

where k is the share of the women/men of a specific age group in the total number of Uzbek citizens according to statistics; m is the share of female/male respondents of a specific age group who selected the Uzbek/Russian language, even though they might not have answered the rest of the questions; n is the share of female/male respondents of a specific age group in our final sample who selected the Uzbek/Russian language.

After weighting, the dataset included a higher proportion of those who selected the Uzbek language, fewer women, and more respondents of an older age (Table 3.5).

Table 3.5 Results of weighting

It may have been productive to compare the characteristics of the weighted database with some other migration statistics (regional distributions, occupations, etc.), but we used all the available statistics variables (gender and age) for weighting. However, we were able to check the changes of the variables in the dataset. We found that the weighting changed the distributions of the variables that were connected with a general orientation towards Russia or country of origin. First, the amount of remittances increased from 15,301 rubles before the weighting to 16,586 rubles after the weighting (approximately $212 and $229 correspondingly). Second, after the weighting, there was a larger share of those willing to return to the country of origin and a smaller share of those willing to stay in Russia or to live transnationally in two countries (Table 3.6).

Table 3.6 Comparison of plans for the future, before and after the weighting

To summarize, targeting the participants of the Uzbek-connotated groups and pages on the two SNS popular in Russia and Uzbekistan proved to be an efficient method of sampling. We were able to get responses from our target group: those who were born in Uzbekistan and/or had Uzbek citizenship, who were at least 18 years of age, who were currently located or resided in Russia, and who were neither guests nor tourists. However, this method is associated with some biases, which we were able to partially compensate for using weighting. The weighting of the dataset according to gender, age, and survey language altered the distributions of some variables. Among these, some were associated with orientations towards Russia or Uzbekistan, namely, remittance behavior and migration intentions.

5 Discussion and Conclusion

This chapter contributes to the growing body of literature demonstrating the effectiveness of targeting on SNS as a sampling strategy for online migrant surveys in various contexts (Pötzschke & Braun, 2017). Its main goal, however, was to foster a discussion of the serious challenges of biased samples associated with this method, and to propose possible approaches to address this problem. The range of methods developed by scholars to adjust for biases in non-random surveys of other target populations (Baker et al., 2013) have limited applicability in the field of migrant studies due to a lack of sampling frames and, more generally, to limited knowledge about the characteristics of this target population. Methods such as propensity score adjustment are not easily applicable in our field, at least thus far. We have demonstrated how weighting based on dropout analysis and external validation can be used as hands-on solutions. Still, we need to note that weighting can in some cases exacerbate biases, if the underlying assumptions are incorrect. For example, our calculations of the weighting coefficients showed that we needed to increase the share of older respondents in accordance with migration statistics. This increase was based on the assumption that the older respondents in our sample did not differ significantly from older migrants who did not take part in the survey. However, if this assumption were to prove incorrect—for example, if significant differences existed between the older migrants who used SNS and those who did not (and thus did not have an opportunity to participate)—our weighting would not have been an effective method of adjustment.

Transparency in the assumptions made and, more broadly, in the descriptions of the design and data analysis, is deemed an essential element for non-random survey designs. Thus, it is necessary for the assessment of a given study and its results, as well as for the advancement of the method (Baker et al., 2013). In the case of a study using targeting on SNS, scholars alone cannot provide fully transparent descriptions of their sampling strategy, since there remains an important piece of the puzzle which is lacking with respect to the operations of the SNS. For now, SNS do not disclose how they construct target variables or how they select users who are exposed to an advertisement among all those who fit the targeting criteria. Besides improving transparency, this information would help researchers to estimate the probability of different users’ participation in a survey.

A realistic assessment of the present situation leads us to concede that the number of biases is considerable, whereas our opportunities to adjust for them are rather modest. If we resign ourselves to this fact, we might instead direct our efforts to the exploration of other approaches to analyze and present the data that may be a better fit for contexts involving high levels of uncertainty. At least two potential sources of inspiration stand out for this strand of exploration: the fuzzy set theory and Bayesian statistics.

Fuzzy set theory was introduced in the 1960s (Zadeh 1965), and since then, it has attracted the attention of different fields, including the social sciences (Ragin & Pennings, 2005; Smithson & Verkuilen, 2006). As a soft computing method, fuzzy set theory was developed to work with linguistic categories that often have blurred boundaries, as well as to deal with imprecise and incomplete data. Unlike classical (crisp) sets—in which an element belongs (1) or does not belong (0) to a set—a fuzzy set’s belonging can vary on an interval from 0 to 1 (with 0.5 being the point of least certainty), which is described with a membership function. In addition to being applicable to categories such as “poor” (Lemmi & Betti, 2006) or “migrant,” which do not always allow for strict definitions (using crisp logic, spending one more day in another country could change your status), fuzzy logic can help scholars to depart from the conventional manner of working with data. Such conventions imply the provision of exact figures for “well integrated” migrants or “average remittances.” Using fuzzy logic can enable researchers to aim for a formulation of tendencies and approximate assessments. Thus, instead of providing a “precise” figure for the average income of a migrant, a scholar can define limits or provide an approximate figure, perhaps indicating a “credible” interval, so to emphasize the character of the data and, more broadly, the context of uncertainty.

In recent decades, Bayesian statistics has advanced due to improvements in computing technologies and algorithms, yet its usage in the social sciences remains, to date, very modest (Lynch & Bartlett, 2019; Western, 1999). Unlike more conventional frequentist statistics, analysis using Bayesian logic implies not only working with a specific dataset, but also taking into account “priors,” which may include results from previous studies, as well as experts’ or scholars’ assessments of probability. Thus, Bayesian statistics are used to “blend” several data sources, for example, SNS data and migration statistics to estimate the number of migrants in different states in the USA (Alexander et al., 2020), or various opinion polls regardless of their representativeness (Roshwalb et al., 2012). Regarding an online migrant survey that uses targeting on SNS, priors may include the results of a study of migrants’ SNS usage practices or exposure data provided by the SNS. Bayesian statistics can also be useful when a survey is implemented that targets individuals on several SNS, where respondents using various SNS differ considerably in their characteristics.

To conclude, we have demonstrated a complicated set of biases that scholars face when conducting an online migrant survey based on SNS targeting, as well as extant possibilities for adjusting for some of these biases. However, these remedies seem insufficient to redress the potentially strong and rather unpredictable distortions caused by those biases. Therefore, we surmise that it might be time to explore other avenues of working with such data in contexts of uncertainty, for example, fuzzy set theory and Bayesian statistics.