## Abstract

The use of informal job search method is prevalent in many countries. There is, however, no consensus in the literature on whether it actually matters for wages, and if it does, what are the underlying mechanisms. We empirically examine these issues specifically for rural migrants in urban China, a country where one of the largest domestic migration in human history has occurred over the past decades. We find that there exists a significant wage penalty for those migrant workers who have conducted their search through informal channels, despite their popularity. Our further analysis suggests two potential reasons for the wage penalty: (1) the informal job search sends a negative signal (of workers’ inability to successfully find a job in a competitive market) to potential employers, resulting in lower wages, and (2) there exists a trade-off between wages and search efficiency for quicker entry into local labor market. We also find some evidence that the informal job search may lead to low-skilled jobs with lower wages. We do not find strong evidence supporting alternative explanations.

## Notes

For example, Corcoran et al. (1980) and Granovetter (1995) both report that more than 50% of all new jobs are found through friends and relatives in developed countries such as the USA. Holzer (1988) finds that 36% of firms filled their job vacancies with referred applicants (Ioannides and Loury 2004).

The network effect on domestic migration has also been documented in other contexts, e.g., in the USA, (Millimet and Ye 2014).

For example, Kanbur and Zhang (2005) find that Gini coefficients increased from 22.4 in 1952 to 29.3 in 1978 and to 37.2 in 2000. (Gustafsson et al. 2008, p.1) state that “Income inequality is ... now considered high by international standard. ... [I]n China the speed with which the increase has occurred, and the level to which inequality has risen, is striking.”

Using household survey data, Yang (1999) finds that income inequality in rural areas account for a sizable share of the overall inequality index in China.

This possibility has been examined in the international contexts. For example, Korenman and Turner (1996) find that differential returns to employment contacts between groups could help to explain part of the racial differences in wages in the USA.

There is also other evidence of productivity-enhancing positive network effects by taking into account spatial components in the network (e.g., Hellerstein et al. (2014)).

For example, the literature has generally found a large, positive impact of marriage on wages (e.g., Maasoumi et al. (2009)) and significant effects of hukou on one’s social and economic circumstances (e.g., Chan (2010)). The ratio of migrants in the village may also fail to be exogenous. For example, Chen et al. (2010, p.3) note that “clustered migration may be driven by villagers having similar individual characteristics or facing similar institutional environments.”

For example, even with multiple IVs, the traditional overidentification tests cannot necessarily help to test the validity of the IVs. Remember that the traditional exogeneity/overidentification test relies on the assumption that a subset of the IVs are valid; the idea behind the test is that if all IVs are valid, then the estimates using the full and subset of IVs should not differ statistically (Wooldridge 2010, p.134–137). However, if

*all*IVs are invalid in similar ways, we should expect them to deliver similar estimates. As a result, the traditional exogeneity tests may still conclude that they are valid ones. Wooldridge (2010) gives an example of estimation of returns to education where both mother’s and father’s educations are used as IVs, while they may be correlated and invalid in similar ways.As detailed in Section 4, we also control for destination fixed effects. Given the log nature of the dependent variable, any destination-specific variables (such as the cost of living in each destination) will also be absorbed by the destination fixed effects.

As a referee notes, the observed choice of job search method can also be thought of as a result of both employee self-selection (e.g., in pursuit of better labor market outcomes such as wages) and employer selection (in pursuit of, e.g., low cost in the case of informal employee search through referrals, or more effective selection in the case of formal employee search for much sought-after skills). While not formally modeling such joint determination of the search choice, our formulation implicitly takes into account this possibility since the comparisons are based on wage offers using different methods, and wage offers capture all aspects of the employer selection. More important, employer selection decisions are usually exogenous to individual decisions, and thus, omission of it should not affect the estimates in our paper, which are based on individual choices. Formal models of employer selection can certainly provide potentially useful sources of identification, e.g., external IV, and richer information on the determination process. However, this would require much more detailed information that is usually not available in most datasets, and we therefore leave this potential extension for future research.

The self-serving bias refers to individuals attributing their successes (in our case, locating a good job) to internal or personal factors but attributing their failures to external or situational factors (Campbell et al. 2000). As a result, while some respondents who actually use network to find a (good) job may report that they find their jobs on their own in a competitive market, others who find a (bad) job may report it as a result of use of social networks.

To see this, \(\mathbb {E}[Informal_{i}|X_{i}]={\sum }_{Informal=0,1}Informal_{i} \times Pr[Informal_{i} = 1 | X_{i}] = 1\times Pr[Informal_{i} = 1 | X_{i}] + 0 \times Pr[Informal_{i} = 0 | X_{i}]=Pr[Informal_{i} = 1 | X_{i}]= Pr[X_{i} \lambda - v_{i} \geq 0] = Pr[v_{i} \leq X_{i} \lambda ] = Pr[S(X_{i}\theta ) v^{*}_{i} \leq X_{i} \lambda ] = Pr\left [ v^{*}_{i} \leq \frac {X_{i} \lambda }{S(X_{i}\theta )}\right ] = F\left [\frac {X_{i} \lambda }{S(X_{i}\theta )}\right ]\).

Specifically, the LM test is calculated by taking

*N*(sample size) multiplied by*R*^{2}from an artificial regression of ones on the product of generalized residual and explanatory variables and the product of generalized residual, the single index from the probit model, and the explanatory variable potentially causing heteroskedasticity. The test statistic is*χ*^{2}with*J*degrees of freedom (the number of explanatory variables potentially causing heteroskedasticity). See Verbeek (2004, p. 201) for more detail.An IV identifies only the wage effects for the sub-population whose decision of utilizing the informal job search method is indeed influenced by this particular IV, the so-called

*local average treatment effect*(LATE) (Imbens and Angrist 1994).Note that the 2007 RUMiCI data are the same data as the widely used national representative data, Chinese Household Income Project 2007.

Urban-urban migrants account for only 1% of the sample.

As noted in Granovetter (1974, p.25), “wages or, in more refined formulations, the total benefits accruing to a worker by virtue of holding a given job” reflect the price of labor.

Since the data were collected from the destination provinces, there are very few observations for some of the provinces of origin. We therefore group them by region as follows: North Coast, Central Coast, South Coast, Central, Northwest, Southwest, and Northeast.

Unfortunately, we cannot conduct such exercise for other variables such as occupation since the IV estimates for these variables are generally non-existent in the literature. The lack of such estimates also indicates the challenge that we face to correctly control for these variables in our baseline estimations.

## Appendix: Simulation results for the impacts of misspecification on KV-IV estimates

### Appendix: Simulation results for the impacts of misspecification on KV-IV estimates

In our empirical analysis, we follow Millimet and Tchernis (2013) and employ a parametric variant of the KV-IV approach by specifying both the distributional and heteroskedastic error functions. This practice has two distinct advantages as it greatly reduces the computation burden, and it also eliminates the need for continuously distributed exogenous variables in the semi-parametric approach. As argued in Millimet and colleague’s paper, this practice is innocuous. Wooldridge (2010, p.939) and Angrist and Pischke (2009) also note that the consistency of IV estimates does not depend on the correct specification of the first-stage equation. In other words, misspecification of the distribution and heteroskedastic functions should not necessarily impact the KV-IV estimates, provided that the other assumptions hold.

Monte Carlo (MC) simulations corroborate this theoretical point. Specifically, we undertake two sets of MC experiments using simulated data. The finite sample performances of the KV-IV estimator under correct specifications have been shown elsewhere (e.g., Millimet and Tchernis (2013)).

The first set of MC experiments consider only the impacts of the misspecification of the distribution functions, while the second set of MC experiments consider the impact of the misspecification of both the distribution and the heteroskedastic functions. We perform this exercise twice for each case using 1000 simulations of sample sizes 4200 (roughly the size in our application) and 10,000.

The first MC design is based on the following data-generating process:

Note that Eq. 12 specifies the heteroskedastic function, while Eq. 15 specifies the distribution function (which is chi-squared distributed). The endogeneity arises because of Eq. 13.

For comparison, we choose the best scenario as our benchmark where the correctly specified predicted value, \(F\left (\frac {x_{1} + x_{2}}{\exp (x_{1}+x_{2})}\right )\) (where *F*(⋅) is the cumulative chi-squared distribution with 1 degree of freedom), is used as the IV. This is the best scenario not only because we use the correct specification, but also because we use the true parameter values (as opposed to the estimated ones); this is denoted as **True IV**. We then compare the benchmark results to the results using our misspecified parametric IV, \({\Phi }\left (\frac {\widehat {\lambda _{1}}x_{1} + \widehat {\lambda _{2}}x_{2}}{\exp (\widehat {\theta _{1}}x_{1}+\widehat {\theta _{2}}x_{2})}\right )\) (where Φ(⋅) is the cumulative distribution function for standard normal variables); this is denoted as **Parametric IV**.

The second MC design is based on the following data-generating process:

Note that the true heteroskedastic function is now different (Eqs. 18 vs. 12). Therefore, for the second MC design, the best scenario case \(F\left (\frac {x_{1} + x_{2}}{(x_{1}+x_{2})^{2}}\right )\) (where *F*(⋅) is the cumulative chi-squared distribution with 1 degree of freedom) is used as the IV. For **Parametric IV**, we continue to use \({\Phi }\left (\frac {\widehat {\lambda _{1}}x_{1} + \widehat {\lambda _{2}}x_{2}}{\exp (\widehat {\theta _{1}}x_{1}+\widehat {\theta _{2}}x_{2})}\right )\). Note that we now misspecify both distribution and heteroskedastic functions.

The results are presented in Table 13; OLS results are also included for comparison. The simulations indicate that in the presence of endogeneity, OLS is severely biased, which is not surprising. What is surprising is the outstanding performances of the parametric approach, relative to the best scenario. First, in the smaller sample (as ours), the parametric IV, although misspecified, performs extremely well and similarly to the true IV (the best scenario). Second, as the sample size increases, both IVs have an average bias nearing zero, and the difference between two IVs becomes even smaller and barely distinguishable. In sum, these MC results corroborate the theoretical expectations above.

