1 Introduction

Ethnic and racial discrimination in labor markets, as manifested in wage and occupational attainment gaps, has been widely examined (e.g., Altonji and Blank 1999; Antecol and Bedard 2004; Atal et al. 2009). In India too, labor market discrimination against historically disadvantaged caste groups, i.e., the former untouchables (Scheduled Castes or SCs) and marginalized tribal groups (Scheduled Tribes or STs), is well documented, with SCs and STs earning significantly lower wages and being allocated to less prestigious jobs as compared to upper castes, after controlling for their productive characteristics (Banerjee and Knight 1985; Madheswaran and Attewell 2007; Das and Dutta 2007).Footnote 1 However, the disadvantage faced by these groups may not be limited merely to wage employment and could extend to the realm of self-employment as well. While there is a sizable literature from the USA that studies racial differences in entrepreneurship in terms of business creation rates, survival, employment, profits and net worth (e.g., Fairlie 2004, 2006; Ahn 2011; Lofstrom and Bates 2013), examination of such issues in the Indian context is relatively recent, due to data constraints.Footnote 2

Our paper attempts to fill this gap for India, by assessing caste discrimination in household non-farm businesses (‘businesses’ hereafter), which has been possible due to the recent availability of good-quality earnings data for such businesses. Given the small scale of operations of these businesses, catering mostly to customers in the local community, it is highly plausible that businesses owned by low-caste owners face discrimination at the hands of customers, suppliers and lenders, since their caste status is easily identifiable, unlike in large businesses with complex ownership and management structures, where observing the caste of the owners might be less straightforward. However, discrimination could be directed toward larger low-caste businesses too: In personal interviews, rich SC entrepreneurs have discussed their individual battles with caste discrimination as they started their businesses.Footnote 3 There are other ethnographic accounts as well (Jodhka 2010; Prakash 2010) that indicate the presence of persistent disadvantage and discrimination in the self-employment arena, which forms the motivation for the present study.

To the best of our knowledge, ours is the first paper to examine caste gaps in earnings from household businesses for India. We use the India Human Development Survey data for 2004–2005 and employ two methodologies for understanding the earnings structure of businesses: OLS estimation of mean earnings for businesses owned by SCs and STs and non-SCST businesses; and quantile regressions for a distributional analysis to look beyond the mean and to understand ‘what happens where’ in the earnings distribution. Correspondingly, we use decomposition strategies to decompose the earnings gap between SCST and non-SCST businesses into explained and unexplained components (with the latter being indicative of discrimination), at the mean and at various quantiles of the earnings distribution.Footnote 4

Our main findings are as follows. There are clear differences in observable characteristics between SCST and non-SCST businesses. The latter are more urban, record larger number of total man-hours, have better educated and richer owners, and are more likely to have a business in a fixed workplace. These disparities get reflected in both indicators of business performance in the data—gross receipts and net income—such that SCSTs, on average, perform significantly poorly compared to non-SCSTs. The Blinder–Oaxaca decomposition reveals that depending on the specification of variables, assuming that the non-discriminatory earning structure is that of non-SCSTs, at least 20 % of the net income gap could be attributed to the unexplained or the discriminatory component. Unconditional caste gaps in earnings are higher at lower percentiles than at the higher percentiles. Thus, we find some evidence supporting a ‘sticky floor,’ a phenomenon observed in the context of gender wage gaps in developing countries (e.g., Chi and Li 2008; Carrillo et al. 2014). Quantile decompositions based on our preferred specification reveal that the unexplained component is significant in the middle part of the distribution (viz., between the fourth and eighth deciles), where it hovers around 15 % of the total gap in earnings.

In addition to contributing to the broader literature on racial and ethnic disparities in small business ownership from a developing country perspective, this paper has significant policy implications, particularly in the context of the current discourse on ‘Dalit Capitalism’ in India—inspired by ‘Black Capitalism’ in the USA—by the Dalit Indian Chamber of Commerce and Industry (DICCI).Footnote 5 DICCI believes that Dalits should enter business and industry sectors as entrepreneurs and use this route to become ‘job givers, and not job seekers’ especially for others in their own community and enhance their wealth, instead of being dependent on the state for benefits. However, the majority of Dalit businesses are small, owner-operated, survivalist household enterprises that do not have the potential to generate either employment or wealth (Deshpande and Sharma 2013). Further, our results indicate that discriminatory tendencies that exist in labor markets may characterize business operations as well.

The rest of this paper is organized as follows: Sect. 2 contains a literature review; Sect. 3 outlines the methodology; Sect. 4 discusses the data and descriptive statistics; Sect. 5 presents the results, while Sect. 6 discusses our findings. Section 7 concludes.

2 Review of related literature

Iyer et al. (2013) and Thorat and Sadana (2009) in descriptive analyses using Indian Economic Census data document caste differences in non-agricultural enterprise ownership and performance.Footnote 6 They find SCs and STs to be underrepresented relative to their population shares. Enterprises owned by SCSTs are smaller in terms of number of workers, hire mostly family labor, rely less on external sources of finance and operate mostly in the unregistered unorganized sector as compared to enterprises owned by non-SCSTs. Deshpande and Sharma (2013) examine unit-level data from two successive censuses of the micro-, small and medium enterprises (MSME) sector to study the nature of participation of marginalized groups in self-employment and found that the MSME sector exhibits very clear differences along business owners’ caste and gender, in virtually all business characteristics.

This evidence of systematic differences, however, does not prove discrimination; all the gaps in performance could, in principle, be accounted for by differences in characteristics of SCST and non-SCST businesses.Footnote 7 For example, in the USA, racial disparities in asset ownership and family background in self-employment (with blacks being more disadvantaged than whites) are among the most important factors leading to differences in business creation and performance (Dunn and Holtz-Eakin 2000; Hout and Rosen 2000). However, even after controlling for differences in characteristics, a significant proportion of the performance gap remains unexplained and that could be on account of discrimination or some unobserved differences in behavior such as ability and risk aversion or some factors not amenable to measurement.

Discrimination manifests itself in self-employment primarily in the form of consumer and credit market discrimination. For example, Borjas and Bronars (1989) study consumer discrimination and find that relative gains of entering self-employment are reduced for ethnic minorities because they have to compensate white consumers by lowering prices charged for goods and services. Coate and Tennyson (1992) study credit discrimination assuming that lenders are unable to observe entrepreneurial ability. Individuals from a group discriminated against in the labor market will receive less favorable terms in the credit market since lenders know that for such individuals, the opportunity cost of entering self-employment is lower, and, thus, they are willing to take more risks. Such groups will be charged higher interest rates, thereby reducing the expected returns from self-employment. Empirical analyses using data from the USA show that the probability of loan denials and rates of interest charged on approved loans is higher for black-owned businesses than whites (Blanchflower et al. 2003) and probability of loan renewals is lesser for black- and Hispanic-owned businesses (Asiedu et al. 2012). Section 6 discusses the evidence from Prakash (2010), Jodhka (2010) and Kumar (2013), among others, to understand possible channels of discrimination against Dalit businesses in India.

3 Methodology

3.1 Blinder–Oaxaca decomposition framework

We first use the Blinder–Oaxaca method to decompose the mean earnings gap from self-employment between SCSTs and non-SCSTs into portions attributable to differences in characteristics (the explained component or composition effect) and differences in returns to these endowments (the unexplained component or coefficients effect) (Blinder 1973; Oaxaca 1973). While the unexplained component can be attributed to discrimination, it is highly plausible that this residual also includes the effects of either unmeasurable or unobservable characteristics. All decomposition exercises are subject to this caveat. However, it is equally true that some pre-market discrimination affects the formation of characteristics, and thus, the explained component also embodies the effects of past discrimination. Therefore, estimates of the unexplained component from decomposition exercises should not be taken as precise measurements of ‘true’ discrimination, but as rough estimates, providing orders of magnitude.

This method involves estimating earnings equations separately for individuals i of the different groups g, SCSTs (group s) and non-SCSTs (group n):

$$w_{ig} = X_{i}^{g} \beta^{g} + u_{i}^{g}$$

where g = (n, s) denotes the two groups. The dependent variable w is the natural log of earnings. X i is the vector of covariates for individual i, which contains characteristics that would determine earnings. β is the corresponding vector of coefficients, and u is the random error term.

The gross difference in earnings between the two groups can be written as:

$$G = \bar{X}^{n} \hat{\beta }^{n} - \bar{X}^{s} \hat{\beta }^{s}$$

In order to decompose this gap, some assumptions have to be made about the earnings structure that would prevail in the absence of discrimination and construct counterfactual earnings functions. One counterfactual could be constructed by assuming that the non-discriminatory earnings structure is the one applicable to non-SCSTs.Footnote 8 In that case, the counterfactual earnings equation of the SCSTs would be written as:

$$w_{is}^{c} = X_{i}^{s} \beta^{n} + v_{i}^{s}$$

Adding and subtracting the counterfactual earnings to Eq. (2), we arrive at:

$$G = \bar{w}^{n} - \bar{w}^{s} = \left( {\bar{X}^{n} - \bar{X}^{s} } \right)\hat{\beta }^{n} + \bar{X}^{s} \left( {\hat{\beta }^{n} - \hat{\beta }^{s} } \right)$$

where the first term on the right-hand side represents the part of the earnings differential due to differences in characteristics and the second term represents differences due to varying returns to the same characteristics. The second term is the unexplained component and is considered to be a reflection of discrimination.

The decomposition is sensitive to the choice of the non-discriminatory earnings structure, as the two counterfactuals yield different estimates. To get around this ‘index number problem,’ one solution is to use the pooled estimates as the single counterfactual (Oaxaca and Ransom 1994). Another solution, suggested by Cotton (1988), is to construct the non-discriminatory earnings structure as a convex linear combination of the earnings structures of both groups.

3.2 Quantile regression decomposition framework

Generalizing the traditional Blinder–Oaxaca decomposition to analyze earnings gaps at different parts of the earnings distribution, Machado and Mata (2005) proposed a decomposition method that involves estimating quantile regressions separately for the two subgroups and then constructing a counterfactual using covariates of one group and returns to those covariates for the other group.

The conditional earnings distribution is estimated by quantile regressions. The conditional quantile function Q θ (w|X) can be expressed using a linear specification for each group as follows:

$$Q_{\theta } \left( {w_{g} |X_{g} } \right) = X_{i,g}^{T} \beta_{g,\theta } \;{\text{for}}\;{\text{each}}\;\theta \in (0,1)$$

where g = (n, s) denotes the two groups. w is the natural log of earnings. X i is the set of covariates for individual i, β θ are the coefficient vectors that need to be estimated for the different θ th quantiles. The quantile regression coefficients can be interpreted as the returns to various characteristics at different quantiles of the conditional earnings distribution.

Next, Machado and Mata (2005) construct the counterfactual unconditional earnings distribution using estimates for the conditional quantile regressions, which consists of the following steps:

  1. 1.

    Generate a random sample of size m from a uniform distribution U [0,1]

  2. 2.

    For each group, separately estimate m different quantile regression coefficients

  3. 3.

    Generate a random sample of size m with replacement from the empirical distribution of the covariates for each group, X s,i and X n,i

  4. 4.

    Generate the counterfactual of interest by multiplying different combinations of quantile coefficients and distribution of observables between group s and group n after repeating this last step m times.

Standard errors are computed using a bootstrapping technique.

This simulation-based estimator relies on the generation of a random sample with replacement to construct the counterfactual unconditional earnings distribution and comes at the cost of increased computational time. Melly (2006) proposed a procedure that is less computationally intensive and faster by integrating the conditional earnings distribution over the entire range of covariates to generate the marginal unconditional distribution of log earnings. This procedure uses all the information contained in the covariates and makes the estimator more efficient than the one suggested by Machado and Mata (2005). The Melly (2006) and Machado and Mata (2005) decompositions are numerically identical when the number of simulations in the latter goes to infinity.

We construct a counterfactual for the SCST group using the characteristics of SCSTs and the earning structure for non-SCSTs here:

$$CF_{\theta }^{s} = X_{s,i}^{T} \beta_{n,\theta }$$

This yields the following decomposition:

$$\Delta _{\theta } = \left( {Q_{n,\theta } - {\text{CF}}_{\theta }^{s} } \right) + \left( {{\text{CF}}_{\theta }^{s} - Q_{s,\theta } } \right)$$

The first term on the right-hand side represents the effect of characteristics (explained component) and the second the effect of returns to characteristics (coefficients effect or unexplained component).

4 Data and descriptive statistics

4.1 Data

We use the India Human Development Survey (IHDS) for 2004–2005, which is a nationally representative data set covering 41,554 households across 1504 villages and 971 urban states in 33 states of India. The modules of the survey collect data on a wide range of questions relating to economic activity, income and consumption expenditure, asset ownership, social capital, education, health, marriage and fertility, etc.

The survey module on household non-farm businesses does not identify the primary decision-maker in the business. However, we can identify specific members in the household who worked in the business and the amount of time they spent, in terms of days per year and hours per day. Using that information, we assume that the person who has spent maximum number of hours in the business is the de facto decision-maker.

We restrict the sample to those states where there are at least 50 household businesses, leaving us with 22 states.Footnote 9 We consider only male businesses (i.e., where men are the primary decision-makers) in the main analysis because factors affecting selection into self-employment vary along lines of gender; additionally, in order to delineate the effect of caste, we need to hold gender constant, so as not to confound the effect of overlapping identities.Footnote 10

The data canvasses information on two measures of financial performance of the business: net income and gross receipts. Our primary dependent variable is the log of net income from the business over the last 12 months. Net income is computed as gross receipts less hired workers’ wages less cost of materials, rent, interest on loans, etc. One issue on which the data are patchy is the use of unpaid family labor in these businesses, which would affect the calculation of net income. While some businesses in the data report the individual components as well as a net income, others report only the net income. However, our queries with the IHDS team revealed that when hired labor is not reported, it cannot be assumed that no labor was actually hired. Thus, data do not allow us to clearly distinguish between hired and unpaid family labor, resulting in the inability to estimate ‘true’ net income. We, thus, use the net income figures in the data as reported. While expenditure-based indicators have been found to be more reliable than income-based measures in developing countries—on account of recall errors, non-response and deliberate mis-reporting—for an analysis focusing on enterprise performance, income is the most appropriate outcome to consider.

As explanatory variables, we use individual-specific variables such as age, marital status and standard years of education completed of the decision-maker; household-specific variables such as wealth (proxied by asset ownership), rural/urban status, whether someone close to or within the household is an official of the village panchayat/nagarpalika/ward committee and membership in the following: business or professional group; credit or savings group; caste association; development group and agricultural, milk or other co-operative; and business-specific variables such as number of family members who worked in the business, total number of hours put into the business, work place type and industry type.Footnote 11 Admittedly, business-specific variables and some household-specific variables such as membership in different types of networks and wealth are potentially endogenous with respect to business performance. However, as Fortin et al. (2011) argue, decompositions are accounting exercises that allow one to quantify the contribution of factors to the difference in outcome between two groups without necessarily shedding any light on the mechanisms explaining the relationship between such factors and outcomes.

As our sample is limited to only those households that operate businesses, a potential limitation of our estimations is that coefficients of earnings regressions may be biased since individuals and households do not randomly select into self-employment. Unfortunately, our data set does not provide us with suitable instruments to correct for selection.

4.2 Descriptive statistics

Table 1 lists the summary statistics for the whole sample and for the sample of SCST and non-SCST businesses separately. Of the total 7288 businesses, 1300 are owned by SCSTs (17.8 %) and the remaining 5988 by non-SCSTs (82.2 %).Footnote 12

Table 1 Summary statistics

In terms of performance, the average net income for non-SCST businesses (Rs. 45,218) is 1.76 times that for SCST businesses (Rs. 25,640). A similar pattern can be seen in the average gross receipts. Figure 1 plots the kernel density distribution of log income for SCST and non-SCST businesses. The distribution of incomes of non-SCST businesses lies distinctly to the right of the SCST businesses.

Fig. 1
figure 1

Kernel density of log income

This large difference in business performance could be on account of a variety of characteristics, in most of which there are clear differences between SCSTs and non-SCSTs. The primary decision-maker is on average 39 years old, and 86 % of them are married. These numbers are similar across SCST and non-SCST decision-makers. However, average years of education differ significantly by caste, with 8.3 years for non-SCSTs and 5.7 years for SCSTs.

There is a distinctly different pattern in the rural–urban distribution across castes with 33 % of SCST households and 53 % of non-SCST households being located in urban areas. There is also disparity in material standard of living as reflected in asset ownership, in that out of the 16 assets in the questionnaire, non-SCSTs own approximately 8, while SCSTs own around 5.Footnote 13 We create a wealth index using principal components analysis and divide the sample into three groups following Filmer and Pritchett (2001): those lying in the bottom 40 % (poor), middle 40 % (middle) and the top 20 % (rich). By this somewhat arbitrary definition, 65.2 % of SCST households fall in the poor category, while 34.6 % of non-SCST households are poor. 27.4 and 42.7 % of SCSTs and non-SCSTs, respectively, are in the middle, and 7.4 % of SCSTs and 22.7 % of non-SCSTs are rich.

We also examine networks since these can affect the decision to become self-employed, as well as the prospective success of the business (Allen 2000). In general, participation in such networks is low. 8 % of all businesses are members of business or professional groups with membership of SCST businesses being below average (5 %). Participation in credit or savings groups does not differ by caste, covering roughly 7 % of owners. Membership in caste associations is 14 and 12 % for non-SCST and SCST businesses, respectively. Membership in development groups and co-operatives is miniscule across the board. In terms of political networks, 12.5 % of SCSTs have someone in, or close to, their household who has been an official in local bodies, while for non-SCSTs, the corresponding figure is 10.6 %.Footnote 14 Overall, there is no discernible pattern in network participation of the two groups in our data.

These gaps in performance could also be related to other characteristics, such as (a) the number of family members who worked in the business: SCST businesses have greater than average number of family members working in the business (1.47), as compared to non-SCST businesses (1.37); and (b) the total number of hours put in by everyone working in the business: Non-SCST businesses record 1.3 times more hours than their SCST counterparts.

In terms of business location, about 25 % of businesses are home-based, and this proportion does not differ by caste. 34 % of SCSTs and 20 % of non-SCSTs have mobile workplaces, while the proportions of non-SCSTs and SCSTs with fixed workplaces are 55 and 39, respectively. To the extent a fixed workplace indicates permanency, it suggests that non-SCST businesses are more stable and less makeshift.

The most important sector for these businesses is ‘wholesale, retail trade and restaurants and hotels,’ which include activities such as running of ‘kirana’ (neighborhood grocery) stores, other grocery and general stores, and petty shops. 56 % of non-SCST businesses and 44.5 % of SCST businesses are involved in this sector. About 13 % of businesses are in manufacturing activities, and this proportion does not vary by caste. The major activities here are blacksmiths, carpenters and flour mills. About 16 % of businesses are in the ‘community, social and personal services’ sector. This includes activities such as barbers, cycle repair shops and tailoring. These examples also corroborate our intuition that these businesses are engaged in low-end activities, and are more survivalist than entrepreneurial.

Approximately 6.5 % of businesses are in the ‘transport, storage and communication’ sector, with the proportion being the same across castes. Overall, only 4 % of businesses are in the primary sector (agriculture, hunting, forestry and fishing), but 15 % of SCST businesses are in this sector. Proportions in ‘construction’ and ‘financing, insurance, real estate and business services’ are small, involving only about 2 % of businesses each. Businesses engaged in ‘mining and quarrying’ and ‘electricity, gas and water’ sectors are practically non-existent, as expected, since these highly capital-intensive activities are not conducive to self-employment.

5 Results

5.1 Earnings function estimates

Table 2 reports the OLS estimates with log income as the dependent variable, for the pooled sample, and separately by caste. We present estimates using two specifications. Specification 1 uses only exogenous explanatory variables. This includes age and age squared (as proxies for experience), whether married or not, years of education, whether urban or not, and state of residence. Specification 2 is more exhaustive and also includes potentially endogenous variables. In addition to variables in the first specification, we include the asset ownership/wealth index, memberships in: business or professional groups, credit or savings groups, caste associations, development groups, co-operatives, political networks, number of hours spent by everyone working in the business, number of family members working in the businesses, whether workplace is fixed or moving (reference category is home-based) and industry type.Footnote 15

Table 2 OLS estimation: pooled sample and caste-wise

The SCST dummy is negative and significant in both specifications, indicating that ceteris paribus, belonging to these marginalized groups, is negatively correlated with income. As expected, earnings have a quadratic relationship with the decision-maker’s age such that earnings initially increase with age and start to decline thereafter. Businesses operated by more educated owners also perform better indicating that more years of formal education may be associated with higher managerial ability and business acumen. Households owning more assets are able to overcome liquidity constraints more easily, and we find that asset-rich households own more profitable businesses. Businesses in urban locations perform better possibly due to proximity to markets and easier availability of information. The number of hours spent working is positively correlated with income, as expected. Businesses based on other fixed locations (outside of the home) and that are mobile are correlated with higher incomes than home-based businesses.

Pooled regressions impose the restriction that the returns to included characteristics are the same for the two caste groups. Since this assumption is not realistic, particularly in the Indian context, we also report caste-specific OLS regressions. Caste-specific OLS estimates indicate that some variables—particularly those related to memberships in business or professional groups, development groups and credit groups—correlate in different ways with performance of SCST and non-SCST businesses.

5.2 Decomposition of the mean earnings gap

The results of the Blinder–Oaxaca decomposition with log income as the dependent variable are presented in Table 3.Footnote 16 Panel A and Panel B of Table 3 display the decomposition results using specification 1 and specification 2, respectively. Within each of these panels, we report results using: coefficients from a pooled model over both groups as the reference coefficients; non-SCST coefficients, i.e., how SCST businesses would fare if they were treated like non-SCST businesses; and SCST coefficients, i.e., how non-SCSTs would fare if they were treated like SCSTs.

Table 3 Blinder–Oaxaca decomposition of log income

It is apparent that as more variables are added in moving from Panel A to Panel B, the explained proportion increases and the unexplained proportion decreases significantly. In Panel B, in the presence of non-SCST coefficients, the unexplained component is 19 %. Using SCST coefficients, we see that the unexplained proportion is 10.05 %, and for the pooled model, the corresponding value is 16 %. Following Banerjee and Knight (1985), we can take the geometric mean of the estimates based on the SCST coefficients and non-SCST coefficients to yield a single estimate of the unexplained component which amounts to 13.8 %.

Which of the variables contributes the most to the explained component? The lower panel of Table 3 shows the contribution of selected significant characteristics to the overall explained part of the income gap. Using the first specification, years of education contributes 39–42 % of the explained component, depending on the counterfactual earnings structure. Urban location also accounts for 37–44 %. However, in the second specification, the importance of years of education and urban location declines significantly to around 5 and 8 %, respectively, and number of hours and asset ownership emerge as the dominant variables, each accounting for approximately 30–40 % of the explained component. Since variables such as years of education and urban location are generally positively correlated with asset ownership, it is not surprising that upon controlling for the latter, the relative importance of education and location declines.

5.3 Quantile regressions

For quantile regressions, we use the same two specifications of the earnings function that we used for the OLS regressions. The average gap in log incomes of non-SCST-owned and SCST-owned businesses is 0.75, which corresponds to a gap of 112 % in raw net incomes of the two types of businesses. This is instructive, but when we juxtapose this against the log income gap for the different quantiles, we see that restricting the analysis to only mean gaps misses a large part of the bigger picture. Broadly speaking, as Fig. 2 indicates, while the uncontrolled log income gap is positive throughout the distribution, the gap is higher for low-income businesses as compared to high-income businesses, with the gap for those at the 10th percentile (300 %) and 25th percentile (154 %) being substantially higher than the gap at the 75th and 90th percentiles (87 and 66 %, respectively). This phenomenon of higher gaps at lower levels of the earnings distribution is similar to the ‘sticky floor’ phenomenon observed in the gender wage gap literature. Sticky floors are broadly defined as declining earning gaps as one moves from lower to higher quantiles of the earnings distribution (e.g., Arulampalam et al. 2007).Footnote 17 Unlike gender wage gaps in most developed countries that are characterized by ‘glass ceilings’ (i.e., increasing wage gaps as one moves from lower to higher quantiles), several developing countries reveal a sticky floor, for instance India (Khanna 2013; Deshpande et al. 2015), China (Chi and Li 2008), Bangladesh (Nordman et al. 2015) and Vietnam (Pham and Reilly 2007). In fact, Carrillo et al. (2014) find that gender wage gaps in poorer and more unequal countries exhibit sticky floors, whereas glass ceilings characterize richer and less unequal ones, using a sample of 12 Latin American countries.

Fig. 2
figure 2

Caste log income gap across percentiles and average gap

Tables 4 and 5 report quantile regression results for the two specifications, respectively, for the pooled model at the 10th, 25th, 50th, 75th and 90th percentiles. The estimates show that controlling for various characteristics reduces but does not eliminate the caste gap observed in Fig. 2. In Table 5, even with the most inclusive specification, the caste dummy remains negative and significant, but compared to Table 4, its magnitude is much smaller at each of the percentiles. The sticky floor no longer prevails as in Table 4. The caste income gap increases from 10 % at the 10th percentile to 16 % at the median, declines to 10 % at the 75th percentile and increases again up to 14 % at the 90th percentile.

Table 4 Quantile regression: specification 1 (pooled sample)
Table 5 Quantile regression: specification 2 (pooled sample)

Results of caste-specific quantile regressions are reported in Tables 6 and 7. Results from Table 7 indicate that while being married and years of education are associated positively with income for non-SCSTs, they are not significant for SCSTs. Being located in urban areas and number of hours spent in the business seems to confer greater benefits at the lower end of the earnings distribution than at the higher end, for both groups. On the other hand, gains from asset ownership are mostly increasing across the distribution for both groups. Returns to other fixed or moving workplaces appear higher at all percentiles for SCSTs as compared to non-SCSTs.

Table 6 Caste-wise quantile regressions: specification 1
Table 7 Caste-wise quantile regressions: specification 2

5.4 Quantile decompositions of log income gaps

We conduct the quantile decompositions separately using both specifications.Footnote 18 Table 8 shows the summary results with the raw difference, characteristics effect and coefficients effect at the 10th, 25th, 50th, 75th and 90th percentiles using the non-SCST coefficients. As noted previously, another set of estimates could be obtained using the SCST coefficients (Table 9).

Table 8 Quantile decompositions of log income (non-SCST coefficients)
Table 9 Quantile decompositions of log income (SCST coefficients)

In both Tables 8 and 9, as with the mean decomposition, we note that upon including more explanatory variables in specification 2, the proportion explained increases. Based on non-SCST coefficients and focusing our attention on specification 2 (Table 8), we find that the raw log income gap shows a generally declining trend, first decreasing from 0.95 at the 10th percentile to 0.57 at the median and then remaining fairly flat thereafter. The proportion of the income gap due to differences in characteristics lies in the range of 84–87 % and then increases to about 91 % at the 90th percentile. Using the SCST coefficients (Table 9), the proportion explained is slightly lower. Differences in characteristics account for about 70 % of the gap at the 10th and 25th percentiles, 77 % at the median and 82 % at the 90th percentile.

Mirroring these trends, we find using both counterfactuals that the unexplained component is somewhat larger at the lower end of the conditional earnings distribution than at the higher end. In Table 8, using the second specification, the unexplained share declines from 12 % at the 10th percentile to approximately 8 % at the 90th percentile. While with the non-SCST earnings structure, the coefficients effect is significant for the businesses in the middle range of the earnings distribution (median and third quartile), with the SCST earnings structure, the unexplained component is statistically significant throughout the distribution.

Figures 3 and 4 plot the raw gap, the contribution of characteristics and that of coefficients at each percentile of the earnings distribution using the second specification for the non-SCST and SCST coefficients, respectively.

Fig. 3
figure 3

Quantile decomposition of log income gap: non-SCST coefficients

Fig. 4
figure 4

Quantile decomposition of log income gap: SCST coefficients

6 Discussion

As mentioned earlier, the unexplained component is the residual gap that remains after all characteristics are accounted for. It measures the influence of other unmeasured and unobserved factors, including discrimination. For instance, the IHDS data do not contain information on family background in running an enterprise, type of customer base, risk aversion of the owner, etc., all of which previous literature has shown to affect enterprise performance. Similarly, there are characteristics such as ability or motivation, which cannot be measured but can affect the earnings gap. Similarly, as noted earlier, the explained component could include pre-market discrimination.

While we cannot test empirically the channels through which discrimination manifests itself, there exist studies that qualitatively document the presence of active discrimination against SCST businesses. Prakash (2010) in his 2006–2007 survey of 90 Dalit businesses in 13 districts spread across 6 states in India reports difficulty in obtaining initial formal credit in order to set up an enterprise, resulting in informal loans being taken at high interest rates. The ones who did successfully obtain institutional credit were those in partnerships with upper castes or had local political contacts that facilitated loan approvals. Kumar (2013) using data from the 2002–2003 All-India Debt and Investment Survey finds that public sector banks operating in areas with more upper castes tend to discriminate more against low-caste loan applicants. Prakash (2010) also cites Dalit entrepreneurs who reported often charging less for their products than their upper-caste peers so that customers ‘forget’ their castes. Jodhka (2010) through detailed interviews with Dalit entrepreneurs in two towns in northwest India finds that caste works as a direct and indirect barrier in the successful running of their businesses. Most of them report difficulties on account of their Dalit identity in mobilizing finance and getting a space to start their enterprise. A majority of them felt that their caste identity was perceived as more important than their professional identity, which led to them being seen as ‘odd actors’ in the local community.

Another unobserved factor that could constrain enterprise performance is geographical segregation. Residential segregation is still prevalent in India with Dalits living in their own segregated neighborhoods. If the main customer base of SCST enterprises is their own community—and given that SCSTs are on average poorer and have lower purchasing power—they may have to keep their prices low in order to cater to members of their own group. For example, Clark and Drinkwater (2000) explain that ethnic enclaves can be a source of advantage or disadvantage. While on the one hand, a concentration of co-ethnics provides a captive market for producing ethnic goods that hold particular appeal for the community, on the other hand, if the ethnic group is poor, then businesses setup by members of these groups may actually languish.

7 Conclusion

In this paper, our objective has been to assess the presence of caste-based discrimination in small household businesses using the large-scale nationally representative India Human Development Survey of 2004–2005. Our results show that SCST businesses fare significantly worse in terms of owner’s education, household economic status and business characteristics, as compared to their non-SCST counterparts. Using the non-SCST counterfactual earnings structure, at least 20 % of the mean earnings gap between businesses owned by SCSTs and non-SCSTs cannot be explained by differences in characteristics. Further, we find that there is substantial heterogeneity in raw earnings gaps across the earnings distribution revealing a sticky floor, thereby necessitating the use of quantile regression-based decomposition methods. These indicate that the proportion of the earnings gap on account of differences in characteristics is generally increasing in the higher deciles of the conditional earnings distribution.

This paper focuses on one part of the IHDS data set, viz., data related to household non-farm business, where we see clear evidence of caste-based disparities in earnings and other business characteristics, as well as the existence of discrimination. Desai and Dubey’s (2011) analysis based on the entire IHDS data set suggests that our findings fit into the larger pattern of persistence of caste inequalities, which results in inequalities in opportunities as well as inequalities in outcomes. They find an increase in civic and political participation by marginalized groups, but also document how economic and educational disparities continue to flourish.

In addition to being the first to examine this question for India, this paper’s findings confirm patterns that have been observed in the context of racial and ethnic differences in entrepreneurship in other countries such as the USA and UK. However, unlike the USA, for instance, where a number of migrant groups such as the Koreans and Japanese have used self-employment as a way to achieve upward economic and social mobility that does not appear to be the case for India, as suggested by our findings and also those in Iyer et al. (2013) and Deshpande and Sharma (2013). This also suggests that the exuberance surrounding Dalit Capitalism may be somewhat misplaced since the reality of most SC and ST businesses is in stark contrast to the success of a few Dalit billionaires.

The simultaneous existence of discrimination against SCs and STs in self-employment and wage employment presents serious challenges for public policy, further complicated by the existence of ‘pre-market’ discrimination for Dalits which results in lower and poorer quality of educational and skill attainment. While caste-based job quotas in India target public sector-salaried employment, that may not be the appropriate instrument to tackle discrimination faced by the self-employed. One such recent move is a 2012 public procurement policy for micro- and small enterprises (MSE) that mandates 4 % of government procurement to be from MSEs owned by SCs and STs. Other multi-pronged measures, based on more research, need to be devised that would tackle discrimination in both spheres. For instance, looking at the entrepreneurial success of migrant groups in countries such as the USA and UK indicates that Fairlie’s (2006) suggestion of stimulating business creation in sectors with high growth potential (e.g., construction, wholesale trade and business service) might be one effective element of public policy for promoting job creation and increasing earnings, especially in areas where marginalized groups are concentrated.

A larger question is the relationship between earnings and wealth, and whether an increase in earnings (from businesses and elsewhere) is sufficient to close the wealth gap between communities. Barsky et al. (2002) find that roughly two-thirds of the mean difference in wealth between blacks and whites in the USA can be explained by differences in earnings from all sources, which suggests that substantial wealth gaps remain even after controlling for earning differences. Whether an increase in business ownership by SCs and STs translates into narrowing wealth gaps would have to be the subject matter of a future exercise.