1 Introduction

1.1 Certification and social equity

The core logic of sustainability certification is to promote responsible production by verifying that products are produced in accordance with agreed upon environmental and social requirements. The social aspirations of certification, in particular, have prompted critiques of how effectively certification itself delivers social equity, i.e. fairness in processes of decision-making and distribution of benefits, costs, and risks (McDermott 2013; Pinto et al. 2014; Kalonga et al. 2015). There is a growing body of research on sustainability certification’s impacts on social equity (e.g. Durst et al. 2006; Alemagi et al. 2012; Foley and McCay 2014; Kalonga et al. 2015; Tysiachniouk and McDermott 2016; Bullock et al. 2018) and for small-scale producers (Bolwig et al. 2009; Loconto and Dankers 2014; DeFries et al. 2017).

Coffee has been a particularly vibrant sector for social and environmental certification and is indicative of the rising prominence of trans-national private governance of global trade (Giovannucci and Ponte 2005; Auld 2010; Gereffi 2014). Coffee—the world’s second most traded commodity by value (Ogden et al. 2013; Tucker 2017)—was one of the first crops to be targeted for sustainability certification, and the world market share of sustainable coffee adhering to social, environmental, and economic standards has grown rapidly in recent years (Panhuysen and Pierrot 2014; Oya et al. 2017; Tayleur et al. 2017). The original motivations for certifying coffee production were social in nature, with pioneering labels such as Max Havelaar and FairTrade aiming to give farmers a fair price and improved access to foreign buyers (Giovannucci and Koekoek 2003; Tayleur et al. 2017). Other certification programmes, including the Rainforest Alliance’s (RA’s) Sustainable Agriculture Network (SAN), were driven primarily by non-governmental organisations (NGOs), with a broader mandate to incorporate both environmental and social concerns (Sasser 2003; Cashore et al. 2004; Auld et al. 2008; Kolk 2013). Nevertheless, across all certification schemes, the way in which earnings from the lucrative coffee crop are distributed along the supply chain is the subject of much contestation (Dicum and Luttinger 1999; Charveriat 2001; Gresser and Tickell 2002; Talbot 2004; Raynolds et al. 2007; Raynolds 2009; Pinto et al. 2014).

Equity has been a particularly strong focus of research within the coffee sector (Bray et al. 2002; Raynolds 2002; Bacon 2008; Lyon et al. 2010). Possible reasons for this equity focus include the major role that smallholders play in coffee production worldwide (Charveriat 2001; Gresser and Tickell 2002; IBGE 2006; Potts et al. 2014); the volatility of global coffee prices and associated labour market insecurity (Gereffi and Korzeniewicz 1994; Ponte and Gibbon 2005; Malan 2013; Kolk 2013; Jena et al. 2017); and the narrow share of profits allocated to farmers in the sector (e.g. Raynolds et al. 2007; Bacon 2008; Beuchelt and Zeller 2011; Rueda and Lambin 2013, etc.). As a result, it is important to understand where, when, and for whom coffee certification is burdensome or unattainable, and how this may relate to social outcomes and thus equitable economic development. At the producer level, researchers have debated the balance between certification’s social benefits, especially for workers and communities, and its costs, particularly for certified smallholders (e.g. Barbosa de Lima et al. 2009; Arnould et al. 2009; Bolwig et al. 2009; Preißel and Reckling 2010; Blackman and Rivera 2010).

1.2 Brazilian coffee and SAN/RA group certification

In Brazil—the world’s primary supplier of both coffee (producing around one-third of the world’s coffee) and of certified coffee beans (Potts et al. 2014)—the majority of coffee producers are smallholders (Watson and Achinelli 2008; Saes 2008), with an estimated 85% of Brazilian coffee farmers holding and cultivating less than 50 ha (IBGE 2006, 2013). These smallholders face difficulties accessing the financial resources to undergo certification (Nordlund and Egelyng 2008; Pinto et al. 2014). Despite Brazil’s economic leadership in agricultural commodities production (Alexandratos and Bruinsma 2012; Lapola et al. 2014) sustainability issues such as deforestation, pesticide use, hazardous work, and poor labour conditions have been common challenges for Brazilian agriculture (Kruger 2007; Nepstad et al. 2009; Martinelli et al. 2010).

The SAN standard addresses these challenges through both its criteria that prescribe improved (1) environmental and (2) social performance, as well as (3) through its management criteria that outline what is needed to adequately plan for and monitor improved socio-environmental performance. At the core of the management criteria is the requirement for a farm management system that integrates the agronomic, operational, environmental, and social dimensions of production (SAN 2011a).

At the landscape level, concerns have been raised that the high costs of certification favour large producers, and that insufficient numbers of resource-poor smallholders have been certified across both farm and forest landscapes (e.g. Renard 2005; Raynolds et al. 2007; McDermott 2013; Handschuch et al. 2013; Nelson and Phillips 2018). While the focus of the SAN standard is on improving on-farm management, its broader mission is to promote sustainability at a larger, landscape level.

The SAN developed the concept of “group certification” in 2008 as a means to increase smallholder participation through economies of scale (Milder et al. 2010; McDermott 2013; Hidayat et al. 2015; Kissinger et al. 2014). A study of SAN/RA-certified coffee in Brazil indicated that group certification is at least partially fulfilling this goal of increasing smallholder access to certification since more small- and medium-scale producers receive group certification than individual certification (Pinto et al. 2014). Indeed, in 2013, certified coffee groups in Brazil were responsible for almost half of the volume of SAN/RA-certified coffee (Pinto 2014).

Group certification requires, firstly, that farmers organise themselves into a formal and legally recognised producer group, such as a cooperative (ISEAL 2008). Group certification functions as an internal quality system (Fouilleux and Loconto 2017), whereby a designated group leader prepares the group for a later external audit on the group’s overall levels of compliance. The group is responsible for ensuring that all of its members comply with the criteria, which makes for a more challenging auditing process, burdening a group’s leaders with significant responsibilities (Mutersbaugh 2002; Winters et al. 2015). Every year, a farm group must be audited for at least 2 days by at least two auditors (SAN 2011a, v. 3).

Apart from the lack of financial resources, another core issue relevant to certification equity is the difficulty smallholders experience complying with procedural requirements (Durst et al. 2006; Alemagi et al. 2012; Bakker 2014)—or, as we shall refer to them in this paper, management requirements. Typically these resource-poor producers do not have adequate means to adapt to new management requirements (Plouffe et al. 2011), i.e. requirements that focus on plans and procedures to achieve or verify desired outcomes, but do not directly deliver substantive sustainability outcomes (Cashore 1997; Blowfield, 1999; Elliott, 2000; McDermott et al. 2010). Thus, from the perspectives of social equity and development outcomes, it is critical to consider which certification criteria are associated with achieving high social performance (the social outcomes that must be achieved for certification, such as good community relations, responsible labour practices, health, education), and how this may vary by farm type and size.

1.3 Focus of this paper

This paper returns to certified coffee’s social justice origins by focusing on the SAN standard used to certify coffee farms in Brazil.

Many agricultural certification standards—such as FairTrade, Utz, and Organic—have converged by combining what we label “management” (procedural) and “social” (performance) criteria (Turcotte et al. 2014). The SAN/RA standard contains comprehensive procedural and performance requirements based on social issues (Potts et al. 2014, Table 3.8). In this study we examine compliance with management criteria and social performance criteria, and the relationship between the two, in order to gain insight into some of the challenges associated with certification compliance and attainment, particularly from a social equity perspective.Footnote 1 (We study management requirements set out in the written SAN standard, and not those associated with the process of certification auditing or verification; consideration of such process requirements is beyond the scope of this study and merits additional research.)

To gain insight into the roles of procedural- and performance-based standards (Blowfield 1999; Cashore et al. 2004; Gulbrandsen 2005, 2010; McDermott et al. 2008) in the context of achieving social equity through certification, we investigate which are the most prevalent non-compliances with SAN/RA’s procedural management and social performance criteria and assess whether compliance with social performance criteria appears to increase with compliance with management criteria. This analysis of non-compliance builds upon a number of previous studies, primarily in the forest sector which found by and large, certified companies have had to make more changes pertaining to their documentation and monitoring rather than to on-the-ground practices (Gulbrandsen 2005; Newsom et al. 2006; Auld et al. 2008, Sect. 3.3). Similarly, a study by Rametsteiner and Simula (2003, p. 95) commented that improvements were mainly with “internal auditing and monitoring”. However, to date, little research has been done to assess the correlation between certification’s management requirements and socio-environmental performance.

In the literature there is a dearth of analyses on audits, with a few exceptions (e.g. Locke et al. 2007; Distelhorst et al. 2015; Toffel et al. 2015). Our study is based exclusively on audit data so fills this gap in the literature, and to our knowledge, this is the first study to analyse such a large certification dataset for any agricultural commodity. Furthermore, this study takes a landscape-level approach by assessing all Brazilian coffee farms that sought SAN/RA certification from 2006 to 2014. Our sample, therefore, provides a complete picture of compliance across all farm sizes and types. To explore landscape-level equity, we compare the compliance results for individual and group certified farms. We also assess whether compliance is related to the size of an individual farm, i.e. whether larger farms comply less or more, on average, than smaller farms. Lastly, we study trends in compliance over time to understand more about farms’ performance with each annual audit, in part because the standard becomes more demanding over time with the addition of new rules and new critical criteria.

2 Materials and methods

Evidence as to whether certification achieves its intended sustainability objectives is mixed (Blackman and Rivera 2011; Loconto and Dankers 2014; van Rijsbergen et al. 2016) and lacking in rigour (Oya et al. 2017; DeFries et al. 2017; Nelson and Phillips 2018). The lack of large-scale assessments of sustainability certification’s impacts could be impeding improvements to the standards (Milder et al. 2012; Milder et al. 2015) and could be contributing to fragmented policy messages (Newton et al. 2013). Therefore, we propose and develop a systematic and scalable approach to evaluate the impacts of the SAN standard. Better evidence on impacts, and across longer time frames, could position certification standards as “powerful adaptive management” frameworks which could support sustainability efforts more widely (Milder et al. 2015, p. 312).

Our analysis, at its core, is based on the issuance of non-compliances during audits. Non-compliances are issued by accredited auditors of the certifying body in the course of their certification assessments to indicate non-compliance with the SAN standards’ criteria and to define the steps that certified operations must take to achieve future compliance. Non-compliances may be “major” or “minor” depending on their significance. An analysis of non-compliances can be used to compare performance across criteria and between producer types. Since certified operations are audited annually, an analysis of major and minor non-compliances can also be used to assess reported changes in performance over time. Farms, whether certified as groups or as individuals, need to achieve a minimum of 80% compliance with the SAN criteria, for each annual audit (SAN 2011a, b). Farms that achieve at least 80% compliance—and moreover, comply with every so-called critical criterion (see “Appendix”)—are awarded the Rainforest Alliance Certified™ seal.

All the audit data used in this study were provided by IMAFLORA. Until December 2014, IMAFLORA was the only certification body in Brazil accredited by the SAN and RA to verify that farms comply with SAN standards; therefore, our data cover every Brazilian coffee farm that sought SAN/RA certification from 2006 to 2014, inclusive. IMAFLORA has recorded all the non-compliances on the farms they have audited from 2003 until the end of 2014. These audit results have been complete since 2006. (The audit results from 2003 to 2005 were not recorded systematically.) This complete recording of farm non-compliances across 9 years represented a unique opportunity for a rigorous and quantitative analysis of the non-compliances, including a time-series analysis of non-compliance trends.

2.1 Classifying social and management criteria

We classified the management and social performance criteria of the SAN standard in consultation with practicing auditors of the SAN standard at IMAFLORA Brazil. Management criteria are those which require farmers to adhere to procedures, usually by way of implementing management systems, planning, and record-keeping. For a farm to prepare for a SAN/RA audit, certain procedures and protocol need to be adopted that integrates the agronomic, operational, environmental, and social dimensions of production (SAN 2011a). To comply with the certification criteria, many farms need to initiate a new system for measuring and monitoring their performance. The management criteria, listed as the SAN two-tier numbering system, are the following: 1.1; 1.2; 1.3; 1.4; 1.5; 1.6; 1.7; 1.8; 1.9; 2.1; 2.9; 4.1; 4.6; 5.1; 5.4; 6.1; 6.2; 6.18; 7.2; 8.1; 9.1; 9.2; 10.1; refer to “Appendix” and/or SAN (2011a) for more details on these criteria. These management criteria comprise most of the criteria in Principle 1 of the SAN standard (SAN’s management system), as well as those criteria related to management across SAN’s other nine principles.

Social performance-based criteria are those which address desired social outcomes, such as livelihood guarantees, fair labour practices, community relations, health, potable water, education, and raising environmental awareness. In tandem with auditing practitioners, we grouped all the SAN criteria that pertained to socially sustainable outcomes. The SAN standard is designed to improve social outcomes on farms with more than one-third of SAN’s criteria being social in nature. There is broad agreement among actors that the social criteria with which farms must comply in order to achieve the SAN/RA certification are stringent (Giovannucci and Ponte 2005; Raynolds et al. 2007; Newton et al. 2013; Englund and Berndes 2015). SAN’s social performance criteria are as follows: 5.2; 5.3; 5.5; 5.6; 5.7; 5.8; 5.10; 5.11; 5.12; 5.13; 5.14; 5.15; 5.16; 5.17; 5.18; 6.3; 6.4; 6.5; 6.6; 6.7; 6.8; 6.9; 6.10; 6.11; 6.12; 6.13; 6.14; 6.15; 6.16; 6.17; 6.19; 7.1; 7.3; 7.4; 7.5; 7.6; as before, more details on these criteria may be found in “Appendix” and/or SAN (2011a). These social performance criteria comprise most of the criteria of principles 5, 6, and 7 (referring to labour, occupational health and safety, and wider community relations, respectively) except those previously classified as management.

2.2 Stratification by group and individual farms

Pinto et al.’s (2014) study of SAN/RA-certified coffee in Brazil found that there are a larger number of small-sized producers that receive group rather than individual certification. Since group certification was developed by SAN to increase smallholders’ participation in certification, we stratify our analysis to distinguish between individual- and group-certified farms. We then consider the findings on compliance with social criteria for group certifications as particularly relevant to smallholders.

Beyond the binary farm profile category of whether a farm was a group farm or an individual farm, building on Pinto et al.’s (2014) we also wanted to know how diverse farm sizes could increase social equity across Brazil’s SAN-certified coffee landscape. Therefore, we conducted another, supplementary analysis on farm size. Three area variables—total area, production area, and protected area—were recorded in the IMAFLORA data. Protected area, as defined by the SAN standard and also by Brazil’s Forest Code, is the land that is under protection to conserve biodiversity or environmental services; production area is the land that is used for the cultivation of coffee. We only performed this analysis for individual farms, since the interpretation of recorded areas for farm groups was not straightforward: for example, “500 ha” could have been the result of ten farms of 50 ha each, or two farms of 100 ha and ten farm of around 30 ha, etc., allowing infinitely many different combinations. With farm groups, the areas of individual member farms were not recorded in the audit data. Nevertheless, it is worth noting—especially given our later discussion of the role of group certification in landscape-level equity—that certified groups of farms are known to span diverse area distributions (e.g. a mix of smaller and larger farms vs. many similar-sized farms) and to range in number from as few as four member farms to more than forty member farms (Pinto et al. 2014, Table 2).

2.3 Data and analysis

We wrote a series of computer scripts using the MATLAB programming language to process the available data and compute the following, for both individual farms and farm groups:

  1. 1.

    for each audit (one farm or farm group, 1 year), a “social non-compliance score”; and

  2. 2.

    for each audit (one farm or farm group, 1 year), a “management non-compliance score”.

In Sect. 2.1 we listed the criteria we deemed (based on consultations with SAN standard auditors) to fall on the axes of (1) management and (2) social performance. In order to attach more weight to major non-compliances, we, in adherence to the SAN policy (SAN 2011a, v. 3), penalised each farm 1 point for each major non-compliance, and only 0.5 points for a minor non-compliance. Thus, for example, a farm found to have three major and two minor social non-compliances would end up with an aggregated social non-compliance score of 4 points, while a farm found to have five minor social non-compliances would end up with an aggregated management non-compliance score of 2.5 points (2.5 points being “better” than 4 in the sense that it encapsulates lower overall non-compliance). The relative weights attached to major and minor non-compliances (in a ratio of 2:1) are mathematically consistent with the ranges chosen to represent major versus minor (0–49% vs. 50–99%) non-compliances. This weighting is also the same one used by SAN auditors to calculate their aggregated performance scores that span all possible criteria from the audit.

Thus, we ended up with:

  • 367 social and 367 management scores for individual farms (from 9 years’ worth of data, for 80 farms)

  • 68 social and 68 management scores for farm groups (from 6 years’ worth of data, for 23 groups)

We then wrote further MATLAB scripts to process these new data, in conjunction with various other data available directly in the auditors’ spreadsheets, in order to answer a number of research questions. These research questions are summarised in Table 1, along with brief notes on how we addressed the questions computationally.

Table 1 Specific questions, grouped according to broader themes, which we sought to address with the data available to us; and a brief summary of how these questions were addressed computationally

3 Results

3.1 Non-compliance frequency

3.1.1 Most frequent non-compliances

The most prevalent social non-compliances, analysed separately for individual farms and farm groups, are presented in Tables 2 and 3, respectively. The most prevalent management non-compliances, also analysed separately for individual farms and farm groups, are presented in Tables 4 and 5, respectively.

Table 2 The five most frequent social non-compliances for individual farms, from 2006 to 2014
Table 3 The five most frequent social non-compliances for group farms, from 2008 to 2014
Table 4 The five most frequent management non-compliances for individual farms, from 2006 to 2014
Table 5 The five most frequent management non-compliances for group farms, from 2008 to 2014

The “weighted occurrence” listed in each of these four tables is computed as follows:

$$ {\text{Weighted occurrence}} = 1\times ({\# }\; {\text{of major non-compliances) + }}\frac{ 1}{ 2} \times ({\# }\; {\text{of minor non-compliances)}}. $$

Thus, as with all other analyses where we combine information on major (50–99%) and minor (0–49%) non-compliances, we attach twice as much weight to a major non-compliance as we do to a minor non-compliance. So, for example, four minor non-compliances would contribute the same to the total weighted occurrence as two major non-compliances.

The “relative frequency” listed in each of these four tables is computed as follows:

$$ {\text{Relative frequency, for one criterion's non - compliance (}}{\% } )= \frac{{ {\text{Weighted occurence }}\left( {\text{for that criterion}} \right)}}{{{\text{Highest weighted occurence }}\left( {\text{for all criteria}} \right)}} \times 100 {\% }. $$

Thus, if the criterion with the highest weighted occurrence (i.e. most non-compliance) in a given table has a weighted occurrence of 130, its relative frequency will be 100%; and all other criteria will be weighted relative to that one, so that a criterion with a weighted occurrence of 65, say, will have a relative frequency of 50%. We present these relative frequencies for ease of comparing non-compliances across tables, even when different numbers of audits are involved (since, for example, there are far more data spanning more years for individual farms than for farm groups).

3.1.2 Average compliance: individual versus farm groups

In Table 6 we present weighted occurrence scores (as defined in Sect. 3.1.1) for management and social non-compliances, averaged over all farms, though with the averaging performed separately for individual farms and group farms.

Table 6 Weighted occurrence scores (as defined in Sect. 3.1.1) for management and social non-compliances, averaged over all farms, though with the averaging performed separately for individual farms and group farms

The data may be interpreted as follows. Based on the SAN-RA criteria chosen for our analysis (as per Sect. 2.1 and detailed in “Appendix”), the maximum possible weighted social non-compliance score for any audit was 21.5; the corresponding weighted maximum for management non-compliance was 12.5. Each individual farm had a weighted average of 2.93 management non-compliances flagged per audit (where a minor non-compliance counts 0.5 points vs. 1 point for a major non-compliance), while farm groups had an average of 3.29 management non-compliances flagged per audit. Similarly, each individual farm had, on average, 2.14 social non-compliances flagged per audit, while farm groups had an average of 2.21 social non-compliances flagged per audit.

Superficially, then, it would mean that farm groups perform marginally worse in terms of both social and management compliance, when compared to individual farms. However, it is important to perform statistical tests to ascertain whether there is a statistically significant difference between these mean scores.

An unpaired two-sample t test indicated, however, that there was no statistically significant difference in the levels of either management (p = 0.20) or social (p = 0.76) compliance; note that the p values we quote here and hereafter are all two-tailed p values. This means that the observed data are consistent with a null hypothesis of equal compliance between individual and group farms.

3.2 Social and management correlations

Having looked at social and management non-compliances separately, we next turned to the question of how the two might be related. In other words—did farms that performed better in terms of social performance compliance also fare better in terms of management compliance? In order to answer this question, we calculated the coefficients of (linear) correlation between weighted social non-compliance scores and weighted management non-compliance scores. As for all our analyses, we performed this calculation for both the individual farms’ and farm groups’ datasets. We also performed the aforesaid correlation analysis using two different approaches, where:

  1. 1.

    we treated the result from one audit (for one farm, in 1 year) as one data point in our correlation analysis; and

  2. 2.

    where we first averaged all audits over all years for a given farm, to form a single data point in the analysis.

Approach (1) had the advantage of giving us more data points in the correlation analysis, whereas approach (2) had the advantage of aggregating all available data for a given farm, which meant we could “average out” possible fluctuations due to, for example, one farm having different auditors from year to year.

In Fig. 1 we illustrate the results of the analysis using approach (1); we plot the results of 367 audits for individual farms and 68 audits for farm groups. In both the individual and group cases, we see that there is a positive linear correlation between social non-compliance and management non-compliance. This means that on average, in audits where management compliance was better, social compliance was also (on average) better; conversely, where management compliance was worse, social compliance was generally also worse. The coefficients of linear correlation (Pearson product–moment correlation coefficients) we calculated were 58% in the case of individual farms and 65% in the case of farm groups. These correlations were found to be highly statistically significant (p ≪ 0.001) in both cases.

Fig. 1
figure 1

Management versus social non-compliance scores, for both individual farms (left panel) and group farms (right panel)—where each dot represents a single audit in a single year. The solid lines indicate best-fit linear models, while the dotted lines indicate 90% confidence intervals, such that approximately 90% of data points lie within these bounds. The colours combine social and management compliance and are included here for easy visual identification of, for example, regions of the plots corresponding to the most compliant (blue dots) and less compliant (red dots) audit results

In Fig. 2 we illustrate the results of the analysis using approach (2); we plot the aggregated scores for 80 individual farms and 23 farm groups. Again, in both the individual and group cases, we see that there is a positive linear correlation between social compliance and management compliance. The coefficients of linear correlation we calculated were 65% in the case of individual farms and 81% in the case of farm groups. Again, these correlations were found to be highly statistically significant (p ≪ 0.001) in both cases.

Fig. 2
figure 2

Management versus social non-compliance scores, for both individual farms (left panel) and group farms (right panel)—where one dot represents the aggregated data (over all years) for one farm, or farm group. The solid lines indicate best-fit linear models, while the dotted lines indicate 90% confidence intervals such that approximately 90% of data points lie within these bounds. The colouring is as per Fig. 1

The apparent increase in the strength of the correlation between management and social compliance when we aggregated all available audit data for a single farm or farm group supported our hypothesis that approach (2) allowed us to iron out “noise” in the audit data that might be attributable to auditor subjectivity (e.g. in assigning whether a non-compliance was major or minor), different auditors being used from year to year, etc. In any case, our analysis revealed that higher management compliance and higher social compliance generally go hand in hand.

It is worth noting that although the social–management correlation appears superficially to be stronger for group farms than for individual farms, statistical testing (using the Fisher r-to-z transformation) to compare the correlation coefficients obtained with both approaches (1) and (2) revealed that there isn’t strong evidence to support there being a difference between the social–management correlation for individual farms and farm groups; for approach (1) we calculated p = 0.36, and p = 0.10 for approach (2). In other words, given our data, there is no evidence to suggest that the social–management correlation (“better management compliance is associated with better social compliance”) is different for individual farms than farm groups.

3.3 Farm size versus compliance

In the previous section, we presented the results of an analysis where we correlated social and management compliance scores with each other. We present now a similar analysis, in which we correlated the data we had available concerning individual farm areas with other available variables. We only performed this analysis for individual farms, due to lack of comparable, quality data on the size of each farm seeking certification in a group.

We had three area variables (total area, production area, and conserved area) to correlate with three other variables (aggregate performance scores, social non-compliance scores, and management non-compliance scores—where the latter two scores are the same scores that appear in Figs. 1 and 2), giving us a total of nine correlation coefficients of interest (note that all the area variables will be correlated with each other as well—since production area and conserved area add together to give total area—but these correlations are trivial, and of little interest to this study).

We found that the production area variable was in all cases more strongly correlated with the non-area variables than was either total area or protected area; as we discuss in Sect. 4.3, this finding is consistent with the way in which farmers must adhere to Brazil’s national Forest Code, as well as the agronomical fact that coffee is grown in higher altitudes, where some of the land is steep and thus not suitable for coffee cultivation. Accordingly, we present only the correlation coefficients for production area correlated with the non-area variables. These results are presented in Table 7.

Table 7 Pearson correlation coefficients between production area, management non-compliance score, and social non-compliance score

The data in Table 7 may be interpreted as follows. There exists a statistically significant negative correlation between production area and social non-compliance score. Thus, larger farms (that is farms with larger production areas) comply—on average—slightly more in terms of social performance criteria. Equivalently, smaller farms comply—on average—slightly less in terms of social performance indicators. There is also some evidence to suggest that larger farms may comply more in terms of management criteria, although this correlation between farm size and management compliance is not statistically significant (a larger sample size might in future permit more robust conclusions about the apparent correlation).

3.4 Changes over time

For the 80 individual farms in our study, the median year of first audit was 2009, and our dataset contained an average of 4.6 years of audits per farm. Only four out of the 80 individual farms had any gaps in their audits, i.e. a year between the first recorded audit and the last recorded audit in which the farm was not audited; and in all three cases, there was only a single gap of 1 year without audit. The remaining 76 individual farms were audited every single year after their year of first audit.

For the 23 farm groups, in our study the median year of first audit was 2011, and the dataset contained an average of 3.0 years of audits per farm. Three out of the 23 farm groups had a gap of a single year in their audits; the remaining 19 farm groups were audited every single year after their year of first audit.

In order to study whether farms’ social and management compliance generally improved or deteriorated from year to year, we computed the difference in management compliance scores between consecutive audits. To take into account the fact that audits very occasionally did not happen in consecutive years, we normalised the difference by the number of years between the audits:

$$ {\text{Improvement }}\left( {{\text{year X}} \to {\text{year Y}}} \right) = \frac{{{\text{compliance }}\left( {\text{year Y}} \right) - {\text{compliance }}\left( {\text{year X}} \right)}}{{{\text{Y}} - {\text{X}}}}. $$

Thus, for example, if a farm’s social compliance (measured by weighted occurrence of major and minor non-compliances) improved by 3 points between 2006 and 2007, the improvement would be computed as 3 points/year. If another farm’s management compliance decreased by 4 points between 2012 and 2014 (with no data being available for 2013), the “improvement” would be computed as − 2 points/year, with the negative sign indicating a deterioration rather than improvement.

We plot the results of this analysis in Fig. 3. For both social and management compliance, and for both individual farms and farm groups, it is apparent that both improvements and deteriorations in compliance are observed. On the whole, though, the average change is positive, i.e. on average, a farm is more likely to become more compliant (on both social and management levels) over time.

Fig. 3
figure 3

Histogram of improvements from year to year, where we take the one audit in 1 year to represent one data point (as in Fig. 1). Clockwise from top left, the average improvements are: + 0.25/year, + 0.19/year, + 0.71/year, and + 0.74/year

For audits of individual farms, the average improvements/year were + 0.25/year, and + 0.19/year, for social and management compliance, respectively. For farm groups, these averages were + 0.71/year and + 0.74/year, respectively. Unfortunately, though, the first two improvements were found to be statistically equivalent to 0, while the latter two were significant only at the 10% level (p < 0.1). In future, analysis of more available data should help us draw more definitive conclusions.

As in Sect. 3.2, we also performed an analysis where we aggregated all the years of available data for a given farm or farm group, such that a single data point in our analysis was taken to be the average of all the yearly improvements for that farm or farm group. As before, adopting this approach allowed us to iron out “noise” in the audit data that might be attributable to auditor subjectivity, as well as to make more a straightforward interpretation about whether a single entity (farm or farm group) was likely to improve or deteriorate over time.

The results of this analysis are presented in Fig. 4. For individual farms, the average yearly improvements were + 0.36/year and + 0.23/year, for social and management compliance, respectively. For farm groups, these averages are + 0.87/year and + 1.16/year, respectively. Only the latter two improvements were found to be statistically significantly different from 0 (at the 10% significance level). Once again, the availability of more data might help us draw more definitive conclusions in future.

Fig. 4
figure 4

Histogram of improvements from year to year, where we take the aggregated year-to-year changes for a single farm or farm group to represent one data point (as in Fig. 2). Clockwise from top left, the average improvements are: + 0.36/year, + 0.23/year, + 0.87/year, and + 1.16/year

Because all of these averages are positive, the interpretation is that more farms and farm groups improved over time, in terms of both social and management compliance (although there were some farms and farm groups that deteriorated and some that remained more or less unchanged over time). However, given that some of the improvements were statistically equivalent to zero, a more accurate interpretation would be that farm compliance did not decline over time.

It is worth noting that the average year-on-year changes were always higher (by a factor of a few) for farm groups than for individual farms and that the individual farms’ improvements were not even statistically significant. For example, the average farm group improved management compliance by 1.16 points/year, whereas this average was only 0.23 points/year for individual farms. This finding can be combined with the findings in Sect. 3.1: though farm groups generally start off at a “lower baseline” (lower social compliance, lower management compliance) than individual farms, their compliance appears to improve more rapidly than individual farms’ compliance. We discuss possible reasons for this phenomenon in Sect. 4.4.

Finally, we mention briefly that as an alternative to the above analysis, we ran a linear regression analysis, i.e. we fitted management and social compliance scores as a linear function of time (year), for both individual and group farms, using ordinary least squares estimators for the slope and intercept parameters in the fitted models. Consistent with the above findings, we found that the slope (corresponding to average change in scores over time) had a positive value significantly different from 0 (at a 10%) level only in the case of group farms, for both management and social compliance. The fitted slopes for the individual farms were positive, but statistically equivalent to 0.

4 Discussion

4.1 Non-compliance frequency

The five most frequent non-compliances were the same, and in the same order, for both individual- and group-certified coffee farms. It is noteworthy that these five management non-compliances (relating to management systems, ecological pest control, continual improvement, socio-environmental systems, and labour payment) appeared in the same order across the two broad profile types of individual and group certifications; given the implausibility of this being a random coincidence, these non-compliances may point to areas in need of particular attention when assisting farms to prepare for initial certification, or indeed to improve their certification performance over time.

In terms of social performance-based criteria, both group and individual farms most frequently fail to comply with four out of the same five social criteria. The most frequent non-compliance for both group and individual farms was with criterion 6.6, which focuses on human hygiene facilities and the provision of protective equipment. In Brazil, the laws on human hygiene (Codex Alimentarius 2003) and protective equipment in the workplace and on farms are very stringent (da Cunha et al. 2014; Rodrigues 2015, pers. comm.). Granted that criterion 6.6 proved difficult to comply with, we suggest that the SAN scheme (via its auditors or information documentation) could offer technical assistance to farmers to help them meet this requirement.

The above similarities between group and individual farms for both management and social performance criteria were consistent with the findings of Pinto et al. (2014). These similarities between group and individual farm also demonstrate group certification’s social-levelling potential since resource-poor farmers with smaller holdings performed very similarly to larger individual farms with larger revenues despite having to surmount financial, administrative, and other obstacles to undergo certification.

4.2 Social and management correlations

Over 9 years of audit data for the SAN/RA certificate, we found that there was a statistically significant, positive linear correlation between the two types of non-compliance scores for both individual and group farms. We found that farms that performed better in terms of social compliance also fared better in terms of management compliance, which suggests that management and social criteria could be mutually supportive. The positive correlation between social compliance and management compliance was stronger for group farms.

However, correlation does not imply causation, and the existence of a correlation across a large sample of Brazilian coffee farms does not, by itself, imply a causal relationship. A causal relationship between good management and good social performance is certainly plausible; it would be compatible with established relationships between management and performance outcomes in other certified sectors (Samson and Terziovski 1999; Singels et al. 2001; Melnyk et al. 2003); and in the case of our data, the association (correlation) is strong. As such, a number of the Bradford Hill criteria for causality would be satisfied (Hill 1965). Nevertheless, further work is required to study more closely the temporality of possible cause and effect (cause should precede effect)—something that would likely require a greater quantity and breadth of audit data than existed at the time of this study—as well as, importantly, to rule out other possible explanations for the correlation. Since the group and individual samples are unmatched, there may be one or more unobserved geographic or socio-economic variables that could explain the observed differences.

4.3 Farm size versus compliance

The stratification of producer types between producers that seek certification individually and those in groups is one clear dichotomy that reveals a difference at the farm level. Beyond this binary division of producer profiles, we examined how farm size affected compliance to the SAN standard. The size of production area was considered, since it is a reflection on the earnings from coffee and can highlight issues of socio-economic equity in certification. Rather than choosing another descriptive delineation like the number of members in a group, or the total area of a farm holding, we considered the size of the production area as a proxy for asymmetries and inequalities in resources. Similarly, Pinto et al.’s (2014) study of all coffee farms SAN-RA audited in Brazil in 2011 showed that the majority of individually certified farms are large (> 450 ha, 73%). The same study showed, by contrast, that group-certified farms are mostly small (31–120 ha, 35%), medium (121–450 ha, 35%), or mini (< 30 ha, 15%).

We showed that there exists a statistically significant, negative correlation between production area and social non-compliance score. Thus, larger farms performed—on average—slightly better in terms of social compliance. Equivalently, smaller farms performed—on average—slightly worse in terms of social compliance. There was also some evidence to suggest that larger farms performed better in terms of management compliance, but more data could be required to determine whether this correlation is statistically significant. These results are consistent with the broader literature discussed in this article’s introduction, which finds that smallholders have greater difficulty in meeting certification requirements since compliance “requires significant capital investment” (Winters et al. 2015; p. 597). As before, we draw conclusions here only about correlations; pinpointing the cause(s) of the higher compliance observed for larger farms would require further research.

4.4 Changes over time

Beyond looking at the sustainability of operations at two time points—e.g. before and after certification’s auditing process (Newsom et al. 2006)—our study analysed compliance across multiple time points, across 9 years of auditing performance. We found a year-on-year increasing compliance to certification standards. For SAN-certified Brazilian coffee, both farms and farm groups improved over time, in terms of both social and management compliance. These year-on-year improvements were set against a backdrop of the SAN certificate becoming harder to attain and maintain with more critical criteria and new rules.

Certification standards change over time, as they respond to changes in the sector, incorporate new information, or sometimes, compete with other standards for legitimacy and predominance in the marketplace (Cashore et al. 2004). The SAN standard changed with new rules, new thematic issues, and new foci. Furthermore, farmers must adhere to a mandatory continuous improvement plan whereby all non-compliances must be eliminated within a two-year period, or within a period agreed with the certification team (IMAFLORA 2005, 2008). As such, one could infer that farms and farm groups who engage with the SAN/RA certification system become more engaged with sustainable agriculture methods.

Average improvements were always significantly greater for farm groups than for individual farms. At the same time, farm groups started at a lower baseline (i.e. lower social compliance, lower management compliance) than individual farms. Group farms also undergo a more difficult audit process. Beyond having to comply with at least 80% of the SAN/RA criteria as individual farms must, groups must also comply with an additional “Group Certification Standard” and must appoint a group auditor to conduct internal audits (SAN 2011b). Despite these extra requirements, group farms’ compliance levels were no worse than those of individual farms.

Despite our cautiously optimistic appraisal of changes in compliance over time, we emphasise that the inferred improvements (or in the case of individual farms, lack of deterioration) were, in substantive terms, rather modest. But again all improvements, whether by group or individually certified farms, are made within a certification system that becomes more difficult to acquire year after year. The lack of deterioration over time is reassuring, especially since farms that undergo certification generally were previously committed to environmental and social issues before the certification process (COSA 2014; Hardt et al. 2015). Improvement despite increasing difficulty would suggest that those farms that undergo the SAN audit learn about sustainable management of their farms as well as how to achieve (socially sustainable) outcomes.

The improvements over time also allow us to address some concerns regarding selection bias, a concern raised, inter alia, by Blackman and Rivera 2011; Rueda and Lambin 2013; Tscharntke et al. 2015; Tayleur et al. 2017; and DeFries et al. 2017. Even if it happened to be true (though we have no data to support this hypothesis) that the individual farms applying for certification were already farming in a way where they were able to comply with certification’s sustainability demands, this would not in itself be an indictment of the certification scheme. On the contrary, our results show that certified farms (whether or not their farming practices were sustainable ab initio) actually, on average, improved at each audit while participating in the certification scheme, which is a positive and indeed unanticipated outcome. The results from the farm groups are even more noteworthy: they bear testimony to smallholders (who likely would not otherwise have had access to the certification process for financial or administrative reasons, or even due to illiteracy and innumeracy) starting from a relatively low compliance baseline and improving steadily over time, while benefiting from sustainable agriculture knowledge gained during certification. Also, since farmer uptake of the SAN/RA standard increases each year, Rainforest Alliance Certified™ coffee is no longer niche but mainstream (Potts et al. 2014; Tayleur et al. 2017), and landscape-level social equity is improving; with the increasing mainstreaming of SAN/RA certification, any concern that those electing for certification are a very narrow, unrepresentative sample of farmers becomes less tenable.Footnote 2

The changes in compliance across 9 years of Brazilian SAN coffee audits in our analyses are consistent with Bakker’s (2014) study on group certification, which found that certified smallholders and group farms perceived an improvement in their social well-being, particularly in the form of increased knowledge in agricultural practices, health, and occupational hazards.

The improvements in compliance to the SAN standard indicate sustainable outcomes as per the stipulation of each criterion. But there are also sustainable improvements that go beyond those stipulated in the standards. Locke et al. (2007) found that producers who underwent the audit process had to invest heavily in training for better labour conditions which then had a positive effect on their staff and also on their suppliers. This wider impact of certification’s sustainable improvements is consistent with the Frenkle and Scott (2002) study on improved working conditions after adhering to a code of conduct.

5 Conclusions

5.1 Summary of findings

Environmental and social certification is expanding rapidly across numerous commodities, making it important to understand what certification delivers in terms of social sustainability and equity outcomes. This paper presented a quantitative analysis of a large dataset to chart the evolution of compliance with certification standards over time and over a large number of farm profiles.

In this paper’s analysis, we drew on the example of Brazil’s SAN-certified coffee sector to provide a comprehensive account of the role of management and social criteria in improving on-farm social sustainability. Although one might expect that the small-scale farms comprising farm groups could be disadvantaged (due to economies of scale), we found that when these small-scale farms were certified in groups, they were able to comply with SAN’s management and social performance requirements to become certified. Indeed, farms seeking certification as a group were found to exhibit levels of compliance with both management and social performance requirements on par with the compliance levels of individual farms. Our findings suggested that adopting certification’s management systems on farms contributes to positive social outcomes, and this was the case for diverse farm profiles (large, medium, and small farms; individual and group farms).

Strikingly, the most frequent non-compliances for management criteria were identical and in the same order of frequency for both group and individual farms. Moreover, the most common non-compliances for social performance criteria were also identical, albeit in slightly different order of frequency, for both group and individual farms. From these results, one could infer that the group certification tool has had some success in delivering social equity between these two farm profiles, not least because group certification provided a means for small-scale farms to access certification in the first place.

Our findings indicate that adopting certification’s management systems on farms is associated with positive social outcomes for diverse farmer profiles. Achieving landscape-level equity across one agricultural commodity requires that diverse profiles of farmers are able to manage their workers and farm operations to deliver social sustainability criteria as required by standards, such as that of the SAN standard. Because the analysis explored diverse farmer profiles by aggregating on-farm audit data that spanned all the coffee farms that had sought SAN certification across Brazil (up to 2014), we were also able to make some inferences about social equity at the landscape level, particularly via the social-levelling tool of group certification.

Whereas some certification schemes are weighted heavily in favour of either procedural or performance criteria, the SAN standard combines both procedural (management) and performance (in this analysis, social performance) criteria. Given our findings regarding the correlation between procedural (management) and performance criteria, we conclude that certification standards that balance both types of criteria appear important in delivering sustainability outcomes and in helping smallholders meet certification standards. Similarly, other agricultural policies could be informed by SAN certification and be designed to foster management systems that are associated with improved social sustainability.

5.2 Limitations of this study

It is also critical to acknowledge the limitations of this study. For instance, our data came from a single auditing and certification mechanism—the SAN standard. However, we believe our findings are relevant to a significant segment of coffee producers worldwide since around one-third of global coffee beans are Brazilian (Potts et al. 2014) and also around one-third of global coffee is certified (ibid.). The relevance of the research we present extends beyond the coffee sector. There has been a proliferation of standards in many industries. Among agricultural products, coffee certification is the most widespread and most mature, thus offering an example for other sectors to emulate (Panhuysen and Pierrot 2014).

At several points we presented various statistically significant correlations; we emphasised, though, that while causal links are plausible between some of the variables in question (e.g. adherence to management criteria leading to improved social outcomes), further work is required to demonstrate causality. Similarly, where differences between individual and group farms were observed, we could not make any deductions about the causes of those differences: to do so would require further study of unmeasured, though possibly relevant, variables that could explain these differences. The common methods used to analyse unbalanced panel datasets (see, for example, Finkel 1995), popular though they may be, would still not have permitted unequivocal identification of causal relationships within our data.

Another key limitation of our analysis is that it was conducted through the lens of the SAN standard, focused on social criteria as per the definitions contained in those standards. There are some social issues the standard does not cover; for instance, the SAN standard does not require a living wage, a contentious and fiercely debated issue in Brazil’s labour and welfare arena. Likewise, the issue of women’s voice and participation in coffee cultivation is given limited coverage: only one SAN criterion currently addresses this in the form of a clause on non-discrimination (SAN 2011a, p. 24, criterion 5.2). On the other hand, given the relatively comprehensive scope and widely regarded stringency of the SAN standard’s criteria that are oriented towards social equity (Giovannucci and Ponte 2005; Raynolds et al. 2007; Newton et al. 2013; see also “Appendix”), along with our generally positive findings regarding compliance from a diversity of farms through many years of audits, we may infer that a degree of social equity has been achieved.

With these limitations in mind, the research we have presented here does nevertheless represent a novel analysis of a large certification dataset and lays foundations for future research. The analytical methods we used could be applied more generally to other certification datasets across a wide range of sectors.

5.3 Future work

One could envisage extending this study with a complementary evaluation of the impacts of SAN certification; such a study might make use of interviews with farmers, probing social equity issues broader than those currently contained in the standard as per Barbosa de Lima et al.’s (2009) impact assessment (see chapter 4 on SAN-certified coffee in Brazil). While audit data may indicate an improvement in compliance or move towards social equity, interview data could elucidate the causal mechanisms driving such changes in the first place. IMAFLORA auditors do conduct interviewers with farm workers as part of their audits—used in part to assess compliance with certain criteria—but notes from these interviews were not available for this analysis, nor were they expected to be standardised for cross-farm comparison. Farmer confidentiality may also be an issue. Nevertheless, such holistic analyses could contribute to improvements in the SAN standard and other certification schemes. In the shorter term, the authors hope to present a similar analysis of the SAN standard to the one in this paper, but this time focused on environmental rather than social outcomes.