1 Introduction

Researchers in the fields of industrial organization and management have long been interested in investigating complementary relations between various organizational practices. Complementarity is understood in this context to exist if the implementation of one practice increases the marginal or incremental return to other practices. Joint implementation of several practices may result in economies of scope (Baumol et al. 1988). The implementation of one practice might also decrease the marginal or incremental return to other practices. This is the case of substitutability (or subadditivity). Examples of studies of complementarity are the relationships between human resource practices and firm strategy (Ichniowski et al. 1997), firms’ internal R&D and external technology sourcing (Arora and Gambardella 1990), process and product innovation (Miravete and Pernias 2004), labor skill and innovation strategies (Leiponen 2005), different government innovation policies (Mohnen and Röller 2005), information technology, workplace reorganization, and new product and service innovations (Black and Lynch 2001; Bresnahan et al. 2002; Caroli and Van Reenen 2001), adoption of different information technologies in emergency health care (Athey and Stern 2002), different types of labor in the determination of trade patterns (Grossman and Maggi 2000) and use of external knowledge across different stages of new product development (Love and Roper 2009).

There are two econometric approaches used to test for complementarity: the “adoption” or “correlation” approach and the “production function” approach (e.g. Athey and Stern 1998). The former has been popular among empirical researchers due to its simplicity (Arora 1996). The adoption approach tests conditional correlations based on the residuals of reduced form regressions of the practices of interest on all exogenous variables. However, although this test can serve as supportive evidence of complementarity, it cannot serve as a definitive test. Estimated correlations between residuals may be the result of common omitted exogenous variables or measurement errors. Even in the case of well-measured correlation between practices, decision makers may not have been sufficiently well informed such that they chose efficiency or output enhancing combinations of practices.

The “production function” approach, in which organizational performance is related to combinations of organizational practices, does not have these drawbacks and can serve as a direct test for complementarity or substitutability.Footnote 1 However, no easily executable testing procedure has been available to test for complementarity or substitutability with more than two practices.Footnote 2 Studies adopting the production function approach have limited analysis to the estimation of pair-wise interaction effects, either including all pair-wise terms (e.g. Caroli and Van Reenen 2001), or estimating only the pair-wise interaction of interest (e.g. Bresnahan et al. 2002). This approach ignores the impact of additional cross-terms (e.g. a triple term in case of three practices), it examines only a partial expression for the cross derivative and is prone to an omitted variable bias that affects all coefficients. As noted by Athey and Stern (1998), a proper complementarity or substitutability test requires a testing framework that considers the complete set of organizational practices. In this paper we develop such a test based on a multiple-inequality restrictions framework corresponding to a definition of strict supermodularity or submodularity (Milgrom and Roberts 1990). We provide Monte Carlo results comparing the power of this test with the performance of the two pair-wise tests.

2 Complementarity and substitutability

We describe the definitions and conditions concerning complementarity and substitutability both for the case of continuously measured practices and the case of dichotomous practices. Consider an objective function f of which the value is determined by the practices x p (p = 1,…,n). In case the practices are measured continuously the following definition of complementarity holds (e.g. Baumol et al. 1988)Footnote 3:

Definition 1

(continuous practices) Practices x i and x j are considered complementary in the function f if and only if \( \partial^{2} f/\partial x_{i} \partial x_{j} \ge 0 \) for all values of \( (x_{1} , \ldots ,x_{n} ) \) with the inequality holding strictly for at least one value.

This definition is demanding in the sense of requiring the cross derivative to be non-negative for all possible or observed values of practices. The definition for substitutability is identical to definition 1 except that ‘larger’ is replaced by ‘smaller’. We use a cross-term specification of the objective function f to test for complementarity or substitutability. The expressions for n equal to 2, 3 and 4 are:

$$ f(x_{1} ,x_{2} ) = \alpha_{0} + \alpha_{1} x_{1} + \alpha_{2} x_{2} + \alpha_{12} x_{1} x_{2} $$
(1)
$$ f(x_{1} ,x_{2} ,x_{3} ) = f(x_{1} ,x_{2} ) + \alpha_{3} x_{3} + \alpha_{13} x_{1} x_{3} + \alpha_{23} x_{2} x_{3} + \alpha_{123} x_{1} x_{2} x_{3} $$
(2)
$$ f(x_{1} ,x_{2} ,x_{3} ,x_{4} ) =\, f(x_{1} ,x_{2} ,x_{3} ) + \alpha_{4} x_{4} + \alpha_{14} x_{1} x_{4} + \alpha_{24} x_{2} x_{4} + \alpha_{34} x_{3} x_{4} + \alpha_{134} x_{1} x_{3} x_{4} + \alpha_{124} x_{1} x_{2} x_{4} + \alpha_{234} x_{2} x_{3} x_{4} + \alpha_{1234} x_{1} x_{2} x_{3} x_{4} $$
(3)

The cross-derivatives \( \partial^{2} f/\partial x_{1} \partial x_{2} \) are equal to \( \alpha_{12} \) for Eq. 1, \( \alpha_{12} + \alpha_{123} x_{3} \) for Eq. 2 and \( \alpha_{12} + \alpha_{123} x_{3} + \alpha_{124} x_{4} + \alpha_{1234} x_{3} x_{4} \) for Eq. 3, respectively. This implies that there is complementarity for the case of two practices if \( \alpha_{12} > 0 \). In case of three practices there are two conditions: \( \alpha_{12} + \alpha_{123} \min (x_{3} ) \ge 0 \) and \( \alpha_{12} + \alpha_{123} \max (x_{3} ) \ge 0 \) with at least one of the inequalities holding. In case of four practices there are four conditions, using the minimum and maximum of x 3 and x 4 , consecutively. We will concentrate upon the case of three and four practices, although the arguments can easily be extended to higher numbers of multiple practices. Figure 1 shows areas of complementarity and substitutability (or neither) in case of three practices and \( x_{3} \in [0,1] \). The latter can be seen as an adoption rate of a practice, running from 0% (no adoption) to 100% (complete adoption).Footnote 4 The areas of complementarity and substitutability include the bold lines but not the origin (0,0).

Fig. 1
figure 1

Areas of complementarity and substitutability

In case the practices take on discrete values variables (step size chosen equal to one) we replace the derivative in definition 1 by a difference. If we consider the first two practices, without loss of generality, the following definition holds:

Definition 2

(discrete practices) Practices x 1 and x 2 are considered complementary in the function f if and only if \( f(x_{1} + 1,x_{2} + 1,x_{3} , \ldots ,x_{n} ) + f(x_{1} ,x_{2} ,x_{3} , \ldots ,x_{n} ) \ge f(x_{1} + 1,x_{2} ,x_{3} , \ldots ,x_{n} ) + f(x_{1} ,x_{2} + 1,x_{3} , \ldots ,x_{n} ) \) for all values of \( (x_{1} , \ldots ,x_{n} ) \) with the inequality holding strictly for at least one value.

The case of dichotomously measured practices (practice is used or not) is a special case of this definition. In that case functions (1), (2), and (3) can also be conveniently rewritten in terms of the possible combinations of practices (cf. Mohnen and Röller 2005). With two practices the collection of possible combinations is defined in the usual binary order as \( D = \{ \,(0,0),\,(0,1),\,(1,0),\,(1,1)\,\} \). We introduce the indicator function \( I_{D = (r,s)} \), equal to one when the combination is \( (r,s) \), else zero. Similar, we have \( I_{D = (r,s,t)} \) for the case of three practices. The functions f is rewritten as:

$$ f(x_{1} ,x_{2} ) = \sum\limits_{r = 0}^{1} {\sum\limits_{s = 0}^{1} {\beta_{rs} I_{{(x_{1} ,x_{2} ) = (r,s)}} } } $$
(4)
$$ f(x_{1} ,x_{2} ,x_{3} ) = \sum\limits_{r = 0}^{1} {\sum\limits_{s = 0}^{1} {\sum\limits_{t = 0}^{1} {\beta_{rst} I_{{(x_{1} ,x_{2} ,x_{3} ) = (r,s,t)}} } } } $$
(5)

The conditions of complementarity now correspond to \( \alpha_{12} = f(1,1) - f(1,0) - f(0,1) + f(0,0) = \beta_{11} + \beta_{00} - \beta_{10} - \beta_{01} > 0 \) for two practices and \( \alpha_{12} = \beta_{110} + \beta_{000} - \beta_{100} - \beta_{010} \ge 0 \) and \( \alpha_{12} + \alpha_{123} = \beta_{111} + \beta_{001} - \beta_{101} - \beta_{011} \ge 0 \) for three practices, with one of the two inequalities holding strictly.

3 The testing procedure

In case of two practices the test for global complementarity is a one-sided t-test of the null hypothesis of \( \alpha_{12} = 0 \) in Eq. 1. However, in the general case of n practices, the number of constraints that have to be tested simultaneously is \( 2^{n - 2} \). One approach is to apply statistical tests along the lines of Gouriéroux et al. (1982), Kodde and Palm (1986) and Wolak (1989).Footnote 5 This procedure is followed by Mohnen and Röller (2005) for dichotomously measured practices. The critical values of such tests are however cumbersome to derive. This limits applicability. In addition the test requires software able to do linear regression under unequality constraints. We propose a simpler procedure, which we explain for three and four practices (for five practices, see the Appendix), all measured in the unit interval [0,1]: \( 0 \le x_{3} ,x_{4} \le 1 \). This also includes the case of dichotomously measured practices. Our procedure is a separate induced test, where a combined hypothesis is accepted if all the separate hypotheses are accepted (Savin 1980). For three practices we have:

$$ y = \alpha_{1} x_{1} + \alpha_{2} x_{2} + \alpha_{3} x_{3} + \alpha_{12} x_{1} x_{2} + \alpha_{13} x_{1} x_{3} + \alpha_{23} x_{2} x_{3} + \alpha_{123} x_{1} x_{2} x_{3} + \varepsilon $$
(6)

where \( \varepsilon \sim {\text{N}}(0,\sigma_{\varepsilon }^{2} ) \). There is complementarity between practices 1 and 2 if \( \alpha_{12} \ge 0 \) and \( \alpha_{12} + \alpha_{123} \ge 0 \) with at least one of the two inequalities holding strictly. Now we rewrite Eq. 6 into:

$$ y =\, \alpha_{1} x_{1} + \alpha_{2} x_{2} + \alpha_{3} x_{3} + \alpha_{12} (x_{1} x_{2} - x_{1} x_{2} x_{3} ) + \alpha_{13} x_{1} x_{3} + \alpha_{23} x_{2} x_{3} + (\alpha_{12} + \alpha_{123} )x_{1} x_{2} x_{3} + \varepsilon $$
(7)

The test can now be executed using linear regression and considering the significance of the coefficients of the variables \( x_{1} x_{2} - x_{1} x_{2} x_{3} \) and \( x_{1} x_{2} x_{3} \). Say that the t-value of the former is t 1 and of the latter t 2 , then the new test indicates complementarity if either\( t_{1} > t_{c} \) and \( t_{2} > - t_{d} \)or\( t_{1} > - t_{d} \) and \( t_{2} > t_{c} \)” where t c and t d are the critical t-values depending upon the significance level. The test indicates substitutability if either\( t_{1} < - t_{c} \) and \( t_{2} < t_{d} \)or\( t_{1} < t_{d} \) and \( t_{2} < - t_{c} \)”. For four practices we have:

$$ \begin{aligned} y =& \alpha_{1} x_{1} + & \alpha_{2} x_{2} + \alpha_{3} x_{3} + \alpha_{4} x_{4} + \alpha_{12} x_{1} x_{2} + \alpha_{13} x_{1} x_{3} + \alpha_{14} x_{1} x_{4} + \alpha_{23} x_{2} x_{3} + \alpha_{24} x_{2} x_{4} \\ &+ \alpha_{34} x_{3} x_{4} + \alpha_{123} x_{1} x_{2} x_{3} + \alpha_{124} x_{1} x_{2} x_{4} + \alpha_{134} x_{1} x_{3} x_{4} + \alpha_{234} x_{2} x_{3} x_{4} + \alpha_{1234} x_{1} x_{2} x_{3} x_{4} + \varepsilon \\ \end{aligned} $$
(8)

This can be rewritten into:

$$ \begin{aligned} y =& \alpha_{1} x_{1} + \alpha_{2} x_{2} + \alpha_{3} x_{3} + \alpha_{4} x_{4} + \alpha_{12} (x_{1} x_{2} + x_{1} x_{2} x_{3} x_{4} - x_{1} x_{2} x_{3} - x_{1} x_{2} x_{4} ) + \alpha_{13} x_{1} x_{3} \\ &+ \alpha_{14} x_{1} x_{4} + \alpha_{23} x_{2} x_{3} + \alpha_{24} x_{2} x_{4} + \alpha_{34} x_{3} x_{4} + (\alpha_{12} + \alpha_{123} )(x_{1} x_{2} x_{3} - x_{1} x_{2} x_{3} x_{4} ) \\ &+ (\alpha_{12} + \alpha_{124} )(x_{1} x_{2} x_{4} - x_{1} x_{2} x_{3} x_{4} ) + \alpha_{134} x_{1} x_{3} x_{4} + \alpha_{234} x_{2} x_{3} x_{4} + (\alpha_{12} + \alpha_{123} + \alpha_{124} + \alpha_{1234} )x_{1} x_{2} x_{3} x_{4} + \varepsilon \\ \end{aligned} $$
(9)

The test on complementarity is whether \( \alpha_{12} \ge 0 \) and \( \alpha_{12} + \alpha_{123} \ge 0 \) and \( \alpha_{12} + \alpha_{124} \ge 0 \) and \( \alpha_{12} + \alpha_{123} + \alpha_{124} + \alpha_{1234} \ge 0 \) with at least one of the four inequalities holding strictly. Hence, we use linear regression and consider significance of the coefficients of the four variables \( x_{1} x_{2} + x_{1} x_{2} x_{3} x_{4} - x_{1} x_{2} x_{3} - x_{1} x_{2} x_{4} \), \( x_{1} x_{2} x_{3} - x_{1} x_{2} x_{3} x_{4} \), \( x_{1} x_{2} x_{4} - x_{1} x_{2} x_{3} x_{4} \) and \( x_{1} x_{2} x_{3} x_{4} \). Denote the t-values of these coefficients as t 1 , t 2 , t 3 and t 4 . The test indicates complementarity in case one of the following four conditions holds: \( (t_{1} > t_{c} )\, \wedge \,(t_{2} > - t_{d} )\, \wedge \,(t_{3} > - t_{d} )\, \wedge \,(t_{4} > - t_{d} ) \) or \( (t_{1} > - t_{d} )\, \wedge \,(t_{2} > t_{c} )\, \wedge \,(t_{3} > - t_{d} )\, \wedge \,(t_{4} > - t_{d} ) \) or \( (t_{1} > - t_{d} )\, \wedge \,(t_{2} > - t_{d} )\, \wedge \,(t_{3} > t_{c} )\, \wedge \,(t_{4} > - t_{d} ) \) or \( (t_{1} > - t_{d} )\, \wedge \,(t_{2} > - t_{d} )\, \wedge \,(t_{3} > - t_{d} )\, \wedge \,(t_{4} > t_{c} ) \). Testing for substitutability means that we replace the ‘larger than’ signs by ‘smaller than’ signs. The literature on Bonferroni procedures is now relevant for determining the probability of type I error for the significance level of the combined hypothesis. Given a significance level for the combined hypothesis of A and a total of \( 2^{n - 2} \) constraints, the (original) Bonferroni procedure suggests a significance level for the seperate hypotheses of A/ \( 2^{n - 2} \), see e.g. Olejnik et al. (1997), p. 391.Footnote 6 That is to reduce the overall probability of a type I error.

Our test procedure performs a multiple-restrictions test directly connected to the definition of complementarity and substitutability. We compare the performance of the multiple-restrictions test with two alternative test procedures used in recent empirical work. The “single cross-term” test procedure only incorporates the cross term of two practices in the estimated equation, and infers complementarity from the estimated coefficient of the cross-term (e.g. Bresnahan et al. 2002). The “all cross-term” test follows the same procedure but incorporates all pair-wise cross-terms x i x j i ≠ j in one equation (e.g. Caroli and Van Reenen 2001). Another recently proposed procedure is the one by Mohnen and Röller (2005). This procedure tests for strict complementarity and substitutability (where all ‘larger than’ and ‘smaller than’ signs are hypothesized to hold) and therefore is not directly comparable. The procedure is also limited to discrete practices (dummy variables) and by using the Kodde and Palm (1986) critical values has a sizeable inconclusive area. Such inconclusive test outcomes become more likely with the increase of the number of inequality constraints. Furthermore, the test is relatively complicated to execute, requiring optimization under unequality constraints, and difficult to extend to higher numbers of practices.

The performance function in the case of three practices is given in Eq. 6. The single cross term test imposes \( \alpha_{13} = \alpha_{23} = \alpha_{123} = 0 \) and judges complementarity to exist if \( \alpha_{12} > 0 \). This is a simple t-test. The multiple cross-term test applies the same criterion but only imposes \( \alpha_{123} = 0 \). Obviously, the “single cross-term” and “all cross-term” tests suffer from omitted-variable bias. However, since these tests involve restricted estimation, the estimators of \( \alpha_{12} \) are likely to have smaller variance (e.g. Judge et al. 1982, chapter 22). In the next section we devise a Monte Carlo experiment to compare the performance of the three test procedures having a trade-off between bias and precision. Since almost all empirical studies of complementarity in the literature examine the impact of using a certain practice or not, we focus our Monte Carlo experiment on the case of dichotomous variables.

4 Monte Carlo experiments

The data for our experiments are generated for samples of 1,000 and 5,000 observations. These are common sample sizes when investigating complementarities between organizational practices.Footnote 7 We describe the Monte Carlo experimental procedure for three practices. In the first step the coefficients \( \alpha_{1} \) through \( \alpha_{123} \) are randomly and independently drawn from the standard normal distribution and then rounded to whole or half numbers. In the second step, variables z 1 , z 2 , z 3 are drawn from the multivariate standard normal distribution. Variables x 1 , x 2 , x 3 are equal to one when z 1  > 0, z 2  > 0 and z 3  > 0, respectively, else zero. In order to mimic empirical research settings, the correlation structure between the practices is allowed to depend on the presence of complementarity or substitutability. Organizations are more likely to simultaneously adopt two practices if these are complementary. In case the draws of \( \alpha_{1} \) through \( \alpha_{123} \) indicate complementarity, the correlation coefficient between x 1 and x 2 is set at 0.5 and in case of substitutability at −0.5. The correlation coefficient is set at zero if the draw indicates no complementarity or substitutability.Footnote 8 Eq. 6 is used to generate data for y. For four practices a similar procedure and Eq. 8 are used.

The outcomes of the tests are established using 10% two-sided significance levels. This means that the critical level is equal to 1.65 for the pair-wise tests. We also use t d  = 1.65 but t c equal to 1.96 for the multiple-restriction test when there are three practices and 2.24 when there are four practices. The latter follow from the A/ \( 2^{n - 2} \) formula with A equal to 10% and n equal to 3 and 4, respectively. The pair-wise tests consider the sign and t-statistic for \( \hat{\alpha }_{12} \). The above procedure has been repeated 10,000 times for models with different explanatory power. Tables 1, 2, 3 and 4 presents the results of the Monte Carlo experiments for models with three different values of σε These are σε equal to 0.25, 1 and 3.5. These correspond to values for R-squared of approximately 90, 50 and 10% in case of three practices (Tables 1 and 2). The explanatory power is higher in the case of four practices with R-squared around 95, 67 and 18%, respectively (Tables 3 and 4). In Tables 1 and 3 we consider 1,000 observations and in Tables 2 and 4 we consider 5,000 observations. In each of the experiments we compare the results of the tests with the true states of complementarity and substitutability.

Table 1 Monte Carlo experiment for three practices and 1,000 observations (10,000 draws)
Table 2 Monte Carlo experiment for three practices and 5,000 observations (10,000 draws)
Table 3 Monte Carlo experiment for four practices and 1,000 observations (10,000 draws)
Table 4 Monte Carlo experiment for four practices and 5,000 observations (10,000 draws)

Our multiple-restrictions test outperforms both the “single cross-term” and “all cross-term” tests in the large majority of cases. Only in case of a model with a low fit (σε equal to 3.5) and a relatively low number of observations vis-à-vis the number of practices, the pair-wise tests appear to perform better. The pair-wise tests perform especially poor in case of four practices. Obviously, in that case there are three further conditions than only \( \alpha_{12} > 0 \). The pair-wise tests perform relatively poorly in the high explanatory power models (σε equal to 0.25, or 1). Clearly, the problem of bias is more important than the lower variance of \( \hat{\alpha }_{12} \) in those cases. The pair-wise tests perform much better in relative terms for the models with low R 2. The “single cross-term” test shows the highest percentage of correct predictions with for example 63.5% in Table 1 and 71.0% in Table 3. Hence, the simpler tests restricting some of the parameters to zero, benefit from having low variance although at the expense of some bias. We conclude that our multiple-restrictions test is a clearly improved testing framework for complementarity or substitutability but only for models in which practices have a noticeable impact on performance. Otherwise, for three practices, pair-wise tests appear as easily executed alternatives with relatively good predictive power.

5 Conclusion

Recent empirical studies of organizational performance have been concerned with establishing potential complementarity between more than two organizational practices adopted simultaneously. These papers have drawn conclusions on the basis of potentially biased estimates of pair-wise interaction effects between such practices. This paper developed a consistent and simple testing framework based on multiple inequality constraints that derives from the definition of (strict) super modularity as suggested by Athey and Stern (1998), and compares the performance of this test with previously used methods. Monte Carlo results show that this multiple-restrictions test is generally superior for performance models.