Causal Inference of Social Experiments Using Orthogonal Designs

Orthogonal arrays are a powerful class of experimental designs that has been widely used to determine efficient arrangements of treatment factors in randomized controlled trials. Despite its popularity, the method is seldom used in social sciences. Social experiments must cope with randomization compromises such as noncompliance that often prevent the use of elaborate designs. We present a novel application of orthogonal designs that addresses the particular challenges arising in social experiments. We characterize the identification of counterfactual variables as a finite mixture problem in which choice incentives, rather than treatment factors, are randomly assigned. We show that the causal inference generated by an orthogonal array of incentives greatly outperforms a traditional design.


Introduction
This paper investigates the problem of making causal inferences in social experiments under noncompliance. We develop two themes motivated by C.R. Rao's fundamental contributions to the characterization of distributions and the study of experiments. We use instrumental variables to characterize the identification of causal parameters as the solution to a mixing distribution problem. We then 1 3 explore orthogonal array designs to correct for the selection bias generated by noncompliance.
Statisticians widely use Rao's research on orthogonal arrays to design efficient arrangements of treatment factors in randomized controlled trials (RCTs). See, e.g., Stinson (2004). Despite its popularity, Rao's research has not been broadly applied to evaluate treatment effects in social sciences. Social experiments are commonly plagued by randomization compromises, such as noncompliance, that often prevent the use of elaborate designs. This paper uses recently developed econometric tools to repurpose Rao's original ideas into a novel framework where orthogonal arrays of incentives play a central role in solving compliance problems in social experiments.
In his M.A. thesis at Calcutta University, C. R. Rao (1943) introduced a powerful class of experimental designs called orthogonal arrays. This design employs combinatorial arrangements of factors (or treatments) for each randomization arm. Rao developed the theory of orthogonal arrays in a series of seminal papers (C. R. Rao 1946aRao , b, 1947Rao , 1949.
The following matrix is an example of an orthogonal array: Matrix A is a 2-1evel orthogonal array because it only uses two elements, 0 and 1. Any two columns of the matrix display all the possible combinations of zeros and ones, that is, (0, 0), (0, 1), (1, 0), and (1, 1). The matrix has four "runs" (rows) corresponding to treatment conditions and three "factors" (columns) corresponding to treatments. The matrix is classified as OA (4, 3, 2, 2), where the first number 2 is the level and the second number 2 is the strength, which is the number of columns where we are guaranteed to see all the possible combinations of zeros and ones. Orthogonal arrays such as OA (4, 3, 2, 2) are widely used to design experiments that determine the optimum mix of factors (or treatments) that maximize production yield. In these experiments, the researcher can choose the combination of inputs in each randomization arm.
A fundamental difference between RCTs in the natural and social sciences is that social scientists often cannot force compliance with intended treatments. In a natural experiment, the experimenter can determine the treatment of each randomization unit. In a social experiment, the randomization units consist of economic agents. The experimenter can attempt to persuade agents but can seldom impose an intended treatment status on them. The final treatment status depends on the agent's decision to comply or not comply with the initial treatment assignment.
Noncompliance violates the principle of randomization that secures the identification of causal effects in perfectly implemented RCTs. Agents that choose to deviate from their assigned treatment may differ from those who do not. The compliance decision introduces the danger of an unobserved confounding variable that may cause both the treatment choice and the outcomes of interest. Noncompliance prevents the use of sophisticated designs, making it especially difficult to reap the benefits of Rao's orthogonal array design.
We present a novel approach to Rao's orthogonal array design to aid the nonparametric identification of causal effects in RCTs with noncompliance. We draw on research by Heckman and Pinto (2018) and Pinto, R. (2021a) 1 and use a choicetheoretic instrumental variable (IV) model. The identification of causal parameters hinges on methods that control for unobserved characteristics of agents. We use discrete instruments to generate a finite partition of unobserved variables. This partition enables us to characterize the identification of causal parameters as a problem of identifying a mixture of unobserved distributions. The partition induced by the instruments enables us to determine the necessary and sufficient conditions for identifying counterfactual outcomes. We use this framework to investigate how the orthogonal design of choice incentives outperforms the traditional approach to social experiments.
Section Causal Model with Choice and Compliance presents a choice-theoretic causal model using instrumental variables. Section Using IV to Control for Unobserved Variables explains how to nonparametrically control for an agent's unobservable characteristics using discrete instruments. Section Identification as a Mixture Problem describes the identification of causal effects as a problem of identifying a finite mixture of unobserved distributions. Section Using Rao's Orthogonal Design to Address Identification Problems Arising from Noncompliance in Social Experiments explains how to use Rao's orthogonal design to identify and estimate causal parameters. Section Conclusion concludes.

Causal Model with Choice and Compliance
In social experiments, the treatment status is typically determined by agents' decisions to comply with the treatment choice. This generates the problem of selection bias, which makes it difficult to identify causal effects. Economists have long used instrumental variables to solve the problem of selection bias and to identify causal effects in choice models. This paper examines the case of multivalued-choice models with categorical instrumental variables and heterogeneous agents.

Decision-Theoretic Foundation
The economic literature offers several theoretical foundations to model an economic agent 's treatment choice t among the available treatments in a choice set T .
The classical microeconomic theory assumes a rational agent that maximizes the utility among available choices. Agents, however, do not need to be rational to generate predictable choice behavior (Thaler 2016). As noted by Becker (1962), the key 1 3 features of choice theory are a notion of preferences based on the agent's information set and some choice constraints, such as a budget set, that shape the agent's behavior-however rational or not.
We do not assume the full rationality of agents, but we allow for purposive actions under different information and constraint sets. We adopt a flexible choice equation consistent with a broad array of decision mechanisms. We denote the preferences of an agent over the choice set T by an unobserved random vector V of arbitrary but finite dimension. Choice constraints are indexed by the elements z in a finite set Z . We keep the information sets of agents implicitly so that the treatment choice of agent given a restriction z ∈ Z is expressed as We map the choice behavior onto a standard IV model where treatment values t ∈ T and restriction indexes z ∈ Z become potential values in the support of the random variables T and Z, respectively. We use X for the random vector of baseline variables that occur prior to treatment choice. All variables are defined on the probability space (Ω, F, P) , and Z , T , V , X denote the realized values of random variables Z, T, V, X for an agent ∈ Ω.

The Instrumental Variable Model
The IV model has been a standard analytical framework in economics since Reiersöl (1945). In the economic context, the IV model consists of four observed variables: (1) an instrument Z taking N Z discrete values in the support supp (Z) = {z 1 , … , z N Z } ; (2) a treatment choice T taking N T discrete values in supp (T) = {t 1 , … , t N T } ; (3) a real-valued outcome 2 Y in ℝ ; and (4) a pre-treatment random vector X of finite dimension taking values in ℝ |X| . Notationally, we use D t = [T = t], t ∈ supp (T) , and D z = [Z = z], z ∈ supp (Z) , as indicators of treatment and instrument values, respectively. 3 Observed variables are related according to two policy-invariant equations that determine causal relationships among the variables: 4 where Y is an unobserved error term 5 in ℝ . As mentioned, the choice Eq. (2) is general and might be motivated by several choice mechanisms, including utility maximization (see, e.g., McFadden 1981). The unobserved random vector V subsumes (2) Choice Equation: T = f T (Z, V, X), Our analysis holds if outcome Y represents a vector-valued variable denoting multiple outcomes. 3 The indicator function [A] equals one if event A occurs and zero otherwise. 4 By policy-invariant, we mean functions whose maps remain invariant under manipulation of the arguments. This is the notation of autonomy developed by Frisch (1938) and Haavelmo (1944). For a recent discussion of these conditions, see Heckman and Pinto (2015) and Pinto and Heckman (2021). 5 Such error terms are often called "shocks" in structural equation models. f T is a deterministic function that can be interpreted as a random function if we introduced shock T of arbitrary dimension as one of its arguments.
not only the agent's preferences but all the unobserved (by the analyst) variables that affect both the choice T and outcome Y. Vector V is a confounder, and it is the source of selection bias. Choice probability P(T = t | Z = z, X) is the propensity score of choosing t given z and X.
The two main assumptions of the IV model are: Independence condition (4) states that the instrument Z is statistically independent of the confounder V and error term conditioned on baseline variables X . Given that V is arbitrary, we can, without loss of generality, assume that V and are statistically independent; that is, V ⟂ ⟂ | X . The independence condition implies that the instrument affects the outcome only through its impact on the treatment T.
IV relevance (5) guarantees that there exists agents who will choose t for any instrumental value z. The condition rules out the possibility that equivalent instrumental values have an identical impact on the treatment. We also assume as a regularity condition that the outcome expectation exists E(Y 2 ) < ∞ . To simplify notation, we henceforth suppress the background variables X . Our analysis can be interpreted as conditioned on such variables.

Counterfactuals
Counterfactual choice is defined by fixing Z in the choice Eq. (2) to a value z ∈ supp (Z) ; that is, T(z) = f T (z, V) . The counterfactual outcome is defined by fixing T in (3) to a value t ∈ supp (T) ; that is, Y(t) = f Y (t, V, Y ). 6 The observed choice T and outcome Y can be described as switching regressions (Quandt 1958(Quandt , 1972) by the following equations: Equation (6) describes choice T as the counterfactual choice T(z) multiplied by the indicator D z that takes value one if Z = z and zero otherwise. Equation (7) describes the outcome Y in terms of the counterfactual outcomes Y(t) multiplied by the choice indicator D t . (4) IV Relevance: P(T = t | Z = z, X) is a positive and non-degenerate function of z, for all (t, z) ∈ supp (T) × supp (Z).
Fixing is a causal operation that captures the notion of external (ceteris paribus) manipulation. It is a central concept in the study of causality and dates back to Haavelmo (1943). See Heckman and Pinto (2015) for a recent discussion of fixing and causality.

3
The independence condition (4) generates two useful relations regarding counterfactuals: The exogeneity condition (8) is commonly used to describe IV models. It states that the instrument Z is independent of the counterfactuals. The matching property (9) states that controlling for the confounder V renders the outcome counterfactuals Y(t) statistically independent of the treatment choice T.

Causal Inference
Causal analysis seeks to make inferences about counterfactual outcomes Y(t). The causal effect of switching the treatment from t to t ′ for agent is given by . A fundamental problem in causal inference is that, in any cross-section, we only observe a single outcome for each agent . Causal inference copes with this problem by focusing on the evaluation of average causal effects, specifically, the causal effect over a sub-population Ω � ⊆ Ω of the agents: If Ω � = Ω in (10), we obtain the average treatment effect of t ′ versus t on the out-

Controlling for Unobservables
The identification of causal effects hinges on our ability to control for the confounder V . By conditioning on V , we are able to relate counterfactual outcome E(Y(t) | V) and where the first equality is due to matching property (9) and the second equality is due to (7). If V were observed, we would be able to identify the counterfactual expectation E(Y(t) | T = t, V) by the conditional expectation E(Y | T = t, V) . In addition, if V were observed, we would be able to identify its probability distribution. The counterfactual mean E(Y(t)) could be evaluated by integrating the conditional expectation E(Y | T = t, V) over the unconditional distribution of V: where the second equality is due to (9), and dF V (v) denotes the probability density of the confounder V at point v.

The Identification Problem
Unfortunately, when V is not observed, the conditional expectation of the outcome E(Y | T = t) does not identify the counterfactual mean E(Y(t)): This mismatch prevents the identification of causal effects and can promote misleading conclusions. For instance, the difference-in-means estimator for the binary outcome T ∈ {0, 1} evaluates the following parameter: An identification problem arises because agent self-selection induces a correlation between choice T and the unobserved variables in V . Large values of the difference in means in (14) could arise from the difference between the distribution of V conditioned on the treatment choices instead of the impact of the treatment on the outcome.
RCTs are supposed to solve the problem of selection bias by randomly assigning the treatments. The randomization secures statistical independence between the treatment T and the unobserved characteristics of the agents, namely, the confounder V . The independence relationship V ⟂ ⟂ T implies that the distribution of V conditional on T is equal to the unconditional distribution of V , and therefore, the outcome difference-in-means identified the average treatment effect.
Noncompliance in RCTs potentially compromises the independence relationship between agents' unobserved variables V and their final treatment assignment T. Effectively, noncompliance transforms the intended RCT experiment into an IV model where the randomization arms determine the instrumental variable.

Using IV to Control for Unobserved Variables
Identification strategies in IV models use instruments Z to control for the unobserved confounder V (Heckman and Pinto 2015). One approach assumes parametric models that impose functional restrictions on the choice Eq.
(2) and the outcome Eq. (3). An example of this approach is Two-Stage Least Squares (Theil 1958(Theil , 1971). Heckman and Pinto (2018) propose a nonparametric approach that explores the choice behavior induced by the instrument Z. They use counterfactual choices to determine a partition of the support of supp (V) that renders T statistically independent of the counterfactual outcomes Y(t). This independence property enables them to characterize the observed data as a mixture of unobserved counterfactuals over the partition set of supp (V) . We use this characterization to determine the necessary and sufficient conditions to point-identify counterfactual outcomes. Additional notation is necessary to introduce their results.

The Response Vector
We control for the unobservables V using a partition of it generated by the choice variation induced by the instrument. A central concept in our analysis is the response vector. This is the N Z -dimensional random vector of counterfactual choices T(z) across all the instrumental values z 1 , ..., z N Z : The support of the response vector is given by supp (S) = {s 1 , … , s N S } , and each element s ∈ supp (S) is called a response-type. The response vector for an agent is given by S = [T (z 1 ) , … , T (z N Z )] � . It lists the treatment choices that agent would take if it were to face each instrumental value. 7 Response vector S has been used by several authors in distinct fields, starting with Robins and Greenland (1992) and Balke and Pearl (1993), who studied bounds for causal effects for the binary choice model. Angrist et al. (1996) use response-types to study the identification of a binary choice model.
Response vectors are called "principal strata" by Frangakis and Rubin (2002) and can be understood as the control functions of Heckman and Robb (1985) and Powell (1994). Our approach differs from these interpretations. We use the response vector S as a criterion to control for the unobserved confounding variable V.
Equation (16) expresses the response vector S as a function of V , while Eq. (17) expresses choice T as a function of the response vector S and the instrument Z. Figure 1 displays these causal relationships graphically as directed acyclic graphs (DAGs).
The response-types can be viewed as "types" in the sense of Keane and Wolpin (1997).

Equation (18) lists three useful properties of the response vector S:
Property (i) states that the response vector is independent of the IV. This independence relationship stems from V ⟂ ⟂ Z in (4) and from the fact that S is a function of V . Property (ii) states a matching condition where S plays the role of a balancing score for V. 8 The relationship stems from (Y(t), V) ⟂ ⟂ Z and from the fact that S is a function of V , while T is a function of Z and V. 9 Indeed, conditioned on S , T depends only on Z, which is independent of Y(t). The last property (iii) is due to the fact that T is deterministic given T and S.
The properties of the response vector in (18) enable us to describe a coarse partition of supp (V) that renders the treatment statistically independent of counterfactual outcomes. According to (16), each v ∈ supp (V) corresponds to one and only one response-type s ∈ supp (S) such that h(v) = s . Thus, for each response-type s n ∈ supp (S) , we can define a subset V n ⊂ supp (V) as: and their union spans the full set; that is, 1 IV Models with and without the Response Vector S . These two diagrams depict equivalent IV models as DAGs. Arrows represent direct causal relations. Circles represent unobserved variables. Squares represent observed variables. The error term is kept implicit. The left-hand side diagram shows the standard IV model without the response vector S , while the right-hand side diagram includes the response vector S.
Note that the events S = s n and V ∈ V n are equivalent. The matching property (ii) in (18) states that Y(t) ⟂ ⟂ T | (S = s n ) , so Equations (19)-(21) imply that the treatment T can be understood as being randomly assigned when we condition on the subset of agents that share the same response-type s . If response-types were observed, we could use (ii) in (18) to identify the expected value of counterfactual outcomes by taking the expected values of the observed outcome conditioned on the treatment choice and the response-types.
A significant challenge is that the response-types that determine the partition of the support of V are not observed. Nevertheless, the partition substantially simplifies the identification problem. It reframes the identification of counterfactuals as a problem of identifying a finite mixture of unobserved distributions.

Identification as a Mixture Problem
We gain a deeper understanding by reframing the identification problem as a particular case of the identification of unobserved mixture distributions (B. L. S. P. Rao 1992). The general mixture model is given by: where F(Y) stands for the cumulative distribution function (cdf) of an observed outcome Y, (F (Y)) ∈Θ is a collection of cdf's indexed by a random variable ∈ Θ that takes a value in the (possibly infinite) set Θ , and G denotes the cdf of . F(Y) is a mixture distribution, the cdf's (F (Y)) ∈Θ are component distributions, G is the mixing distribution, and is the unobserved latent (or mixing) variable. B. L. S. P. Rao (1992) notes that if the mixing distribution G is finite, then a necessary and sufficient condition for its identification is that the family of cdf's (F (Y)) ∈Θ be linearly independent as functions on Y. We use the mixture model (22) as a starting point.
As mentioned, the identification of causal parameters hinges on controlling for unobserved variables V . A natural candidate for the values of in (22) are the elements v ∈ supp (V) . We replace the cdf's in (22) by the expectation of (Y) , where ∶ supp (Y) → ℝ is an arbitrary real-valued function. (20) Equation (23) describes the expected outcome using the mixture model in (22), where stands for the elements v ∈ supp (V) . Equation (24) uses the partition of supp (V) in (19) to generate a discrete mixing distribution across the partition sets of the support of V . Condition (21) in Section Using IV to Control for Unobserved Variables enables us to express the conditional expectation E( (Y) | T = t) in terms of the conditional counterfactuals E( (Y(t)) | V): Equation (25) relates a single conditional outcome expectation with several outcome counterfactuals for each choice value t ∈ supp (T) . The equation does not assess sufficient information on observed data to secure the identification of the counterfactual outcomes. The instrumental variable Z generates additional variation of observed quantities (left-hand side of (25)) without increasing the number of unobserved counterfactuals (right-hand side of (25)): Equation (26) rewrites (25) conditioning on instrument Z. Equation (27) uses the fact that Z ⟂ ⟂ S and that V ∈ V n and S = s n are equivalent events. Equation (28) uses Bayes rule to rewrite the conditional expectation P(S = s n | T = t, Z = z) . Equation (29) employs Z ⟂ ⟂ S again and invokes the fact that T is deterministic when conditioned on S and Z. The response vector S enables us to connect observed data with a mixture of counterfactual outcomes conditioned on response-types. This produces our main equation: If (Y) = Y, (30) generates an equality relating the expected values of observed outcomes with expected counterfactual outcomes. Setting (Y) = [Y ≤ y] generates the cdf of the observed outcome with the unobserved cdf of counterfactual outcomes. Setting (Y) to 1 in (30) generates the propensity score equality: Replacing (Y) by any variable X such that X ⟂ ⟂ T | S generates an equation that relates baseline variables with response-types:

Identification Criteria
We now investigate the necessary and sufficient conditions for identifying counterfactual outcomes and response-type probabilities. To do so, we express our main Eq. (30) as a system of linear equations.
Observed parameters are stacked in vectors P Z (t) and Q Z (t) below: where P Z (t) is the vector of observed propensity scores, and Q Z (t) is the vector of outcome expectations. The unobserved parameters are stacked in the vectors P S and Q S (t) below: where P S is the vector of response-type probabilities, and Q S (t) is the vector of counterfactual outcomes conditioned on response-types. Response matrix R stacks the response-types in supp (S) as columns: Matrix R has dimension N Z × N S . The entry in the ith row and nth column of R is denoted by R[i, n] = (T | Z = z i , S = s n ), i ∈ {1, … , N Z }, n ∈ {1, … , N S } . We use R[i, ⋅] to denote the ith row of R and R[⋅, n] to denote the nth column of R . IV relevance condition (5) prevents identical rows in R.
We use B t = [R = t] to denote a binary matrix of the same dimension of R that takes value 1 if the respective element in R is equal to t and zero otherwise. An entry of B t is given by ] � and let P Z be the (N Z ⋅ N T ) × 1 vector that stacks the propensity scores P Z (t) across the treatment values: P Z = [P(t 1 ) � , … , P(t N T ) � ] � . In this notation, Eqs. (30) and (31) can be written in matrix form by the following equations: where ⊙ denotes the Hadamard (element-wise) multiplication.
The response matrix R and the binary matrices B t , t ∈ supp (T) , are deterministic, as T is known given Z and S . If B t and B T were invertible, Q S (t) and P S would be identified. However, such inverses do not always exist. In their place, we can use generalized inverses. Let B + T and B + t be the Moore-Penrose pseudo-inverses 10 of B T and B t , t ∈ supp (T) . Under this notation, we can state the following result: (38)

3
Matrices K T and K t are orthogonal projection matrices that depend only on matrices B T and B t , t ∈ supp (T) . Theorem T-1 is useful to provide the general conditions for identification of response probabilities and counterfactual means:

Corollary C-1 In the IV model (4)-(5), if there exists a real-valued
Proof See Heckman and Pinto (2018) or Appendix A.2.
◻ Corollary C-1 shows that the nonparametric identification of counterfactuals depends only on properties of the response matrix R . If B T had full column-rank, then B + T B T = I N S and K T = 0 . In this case, each response-type probability is identified. Indeed, ′ P S is identified for any real vector of dimension N S including those that indicate each of the response-type probabilities. 11 Binary matrix B T contains each B t , t ∈ supp (T) . Thus, the conditions for identifying response-type probabilities are weaker than those for identifying counterfactual outcomes. In particular, a full-rank B T does not imply that matrices B t , t ∈ supp (T) , are full-rank. Therefore, the identification of the response-type probabilities does not automatically identify corresponding mean counterfactual outcomes. Corollary C-2 formalizes this discussion.

Corollary C-2 The following relationships hold for the IV model (4)-(5):
where is an N S -dimensional vector of ones.
Proof See Heckman and Pinto (2018) or Appendix A.3. ◻ Versions of Corollary C-2 are found in the literature on the identifiability of finite mixtures (see, e.g., Yakowitz andSpragins 1968 andB. L. S. P. Rao 1992). Given binary matrices B T and B t , t ∈ {1, … , N T } , the problem of identifying P S and Q S (t) is equivalent to the problem of identifying finite mixtures of distributions where B T and B t play the roles of kernels of mixtures. Mixture components are the corresponding counterfactual outcomes conditional on the response-types, and mixture probabilities are the response-type probabilities.

Understanding the Identification Challenge
Identification criteria (43) and (44) show that the identification of causal parameters depends solely on the properties of the response matrix R . In particular, the identification of the counterfactual outcomes in Q S (t) depends on the column-rank of the binary matrix B t . If the column-rank of B t is N S (full column-rank), then B + t B t = I N S and K t = 0 . In this case � Q S (t) and ′ P S are identified for any real vector of dimension N S , including all unit vectors with a value of 1 in the nth entry and 0 elsewhere. In summary, all counterfactual outcomes in Q S (t) would be identified.
Identification criteria (43) and (44) pose a major identification problem. The column rank of any binary matrix B t is less than or equal to its row-dimension N Z . On the other hand, the dimension of Q S (t) is the number of response-types N S that usually far exceeds the number of IV-values N Z . For instance, under no restrictions, the total number of potential response-types is N N Z T . Thus, a requirement for generating any identification result on counterfactual outcomes is to reduce the number of response-types that the choice model admits.
A common approach to decreasing the number of response-types is to impose functional restrictions on the choice equation. Heckman and Pinto (2018) and Pinto (2021a) 12 adopt a different approach that relies on economic choice theory. They combine choice incentives with revealed preference analysis to generate choice restrictions that systematically eliminate potential response-types.

Using Rao's Orthogonal Design to Address Identification Problems Arising from Noncompliance in Social Experiments
We propose a novel application of Rao's orthogonal design (C. R. Rao 1946aRao , b, 1947Rao , 1949). Rao's methodology is traditionally applied to investigate the effects of combinations of treatment factors. The method determines randomization groups exposed to an orthogonal arrangement of treatment factors.
Similar to Rao's work, ours uses an RCT setting. Our method differs from Rao's original methodology in two ways: (1) we consider the possibility of noncompliance; and (2) the orthogonal array design is not used to combine treatment factors but to determine choice incentives across a finite number of treatment alternatives.
We use revealed preference analysis to translate choice incentives into choice restrictions that eliminate response-types. This elimination process generates the response matrix R , which contains all the necessary information to examine the nonparametric identification of causal parameters.

Examining Choice Incentives Determined by an Orthogonal Array
Noncompliance in social experiments effectively transforms the original RCT into an IV model where each instrumental value represents a randomization arm. It implicitly adds a choice probability, which we explicitly model in the IV model. The experimenter cannot impose a treatment status upon participants but rather incentivizes them toward a treatment choice. In this setup, orthogonal arrays play the role of the incentive matrix of Pinto (2021a). 13 Each factor stands for a treatment choice and each run stands for a randomization arm that incentivizes one or several treatment alternatives.
We illustrate the method using the orthogonal array OA (4, 3, 2, 2) discussed in Section Introduction. This design can be understood as an RCT with four randomization arms Z ∈ {z 1 , z 2 , z 3 , z 4 } and three treatment statuses T ∈ {t 1 , t 2 , t 3 } , where z 1 denotes the control group that offers no incentive toward any choice, z 2 incentivizes participants toward choices t 1 and t 2 , z 3 incentivizes them toward choices t 1 and t 3 , and z 4 incentivizes them toward choices t 2 and t 3 . This incentive pattern is described by an ordinal incentive matrix: Each column displays which choices are incentivized across all values of the instruments. The incentive matrix L in (45) is an orthogonal array of type OA (4, 3, 2, 2). The factors refer to treatment choices; the runs, to instrumental values.

Choice Restrictions
Classical revealed preference analysis can be used to translate choice incentives into choice restrictions. Pinto (2021a) 14 shows that the Weak Axiom of Revealed Preferences (WARP) and Normal Choice generate the choice rule described below: Choice rule (46) is intuitive. It states that if an agent chooses choice t under z, and the change from z to z ′ induces greater incentives toward t than toward t ′ , then the same agent does not choose t ′ under z ′ .
Choice rules like (46) restrict R . They enable analysts to translate any incentive matrix into a set of choice restrictions and generate a response matrix. A simple algorithm efficiently implements the task of moving from an incentive matrix to a response matrix. We now clarify this process.
Consider an agent that chooses t 1 if it were assigned to z 1 ; that is, T (z 1 ) = t 1 . We seek to examine whether the agent would choose t 2 if it were assigned to z 2 , z 3 , or z 4 .
The first row of Table 1 compares the incentive gains for choosing t 1 and t 2 if the instrument were to change from z 1 to z 2 . The incentives to choose either t 1 or t 2 increase, which satisfies the incentive requirement of choice rule (46). Therefore, we can state that an agent that chooses t 1 under z 1 does not choose t 2 under z 2 . This choice restriction is summarized as T (z 1 ) = t 1 ⇒ T (z 2 ) ≠ t 2 . This table presents all the choice restrictions generated by applying the choice rule (46) to each of the combination of choices (t, t � ) ∈ {t 1 , t 2 , t 3 } and instrumental values (z, z � ) ∈ {z 1 , z 2 , z 3 , z 4 } of the incentive matrix (45) Counterfactual choice Incentive condition Choice restriction  (45) This table presents all the choice restrictions generated by applying the choice rule (46) to each of the combination of choices (t, t � ) ∈ {t 1 , t 2 , t 3 } and instrumental values (z, z � ) ∈ {z 1 , z 2 , z 3 , z 4 } of the incentive matrix (45) 1 The second row compares the incentives to choose t 3 for the same instrumental change ( z 1 to z 2 ). The incentive to choose t 1 increases, while the incentive to chose t 3 does not. Choice rule (46) applies and the agent does not switch to t 3 ; that is, The third and fourth rows of Table 1 compare the incentives for choosing t 1 versus t 2 (third row) and t 1 versus t 3 (fourth row) when the instrument changes from z 1 to z 3 . The incentive to choose t 1 increases, while the incentives to choose either t 2 or t 3 do not. Choice rule (46) holds and the agent does not choose t 2 or t 3 ; namely, The last two rows investigate the instrumental change from z 1 to z 4 . The incentives to choose t 2 or t 3 increase, while the incentive to choose t 1 does not. The incentive requirement of choice rule (46) is not satisfied, and therefore, no choice restriction is generated. Table 2 presents all the choice restrictions generated by applying choice rule (46) to each combination of treatment pairs (t, t � ) ∈ {t 1 , t 2 , t 3 } 2 and to each pair of instrumental values (z, z � ) ∈ {z 1 , z 2 , z 3 , z 4 } 2 .

Generating the Response Matrix
The choice restrictions of Table 2 can be used to determine the set of admissible response-types that the response-vector S = [T(z 1 ), T(z 2 ), T(z 3 ), T(z 4 )] � can take. The first panel of Table 2 examines the case where T(z) = t 1 for z ∈ {z 1 , z 2 , z 3 , z 4 } . The first restriction states that if T(z 1 ) = t 1 , then T(z 2 ) = T(z 3 ) = t 1 . Given T(z 1 ) = t 1 , there are only three possible response-types that comply with this choice restriction: s 1 = [t 1 , t 1 , t 1 , t 1 ] � , s 2 = [t 1 , t 1 , t 1 , t 2 ] � , and s 3 = [t 1 , t 1 , t 1 , t 3 ] � . The second and third choice restrictions of Table 2 are subsumed by the first restriction. The fourth choice restriction implies that the only admissible response-type for which T(z 4 ) = t 1 is s 1 = [t 1 , t 1 , t 1 , t 1 ] � .
The second panel of Table 2 examines the case where T(z) = t 2 for z ∈ {z 1 , z 2 , z 3 , z 4 } . The third panel examines the case where T(z) = t 3 for z ∈ {z 1 , z 2 , z 3 , z 4 } . We apply the elimination analysis of the first panel to the second and third panels. There are only nine admissible response-types that comply with each of the 12 choice restrictions of Table 2. Those are displayed in the response matrix below: 15 (47) s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 15 Under no choice restrictions, each of the four counterfactual choices ( T(z 1 ) , T(z 2 ) , T(z 3 ) , and T(z 4 ) ) can take any of the three treatment values ( t 1 , t 2 , or t 3 ). Thus, the total number of potential response-types is 81. The choice restrictions in Table 2 are able to eliminate 72 out of the 81 possible response-types. The nine response-types that survive this elimination process are displayed in (47).

Identification and Estimation
Theorem T-2 uses the identification criteria in C-1 to recover all causal parameters that are identified.
Theorem T-2 The response matrix (47) enables the identification of the following causal parameters: 1 All response-type probabilities P(S = s j ) ; j = 1, … , 9. 2 The expectation (and distribution) of the following counterfactual outcomes:

Response-Types
Treatment Choices Proof See Appendix A.5. ◻ The response matrix (47) enables the researcher to use well-known econometric methods to evaluate causal effects. For instance, the first row ( z 1 ) and the last row ( z 4 ) of the response matrix (47) differ for two response-types: s 2 and s 3 take the value t 1 for z 1 and the values t 2 and t 3 for z 4 , respectively. It is easy to show that the 2SLS estimator that uses the t 1 -indicator D t t = [T = t 1 ] as the treatment and employs only the IV-values z 1 and z 4 evaluates the causal effect of choosing t 1 versus not choosing t 1 for response-types s 2 and s 3 : where Y(t 1 ) stands for the counterfactual outcome of not choosing t 1 : We can make the same analogy for the 2SLS that uses the indicator D t 3 for treatment and employs data from z 1 and z 2 . The 2SLS estimator evaluates the causal effect of choosing t 3 versus not choosing t 3 for response-types s 7 and s 8 ; that is, combination of treatment pairs (t, t � ) ∈ {t 1 , t 2 , t 3 } 2 and to each pair of instrumental values (z, z � ) ∈ {z 1 , z 2 , z 3 , z 4 } 2 of incentive matrix (50). The choice restrictions of Table 3 eliminate 69 out of the 81 possible responsetypes. The 12 admissible response-types that comply with all the choice restrictions in Table 3 are presented in the response matrix below: Table 4 presents all the response-type probabilities and counterfactual outcomes that are identified by the response matrix (51). Response matrix (51) does not generate a single point-identified response-type probability. The matrix does not generate any point-identified counterfactual outcomes either. By choosing an orthogonal design for the incentive matrix, we secure the identification of causal parameters. Using a traditional design, we do not.
Appendix B applies our analysis to the study of Latin squares. We refer to Pinto and Navjeevan (2022) 16 for further discussion on how economic incentives shape choice restrictions in the IV model with multiple choices and heterogeneous agents.

Conclusion
This paper provides a novel application of Rao's fundamental work on the design of experiments using orthogonal arrays. Rao's seminal ideas are widely used to determine efficient arrangements of treatment factors in RCTs. His method is well suited for experiments where the analyst can reliably assign treatment factors to randomization units. Unfortunately, social scientists can seldom impose treatment statuses. Most social experiments are consequently plagued by noncompliance, which undermines the random assignment of treatment statuses.
We repurpose Rao's original ideas to address the common challenges that noncompliance generates. We use a novel framework whereby orthogonal arrays denote a pattern of choice incentives. We combine the IV framework of Heckman and Pinto (2018) with the recently developed econometric tools in Pinto (2021a, b), 17 and Pinto and Navjeevan (2022) 18 to translate choice incentives into choice restrictions. These restrictions determine the set of economically justifiable counterfactual choices, which, in turn, enable the identification of causal parameters. We then show the benefits of using orthogonal arrays (rather than traditional approaches) for identifying causal parameters.
Our method broadly applies to IV models with multiple treatments, categorical instruments, and heterogeneous agents. We establish a tight link between the problem of the unobserved mixture of distributions and the identification of counterfactuals. We explore the notion of a response matrix. The matrix contains all the necessary information to examine the nonparametric identification of model counterfactuals. We apply mixture model methods to matrices to prove the identification of causal parameters.