Scale characteristics of variable returns-to-scale production technologies with ratio inputs and outputs

Applications of data envelopment analysis (DEA) often include inputs and outputs represented as percentages, ratios and averages, collectively referred to as ratio measures. It is known that conventional DEA models cannot correctly incorporate such measures. To address this gap, the authors have previously developed new variable and constant returns-to-scale models and computational procedures suitable for the treatment of ratio measures. The focus of this new paper is on the scale characteristics of the variable returns-to-scale production frontiers with ratio inputs and outputs. This includes the notions of the most productive scale size (MPSS), scale and overall efficiency as measures of divergence from MPSS. Additional development concerns alternative notions of returns to scale arising in models with ratio measures. To keep the exposition as general as possible and suitable in different contexts, we allow all scale characteristics to be evaluated with respect to any selected subsets of volume and ratio inputs and outputs, while keeping the remaining measures constant. Overall, this new paper aims at expanding the range of techniques available in applications with ratio measures.

units (DMUs) operate or describe the quality of inputs and outputs involved in the production process. For example, in the context of assessment of school performance, ratio measures may include the average income per capita in the catchment area of a school, proportion of pupils with special needs and the percentage of students achieving a certain high level in exams.
The two standard DEA models are based on the assumptions of constant and variable returns-to-scale (CRS and VRS). These models were introduced in DEA by Charnes et al. (1978) and Banker et al. (1984) and can be seen as continuing the earlier developments in the economics literature by Afriat (1972), Shephard (1974), Färe and Lovell (1978) and Färe et al. (1983a). It has long been realised that the standard CRS and VRS DEA models are generally not suitable if some data are given in the form of ratios-see, e.g., Dyson et al. (2001), Cooper et al. (2007), and Emrouznejad and Amin (2009). The main reason for this is that production technologies with ratio data do not generally satisfy the assumption of convexity which is incorporated in the VRS and CRS technologies. In the case of CRS, a further problem arises because ratio measures are generally not scalable in the same way as conventional volume inputs and outputs such as costs, labor and physical levels of production or services. As also noted by Pastor et al. (2013), the input and output projections of inefficient DMUs obtained in the CRS model may result in the target values located outside the range of values observed in the empirical data set. This becomes a particular problem in the case of ratio measures, such as percentages, which may have a natural upper bound such as unity or 100%. 1 In order to overcome the highlighted problems with the use of ratio data in DEA, Olesen et al. (2015) introduced new ratio-VRS (R-VRS) and ratio-CRS (R-CRS) production technologies that incorporate both volume and ratio inputs and outputs as native types of data, i.e., without any modification. Both technologies are formally derived from the explicitly stated sets of production axioms. In particular, to allow a different treatment of ratio measures compared to volume measures, Olesen et al. (2015) utilize the axiom of selective convexity of Podinovski (2005), instead of the conventional convexity assumption. In the case of R-CRS, the standard axiom of full proportionality (scalability) is replaced by the axiom of selective proportionality, which also distinguishes between the volume and ratio inputs and outputs.
The R-VRS and R-CRS models with ratio data are further explored by Olesen et al. (2017). The latter paper discusses efficiency concepts in models with ratio data, including the new notion of potential ratio efficiency, and computational approaches for their testing. In the most recent paper on this subject, Olesen et al. (2022) explore the geometric structure of the R-VRS technology and the R-CRS technology with fixed ratio inputs and outputs. In particular, they prove that the R-VRS technology is the union of a finite number of specially constructed standard VRS technologies.
In the current paper we consider scale characteristics of the production frontier of the R-VRS technology that have not yet been explored. This includes the notions of the most productive scale size (MPSS), scale and overall efficiency, and the related notions of returns to scale (RTS). All these notions have conceptually been defined and discussed in the literaturesee, e.g., Frisch (1965) and Färe et al. (1983bFäre et al. ( , 1985. Methods of their evaluation in the standard VRS technology (see, e.g., Banker 1984;Banker and Thrall 1992;Førsund and Hjalmarsson 2004;Chambers and Färe 2008) and in the whole large classes of polyhedral and convex technologies (Podinovski et al. 2016;Podinovski 2017) have also been developed. However, these methods do not generally apply to the R-VRS technology (because it is not a convex technology), and even the known conceptual characteristics such as scale efficiency may not appear to be uniquely defined. In our paper we address these issues.
Several contributions of our paper are worth highlighting. First, following the approach of Banker (1984), we define and interpret scale efficiency in the R-VRS technology as a measure of divergence from MPSS. A challenging problem here is that the cone and CRS extensions of the R-VRS technology do not coincide. What appears to be a straightforward extension of the traditional approach to the evaluation of scale efficiency turns out to be a conceptual dilemma.
Second, to make the exposition as general as possible, we give the definition of MPSS and scale efficiency with respect to any selected subsets of volume and ratio inputs and outputs, while keeping the remaining measures constant. Conceptually, this is a generalization of the approach of Banker and Morey (1986) who consider technical and scale efficiency with respect to discretionary inputs and outputs only. However, because the R-VRS technology has a more complex structure than the standard VRS technology, the actual extension of the approach of Banker and Morey (1986) to the R-VRS technology is not straightforward.
In the additional development, we explore the notion of RTS. We differentiate between the cases in which the types of RTS are evaluated with respect to volume inputs and outputs only and the general case involving ratio measures. In the former case, we explore the local characterization of RTS with respect to the selected inputs and outputs. In the latter case, we show that the conventional local RTS characterization becomes trivial and uninformative. Instead, we employ the global RTS characterization of production frontiers developed by Podinovski (2004a, b) whose types are indicative of the direction to MPSS.
The methodology developed in our paper extends the well-known approaches to the evaluation of scale efficiency and RTS to the technologies with ratio inputs and outputs. Furthermore, because our approach allows for a selection of inputs and outputs with respect to which we measure scale characteristics, it now becomes possible to explore the relationship between the volume inputs and outputs, while keeping the socio-economic and quality characteristics of the production process (represented by percentages) constant. Alternatively, it is also possible to explore the relationship between the socio-economic factors and quality characteristics of the production process, for the given levels of volume inputs and outputs. We illustrate the usefulness of such approach in an application to secondary schools in England.
We proceed as follows. In Sect. 2, we introduce basic definitions and notation. In Sect. 3, we briefly outline the R-VRS technology and give a clarifying illustrative example.
Section 4 contains the main theoretical results of our paper. We first define the partial cone extension of the R-VRS technology, which is subsequently used to define MPSS and overall and scale efficiency of DMUs with regard to arbitrary subsets of volume and ratio inputs and outputs. We also show that the cone and CRS extensions of the R-VRS technologies are generally different sets and discuss implications of this result.
In Sect. 5, we operationalize the models developed for the assessment of scale characteristics in the R-VRS technology. In Sect. 6, we show that several known technologies and methods of evaluation of scale characteristics are special cases of our newly developed approach. In Sect. 7, we consider local and global RTS characterizations in the R-VRS technology. In Sect. 8, we consider an application in the context of secondary education which illustrates the evaluation of scale characteristics in the R-VRS technology. Section 9 contains concluding remarks.
All mathematical proofs are given in Appendix A. An additional example clarifying the discussion in Sect. 4 is considered in Appendix B. The full data set used in the application is given in Appendix C. The GAMS code used for computations in the application is available online.

Preliminaries
Following the notation introduced by Olesen et al. (2015), let T ⊂ R m + × R s + be a production technology with the sets I = {1, ..., m} and O = {1, ..., s} of nonnegative inputs and outputs, respectively. We denote I V ⊆ I and O V ⊆ O the subsets of volume inputs and outputs. The complementary subsets I R = I\I V and O R = O\O V include ratio inputs and outputs. We assume that both sets I = I V ∪ I R and O = O V ∪ O R are not empty, although any of their subsets I V , I R , O V and O R may be empty.
DMUs are elements of technology T and are stated in the form where X ∈ R m + and Y ∈ R s + are the vectors of inputs and outputs, and their subvectors X V , X R , Y V and Y R correspond to the sets of volume and ratio measures I V , I R , O V and O R , respectively.
Let (X j , Y j ) be observed DMUs, where j ∈ J = {1, ..., n}. We assume that each observed DMU has at least one strictly positive input and at least one strictly positive output, i.e., X j = 0 and Y j = 0, for all j ∈ J . (We use bold symbols 0 and 1 to denote the vectors of zeros and ones whose dimensions are clear from the context.) Denote DMU (X o , Y o ) the particular DMU whose efficiency or scale efficiency is being considered. This may be any DMU from technology T , including one of the observed DMUs.
Often, ratio inputs and outputs have certain upper bounds, typically either unity or 100%. Following Olesen et al. (2015), we state these bounds in the form where each component of vectorsX R andȲ R can be either finite or +∞. (Vector inequalities mean that the specified inequality is true for each component, e.g., X R ≤X R means that X R i ≤X R i , for all i ∈ I R .) We naturally assume that the vectors of ratio inputs and outputs X R j and Y R j of all observed DMUs j ∈ J satisfy the inequalities (1). Olesen et al. (2015) note that the conventional VRS technology of Banker et al. (1984) should generally not be used if some inputs and outputs are ratio measures. The main reason for this is that ratio measures cannot be assumed to satisfy the axiom of convexity which is explicitly required by the definition of the VRS technology. Instead, Olesen et al. (2015) demonstrate that one should exclude the ratio inputs and outputs from the convexity assumption, while keeping the convexity property only for the volume measures. For a formal definition of the R-VRS technology, Olesen et al. (2015) assume the following three axioms, the last of which is a special case of the axiom of selective convexity introduced by Podinovski (2005).

The R-VRS technology
Axiom 3 (Selective convexity) Let (X ,Ỹ ) ∈ T and (X ,Ŷ ) ∈ T . Assume thatX R =X R and Y R =Ŷ R . Then γ (X ,Ỹ ) + (1 − γ )(X ,Ŷ ) ∈ T , for any γ ∈ [0, 1]. Axiom 3 reflects the fact that, although we cannot generally assume that convex combinations of DMUs in the presence of ratio data remain in technology T , we can nevertheless do so provided the combined DMUs have identical ratio inputs and outputs. 2 Importantly, a combination of Axioms 2 and 3 also allows us to form convex combinations of DMUs that have different ratio inputs and outputs. This is illustrated by the following example. Table 1, which are in some technology T with two inputs and two outputs. Input 1 and Output 1 are volume measures, Input 2 is a ratio measure (proportion), and Output 2 is a ratio measure (percentage). We assume that technology T satisfies Axioms 1-3.

Example 1 Consider DMUs A and B shown in
Suppose we wish to form a convex combination of DMUs A and B taken with the weights 1/3 and 2/3, respectively. Note that we cannot use Axiom 3 directly, because the ratio input and output of DMUs A and B are not identical. Instead, we first employ Axiom 2 and reduce the ratio output of DMU A from 90 to 70%, to match the ratio output of DMU B. Similarly, we use Axiom 2 to raise the ratio input of DMU B from 0.3 to 0.5, to make it equal to the ratio input of DMU A.
The resulting DMUs can be stated as A * = (6, 0.5, 4, 70%) and B * = (3, 0.5, 1, 70%). By Axiom 2, both DMUs A * and B * are in technology T . Because the ratio input and output of DMUs A * and B * are identical, by Axiom 3, any convex combination of these DMUs is in technology T . In particular, using the weights 1/3 and 2/3 for DMUs A * and B * , respectively, we obtain DMU C shown in Table 1.
The above example shows that, although we cannot form convex combinations of DMUs with different ratio inputs and outputs, we can still form their ratio-convex (R-convex) combinations as defined by Olesen et al. (2017). Namely, in a R-convex combination of DMUs, the volume inputs and outputs form conventional convex combinations, while the ratio inputs are taken at their maximum levels across all combined DMUs, and the ratio outputs are taken at their minimum levels. In other words, the weighted average used for volume measures is replaced by the operations of maximum for the ratio inputs and minimum for the ratio outputs.
In line with the minimum extrapolation principle used by Banker et al. (1984) and Olesen et al. (2015) give the following definition: Definition 1 The R-VRS technology T R VRS is the intersection of all technologies (sets) T ⊂ R m + × R s + that satisfy Axioms 1-3. Following Olesen et al. (2015), technology T R VRS can equivalently be stated as follows.
In this statement of technology T R VRS , the first two vector inequalities (2a) and (2b) describe conventional convex combinations of volume inputs and outputs of the observed DMUs taken with the weights λ j ≥ 0, j ∈ J , that add up to 1 as in equality (2e).
To see the role of inequalities (2c) and (2d), restate them as follows: Conditions (3) imply that an observed DMU (X j , Y j ), j ∈ J , may be used in the convex combination of volume inputs and outputs in constraints (2a) and (2b) with a positive λ j only if DMU (X j , Y j ) is not worse than the DMU (X , Y ) on all ratio inputs and outputs.
It is clear that technology T R VRS stated by conditions (2) consists of all R-convex combinations of observed DMUs and all DMUs outperformed (dominated) by them, subject to the upper bounds (1) on the ratio measures. This is similar to the conventional VRS technology which includes all convex combinations of observed DMUs and all DMUs outperformed by them.
It is worth noting that T R VRS is generally not a convex set, but is a closed set (Olesen et al. 2015).

Example 2 Consider DMUs
A and B with one volume input, one volume output and one ratio output as shown in Table 2. Fig. 1  Also note that the technology in Fig. 1 satisfies Axioms 1-3. In particular, Axiom 3 of selective convexity is satisfied because, for each level of ratio output, the corresponding section of technology T R VRS is a convex set. For example, the section corresponding to the output level 0.25 of DMU A is the convex polyhedron K C ADE. The section corresponding to the output level 1.5 of DMU B is the polyhedron H F BG. However, the whole technology T R VRS is not a convex set.
Remark 1 Technology T R VRS may be seen as a generalization of several technologies. Suppose that there are no ratio inputs and outputs. In this case the inequalities (2c), (2d), (2f) and (2g) are omitted and T R VRS is the conventional VRS technology of Banker et al. (1984). If there are no volume inputs and outputs then T R VRS is free disposal hull (FDH) of Deprins et al. (1984), with the additional upper bounds (1) on the ratio measures. If we have only volume and ratio inputs and volume outputs, i.e., there are no ratio outputs, technology T R VRS is similar to model (7) stated by Ruggiero (1996) in which the ratio factor Z characterizes the quality of the environment.
A subtle difference between our approach and the approach of Ruggiero (1996) is that we consider any ratio input and output (including those of environmental nature) as yet another dimension of technology T R VRS . In contrast, Ruggiero (1996) does not formally include the environmental factor Z as an input of the technology but instead defines the conventional VRS technology T VRS (Z ), for every value of Z treated as a parameter. It is clear that the parametric family of technologies T VRS (Z ) of Ruggiero is the collection of the sections of technology T R VRS defined in the volume input and output dimensions for each fixed value of parameter Z .
The treatment of environmental factors (and any other ratio measures) as inputs and outputs of technology T R VRS allows us, if required, to account for such factors in the evaluation of efficiency and various scale characteristics. For example, we may explore the question of optimal scale of production and returns to scale with regard to several inputs and outputs, including environmental factors. These possibilities are considered in the general setting in the subsequent sections.

Scale efficiency in the R-VRS technology
In this section we follow the approach of Banker (1984) and show how the simultaneous development of the notions of MPSS and scale efficiency in the standard VRS technology could be extended to technology T R VRS .

The general setting
To keep the exposition as general as possible, we consider scale properties of the production frontier with respect to some selected nonempty subsets of inputs I and outputs O : In different applications, the sets I and O may represent discretionary inputs and outputs in the sense explored by Banker and Morey (1986) and Golany and Roll (1993). These sets may also include measures that are not discretionary, such as certain exogenous factors. 3 For example, consider a scenario in which the policy maker uses a DEA model with several volume and ratio inputs and outputs for the assessment of school efficiency. Suppose that this model uses the percentage of families from the higher socio-economic background as an exogenously fixed non-discretionary ratio input x and the percentage of school graduates going to university y as a discretionary ratio output representing quality of education. Suppose that the policy maker is interested in the relationship between these two factors on the production frontier (i.e., among the efficient schools only) while keeping all the other inputs and outputs constant. This leads to the question of optimal scale and returns to scale defined in the selective (partial) sense, for which I = {x } and O = {y }, even though the input x is not discretionary.
For a DMU (X o , Y o ) ∈ T R VRS , define vectors X o (ϕ) and Y o (ψ), where ϕ, ψ ≥ 0 are scaling factors, as follows: The output radial efficiency (technical efficiency) of DMU (X o , Y o ) measured with respect to the selected subset of outputs O is defined as the inverse of the optimal value of the following program (in its statement, for consistency of notation maintained throughout this paper, we change variable ψ to η): Assessing the efficiency of DMU (X o , Y o ) by program (5) is straightforward. This requires replacing the DMU (X , Y ) in conditions (2) by DMU (X o , Y o (η)), and maximizing η subject to the resulting conditions (Olesen et al. 2015(Olesen et al. , 2017.

The partial cone extension of the R-VRS technology
In this section we consider the partial cone extension of technology T R VRS defined with respect to the selected input and output sets I and O . This cone extension is useful in exploring the notions of MPSS and scale efficiency in T R VRS undertaken in subsequent sections. Define the partial cone extension C(I , O ) of technology T R VRS as follows: where the DMU (X (α),Ỹ (α)) is as defined by (4) The next result provides an explicit statement of the closed coneC(I , O ). It is proved under the following mild assumption about all observed DMUs.
Assumption 1 For each j ∈ J , there exists an i ∈ I (generally different for different j) such that X ji > 0.
Theorem 2 Let Assumption 1 be true. Then the closed partial cone extensionC( 4 For example, consider the VRS technology T VRS generated by the single DMU A = (1, 1, 5) whose first two components are volume inputs and the third component is a volume output. This technology is special case of technology T R VRS . Let I = I and O = O. Consider DMU B = (1, 0, 0). Because technology T VRS does not include points on the ray {(α, 0, 0) | α ≥ 0}, DMU B is not in the (full) cone C(I, O). However, B is in the CRS technology T CRS generated by DMU A which is the closure of C(I, O). (Note that B is dominated by the origin (0, 0, 0) ∈ T CRS and is included in T CRS by the axiom of free disposability). Therefore, the cone C(I, O) is not a closed set.
It is clear that conditions (8) are closely related to conditions (2). The difference is that, in (8), the scaling factor σ is attached to all volume and ratio inputs i ∈ I and outputs r ∈ O with respect to which we define the cone extension, and the remaining inputs and outputs in the sets I \ I and O \ O are kept fixed. As a result, the closed partial cone extensionC(I , O ) of technology T R VRS includes all partially scaled R-convex combinations of the observed DMUs, where only the inputs and outputs in the sets I and O are scaled by σ ≥ 0, and the remaining inputs i ∈ I \ I and r ∈ O \ O are not scaled.

The most productive scale size in the R-VRS technology
Recall that Banker (1984) introduces the notion of MPSS evaluated with respect to the entire vectors of inputs and outputs in the VRS technology T VRS (which, according to Remark 1, can be viewed as a special case of the R-VRS technology T R VRS ). Namely, let DMU (X o , Y o ) ∈ T VRS , and let ϕ > 0 and ψ > 0 be the scaling factors for the input and output vectors, respectively. Consider the parametric set of all DMUs (ϕ X o , ψY o ) ∈ T VRS . All such DMUs have the same structure of the input and output vectors as the original DMU ( Below we follow the same approach and introduce the notion of MPSS in technology T R VRS evaluated with respect to the subsets I and O . Consider the program where the vectors X o (ϕ) and Y o (ψ) are defined as in (4).

If the sets I and O include all inputs and outputs (i.e., if I = I and O
In this case, similar to the standard definition of Banker (1984), program (9) maximizes the average productivity ψ/ϕ among all DMUs stated in the form (ϕ X o , ψY o ) ∈ T R VRS , i.e., preserving the input and output structures of the DMU (X o , Y o ) under the consideration.
In the general case, program (9) maximizes the ratio ψ/ϕ which is interpretable as the ratio of the quantity ψ of the subvector of selected outputs Y or , r ∈ O , to the quantity ϕ of the subvector of selected inputs X oi , i ∈ I , while keeping the remaining inputs and outputs constant.
is at MPSS evaluated with respect to the subsets I and O if θ * = 1.
Let us show that solving program (9) can be replaced by assessment of the partial output radial efficiency of DMU (X o , Y o ) (with respect to the outputs in the set O only) in the closed partial cone extensionC(I , O ) of technology T R VRS whose statement is obtained by Theorem 2. This result is formally stated and proved as Theorem 3 below. Let us first provide its intuitive explanation.
Consider program (9). Note that we can substitute technology T R VRS in its constraints by its cone extension C(I , O ). Indeed, for any feasible solution ϕ, ψ of program (9), changing T R VRS to C(I , O ) adds the full ray αϕ, αψ , α > 0, to its feasible region. However, because αψ/αϕ = ψ/ϕ, this change does not affect the supremum of the objective function. Furthermore, because the value of the objective function ψ/ϕ of the resulting program (9) with T R VRS replaced by C(I , O ) is constant along any ray of feasible solutions αϕ, αψ , α > 0, we can restrict the feasible set to only one point on each ray, by requiring that ϕ = 1. After this normalization of variable ϕ and renaming variable ψ to η, we obtain the following program: Taking into account that the cone C(I , O ) is generally not a closed set, we replace it in the constraints of program (10) by the closed coneC(I , O ). This results in a linear program in which we conventionally change the supremum of the objective function to its maximum: The next result requires an additional assumption about DMU (X o , Y o ) which should be true in any meaningful application: Assumption 2 There exists an r ∈ O such that Y or > 0.
Theorem 3 Let both Assumptions 1 and 2 be true. Then the supremum θ * of program (9) is equal to the maximum η * of program (11), and both are attained.
It is clear that program (11) which uses the statement (8) of the coneC(I , O ) is nonlinear and would be problematic in practical computations. Therefore, Theorem 3 should primarily be of theoretical interest. It shows that the evaluation of MPSS for DMU (X o , Y o ) in technology T R VRS (with respect to the subsets I and O ) is equivalent to the evaluation of its partial output radial efficiency (with respect to the subset O ) in the closed partial cone extension VRS if and only if its partial output radial efficiency inC(I , O ) evaluated with respect to the subset O is equal to 1.
Consider the special case in which technology T R VRS is the standard VRS technology T VRS of Banker et al. (1984) and the sets I and O include all inputs and outputs. Then the closed coneC(I , O ) is the standard CRS technology T CRS of Charnes et al. (1978). In this case, Theorem 3 becomes a well-known result that DMU o ∈ T VRS is at MPSS if and only if its output radial efficiency η * in the benchmark CRS technology T CRS is equal to 1.
In Sect. 5, we consider computational approaches to solving program (9). We transform this program to an equivalent form that, depending on the sets I, O, I and O , either becomes a linear program or can be solved as a mixed integer linear program.

Overall and scale efficiency in the R-VRS technology
Following the approach of Banker (1984), we interpret the inverse optimal value 1/θ * of program (9) [or, equivalently, the inverse value 1/η * of program (11)] as the overall efficiency assessed with respect to the subsets I and O . By Otherwise we can decompose the overall efficiency into the product of its technical and scale efficiency components Following Banker (1984), the scale efficiency S E(X o , Y o ) is interpretable as a measure of divergence from MPSS with respect to the sets I and O . Indeed, let for simplicity DMU is not at MPSS (for the selected sets I and O ) and, in program (9), we have θ * > 1. By Theorem 3, the supremum θ * is attained at some feasible solution ϕ * , ψ * of program (9), and we have scales its inputs in the set I by the factor ϕ * and outputs in the set O by the factor ψ * , while keeping the remaining inputs and outputs fixed, its average productivity (measured only with respect to the selected inputs and outputs) will increase by ψ * /ϕ * > 1. Therefore, the ratio ψ * /ϕ * = 1/S E(X o , Y o ) shows by how much the average productivity of this DMU, measured with respect to the selected inputs and outputs in the sets I and O , could increase if it were to change these inputs and outputs by the factors ϕ * and ψ * , respectively, to match those of its MPSS. This is similar to the standard notion of MPSS in the VRS technology. Following Banker (1984), the optimal ratio ψ * /ϕ * is interpretable as a measure of divergence of the technically efficient DMU (X o , Y o ) from its MPSS, if we restrict the scaling of the inputs and outputs to the sets I and O only, while keeping the remaining inputs and outputs constant.
This interpretation is illustrated by the following example.
Example 3 Let us refer to technology T R VRS in Example 2. Consider the following three scenarios in which the set I includes the single input but the sets O are defined differently. Note that both DMUs A and B are technically efficient in all three scenarios.
(i) Let the set O include both the volume and ratio outputs. Consider assessing the scale efficiency of DMU A by program (9). Its unique optimal solution is ϕ * = 2 and ψ * = 3, which corresponds to DMU P (see Table 3 and Fig. 2). 5 DMU P uses twice the amount of volume input of DMU A but produces three times its vector of outputs. The optimal value of the objective function of program (9) is equal to 3/2, and the scale efficiency of DMU A is 2/3. Note that, in the given scenario, DMU P represents MPSS for DMU A. Similarly, in the case of DMU B, the unique optimal solution to program (9) is ϕ * = ψ * = 1. Therefore, DMU B is at MPSS. 6 (ii) Let the set O include only the volume output but not the ratio output. In this case, program (9) assesses the scale efficiency of DMU A by keeping its ratio output constant, i.e., by restricting the evaluation to the section of technology K C ADE. In this case the optimal solution is ϕ * = 2 and ψ * = 3. This means that the scale efficiency of DMU A is equal to 2/3 and its MPSS is DMU D. Similarly, the scale efficiency of DMU B is assessed on the section H F BG, and DMU B is at MPSS. (iii) Let the set O include only the ratio output but not the volume output. To evaluate the scale efficiency of DMU A, we search among all DMUs on the broken line ALV W . The highest ratio of the ratio output to the volume input is achieved at DMU V , which corresponds to ϕ * = 2 and ψ * = 6 in program (9). Therefore, in this scenario, the scale efficiency of DMU A is ϕ * /ψ * = 1/3, and DMU V represents MPSS for DMU A. A similar investigation shows that DMU B is at MPSS.

The cone extension and the CRS extension of the R-VRS technology
The cone and CRS extensions of the standard VRS technology are the same sets, and both can be used as the reference technologies in the evaluation of MPSS and scale efficiency of the DMUs. The purpose of this section is to demonstrate that the same identity does not hold for the R-VRS technology. Namely, its CRS extension is generally different from its cone extension, and it is the latter that we employ in the evaluation of MPSS and other scale characteristics.
In order to demonstrate this difference and avoid excessive technicalities, for the discussion in this section, we assume that I = I and O = O. We also assume that the bounds (1) are not specified.
As noted in Remark 1, if all inputs and outputs are volume measures, technology T R VRS is the standard VRS technology T VRS of Banker et al. (1984). In this case, program (9) represents the standard approach of Banker (1984) for the assessment of MPSS in the VRS technology. Further, the closed cone extension of technology T VRS coincides with its CRS extension T CRS , which is the CRS technology of Charnes et al. (1978). This implies that the evaluation of MPSS for DMU (X o , Y o ) by solving program (9) (2015) in which the ratio inputs and outputs are of the proportional type with no bounds. Such inputs and outputs are assumed to be scalable in the same proportion as the volume measures. Following Olesen et al. (2015), we denote such technology T P VRS . This technology is defined (in the sense of the minimum extrapolation principle) by Axioms 1-3 and the following additional axiom: By Theorem 2 in Olesen et al. (2015), technology T P CRS is the set of all DMUs (X , The full closed cone extensionC(I, O) of technology T R VRS and its R-CRS extension T P CRS are in general different sets. Indeed, the former is stated by conditions (8) in which I = I and O = O. In these conditions, all observed DMUs (X j , Y j ), j ∈ J , are scaled by the same single factor σ . In contrast, in (12), all observed DMUs are scaled independently by generally different factors σ j , j ∈ J . Therefore, we haveC(I, O) ⊆ T P CRS . Example 4 considered below and a further example in Appendix B show that, generally, this embedding is not an equality.
To facilitate the graphical illustration in the next example, we first establish the following useful result which is true if the specification of technology T P CRS includes only a single ratio measure. This measure can be a ratio input or a ratio output, but not both. This result is not valid if the technology incorporates more than one ratio measure.

Fig. 3 Technology T R VRS in Example 4
Theorem 4 Let technology T P CRS include a single ratio measure, i.e., let the union of the sets I R ∪O R be a singleton. Also, let this ratio measure be strictly positive for every observed DMU. (E.g., in the case of single ratio input X R , we assume that X R j > 0, for all j = 1, . . . , n.) Then technology T P CRS is the standard CRS technology T CRS generated by the same set of observed DMUs.
As established in Sect. 4.3, the maximum average productivity that DMU (X o , Y o ) achieves at its MPSS in technology T R VRS , as represented by the maximum of the objective function ψ/ϕ of program (9), is equal to the average productivity of its output radial projection in the closed cone extensionC(I, O). The following example illustrates the case in which the maximum average productivity for DMU (X o , Y o ) with T R VRS (orC(I, O)) as the benchmark technology falls below the average productivity in technology T P CRS . Example 4 The example involves four DMUs A, B, C and D with one volume input x V , one proportional ratio input x R and one volume output y V , as shown in Table 4.
Technology T R VRS generated by DMUs A, B and C is shown in Fig. 3. It is not convex and consists of the two shaded parts. Note that the ratio input x R of the DMUs B and C is equal. By Axiom 3 of selective convexity, the line segment BC is included in technology T R VRS . Because the ratio input x R of the DMUs A and B is different, the line segment AB is not included in this technology.
Let us evaluate the scale efficiency of DMU D. This requires that we consider all DMUs in T R VRS stated as (2α, 2α, y V ) and identify the largest ratio y V /α among all such DMUs. Consider the two-dimensional piecewise linear section of technology T R VRS with the input mix described parametrically as (2α, 2α). In other words, we measure input by α ≥ 0. The It is clear that DMU D is scale efficient. It achieves the maximum average productivity equal to y/α = 1.75/1 among all DMUs stated in the form (2α, 2α, y V ) and plotted on the red line in Fig. 4. Expressed differently, DMU D is also located on the ray O K . This ray represents the upper boundary of the closed cone extensionC(I, O) of T R VRS . The R-CRS technology T P CRS is shown as the shaded area in Fig. 5. In this example, by Theorem 4, it coincides with the standard CRS technology T CRS generated by the observed DMUs A, B and C. The cone spanned by DMUs A and B of dimension 2 is a facet in T P CRS . 7 DMU D is located below this facet and is an interior point of technology T P CRS . Consider the two-dimensional piecewise linear section of T P CRS with the input mix fixed to (2, 2) of DMU D. The possible increase in the observed average product for D (equal to 1.75) to be obtained with T P CRS as benchmark can now be estimated by the maximal expansion of the current output (equal to 1.75) with inputs fixed at the level (2, 2).
Straightforward calculations show that the output projection of DMU D on the boundary of technology T P CRS is DMU D with the input mix equal to (2, 2) and y V = 2. Using the 7 It is also straightforward to show that the segment AB is in T P CRS independently of Theorem 4. Both DMUs x V A , x R A , y V A = (3, 1, 2) and x V B , x R B , y V B = (1, 2, 1.5) belong to T P CRS . Applying Axiom 4 of proportionality to DMU B with α = 0.5, we obtain DMU B * = (0.5, 1, 0.75) ∈ T P CRS . By Axiom 3 of selective convexity, any convex combination of DMUs A and B * (which have the same ratio input x R = 1), belongs to T P CRS . The facet spanned by these convex combinations is identical to the facet spanned by the line segment AB.

Computation of scale efficiency
As highlighted in Sect. 4, the evaluation of MPSS and overall and scale efficiency by nonlinear programs (9) or (11) is generally problematic. In this section we obtain an equivalent statement of these programs whose use in practical computations is straightforward. Below, when referring to program (9), we assume that its constraint . This program is stated in terms of the variable vector λ and scalars ϕ and ψ. Similarly, we restate program (11) using the statement of the closed partial coneC(I , O ) by Theorem 2. (The full statement of the resulting program (11) is shown as program (28) in the proof of Theorem 5.) We now use the substitution λ = λσ . Suppressing the prime symbol, we state the resulting program aŝ In program (13), the bounds (13l) and (13n)  The next result establishes a one-to-one correspondence between the optimal solutions of programs (9) and (13). We prove it under the following stronger variant of Assumption 2: Assumption 3 Either there exists an r ∈ O V ∩ O such that Y or > 0, or there exists an r ∈ O R ∩ O such that both Y or > 0 and the upper boundȲ R r in the corresponding constraint (13k) is finite. (Such constraint needs to be specified and should not be omitted. 8 ) Clearly, a simple sufficient, but not necessary, condition that guarantees that Assumption 3 is satisfied, is that all outputs of DMU (X o , Y o ) are strictly positive and, additionally, all ratio outputs have a specified finite upper bound (1).
To see the importance of Theorem 5, recall that the evaluation of MPSS and the overall efficiency of DMU (X o , Y o ) rely on our ability to solve program (9). Any of its optimal solutions ϕ , ψ would correspond to the MPSS ( VRS , defined with respect to the selected input and output sets I and O . The inverse of the optimal value 1/θ * is the overall efficiency of the DMU (X o , Y o ). Because program (9) is nonlinear, we face obvious computational challenges in its practical use. Theorem 3 stated in Sect. 4.3 appears to suggest a conventional way to overcome this computational difficulty. Namely, instead of evaluating the maximum average productivity ψ/ϕ by solving the nonlinear program (9), we may equivalently solve program (11) which evaluates the partial (with respect to the set O ) output radial efficiency of DMU (X o , Y o ) in the closed partial cone extensionC(I , O ) of technology T R VRS . Unfortunately, the described conventional approach based on solving program (11) instead of (9) does not resolve all computational problems. First, if we state program (11) in the full extended form (shown in the proof of Theorem 5), we obtain a nonlinear program. Second, even if we solve this program and identify its optimal solution λ * , η * , σ * , Theorem 3 does not tell us how to convert this solution to an optimal solution of program (9) and identify the MPSS of DMU (X o , Y o ).
The new Theorem 5 resolves the above problems. It shows that the nonlinear program (11) can be equivalently restated as program (13). As discussed in Remark 2 below, the latter program is easy to solve. Furthermore, Theorem 5 establishes a one-to-one correspondence between the optimal solutions of programs (9) and (13), and provides simple formulae that transform an optimal solution of either program to an optimal solution of the other program. This means that, in practice, we can solve only program (13) and subsequently convert its optimal solution to an optimal solution of program (9), thus identifying the MPSS of DMU Remark 2 In practical applications, solving program (13) should be unproblematic. Indeed, if the sets I and O do not include ratio inputs and outputs, then the sets I R ∩I and O R ∩O are empty, the constraints (13f), (13h), (13k) and (13m) are omitted and program (13) becomes a linear program. If at least one ratio input or output is included in the sets I and O , then program (13) is nonlinear. In this case, the constraints (13f) and (13h) can be restated as "either-or" conditions and further linearized using the "big M" method as discussed in Olesen et al. (2017). This transforms program (13) to a mixed integer linear program.

Special cases
Below we consider several special cases of program (13). Recall that, as discussed in Sect. 5, solving this program is equivalent to the identification of MPSS by program (9).

The model of Banker and Morey (1986)
Suppose that there are no ratio measures, i.e., I = I V and O = O V . In this case, the inequalities (13f)-(13i) and (13k)-(13n) are omitted from program (13). If the set O includes all volume outputs but I does not include all volume inputs, the resulting program (13) is the model of Banker and Morey (1986) in which the inputs in the complementary subset I V \I are regarded as non-discretionary. If additionally some (non-discretionary) outputs are permitted and are not included in the set O , program (13) is a generalization of the model of Banker and Morey (1986) provided by Golany and Roll (1993). In both models, the inequalities (13c) and (13e) disallow proportional scaling of the volume measures that are not included in the sets I and O .

The partial cone extension of the model of Ruggiero (1996)
Let I = I V and O = O V . This corresponds to an important practical situation in which the ratio inputs and outputs, often representing environmental and quality characteristics, are assumed constant in the evaluation of the technical and scale efficiency (Ruggiero 1996). In the described scenario, conditions (13c), (13e), (13f), (13h) and (13k)-(13n) are omitted, and equality (13j) and variable σ become redundant. Then program (13) is stated as follows: The technology employed by model (14) allows proportional scaling of the volume inputs and outputs while keeping the ratio measures constant. This corresponds to the special case of the R-CRS technology, denoted T F CRS , in which all ratio inputs and outputs are of the fixed type (Olesen et al. 2015). 9 Technology T F CRS can be viewed as the partial cone extension (with respect to the volume inputs and outputs only) of technology T R VRS stated by Theorem 1. It can also be seen as the CRS extension of the model of Ruggiero (1996) which allows proportional scaling of the volume measures, while keeping the exogenous ratio measures fixed.

The full scale efficiency
Let the sets I and O include all volume and ratio measures, i.e., let I = I and O = O. This case was illustrated by scenario (i) of Example 3. As a practical example, in an application to schools, volume measures may represent the teaching hours, expenditure and the number of pupils. The ratio inputs and outputs may represent the percentage of pupils with good grades on entry and exit, and also the percentage of pupils from the higher socio-economic background. The policy maker may be interested in the full scale characterization of the production frontier, in which case the sets I and O include all, volume and ratio, inputs and outputs. Note that, regardless of whether some inputs or outputs are discretionary or non-discretionary, it may still be useful to consider the full scale characterization that takes into account all such measures-see Sect. 4.1.
In the described scenario, program (13) takes on the following simplified form: subject to 9 As a special case of program (13), program (14) is based on Assumption 3. This means that Y V o = 0. As proved by Olesen et al. (2015), under this assumption, the technology employed by model (14) coincides with technology T F CRS . Without the assumption that Y V o = 0, the statement of technology T F CRS is more complex.
All observed DMUs (X j , Y j ), j ∈ J , and the bounds on the ratio measures (1) in program (15) are fully scaled by the scaling factor σ . Note that the factor σ is not explicitly present in constraints (15b) and (15c). However, taking into account (15f), the conical combinations of the volume inputs and outputs of the observed DMUs in (15b) and (15c) can also be viewed as the convex combinations of the scaled vectors σ Y V j and σ X V j taken with the weights λ j /σ that add up to 1.

Returns to scale in the R-VRS technology
If a DMU is scale inefficient, a further question arises: is it too small or too large compared to its MPSS? Answering this question leads to the RTS characterization of the production frontier. For the conventional VRS technology, such characterization is based on the underlying notion of (one-sided) scale elasticity. It was originally developed by Banker (1984) and Banker and Thrall (1992) and further explored in the literature-see, e.g., Førsund and Hjalmarsson (2004), Hadjicostas and Soteriou (2006), Chambers and Färe (2008), Zelenyuk (2013) and Sahoo and Tone (2015).
In this section, we show that the notion of RTS can also be extended to the R-VRS technology. Because the R-VRS technology is generally nonconvex, such extension is not straightforward. In particular, the scale elasticity as a marginal scale characteristic and the RTS characterization based on it are not generally suitable indicators of a direction to MPSS.
In order to extend the notion of RTS to the R-VRS technology, we employ two approaches, depending on the selected sets I and O of inputs and outputs with respect to which we measure the scale efficiency. Namely, if the selected inputs and outputs are volume measures, we employ a variant of the standard notion of local RTS (Banker and Thrall 1992). If at least one of the selected inputs or outputs is a ratio measure, we identify a direction to MPSS by exploring the range of optimal values of variable σ in program (13). This approach is conceptually related to the approach of Färe et al. (1983bFäre et al. ( , 1985 and leads to the notion of global RTS (Podinovski 2004a, b).
The returns-to-scale characterization applies to DMUs located on the production frontier. 10 We therefore require that DMU (X o , Y o ) satisfies the following assumption: is output radial efficient with respect to the subset of outputs O , i.e., in program (5), we haveη = 1.

Local RTS in the R-VRS technology
Below we define the local RTS characterization of the production frontier of technology T R VRS with respect to the volume inputs and outputs only, while keeping the ratio inputs and outputs fixed.
Let the sets I and O include all volume measures and exclude all ratio measures: In the case of technology T R VRS , we can similarly define local RTS (with respect to volume inputs and outputs only) by evaluating the one-sided scale elasticities at the DMU (X o , Y o ) in the section T of technology T R VRS obtained for the fixed vectors X R o and Y R o . The section T can be regarded as a technology which represents all volume input and output combinations (X V , Y V ) possible for the fixed vectors X R o and Y R o of the ratio inputs and outputs. Taking into account Theorem 1, technology T is the set of all nonnegative DMUs (X V , Y V ) for which there exists a vector λ ∈ R n such that the following conditions are true (note that the vectors X R o and Y R o are fixed, and there is no need to specify bounds (1) on the ratio measures): Technology T can be viewed as the standard VRS technology generated by the subset J of observed DMUs (X j , Y j ) such that both vector inequalities Y R j ≥ Y R o and X R j ≤ X R o are true. If an observed DMU (X j , Y j ) does not satisfy these inequalities, we have λ j = 0, which excludes this DMU from the subset J .
the partial right-hand and left-hand scale elasticities evaluated at the DMU (X V o , Y V o ) on the boundary of T . Because T is a VRS technology, their computation is straightforward. Indeed, let ω min and ω max be the minimum and maximum optimal values of the dual variable ω to the normalizing equality 1 λ = 1 of the outputoriented linear program stated for DMU (X o , Y o ). (Identifying ω min and ω max requires solving two linear programs.) Then In line with Banker and Thrall (1992), we have the following definition.
Definition 3 DMU (X o , Y o ) ∈ T R VRS exhibits the following types of partial RTS evaluated with respect to the volume inputs and outputs only:

Remark 3 The one-sided scale elasticities
can also be calculated using the linear program that evaluates the input radial efficiency of the DMU (X V o , Y V o ) in the VRS technology T (Førsund and Hjalmarsson 2004;Hadjicostas and Soteriou 2006;Sahoo and Tone 2015;Zelenyuk 2013). Furthermore, in some applications, only a subset of volume inputs and outputs may be of interest and included in these sets, i.e., we may have The partial one-sided scale elasticities with respect to the selected subsets I and O can be evaluated in technology T using the linear programming approach of Podinovski et al. (2016) of which formulae (16) are a special case.

Global RTS in the R-VRS technology
The global RTS (GRS) characterization of production frontiers was introduced by Podinovski (2004a, b). The types of GRS are indicative of the direction in which DMU (X o , Y o ) should resize to achieve MPSS. For example, DMU (X o , Y o ) exhibits global increasing RTS if it is smaller than its MPSS and, therefore, needs to increase the scale of its operations to achieve its MPSS. 11 In any convex technology, including the VRS technology, the local and global RTS characterizations are identical (Podinovski 2017). However, in a nonconvex technology such as the R-VRS technology, the local types of RTS are no longer indicative of the direction to MPSS (see the example in Sect. 7.3), and the global characterization needs to be used instead.
In this section, we consider arbitrary nonempty subsets of volume and ratio inputs and outputs I ⊆ I and O ⊆ O. Let DMU (X o , Y o ) satisfy Assumption 4. In order to evaluate its scale efficiency with respect to the selected sets I and O , we solve program (9) or equivalent program (13).
By statement (iii) of Theorem 5, the case ϕ * < 1 corresponds toσ > 1 in an optimal solution of program (13), and the case ϕ * > 1 corresponds toσ < 1. We can now extend the characterization of GRS defined in Podinovski (2004a) in the case I = I and O = O to the case of partial GRS evaluated only with respect to the sets of inputs and outputs I and O .
Let DMU (X o , Y o ) ∈ T R VRS satisfy Assumption 4. In order to determine if DMU (X o , Y o ) is at MPSS and, if not, whether there exists an MPSS that is smaller or larger than DMU (X o , Y o ), we solve two additional programs derived from program (13). The first can be viewed as the non-increasing returns-to-scale (NIRS) analogue of program (13), and the second as its non-decreasing returns-to-scale (NDRS) analogue.
The final logical possibility is the case in which 1 <η 1 =η 2 . In this case, DMU (X o , Y o ) is scale inefficient and has at least two different directions to MPSS, one of which is larger, and the other smaller, than the DMU (X o , Y o ). We class DMU (X o , Y o ) as exhibiting global sub-constant RTS (G-SCRS).

Remark 4 Assume that DMU
Then the projected DMU (X * , Y * ) satisfies Assumption 4 and we can characterize its type of GRS by solving programs (18) and (19). Let the corresponding optimal values of these two programs beη * 1 andη * 2 , respectively. It is clear thatη * 1 =η 1 /η andη * 2 =η 2 /η, whereη 1 andη 2 are the optimal values of programs (18) and (19) stated for the evaluation of DMU (X o , Y o ). Thenη * 1 ≤η * 2 if and only ifη 1 ≤η 2 . This implies that the procedure for the characterization of GRS based on the optimal valuesη 1 andη 2 of programs (18) and (19) stated for DMU (X o , Y o ) (at which the notion of GRS is undefined) in fact characterizes GRS at the projection (X * , Y * ).

Example of evaluation of RTS
Let T R VRS be the R-VRS technology considered in Examples 2 and 3. We first illustrate the notion of local RTS, by referring to scenario (ii) in which we define I = I and O = O. In this case, technology T is the section K C ADE shown in Fig. 1 and, separately, in Fig. 6.
Consider DMU A and note that it satisfies Assumption 4. The left-hand scale elasticity at DMU A is undefined because it is not possible to reduce its input in technology T , and we can conventionally take ε − (A) = +∞. The right-hand scale elasticity ε + (A) = 2. It corresponds to the movement along the side AD away from A and can be calculated as the ratio of the marginal productivity (equal to the slope 2 of the line AD) to the average productivity at A (equal to 1/1 = 1). By Definition 3, DMU A exhibits IRS with respect to the volume input and output.
Because the section T is convex, the local type of RTS of DMU A coincides with its global (G-IRS) type, and both are indicative of the direction to its MPSS at DMU D.
Now consider scenario (i) in which the set I includes the single volume input and the set O includes both volume and ratio outputs. As shown in Example 3, DMU A is scale inefficient and its MPSS is DMU P. Because DMU P is larger than A, the latter exhibits G-IRS with respect to all inputs and outputs. Note that, in this scenario, the right-hand scale elasticity ε − (A) = 0 and corresponds to the movement along the line AL (see Footnote 5). Therefore, locally, DMU A exhibits DRS to the right, although in the global sense, it should increase the scale of its operations to achieve its MPSS at DMU P.
Finally, consider scenario (iii) in which the set O includes only the ratio output. As shown in Example 3, DMU A is scale inefficient and its MPSS is at DMU V , which is larger than A. Therefore, DMU A exhibits G-IRS in this scenario as well. As in scenario (i), the righthand scale elasticity at A in the section of technology T R VRS defined by its boundary ALV W is equal to zero and corresponds to the marginal movement along the line AL away from A. Therefore, DMU A exhibits DRS, and this local characterization is not indicative of the direction to MPSS.

Illustrative application
In this section, we illustrate the application of the developed methodology using a sample of 39 secondary schools in the West Midlands region of England. The data was collected in 2020-21 and is publicly available from the official website of Department for Education.   Table 5 shows summary statistics of the measures used in the application. (The full data set used in this application is given in Appendix C.) The volume inputs 1 and 2 represent the number of teachers and expenses in thousands of British pounds (excluding teacher salaries) at each school. The ratio input 3 measures the percentage of pupils who are not eligible for free school meals (NFSM). This input is a socio-economic characteristic of the catchment area which is assumed to have a positive effect on the performance of the school. The volume output 1 is the number of all pupils at the school. The ratio output 2 is the proportion of the final year (sixth form) pupils proceeding to higher education (HE) and is included in the model as a measure of quality of education. (In Sect. 8.4, we discuss potential computational issues arising from the use of small ratios in DEA models. 13 ) We consider three scenarios of evaluation of scale characteristics, by choosing the sets I and O in different ways: first, as the sets of all (volume and ratio) measures; second, as the sets of all volume inputs and outputs; and third, as the sets of all ratio inputs and outputs, respectively.
Preliminary calculations show that 23 schools in the current sample are strongly (Pareto) efficient and, therefore, satisfy Assumption 4. It also turns out that, in each scenario, no additional observed DMUs satisfy this assumption, i.e., the set of observed DMUs that are efficient with respect to the outputs in the set O is the set of strongly efficient DMUs of technology T R VRS . 14

Evaluation with respect to both volume and ratio measures
Consider the evaluation of scale efficiency and RTS characterization of the schools with respect to all, volume and ratio, inputs and outputs. In this scenario, we define the sets I = I and O = O. Table 6 shows the results of computations. Its second column shows the output radial efficiency of each school in the R-VRS technology T R VRS evaluated with respect to the vector of two outputs. The next three columns show the output radial efficiency of the schools in models (13), (18) and (19) which are based on the closed full cone extensionC(I, O) of 13 We use percentages for input 3 and fractions between 0 and 1 for output 2 in line with the way the data are reported by Department for Education. An obvious alternative would be to convert percentages to fractions or fractions to percentages. Rescaling data in such a way does not affect radial measures of efficiency (evaluated with respect to the set O ) and scale characteristics based on them. 14 To verify strong efficiency of DMU (X o , Y o ), we solve the additive model that seeks the maximum sum of individual improvements (slacks in the input and output constraints) of all volume and ratio inputs and outputs, under the condition that the resulting DMU is in technology T R VRS . This model is stated as model (12) in Olesen et al. (2017) in which we take (X ,Ŷ ) = (X o , Y o ).  Table 6 are inverse to the optimal valuesη,η 1 andη 2 of these programs. 15 Following the discussion of Sect. 6.3, instead of solving program (13), we solve the simpler, but equivalent, program (15). We also solve this program with the additional constraints σ ≤ 1 and σ ≥ 1, instead of the NIRS and NDRS programs (18) and (19).
According to Sect. 4.4, the overall scale efficiency of each school is equal to its efficiency in model (13), as shown in the third column of Table 6. The scale efficiency of each school is shown in the second last column of this table. It is obtained as the ratio of its output efficiency in model (13) to its efficiency in technology T R VRS . Let us now consider the RTS characterization of schools located on the frontier of technology T R VRS . As highlighted in Sect. 7, because this technology is not convex, the standard RTS characterization of its frontier is generally not well-defined and is not indicative of the direction to MPSS. Instead, we use the global RTS characterization to identify the direction to MPSS for each school.
As already established, 23 schools from the sample are output radial efficient and are located on the frontier of technology T R VRS . (The efficiency of these schools in the R-VRS model is equal to 1, as shown in the second column of Table 6.) As discussed in Sect. 7.2, the characterization of GRS for these schools can be obtained by comparing their efficiency in the NIRS and NDRS programs (18) and (19).
The last column of Table 6 shows the resulting GRS characterization. For example, School 2 is at MPSS (or, equivalently, exhibits G-CRS). School 11 exhibits G-IRS and is, therefore, smaller than its MPSS. (Note that, in this case, for the optimal values of programs (18) and (19), we haveη 1 = 1/0.9463 > 1/1 =η 2 ). School 24 exhibits G-DRS and is, therefore, larger than its MPSS.

Remark 5
For the inefficient schools, the last column of Table 6 shows the GRS characterization of their radial projections on the boundary of technology T R VRS (see Remark 4 for the discussion of GRS evaluated at the projections of inefficient DMUs). For example, School 1 is inefficient. Improving its efficiency by proportional increase of both outputs (by the factor 1/0.9337 = 1.071) would result in the school that exhibits G-IRS. Such projection would be smaller than its MPSS.

Evaluation with respect to volume measures only
Consider the evaluation of scale efficiency and RTS with respect to volume measures only, by keeping the ratio input and output fixed. Define I = {input 1, input 2} and O = {output 1}. In this scenario, we investigate the relationship between the volume parameters of the schools (teachers, expenses and pupils) by assuming that the socio-economic characteristic of the school catchment area (ratio input 3) and quality of education (ratio output 2) remain unchanged.
In this case, model (13) becomes program (14). Models (18) and (19) are obtained from the latter program by the reinstatement of constraint (13j) and the incorporation of additional constraints σ ≤ 1 and σ ≥ 1, respectively. 15 Solving the output-oriented R-VRS model as a mixed integer linear program was discussed in Sect. 3.2 of Olesen et al. (2017). Solving the cone model (13) and its NIRS and NDRS variants (18) and (19) (in all three scenarios considered in the current application) is similar to solving the R-CRS models, as also discussed by Olesen et al. (2017). In all these models, we restate the nonlinear inequalities (13f) and (13h) as "either-or" conditions and linearize them using the common "Big M" approach. The big constant M can be assessed using the constraints of program (13). In all our computations, we used the same value M = 100. Table 7 shows the results of computations in a format similar to Table 6. The difference with the previous scenario is that the efficiencies of all schools are now evaluated with respect to the volume output 1 only. 16 Also, as discussed in Sect. 7.1, in this scenario, the local and global characterizations of RTS are identical.
Consider for example, School 3. As shown in Table 7, this school exhibits DRS (or G-DRS in the equivalent terminology of GRS). This means that a marginal proportional increase of the number of teachers and expenses of this school would result in a smaller rate of increase of the number of pupils, assuming that the ratio measures (input 3 and output 2) remain constant. From a policy-maker's perspective, School 3 is scale inefficient and moving to MPSS means making the school smaller, without any change to the socio-economic environment and without affecting the quality of education.

Evaluation with respect to ratio measures only
Suppose that we are interested in the impact of the socio-economic environment on the proportion of pupils going to university, while keeping the volume measures fixed. Define I = {input 3} and O = {output 2}. Table 8 shows the results of computations using appropriately specified programs (13), (18) and (19).
For example, consider School 11. Because it exhibits G-IRS, it is smaller than its MPSS evaluated with respect to the ratio input and output. This implies that, if we keep the size of the school unchanged (represented by the number of teachers, expenses and pupils), then the socio-economic characteristic of the pupil intake (percentage of pupils not eligible for free school meals) has a more than proportional impact on the quality of education (proportion of pupils going to university).

Notes on computations with small ratios
As mentioned in Footnote 13, in the discussed application, output 2 represents the proportion of final-year pupils proceeding to higher education. We could, of course, convert these proportions to percentages, by rescaling the proportions by a factor of 100. However, we preferred to keep this output unchanged, in line with the way the data was reported by Department for Education.
Using proportions and any other very small numerical values (compared, for example, with the numerical values for school expenditure all of which are in excess of 1000) in DEA models may present well-known computational problems, and rescaling the data prior to calculations may be required in order to obtain correct results. (In the reported application, we were aware of potential problems and repeated our computations with the rescaled ratios, but did not detect any discrepancy between the results.) The scaling of linear optimization problems is a technique utilized in linear optimization solvers with the aim to improve the conditioning of the constraint matrix and decrease the computational effort for solution-see, e.g., Bixby (2002). As observed by Elble and Sahinidis (2012), scaling provides a relative point of reference for absolute tolerances. This is especially important when solving the linearized counterparts of models (13), (18) and (19),  because the linearization of constraints (13f) and (13h) involves one binary variable used with one "Big M" for each DMU in the sample. (The value of "Big M" should of course be chosen sufficiently large but not much larger than necessary. To avoid too large "Big M" values, it is of course valid to use two different "Big M" values, one for constraints (13f) and the other for (13h).) Absolute tolerances are applied to every node of the branch-and bound tree used in the simplex method to determine when a binary variable in a relaxed solution is considered to be integer or a reduced cost coefficient is considered to be nonnegative. The use of absolute tolerances is challenging when, for example, rows in the constraint matrix are of different numerical magnitude as in the constraint matrix for the data set in Table 10. For example, in the discussed application, the numerical values for school expenditure are up to five orders of magnitude larger than the corresponding proportions of pupils proceeding to higher education. A further computational challenge may arise with the use of the "Big M" approach to the linearization of optimization programs with an unbalanced constraint matrix.
In the reported application, we could balance the constraint matrix by restating expenditures in £100,000, pupils in hundreds and their proportion going to higher education as percentages. This would result in all data being in the range [0; 100]. It has been decided to maintain data as is rather than to rescale it. This proved not to be a problem for industry-leading solvers, e.g., CPLEX or Gurobi, because their presolvers include sophisticated rescaling of the constraint matrix by default.

Conclusion
The notions of the most productive scale size, scale and overall efficiency play an important role in efficiency and productivity analysis based on the conventional VRS model. This paper develops an extension of such scale characteristics to the R-VRS technology of Olesen et al. (2015). The latter technology is defined axiomatically and allows both volume and ratio inputs and outputs to be incorporated in the model.
Following the approach of Banker (1984), we start by defining the notion of MPSS in the R-VRS technology. Continuing with the same approach, we define and interpret scale efficiency as a measure of divergence from MPSS and decompose the overall efficiency of a DMU into its technical and scale components.
Similar to the case of conventional VRS technology, the evaluation of MPSS and scale efficiency of a DMU in the R-VRS technology turns out to be equivalent to the assessment of its output radial efficiency in the closed cone extension of the R-VRS technology. Obtaining an explicit statement of such cone extension suitable for optimization is a nontrivial task accomplished in our paper.
Despite the conceptual similarities between the cases of VRS and R-VRS technologies, there is an important distinction to be noted. In the former case, the closed cone extension of the VRS technology is the CRS technology. This allows the scale efficiency in the VRS technology to be alternatively interpreted as the technical efficiency in the benchmark CRS technology. However, as shown in this paper, the closed cone extension of the R-VRS technology is generally different from the R-CRS technology axiomatically defined by Olesen et al. (2015). As demonstrated by examples in this paper, the MPSS and scale efficiency in the R-VRS technology defined according to the approach of Banker (1984) cannot generally be evaluated using the R-CRS technology.
In order to keep the exposition more general and suitable for different applications, we allow the evaluation of MPSS and scale efficiency with respect to any selected subsets of inputs and outputs, while keeping the remaining measures constant. This corresponds to the scenario in which we want to test the response of a selected subset of outputs to changes of a selected subset of inputs while treating the other inputs and outputs as fixed exogenous measures.
We also consider returns-to-scale characterizations of the production frontier of the R-VRS technology. Depending on the choice of inputs and outputs with respect to which we evaluate the MPSS and scale efficiency of a DMU, we employ the local and global variants of the notion of returns to scale. For a scale inefficient DMU, the type of returns to scale (increasing or decreasing) is indicative of the direction to its MPSS, when the resizing is allowed only with respect to the selected inputs and outputs.
To outline further potential research avenues, it is worth noting that the use of ratio data presents similar difficulties in many alternative convex DEA models, for example the models based on assumptions of weak disposability, models incorporating value judgements in the form of weight restrictions and models with complex network structures. It is clear that ratio inputs and outputs are inconsistent with the assumption of convexity and the additional assumption of scalability often made in such models. Exploring the ways to incorporate ratio data and developing approaches to the efficiency and RTS evaluation in such models is a challenging task for future research.

Funding Not applicable.
Code availability Supplementary GAMS code is available online.

Conflict of interest The authors have no conflicting interests.
Ethical approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. DMUs, there exists some (not necessarily unique) convergent subsequence {λ k t }, t = 1, 2, ..., of the sequence {λ k }. Let the limit point of {λ k t } be λ * . Because [0, 1] n is a closed set, λ * ∈ [0, 1] n . Therefore, {λ k t } → λ * as t → +∞. To simplify the proof, and without loss of generality, let us assume that the sequence {λ k } itself is converging, i.e., that {λ k } → λ * as k → +∞.
Let us prove that the sequence {σ k } is contained in a compact set. Because {X k } converges to X * , it is bounded and there exists an M > 0 such that X k i < M, for all i ∈ I and k = 1, 2, ... Because λ * satisfies (8i), there exists a j ∈ J such that λ * j > 0. Because {λ k j } → λ * j as k → +∞, there exists an ε > 0 andk ≥ 1 such that λ k j ≥ ε > 0, ∀k >k. The inequalities (8c) and (8g) stated for all DMUs (X k , Y k ), k >k, imply By definition, we have 0 < ε ≤ λ k j and M > X k i , ∀k >k and ∀i ∈ I. Then, from the inequalities (20), for all k >k, we have By Assumption 1, there exists an i ∈ I such that X j i > 0. If i ∈ I V , then X V j i > 0 and (21a) implies that σ k ≤ σ 1 , for all k >k, where σ 1 = M/(ε X V j i ) > 0. If i ∈ I R , then X R j i > 0 and (21b) implies that σ k ≤ σ 2 , for all k >k, where σ 2 = M/X R j i > 0. Denotē σ = min{σ 1 , σ 2 }. We have proved that the sequence {σ k }, k ≥k, is contained in the compact set [0,σ ]. Then there exists its subsequence that converges to some σ * ∈ [0,σ ]. Without loss of generality, assume that the sequence {σ k } itself is converging, i.e., that {σ k } → σ * ∈ [0,σ ] as k → +∞.
(ii) Letσ = 0. We now use the following extended notation for any DMU (X , Y ): In (22), we use the single prime symbol for the measures in the sets I and O , and the double prime for the measures in I \ I and O \ O . For example, the subvector X V includes inputs i ∈ I V ∩ I , and Y R includes outputs r ∈ O R \ O . The subvector X R 1 includes the inputs from the set I R ∩I for which we have a finite bound in (1). The subvector X R 2 includes the inputs from I R ∩ I for which the bound in (1) is either not specified or is infinite. Becauseσ = 0, inequalities (8a) and (8l) imply thatŶ V andX R 1 are zero vectors. By (8i), there exists a j ∈ J such thatλ j > 0. Then the inequality (8e) for j = j implies thatŶ R is also a zero vector. Using notation as in (22) and replacing the three zero subvectors by 0, state DMU (X ,Ŷ ) as follows: Define the sequence of DMUs {(X (k) , Y (k) )}, k = 1, 2, ..., as follows. Denote J = { j ∈ J :λ j > 0}. For each k, define where max is the operator of component-wise maximum. For example, each component X R 1 (k),i is defined as the maximum of the corresponding components X R 1 j,i , where j ∈ J . Let the other components of (X (k) , Y (k) ) be the same as in (X ,Ŷ ). Note that component X R 1 (k) satisfies (1). It is straightforward to verify that each DMU (X (k) , Y (k) ) satisfies (2) with the same λ =λ and is therefore in T R VRS . Using notation (4), where ϕ = ψ = 1/k, define (X (k) ,Ŷ (k) ) = (X (k) (1/k), Y (k) (1/k)), for any k = 1, 2, . . . We have: By definition, (X (k) ,Ŷ (k) ) ∈ C(I , O ). DMU (X ,Ŷ ) stated in (23) is the limit point of {(X (k) ,Ŷ (k) )}. Therefore, (X ,Ŷ ) ∈C(I , O ). Cases (i) and (ii) imply that W ⊆C(I , O ).
Lemma 2 If Assumptions 1 and 2 are true, the supremum of program (11) is attained.
Proof of Lemma 2 By Theorem 2, the feasible region of program (11) includes all triplets λ, σ, η that satisfy conditions (8) in which DMU (X , Y ) is substituted by (X o , Y o (η)). It suffices to prove that the set is closed and bounded. Conditions (8i) and (8n) imply λ ∈ [0, 1] n . As proved in Lemma 1, Assumption 1 implies that there exists aσ > 0 such that σ ∈ [0,σ ]. Then condition (8a) stated as j∈J λ j σ Y V jr ≥ ηY V or , for all r ∈ O V ∩ O , and Assumption 2 imply that there exists an upper bound M > 0 on η. Therefore, is a bounded set. Furthermore, is a closed set because it is defined by nonstrict inequalities and an equality with continuous functions on both sides. Because the objective function η is continuous on , its supremum is attained at some λ * , σ * , η * ∈ .

Proof of Theorem 4
By definition (Olesen et al. 2015), technology T P CRS is the intersection of all sets in R m + × R s + that satisfy Axioms 1-4. Similarly, technology T CRS is the intersection of all sets in R m + × R s + that satisfy Axioms 1, 2, 4 and the standard axiom of convexity . It suffices to prove that, under the assumptions of Theorem 4, Axiom 3 of selective convexity implies the standard axiom of convexity. (The opposite is always true). To be specific, let the single ratio measure be a ratio input. (The case of ratio output is similar and is not considered.) Consider any two DMUs (X V ,X R ,Ỹ V ) ∈ T P CRS and (X V ,X R ,Ŷ V ) ∈ T P CRS , and any scalar γ ∈ [0, 1]. Define We need to prove that (X V γ , X R γ , Y V γ ) ∈ T P CRS . As assumed,X R > 0 andX R > 0. Then X R γ > 0, and we can also define the two strictly positive scaling factors From (24), we have γX R + (1 − γ )X R = X R γ . Dividing both sides of this equality by X R γ > 0, noting (25) and rearranging, we have γ α Restate (24) as follows: By Axiom 4, the scaled DMUs (αX V ,αX R ,αỸ V ) and (αX V ,αX R ,αŶ V ) are in T P CRS . Note that the scaling is defined in such away that the resulting ratio inputs are now equal. Indeed, from (25),αX R =αX R = X R γ . Taking into account (26) and (27), DMU (X V γ , X R γ , Y V γ ) is a convex combination of these two scaled DMUs. By Axiom 3, (X V γ , X R γ , Y V γ ) ∈ T P CRS .

Lemma 3
The incorporation of the additional inequality η ≥ 1 in the constraints of programs (13) and (28) (as defined in the proof of Theorem 5) does not affect the value and attainability of their supremaη and η * , and does not affect any of their optimal solutions. If Assumption 3 is satisfied, in any of the feasible solutions λ, σ, η of the resulting programs, we have σ > 0.