1 Introduction

Efficiency and productivity in the public sector are major concerns for most parliaments and governments. It is explicitly stated in the Swedish Budget Act (SFS 2011:203), that all state services should be provided with a high level of efficiency. When dealing with state-provided production, this means that it should be efficient; that is, it should use the least amount of inputs at the given output level or produce the maximum amount of production at the given resource level. The motivation for studying productivity development is therefore, from a policy perspective, straightforward. Swedish applications can, for instance, be found for higher education (Andersson et al. 2017a), employment offices (Andersson et al. 2014), day care (Bjurek et al. 1992), elderly care, and primary and secondary education (Arnek et al. 2016). Economic development is concluded to benefit from an efficient judicial system, internationally (Feld and Voigt 2003; Messick 1999). Furthermore, there is an ongoing debate in the media, as well as in research regarding punishments, criminality, and the judicial system in Sweden (Sturup et al. 2018; Tyrefors Hinnerich et al. 2017). For instance, the share of solved suspicious declined from 2004 to 2014, according to Nordén (2015).

The Swedish Government has launched a number of reforms for the district courts during the last 20 years, with the major objective of increasing efficiency and productivity, while maintaining a high degree of law and order (Swedish Agency for Public Management 2007). One such reform has targeted the size of the courts, based on the assumption that scale advantages exist. In 1999, there were 96 district courts in Sweden, but today only 48 exist. Despite the reforms, the Swedish National Court Administration (SNCA) reports that the productivity has declined. However, problems can occur in the study by SNCA, since productivity (i.e., labour productivity) is measured by partial measures, which ignore substitution between inputs. At the same time, new technologies have been introduced in the courts (i.e., the possibility of conducting hearings through video conferences), and these changes are not captured by the productivity measures used.

The aim of this study is to measure the Swedish district courts’ total factor productivity (TFP) from 2012 to 2015. To compute TFP, distances obtained by the data envelopment analysis (DEA) framework, proposed by Charnes et al. (1978), is used. TFP is measured by the Malmquist productivity index (MPI), which was first proposed by Caves et al. (1982) and first applied in a DEA framework by Färe et al. (1994a).Footnote 1 Following Wheelock and Wilson (1999), the MPI is decomposed into four parts: changes in (1) pure technical efficiency; (2) scale efficiency; (3) pure technology; and (4) the scale of the technology. A commonly known issue with all types of DEA analysis is that no statistical inference is possible (Simar and Wilson 2000). In this study a bootstrap approach is used to determine the confidence intervals of the different components presented above (Efron 1979). Another issue with DEA is the influence of outliers (Kapelko and Oude Lansink 2015). The analysis of outliers is, to a large extent, omitted in previous studies on court performance. In this study, an outlier detection analysis is performed to investigate whether the results depend on a few extreme observations.Footnote 2

The findings indicate a decline in TFP of 1.7% on average. However, a substantial variation between courts and between years is present. Between 36 and 57% of the courts have a significantly negative change in TFP, while the share of courts with a significantly positive TFP change is 16–36%. The reason for this is because courts that improve 1 year may show a decline in TFP the following year. Looking at the components, the negative impact is driven by a decline in pure technical change (TC) of 4.7% in 2012–2013. Further, the TFP is significantly negative during 2014–2015. During this period, the number of courts observed to have a significant decline in TFP is larger, and the numbers of courts with a positive and significant TFP development are fewer, in comparison with the rest of the years. The correlation analysis concludes that the rate of change in the caseload has a significantly positive correlation with TFP, which indicates flexibility problems. The caseload variable is defined as the number of pending cases and matters by the start of the year, plus new cases and matters during the year.

This paper is organised as follows. Section 2 provides a brief summary of the Swedish judicial system. Section 3 presents the previous TFP and efficiency literature regarding courts. Section 4 describes the methodology. Section 5 examines the data, including outlier detection. The results are reported in Sect. 6. Finally, Sect. 7 concludes and discusses the policy implications.

2 The Swedish judicial system: a short description

The Ministry of Justice is responsible for matters that are related to the judicial system, which include legislation on the fields of civil law and criminal law, for example.Footnote 3 However, it is not allowed to interfere in the day-to-day work, since the aim of the Swedish legal justice system is to provide fair trials. This requires independence and autonomy between courts, in relation to the Parliament, Government, and other authorities. The judicial process differs, depending on whether it is a criminal case, a civil case, or a matter. The different processes are described in Fig. 1. Each stage has the general purpose of dealing with cases and matters in an efficient manner and in compliance with the rule of law.

Fig. 1
figure 1

Description of the judicial process

Criminal cases, to the left in Fig. 1, are first handled by the police. These cases start with a police report, followed by a preliminary investigation. In the next step, the case can either be closed or sent to the prosecutor, who will decide whether the case will be prosecuted. If the case continues to prosecution, it will end up in a district court. Initially, a dispute is handled by the municipalities; however, if it remains unsolved, it becomes a civil case in a district court. Civil cases are related to a dispute between individuals or business firms. Matters are regulated in the Court Matters Act (SFS 1996:242) and can be separated into four categories: (1) debt clearances; (2) debt enforcements; (3) bankruptcies or company reconstructions; and (4) other matters. Categories 1–3 relate to payment problems, as shown in Fig. 1. Debt clearances and debt enforcements are decided by the Swedish Enforcement Agency and adjudicated, as a matter, by the district courts if the decision of the Swedish Enforcement Agency is appealed (SFS 1981:774). A decision of bankruptcy must be decided by the district court, which is also the case if a business firm applies for bankruptcy. The fourth category, named ‘other matters’ in Fig. 1, includes a variety of matters, for example: estate administrators, parking remarks, heritages, and custodians.Footnote 4

There are three different types of courts that build up the Swedish court system, namely, the general courts, the administrative courts, and special tribunals. The general courts consist of the district courts, the Courts of Appeal, and the Supreme Court. Each of the instances is important, since different instances provide possibilities to appeal to achieve a fair trial, which is a fundamental right in any legal justice system. The Supreme Court, which is the last instance, has the main mission to provide the district court system with legal practice to enhance the uniformity of actions in legal decisions.

This study focuses on the district courts, which have the mission to serve as the first instance in the legal system. Each district court mainly handles cases related to their catchment area, which corresponds to the surrounding geographical area. However, there are five courts that specialise in land and environment cases. These courts deal, for example, with environmental and water issues, property registration, and building matters. Within each court, there are Chief Judges, Senior Judges, and Judges who are considered as permanent judges, the former being the head of the court. Each judge is appointed by the government. There are also law clerks who work as non-permanent judges, including both recent law graduates in the training programme to become a permanent judge and regularly employed law clerks that are not included in the judge training. The work tasks of the law clerks normally consist of preparing cases, but can also include deciding simple cases as a non-permanent judge. Finally, Lay Judges have experience from other occupations and politics and are chosen by the Municipal Council, but they are not educated in law. They work as judges for a period of 4 years.

3 Literature review

There is existing literature that focuses on the labour productivity of courts (Blank et al. 2004; SNCA 2015), but there is only a limited amount of literature that considers court TFP. Kittelsen and Førsund (1992) are the first to investigate TFP change over time. The efficiency scores of Farrell (1957) are used to calculate the MPI, which is decomposed into change in efficiency and technology, with the first year as the base (Caves et al. 1982). In terms of the decomposed factors, the catching-up was 4% and the technology shifted 2% from 1983 to 1988. Kittelsen and Førsund (1992) perform an outlier detection analysis, in which the MPI and its components are shown in a histogram, with the labour share on the x-axis. Based on the diagrams, three courts are considered to be outliers, due to a large improvement or decline in TFP.Footnote 5 Fauvrelle and Almeida (2016) calculate the MPI and decompose it into TC and efficiency change (EC).

Following Färe et al. (1994b), EC is further decomposed into a pure EC and a scale component.Footnote 6 The results show, on average, a positive TFP change of 1.5%, which is decomposed into a decline of 1.7% in TC, a pure EC of 3.3%, and a scale EC of 0.7%.Footnote 7 Both Fauvrelle and Almeida (2016) and Kittelsen and Førsund (1992) use the averages of TFP change and decompose them into, at most, three components. However, neither of them investigates whether the changes are statistically significant. Finally, Falavigna et al. (2017) contribute to the literature by applying a bootstrapped MPI in a two-stage analysis, proposed by Simar and Wilson (2007), to investigate the impact of structural changes in Italian district courts during 2009–2011. The MPI is found to be 0.3%, EC − 0.1%, and TC 0.4%. Further, they conclude that the role of judges is correlated with court productivity and efficiency.

While only a few studies exist on TFP change, efficiency is measured more extensively. Such research is important for this study, since it deals with the question of which inputs and outputs are best for measuring performance.Footnote 8 Lewin et al. (1982) are the first to investigate inefficiency in district courts, using DEA.Footnote 9 Lewin et al. (1982), as well as all the other studies, use the number of employees as an input. In some studies, employees are measured as the number of judges (Falavigna et al. 2017; Ferrandino 2014; Finocchiaro Castro and Guccio 2014). In other studies, the personnel are separated into judges and office staff (Major 2015; Santos and Amado 2014). The caseload of a court is another input included in some studies. The caseload consists of pending and new cases; that is, the demand of justice services (Kittelsen and Førsund 1992; Schneider 2005). For instance, Nissi and Rapposelli (2010) and Schneider (2005) argue the importance of including the caseload, since an underestimation of productivity will occur, because the employees cannot perform their job without incoming or pending cases. However, this is a slightly contradicting argument when analysing TFP, since courts should be able to adjust inputs when justice demands change. This will be discussed, in more detail, in Sect. 5.

Moreover, Beenstock and Haitovsky (2004) argue that individual productivity increases if the work pressure is high. However, the caseload can also, as Kim and Min (2016) argue, correlate negatively with quality. For example, if the caseload is low, more time can be spent on each case, which, on average, generates a more precise judgement. Outputs normally consist of the number of decided cases (Falavigna et al. 2017; Nissi and Rapposelli 2010). In some studies, cases are separated by type; for example, criminal cases and civil cases (Finocchiaro Castro and Guccio 2016). However, due to data limitations, the studies cannot separate outputs within each category based on the spent resources. This aggregation equalises, for example, a murder with a car crime. Different types of crimes require different amounts of resources, due to their dissimilarity in complication. A problem with this will occur, in court performance analysis, if there are differences in the mixture of crime types between courts.

Quality variables are argued to be important in some studies (Yeung and Azevedo 2011). Some attempts to investigate the impact of quality variables on performance can be found in the literature. Examples are judges’ salaries and education, of which the former have a significantly positive effect on efficiency (Deyneli 2012). Furthermore, Schneider (2005) concludes that more PhD holders, as judges, increase the efficiency. Falavigna et al. (2015) use court delay, as an undesirable output, in a directional distance function.Footnote 10 Finally, Andersson et al. (2017b) include a quality measure that relates to the number of changed decisions by a superior court, but does not find any significant correlation with efficiency. All of these studies focus on TE, which is basically a measurement of similarity. For instance, Espasa and Esteller-Moré (2015) argue that the efficiency can be high, even if the courts perform poorly, as long as they are congested. Thus, this is not a good measurement of performance improvements over time; for example, lower inefficiency over time could occur due to a decline in performance of the best district courts.

To sum up, there is no research regarding TFP in Sweden and very little literature, internationally. Furthermore, the international studies do not, due to data limitations, investigate the potential heterogeneity in resource spending within the output categories. Moreover, statistical inference is left out, with the exception of Falavigna et al. (2017), and TFP is at most decomposed into three components.

4 Methodology

Different approaches can be applied in productivity and efficiency studies. Stochastic frontier analysis (SFA) is a widely used parametric methodology (Krüger 2012; Kumbhakar and Lovell 2003). SFA has the advantage of allowing for statistical noise, directly. However, the disadvantage is that it requires a specific functional form. Another option is the DEA approach, which has the advantage of relying on few assumptions and the capability of handling multiple outputs and inputs. Furthermore, DEA is relevant when analysing the public sector, in which the outputs are not sold on the market (Førsund 2016). However, DEA also has some disadvantages. Firstly, it does not give information about inference. To some extent this can be handled by using resampling methods, such as the bootstrap procedure proposed by Simar and Wilson (1998a). The second disadvantage of DEA is its sensitivity to outliers. This shortage is, to a large extent, neglected in previous literature on court performance.

4.1 Outlier detection

There is no optimal procedure to detect outliers, since no generally accepted definition of an outlier can be found (Davies and Gather 1993). However, plenty of methods are applied in different areas. For example, the outlier detection method, by Wilson (1993), is useful when the data checking is costly (i.e., when the data-set is large). Kapelko and Oude Lansink (2015) use a specific deviation from the median. In DEA, it is important to identify observations that substantially push the frontier, as proposed by Banker and Gifford (1988). This procedure, referred to as the method of super-efficiency, is further concluded to perform well in practical applications, using experiments by Banker and Chang (2006) and Banker et al. (2017), which concludes robustness using different scale assumptions. The focus in this paper is TFP change, and a super-efficient unit 1 year may change the results. This paper identifies an observation as a potential outlier if the output-based super-efficiency score, assuming constant returns to scale (CRS), is below 0.75. This limit is used in, for example, the robustness investigation by Agrell and Niknazar (2014) and the empirical application by Edvardsen et al. (2017).Footnote 11 Finally, when a potential outlier is identified, a closer look at the specific observation should be taken to produce arguments for why it is an outlier (Simar 2003).

4.2 DEA and the Malmquist productivity index

The point of reference can be taken either from an input perspective (i.e., minimise the inputs to produce a given level of output) or from an output perspective (i.e., maximise the output given the level of inputs). As in most studies of district courts, an output-based perspective is assumed. There are, in the scope of courts, two reasons for choosing an output-based perspective. First, inputs are not easily changed in the short-run. Second, the individual court has no incentives to change its inputs, since the budget for employees is given for a specific year. Thus, the maximum output should be carried out using a given level of inputs. The production technology in time period t, for the 48 Swedish district courts, is defined as:

$$S_{t} = \left\{ {\left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)\left| {{\mathbf{x}}_{i} \quad {\text{can produce }}{\mathbf{y}}_{i} {\text{at time }}t} \right.} \right\},$$
(1)

where \(S_{t}\) represents the technology. Each court, i, uses a vector of inputs, \({\mathbf{x}}^{t}\), to produce a vector of outputs, \({\mathbf{y}}^{t}\), in period t. Using the output distance function,Footnote 12 the technical efficiency (TE) can, in time period t, be written as:

$${\text{D}}_{\text{O}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right) = inf\left\{ {\theta :\left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} / \theta } \right) \in S^{t} } \right\},$$
(2)

where \(\theta\) is a scalar and the distance is \({\text{D}}_{\text{O}}^{t} \left( {{\mathbf{x}}_{i}^{\text{t}} , {\mathbf{y}}_{i}^{\text{t}} } \right) 1.\) From the distance function, a measure of TE is obtained as \({\text{TE}} = 1/{\text{D}}_{\text{O}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)\).Footnote 13 If TE is equal to unity, the court is on the frontier, meaning that it is technically efficient. However, if TE is larger than unity, the court is inefficient; for instance, TE equal to 1.1 means that the output can be increased by 10%, given the amount of inputs. To calculate the standard MPI, introduced by Caves et al. (1982), the same calculation needs to be performed for the following period: t + 1. This is shown in Eq. 3.Footnote 14

$${\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) = { \inf }\left\{ {\theta :\left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} /\theta } \right) \in S^{t} } \right\}$$
(3)

In Eq. 3, and hereafter, the C subscript represents CRS. Similarly, Eq. 3 can be written in the variable returns to scale (VRS) case, which is defined as \({\text{D}}_{V}^{t} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)\) where V is the VRS representation. If the technology is not CRS, the MPI does not accurately measure TFP, according to Griffel-Tatjé and Lovell (1995). However, Wheelock and Wilson (1999) state that using the CRS assumption, if the true technology is VRS, will generate inconsistent distances that give arguments for not restricting the calculation to one scale assumption.Footnote 15 Using Eqs. 2 and 3, assuming the technology of period t as the reference, Caves et al. (1982) define the MPI as:

$${\text{M}}^{t, t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) = \frac{{{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{\text{C}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}},$$
(4)

where the MPI is the ratio of the output distance functions in each period, respectively. This paper uses the most common version of the MPI, based on Caves et al. (1982). To avoid an assumption of the benchmark technology, Eq. 4 is often defined as the geometric mean of two indices.Footnote 16

4.3 Decomposition

Decomposition of the productivity index was first proposed by Nishimizu and Page (1982), who define TFP as the sum of the EC and TC. The geometric mean of the two indices is, following Caves et al. (1982) and Färe et al. (1992, 1994a), obtained by rewriting Eq. 4 as:

$${\text{M}}_{{}}^{t, t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) = \left\{ {\left[ {{\text{M}}_{{}}^{t} \left( { {\mathbf{x}}_{i}^{t} , \varvec{ }{\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) \times {\text{M}}_{{}}^{t + 1} \left( {{\mathbf{x}}_{i}^{t} , \varvec{ }{\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)} \right]} \right\}^{{\frac{1}{2}}} = \left[ {\frac{{{\text{D}}_{C}^{t} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}} \times \frac{{{\text{D}}_{C}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{C}^{{{\text{t}} + 1}} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right]^{1/2} ,$$
(5)

where the distance functions are defined, assuming CRS. Based on the geometric mean defined in Eq. 5, the MPI can be decomposed into TC and EC. Following Wheelock and Wilson (1999), the decomposition is, while allowing for VRS, written asFootnote 17:

$${\text{M}}_{{}}^{t, t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) = \left( {\frac{{{\text{D}}_{V}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{V}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right) \times \left( {\frac{{{\text{D}}_{V}^{t} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right){\text{D}}_{V}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}{{{\text{D}}_{V}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right){\text{D}}_{V}^{t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right)^{1/2} = {\text{EC}} \times {\text{TC}}.$$
(6)

EC is interpreted as changes in the relative efficiency of a court (i.e., movements towards or away from the frontier), while TC measures the shift of the frontier itself.Footnote 18 EC or TC that is larger (or smaller) than unity, indicates an improvement (or decline) in EC or TC, between period t and period t + 1.Footnote 19 Allowing both TC and EC to have either VRS or CRS, makes the decomposition shown in Eq. 7 possible.

$$\begin{aligned} {\text{M}}^{t, t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} , {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right) & = \left( {\frac{{{\text{D}}_{C}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right) \times \left( {\frac{{{\text{D}}_{V}^{t + 1} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)/{\text{D}}_{C}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{V}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)/{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right) \\ & \quad \times \;\left( {\frac{{{\text{D}}_{C}^{t} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right){\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}{{{\text{D}}_{C}^{t + 1} \left( { {\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right){\text{D}}_{C}^{{{t} + 1}} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right)^{{\frac{1}{2}}} \\ & \quad \times \left( {\frac{{{\text{D}}_{V}^{t} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)/{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}}{{{\text{D}}_{V}^{t + 1} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)/{\text{D}}_{C}^{t + 1} \left( {{\mathbf{x}}_{i}^{t + 1} , {\mathbf{y}}_{i}^{t + 1} } \right)}} \times \frac{{{\text{D}}_{V}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)/{\text{D}}_{C}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}{{{\text{D}}_{V}^{t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)/{\text{D}}_{C}^{t + 1} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)}}} \right)^{{\frac{1}{2}}} \\ & = \Delta {\text{PureEff }} \times \Delta {\text{ScaleEff }} \times \Delta {\text{PureTech }} \times \Delta {\text{ScaleTech}}.\end{aligned}$$
(7)

\(\Delta {\text{PureTech}}\) and \(\Delta {\text{PureEff}}\) are both defined on the best-practice technologies, according to Ray and Desli (1997) and Färe et al. (1994b), respectively. The scale EC measures the movement towards or away from the technically optimal scale. Finally, the scale of the technology (i.e., \(\Delta {\text{ScaleTech}}\)), proposed by Wheelock and Wilson (1999), represents the scale bias of TC (i.e., the geometric mean of two scale efficiency ratios). This means that any change in \(\Delta {\text{ScaleTech}}\) occurs from a change in the shape of the technology. The first ratio consists of the change in the scale of the technology between t and t + 1. The reasoning of the second ratio is similar, specifically a change in the scale of the technology between t and t + 1, relative to the location of the production unit in period t.Footnote 20 Problems with this decomposition can occur when cross-period distance functions are calculated using the VRS assumption, since it can generate missing values for some components.Footnote 21 Finally, this decomposition is criticised slightly for its confusing interpretation. For example, Wheelock and Wilson (1999) interpret what we call \(\Delta {\text{ScaleTech}}\) as the shape of the technology, while Zofio and Lovell (1998) interpret it as the scale bias of the technology (Balk 2001; Ray 2001).Footnote 22

To examine TFP and its decomposed factors, the efficiency needs to be calculated. The reciprocal to the output-based Farrell (1957) measure of TE is formulated by Färe et al. (1994b) as:

$$\left[ {{\text{D}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)} \right]^{ - 1} = {\text{TE}} = {\text{Max }}\theta$$
(8)

Subject to

$$\mathop \sum \limits_{k = 1}^{K} {\mathbf{z}}_{k} {\mathbf{x}}_{k,n} \le {\mathbf{x}}_{j, n} , \quad n = 1, \ldots , N$$
(9)
$$\mathop \sum \limits_{k = 1}^{K} {\mathbf{z}}_{k} {\mathbf{y}}_{k,n} \ge \theta_{j} {\mathbf{y}}_{j, m} , \quad m = 1, \ldots , M$$
(10)
$$\mathop \sum \limits_{k = 1}^{K} {\mathbf{z}}_{k} \ge 0 \left( {\text{CRS}} \right),$$
(11)
$$\mathop \sum \limits_{k = 1}^{K} {\mathbf{z}}_{k} = 1 \left( {\text{VRS}} \right),$$
(12)

where \(z_{k}\) is \({\text{N}} \times 1\) the vector of intensity variables (i.e., weights). The objective is to maximise \(\theta\) which corresponds to minimising the value of the distance function, \({\text{D}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)\).Footnote 23 For example, if \({\mathbf{y}}_{0}\) is an arbitrarily chosen level of output, the maximum output, given the level of inputs, is calculated as \({\mathbf{y}}_{0} *{\text{TE}}^{t}\) or similarly as \({\mathbf{y}}_{0} /{\text{D}}^{t} \left( {{\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right)\).Footnote 24

To compute the MPI, four single-period problems are required, assuming CRS and VRS, as well as four mixed-period problems, under CRS and VRS.Footnote 25 These calculations will generate the average MPI. To improve the robustness of the calculated MPIs and draw conclusions based on statistical inference, a bootstrap approach is applied.

4.4 Bootstrapping the Malmquist productivity index

Statistical inference for DEA is most commonly based on bootstrapping (Efron 1979). Bootstrapping, and other resampling techniques, simulates the data-generating process multiple times by resampling from the data and applying the original estimator to each simulated sample. This generates an approximation of the sample distribution that can be used to create inference that is meaningful in a statistical sense; for example, the confidence intervals of the DEA efficiency scores. These confidence intervals are based on a large number of bootstrap draws (Simar and Wilson 1998a). Further, the efficiency scores can be bias-corrected, as proposed by Simar and Wilson (1999). However, the rule of thumb is not to correct for this bias, unless \({\text{s}}^{2} < \frac{1}{3}\left( {Bias_{B} \left[ {\hat{\theta }({\mathbf{x}}_{i}^{t} , {\mathbf{y}}_{i}^{t} } \right]} \right)^{2}\), where \({\text{s}}^{2}\) is the variance of the bootstrapped values (Simar and Wilson 2000). The procedure can be summarised in four steps: (1) calculate the MPIs as previously described; (2) generate an i.i.d. bootstrap sample from the original sample; (3) calculate the MPIs based on the bootstrap sample; and (4) repeat steps 2 and 3 a sufficient number of times (e.g., 2000 repetitions in our study) to generate standard deviations to construct the confidence intervals of the MPI and its decomposed factors.Footnote 26

To summarise the methodology, the MPI, including all the decomposed factors and their confidence intervals, will be computed and bootstrapped. This provides the possibility to evaluate changes based on statistical significance due to the bootstrapping. The decomposition can serve as a good starting point for investigating the sources of the TFP change. This can be achieved without problems of sample noise, making the results statistically robust.Footnote 27

5 Data

The data used were obtained from the SNCA and cover the time period 2012–2015. However, data on hearing times are available for a longer time period (2007–2015) and will be used for the weighting of the outputs. The available data is more detailed than the data used in previous research. For example, different cases and matters are reported in 292 sub-categories, which will be taken into consideration. The choice of input and output variables is based on what representatives of the courts have stated in interviews as reasonable resource and performance measures, as well as economic theory and previous research.Footnote 28 This very detailed information implies that the complexity of cases and matters within a given sub-category will not vary much, since cases and matters within the same sub-category will be similar.

In the first step, the outputs are aggregated into decided civil cases, decided criminal cases, and decided matters, which are described in Fig. 1. Simply adding these three groups together, however, will introduce aggregation errors. There are differences between courts, regarding the type of cases and matters that they handle. Some courts handle more resource-intense cases than others. For example, a murder case would most likely require more resources than a traffic case. This heterogeneity within different categories of cases and matters is not taken into account in previous studies (Kittelsen and Førsund 1992; Santos and Amado 2014; Yeung and Azevedo 2011). To compensate for these facts, the outputs are weighted by the hearing time (i.e., the time in the courtroom).Footnote 29 This means that courts with a large share of cases, from a complicated category, are not negatively affected in terms of TFP. The weights are based on the average hearing time in each sub-category; for example, criminal cases alone consist of almost 40 sub-categories, and each sub-category receives its own weight.Footnote 30

On the input side, labour is the largest cost share for Swedish district courts, with 70%, and the rental cost is about 13% for the period 2012–2015 (SNCA 2012, 2013, 2014, 2015, 2016). Labour is divided into three categories, specifically the number of hours worked for: (1) judges; (2) law clerks; and (3) other personnel. The reason for dividing labour into three categories is that the different types of staff conduct different tasks at the Swedish district courts, and the staff composition varies between courts, according to the SNCA. A measure of capital is omitted from previous research (Ferrandino 2014; Kittelsen and Førsund 1992), except for Elbialy and García-Rubio (2011), who incorporate computers. To incorporate capital, the office space of the court is used, following the assumption that the amount of capital (e.g., computers and office equipment) is proportional to the size of the premises. We argue that including a measure of capital is important, since it is, to some extent, possible to substitute labour with capital in court production. An example is the incorporation of video conferences, which, according to the SNCA, decreases the travelling time for judges.

Furthermore, the caseload, as described in the previous literature, is an important source of performance for several reasons; for example, if there is no caseload, there will not be any output. We argue that the caseload is an important variable to incorporate when calculating performance that focuses on improvements of technology, management, and so on. However, it is not recommendable to include the caseload in the main analysis when TFP is investigated, since an important factor of TFP changes is flexibility in inputs, i.e., adjustable inputs depending on changes in justice demand. Thus, the caseload is only included in a second-stage correlation analysis. The caseload in year t is defined as the stock of open cases and matters at the end of year t − 1 plus the incoming cases and matters in the present year. A potential problem with this method is that the incoming cases, at the end of 2015, are not included. The correlation is invariant for addition and, therefore, only a problem when the difference is non-random between courts. In our case, however, it is more relevant to assume randomness between courts, meaning that it does not affect the correlations.Footnote 31

5.1 Outlier detection

The chosen limit of the super-efficiency scores is 0.75. This means that if an observation is identified with a super-efficiency score below the limit in any of the years it will be under consideration to become eliminated from the main analysis. Four district courts are below this limit during, at least, one of the years. These are Eksjö, Uddevalla, Gotland and Nacka.Footnote 32 Gotland is unlikely to be super-efficient, in general, based on interviews with representatives for the district courts. However, 2014 is an exception, according to the representatives at the SNCA. Furthermore, from 2013 to 2014, Gotland has a TFP growth of 12%. Therefore, Gotland is eliminated from the main analysis. Thus, four courts are eliminated, meaning that there are 44 courts left in the sample. The descriptive statistics of the outputs and inputs, after the elimination of outliers, are reported in Table 1.

Table 1 Descriptive statistics, excluding outliers

Table 1 show that the differences in output over time are, on average, quite small. However, each of the inputs increased in size over time. For example, the number of full-time equivalent judges increased from 17.28 to 18.81 (9%). The caseload declined over time, but the dramatic drop in the last period is because all incoming cases were not included as previously described. Also note the large standard deviations, which are almost as large as the means. For instance, the Stockholm district court is, in terms of hours worked for judges, almost 31 times larger than Lycksele. Finally, the non-weighted caseload declined during the studied time period.

6 Results

The results, concerning the MPI and its decomposed factors, are reported first, namely, EC and TC. EC and TC are then further decomposed into a pure and scale effect. Then a correlation analysis is performed, based on the MPI and its components. Finally, observations are concluded as outliers are eliminated from the main results.Footnote 33

6.1 Malmquist index and its decomposed factors

The MPI and its components are reported in Table 2.Footnote 34

Table 2 Malmquist index and its decomposed factors after eliminating outliers

In Table 2 the TFP change, measured as the MPI, is negative for 2012–2013 and 2014–2015, respectively. Column 3 reports that TC is significantly negative during the first period and, on average, below zero in the following periods. EC contributes positively to the TFP growth during the period 2012–2014. For the last period, 2014–2015, both TC and EC affect TFP negatively, which generates a statistically significant decline in TFP. We do not aim to identify causes of the different components of TFP change; however, argumentation of potential sources of the results is provided. A negative TC, which is observed for each period, has its original interpretation from other sectors, where it can occur from an absence of reinvestment in capital so less outputs can be produced. However, this is not likely for district courts. Instead, an inward shift of the frontier will most likely occur, due to two reasons. First, if the turnaround time increases, due to more complicated cases, that will generate a lower output, which occurs as a negative TC in the model. This means that all courts are affected the same (i.e., efficiency remains constant, but all courts are closer to the origin). However, the turnaround time decreased during this time period, in terms of both criminal cases and civil cases, according to the SNCA (2014, 2016), which means that the source is something else.

As a second attempt to interpret a decline in TC or EC, depending on if the affected courts operate on the frontier, it is worth studying Table 1 of the descriptive statistics. In Table 1, it can be observed that the average number of decided cases and matters fluctuate between 1 and 2% for 2012–2014; thus, the changes are stable. However, for 2014–2015, the decided criminal cases and matters are in the same range, as previously described, but the civil cases declined by 4.2%, on average. Thus, the produced output decreases in total, driven by a lower number of civil cases. This, however, only concerns decided cases. However, the number of incoming civil cases is reported to decline by 5% during the period 2014–2015.Footnote 35 Thus, a potential explanation for the negative TFP change that is driven by both a decline in TC and EC, depending on if the courts operate on the frontier, is likely to be due to the decline in the caseload during this period. This, in itself, should not decrease TFP if the inputs are fully flexible. However, it does decrease if the inputs are not flexible enough to compensate for the lower workload level. The MPI and its confidence intervals are graphically reported for each court and each year, excluding the outliers (see Figs. 3, 4, and 5 in the “Appendix”).Footnote 36 Furthermore, the geometric mean of the MPI and its components are provided for the individual courts in Table 6.

In columns 5 and 6 in Table 2, it can be observed that the number of courts, with a significantly negative TFP change, is fairly stable during the period 2012–2014; however, the number increases in 2014–2015. Furthermore, the number of courts with significantly positive TFP growth decreases in each year. Both the fact that the production frontier moves towards the origin and the fact that fewer courts have a significant and positive TFP change indicates that this result is not driven by a few observations. Additional to the caseload, other differences may generate differences in TFP change. For example, the organisation within the courts may be an issue. Thus, adjusting the organisation to the best performing court can generate a better development. To gain more information, TC is decomposed into pure TC and scale TC.

6.1.1 Decomposition of TC

The decomposition of TC into pure TC and scale TC is performed according to Eq. 7 in Sect. 4. The results are reported in Table 3.

Table 3 Decomposition of technical change into pure technical change and scale technical change

TC is defined as the product of pure TC and scale TC.Footnote 37 Pure TC means that the best firms, assuming CRS, have a significant decline in 2012–2013 and a smaller negative change in the following periods. The movement of the technology, from the optimal scale (i.e., scale TC), generates regress of 2.2% for the same time period. This indicates that the largest decline has its source in pure TC, meaning that the frontier moves inwards; however, the shape of scale TC also contribute negatively. In November 2011, there was a reform so that a type of matter, handled only by a few courts, was moved to the Swedish mapping, cadastral, and land registration authority. In particular, these courts have a large decline in pure TC; for example, Ångermanland had a negative pure TC of 34%. The source of this decline is that cases were moved in the end of 2011, which generated a smaller stock and less incoming cases during 2012–2013. Therefore, less outputs are produced; meanwhile, the inputs are not changed accordingly, even though the mentioned change was known by the courts at least 1 year in advance, indicating a flexibility problem. In contrast to the previous interpretation over time (i.e., that the result is not driven by a few courts), it can be concluded that the significant decline in pure TC during 2012–2013 is driven by district courts where different types of matters where moved to another authority. To investigate the components of EC, its decomposition is now reported.

6.1.2 Decomposition of EC

EC is decomposed into pure EC and scale EC, according to Färe et al. (1994b). The results are reported in Table 4.

Table 4 Decomposition of efficiency change into pure efficiency change and scale efficiency change

EC is positive for 2012–2013 and 2013–2014, respectively. The positive effect has its source in the positive pure EC and scale EC for both periods. This indicates that district courts, on average, become more homogeneous, since their efficiency measures the distance from the frontier. However, based on previous arguments regarding TC, it is not necessarily the case that a positive EC of 5.7% has its source in better performance of inefficient courts. Instead, using the decomposition of MPI, inefficiency is reduced when courts on the frontier move towards the origin, meaning that such courts are closer in distance to the previously inefficient courts. In other words, the positive and significant EC during 2012–2013 is, most likely, due to a movement inwards of the frontier.

During the period 2014–2015, EC is negative, indicating greater heterogeneity between courts; that is, the average court is further away from the production frontier. This effect comes almost equally from both components of EC. Thus, it can be concluded that most changes in TFP, during the last time-period, occur from the different components of EC. As previously described, a decline in EC can also occur from a lower justice demand for courts that do not operate on the frontier, ex-ante. However, it can also occur from organisational issues; for example, if high-skilled employees leave the court and there are difficulties finding replacement staff.

6.2 Correlation analysis

A few of the previous studies argue the importance of incorporating the justice demand to avoid underestimating a court’s TFP. However, as stated, the justice demand should not affect TFP if the inputs are fully flexible; that is, there should be a zero correlation if this is fulfilled. The interpretation of the previously presented result indicates that the MPI and its components are not independent of the changes in workload. In Table 5, the MPI and decomposed factors are correlated with the rate of change in the caseload.

Table 5 Spearman correlations between MPI, TC and EC and the rate of change in the caseload

From Table 5, it can be observed that the MPI and its components, in all cases, are positive. To a large extent, the MPI and its components also have statistically significant correlation with the rate of change in the caseload. A positive correlation can mean either the inputs do not decrease enough when the demand for justice services declines or the employees work harder when the demand increases, generating increased output for the given inputs. Each of these reasons indicate a slack in the courts; that is, more can be produced without increasing the inputs. Schneider (2005) argues that the exclusion of the caseload generates an underestimation of TFP.

However, despite the positive correlation concluded in this section, we argue that the correct measure of TFP is what we reported in the main analysis. Nevertheless, the caseload can, at least, partly explain the results indicating that the inputs are not flexible enough. The positive relationship between the MPI and the rate of change in the caseload is also in line with Beenstock and Haitovsky (2004), who argue that individual productivity increases when the work pressure is high. These results strengthen the previous argument of low flexibility in inputs. However, it should be interpreted carefully, since no causality can be concluded, and other factors are likely to affect the TFP change, which is not included here.

7 Conclusion and policy recommendations

This paper aimed to investigate the development of TFP from 2012 to 2015. The differences in comparison with previous research are: (1) more detailed data are used, which allow the outputs to be weighted based on the hearing time; and (2) TFP is decomposed into four components, in contrast to a maximum of three in the earlier literature.

The findings indicate a 1.7% decline in TFP, which is measured as an annual geometric mean. However, a substantial variation between courts is found; for example, 36–57% of the courts have a negative change in TFP, while 16–36% of the courts have a positive TFP change, depending on the year. The negative TFP change is mainly driven by a decline in TC during the first period. Looking at the components of TC, it can be observed that most of the decline has its source in pure TC that is argued to be assigned to a decline in the caseload. However, the period 2014–2015 has a negative TFP change, occurring from a decline in pure TC, pure EC, and scale EC. Likely, this decline is also due to a smaller demand of justice services. However, the different components are differently affected, depending on where the court operates in relation to the frontier. Furthermore, the correlation between TFP and the rate of change of caseload is concluded to be positive and significant, which strengthens the previous argument and, therefore, indicates a non-sufficient level of flexibility in inputs.

The policy conclusion is that there is room for improvements. A recommendation is that district courts with negative TFP could learn from those with positive TFP in aspects of organisation and internal development of working methods. Furthermore, since the smallest courts have the largest volatility in TFP change, smoother changes can be achieved by merging courts, which would improve TFP. However, merging is, to some extent, constrained by the social and geographical issues that need to be taken into consideration. To avoid this issue, a less controversial policy implication that achieves more flexibility in the Swedish district courts is to develop the back-up labour force, introduced in 2012, to include other personnel than judges. This will allow the inputs to be adjusted when the demand fluctuates, which generates a higher degree of flexibility on the regional level.

In particular, this will enhance the flexibility of the small courts. The smallest courts have close to the minimum number of employees. Small courts are, by construction, more sensitive to changes in the workload, since a small change in the justice demand generates a large share of the percentage. Therefore, the issue of large volatility in the justice demand could, at least, partly be solved by an expansion of the back-up labour force to enhance flexibility. Furthermore, more flexible inputs across Sweden could potentially make it possible to eliminate the requirement of a minimum number of employees in each court. Instead, the volatility in the smallest courts can be served by flexible personnel (e.g., the back-up labour force).

Finally, peer comparisons of courts could be used in many potential aspects of the work for improving efficiency and productivity. For example, differences can be present that are not directly possible to determine in this study, such as organisational problems. This is, however, an aspect that can be taken into consideration in future research.