1 Introduction

In recent years, significant attention has been paid to improving the judicial system’s efficiency, considering different perspectives. Among others, Chen et al. (2024) performed an empirical analysis of the productivity growth of mergers in Swedish district courts, and Gupta and Bolia (2024) developed a framework to identify factors that influence the judicial performance of Indian courts. In general, the European Commission (2022a) highlighted that the high average time of legal trials is still a key challenge because the timeliness of the decisions is a prerequisite for the effectiveness of the protection process for each individual. This inefficiency increases uncertainty among economic actors and represents a deterrent to foreign investments since it undermines confidence in the legal system (Gupta and Bolia 2023; Giacalone et al. 2020; Ippoliti and Tria 2020). Compared to other European countries, in several states, such as Italy, the relatively low efficiency is more serious than elsewhere, and this country has been involved in policy actions aiming to modify the judicial districts’ organisation in recent years (Comi et al. 2021). Nevertheless, strengthening the efficiency of justice represents a thorny issue for several reasons, for instance, the need to harmonise European law and the freedom allowed for countries’ law framework variations (CEPEJ 2021a). Although the selection of the dimensions involved in judicial efficiency assessments should be shared, scientifically reliable, and internationally comparable, there is still no consensus regarding efficiency evaluation via appropriate judicial indicators (European Commission 2022b).

Recent literature suggested various dimensions for this investigation, as well as different methodologies for benchmarking analysis according to the targets set and stakeholders involved in tackling the issues of the judicial system. Azaria et al. (2023) and Gupta and Bolia (2023) highlighted that the performance evaluation of a judicial system usually involves the average number of resolved and pending cases in the courts, and the clearance rate is among the most used dimensions to monitor the performances. Yeung et al. (2022) underlined that the availability of this latter indicator and the disposition time measurements explain their extensive usage in judicial system evaluations. If the improved timeliness of judicial terms is the desired result, then it is important to avoid trading off quicker output for lower-quality judgements. This topic has been debated by several authors, such as Stachowiak-Kudła and Kudła (2023), Yassine et al. (2023), Mocan et al. (2020), and Marciano et al. (2019) considering different perspectives. CEPEJ (2022; 2016) highlighted that a reasonable indirect measure of the quality of the justice system is the appeal rate of first-instance decisions. The people’s right to fair compensation—if the duration of the proceedings does not conclude the iter within a reasonable time—was discussed by, among others, Filomeno and Rocchetti (2019). Financial reimbursement is contemplated in Italy for those who encounter this inconvenience (Ministry of Justice 2021). The ultra-triennial pending cases potentially involved in this financial reimbursement cannot be ignored when evaluating the proceeding features of justice efficiency. In addition to the above-mentioned broad dimensions, context indicators should be involved in assessing the judicial system (Melcarne et al. 2021). Falavigna and Ippoliti (2023) investigated the key role of judicial efficiency in enforcing credit rights and the firm’s behaviour in accessing external financial resources. These authors noted that, in Italy, the judicial delays led firms to resort to alternative strategies (such as tax arrears) to deal with financial constraints, that means a relevant issue considering that Italy presents an extremely high percentage of micro firms with limited access to the capital market due to their size. Furthermore, CEPEJ (2022) and Filomeno and Rocchetti (2019) suggested including in the evaluation process the degree of litigiousness—which refers to the different inclination of people (or companies) resorting to the judiciary—and the incidence of organised crime. An efficiency evaluation model cannot ignore the characteristics of labour input, and Bełdowski et al. (2020), Banasik et al. (2022), and Gupta and Bolia (2024) examined several features connected to the determinants of judges, the court support staff and their productivity. As for the Italian scenario, Falavigna and Ippoliti (2023) and Giacalone et al. (2020) emphasised the significant heterogeneity characterising the courts distributed across the territory. Different further proposals have been suggested in the court’s efficiency analysis literature. The level of complexity of each case (e.g. full legal process or easily resolved proceedings) was investigated by Bogetoft and Wittrup (2021) and Gupta and Bolia (2024). The involvement of different numbers of lawyers and the degree of specialisation and incentives for judges was proposed by Banasik et al. (2022), while Viapiana (2021) and Yeung et al. (2022) investigated the involvement of proxies of financial resources for each court. Castelliano et al. (2023) investigated the usage of information and communication technologies (ICT) for justice systems, which has suddenly increased mainly due to the COVID-19 pandemic.

The current research attempts to investigate the Italian courts’ efficiency, with the authors proposing a methodology that fits the Italian scenario but could also be extendable to different judicial systems. The authors seek to answer the research questions aimed at verifying if (a) there is a significant relationship between the Italian judicial efficiency and specific constructs—performance, quality, proceeding and context features—and (2) the heterogeneity significantly affects the courts’efficiency. The dimensions of the efficiency model have been selected considering the latent constructs that authors identified as relatively more significant than others among the many determinants of judicial efficiency.

Compared to the current literature, as a step forward, the authors believe that the proposed model is a candidate for becoming a sustainable tool for specific policy assessments. Identifying the causes of inefficiency underpinning any reorganisation process represents a key step for policy reforms, and policymakers should refer to unambiguous benchmark measures of efficiency as a valuable tool towards this goal. Furthermore, evaluating the (potential) impact of regulatory measures aimed at reducing each court’s backlog can benefit from an investigation to identify the clusters of courts with similar behaviour patterns.

Applying a two-step procedure—broadly discussed further—represents a methodological contribution to the previous literature in this context. In more detail, based on a consolidated scientific literature stream to estimate the courts’ technical efficiencies, the authors refer to data envelopment analysis (DEA), considering the slacks-based measure (SBM) as the first-step technique. Many research papers and perspectives have recently characterised the variable selection techniques for DEA. In the current research paper, the theoretical statistical model uses the DEA-SBM results in a second-step structural model, in addition to different dimensions. In the second step, the authors propose partial least squares structural equation modeling (PLS-SEM) to evaluate Italian courts’ PLS-SEM efficiency. Both DEA-SBM and PLS-SEM methodologies include several indicators, some of which have already been used to underpin recent regulatory actions, while others will potentially have implications in the near future.

The paper is organised as follows. After the introduction, Sect. 2 presents the literature review and hypothesis development, also considering preliminary specifications on the efficiency of the Italian judicial system. Section 3 presents the methodology. Section 4 describes the results. Section 5 refers to the discussion, and Sect. 6 concludes.

2 Literature review and hypothesis development

2.1 Italian judicial system

The Italian judicial system consists of ordinary courts with jurisdiction over civil and criminal matters. It also includes the administrative, accounting, military and taxation fields, but the structure and efficiency of these latter tribunals are not considered in the current analysis. Courts are organised in three degrees of justice, as follows: the first-instance courts, comprising justice of the peace offices, ordinary courts, penal tribunals, and juvenile courts; the second-instance courts, which include courts of appeal and penal tribunals in the second instance; the third-instance court or Supreme Court (also known as the High Court of Cassation). The territorial organisation of the Italian judicial system encompasses 140 first-instance courts. Each of them has its jurisdiction area and administers both civil and criminal law. Since the territorial organisation of Italian courts does not systematically match the Italian administrative boundaries, additional data connected to the Italian municipalities (about 8,000) have been used in the current research to determine the homogenous and comparable indicators. In this framework, an example refers to the resident population under the jurisdiction of each court. The NUTS3 territorial district codes for Italy refer to 107 provinces, which differ from the court jurisdictions; consequently, each court can include people from diverse NUTS3 areas. It must be specified that, in general, each civil first-instance court receives proceedings regarding territorial competence according to the defendant’s residence, which means that the place of residence determines the jurisdiction. Nevertheless, this general rule allows many exceptions, and there may also be situations in which the defendant may choose different jurisdictions. This perspective is further complicated in the case of criminal trials.

Both civil and criminal sectors are included in the National Recovery and Resilience Plan (NRRP), sharing several objectives and financial resources. The new legal NRRP is the most comprehensive and structured reform in recent decades. It represents an ambitious reform package aiming to increase complementary justice, for instance, via ‘alternative dispute resolution’ tools. Prior to this, the most significant reform in the Italian justice system dates back to 2012. This reform arranged the closure of 31 courts, motivated by the fact that courts with performance indicators below a specific threshold might benefit from economies of scale if merged with bigger and more productive courts.Footnote 1 According to the NRRP, several Italian reforms, such as Laws No. 134 and no. 206 of 2021, were passed to achieve common European goals to reduce the excessive duration of civil and criminal trials at all justice levels and improve the system’s efficiency. Improving telematic innovations and strengthening the office for trial (OFT, also known as the office for the process, broadly discussed further in this paper) represent additional examples of the NRRP’s goals (CEPEJ 2021b; European Commission 2022a). A relevant NRRP objective is the significant reduction of Pinto Law reimbursements. The Pinto Law no. 89/2001 refers to the monetary compensation for violating the right to a reasonable length of a judicial proceeding. It assumes that the risk of paying for the excessive duration of the proceedings should produce efficiency and reduce the backlogs of the first, second and third degree of judgment.

Italy represents a centralised country with a high degree of legislative homogeneity across its territory (Carlucci et al. 2017). Nevertheless, this country is characterised by a significant variation in the efficiency of the judicial system across different courts (Cusatelli and Giacalone 2018). Therefore, the analysis of judicial inefficiency requires a broad investigation of the heterogeneity at a local level for each court. Although civil and criminal sectors cannot be considered separate entities, there are significant differences between handling civil and criminal offences. These differences prevent preliminary and advanced analyses, requiring separate investigations (Peyrache and Zago 2016; Falavigna et al. 2018). Accordingly, the current empirical analysis focuses on the first-instance court cluster, limiting the research to the civil area.

2.2 The relationship between performance indicators and judicial efficiency

As an increasing body of recent studies emphasised, well-functioning judiciaries are crucial determinants of economic performance. This positive impact has long been highlighted by several international organisations, such as the European Union (EU), the Organisation for Economic Co-operation and Development (OECD), the International Monetary Fund (IMF), and the World Bank (CEPEJ 2021a; OECD 2013; World Bank Group 2020).

To evaluate the court’s performance in coping with their in-flow of cases—and compare different systems regardless of their differences and characteristics—CEPEJ (2022) refers to the clearance rate. It represents the ratio between the number of resolved cases and the number of incoming cases in one year. The number of settled cases represents, in general, a recurring dimension in the judicial performance evaluation, as noted by Agrell et al. (2020), Finocchiaro Castro and Guccio (2018) and Mattsson and Tidanå (2019). According to the methodology proposed by the Supreme Court (2014) and Ministry of Justice (2021), an additional wording of its proposed use refers, for instance, to the ‘disposal index’, considering the ratio between the resolved cases and the total of ‘incoming cases’ plus ‘pending cases from the previous year’. Since these dimensions seem sufficient to describe each court’s outputs, they can be considered for evaluating a construct that includes performance features. Therefore, based on the dissertation mentioned above, the first hypothesis is derived as follows:

H 1

There is a significant relationship between efficiency and the judicial performance features in the Italian judicial system.

2.3 The relationship between quality indicators and judicial efficiency

The growing attention on the functioning of the judicial system has also stimulated several studies on different aspects of judgment quality, which should be verified by reducing the court delay to avoid the higher speed potentially being traded off for lower quality. Stachowiak-Kudła and Kudła (2023) noted that judicial quality included independence, accountability and effectiveness of each court, and sometimes citation counts were used in literature as proxies for judicial quality. Mocan et al. (2020) suggested several indicators of judicial quality for ninety-five European countries to evaluate the impact on propensity for dishonest behaviour. In this current paper, the authors address the quality topic by considering that a reasonable assumption is that poor-quality judgments would be more likely to entail an appeal, as suggested by CEPEJ (2016) and Bartolomeo and Bianco (2017) Therefore, the ‘appeal rate’ of first-instance decisions may be considered an indirect measure of the quality of the justice system. In Italy, the second-instance decision encompasses twenty-six courts of appeal, and the first-instance courts are nested within them. Just as the courts do not match NUTS3 Italian territorial districts, in the same way, the courts of appeal do not correspond systematically to the NUTS2 Italian regions.Footnote 2 Accordingly, the appeal rate dimension can refer to the courts falling within each district of the jurisdiction of the courts of appeal. Nevertheless, it must be emphasised that it is not uncommon for an appeal judgment to depend on strategic behaviour rather than the low quality of the first-degree decision. If this opportunistic behaviour were to become prevalent, this would strengthen the standpoint of the appeal independent of the quality of the verdict.

In the current work, the authors also include the disposition time in the quality construct in addition to the appeal rate. It is a commonly used indicator to estimate the timeframe of a judicial system for solving a case (CEPEJ 2021a), and it considers the number of days necessary for a pending case to be resolved by the court. Disposition time is calculated by dividing the number of pending cases during the observed period by the number of decisions each court provides for a specific year t. As discussed further in the paper, disposition time represents an output measurement that needs to be decreased rather than increased. Since the cases that remain unsolved by the court at a given point in time (pending cases) affect the disposition time, Bielen et al. (2015) and Bielen and Marneffe (2017) broadly debated the different values of disposition time in complex—or full—trials. In fact, the case category, the higher impact of plaintiff and defendant, etc., may significantly increase the duration. These authors also noted that the presence of many lawyers might induce further delay since, for instance, they might encourage legal proceedings even for small claims. Based on the dissertation mentioned above, the second hypothesis is derived as follows:

H 2

There is a significant relationship between efficiency and the judicial quality features in the Italian judicial system.

2.4 The relationship between proceeding features and judicial efficiency

The judicial system is extremely relevant to maintaining order in society, regulating private citizens’ disputes, supporting the market’s economic actors, and fostering their growth. Reducing the time required to settle a case and ensure a verdict is delivered is relevant from the citizens’ perspective, and devising effective reforms to improve each proceeding feature is relevant from the policymakers’ point of view. The Italian telematic civil process (PCT, also known as the online civil trial) represents a technological framework aimed at the online remote execution of operations (such as document filings, the transmission of communications and notifications, the consultation of the status of the proceedings, etc.) which in the past were only available by physically visiting the Court chancery. Its use was regulated, among others, by Law Decree No. 193/2009, which extended the legislation on civil PCT to criminal trials. The PCT data for civil areas come from two different databases: SICID, the ‘district information system on civil litigation’ database (civil litigation, voluntary jurisdiction, labour disputes, etc.), and SIECIC, the ‘information system of civil executions’ database (related to bankruptcy proceedings). Recent studies have used several proxies for SICID and SIECIC average durations (Filomeno and Rocchetti 2019; Istat 2021). The current paper considers the dimensions connected to the length of proceedings in the two databases mentioned above, using the average duration (in days) of proceedings defined in ordinary courts for SICID and SIECIC areas, respectively.

The high reimbursement provided by the Italian state in the application of penalties stated in the aforementioned Pinto Law has stimulated various research insights into the importance of monitoring proceedings over three years and more (Calanca et al. 2022; Filomeno and Rocchetti 2019; Sabbi 2018; UPB 2016). In more detail, to ensure that proceedings conclude within a reasonable time, the Pinto Law recognises the right to a fair remedy if the duration of the proceeding goes beyond a certain reasonable threshold, namely that the length of the entire judgment for all three grades should not exceed six years (three years for the first instance court, two years for the appeal court, and one year for the Supreme Court). According to these limits, in the current paper, the authors consider two indicators connected to pending cases potentially involved in Pinto Law. These indicators refer to ultra-triennial pending proceedings that could become ‘Pinto Law cases’, which are not yet classified as such because they do not meet all the requirements in both SICID and SIECIC areas. Based on the dissertation mentioned above, the hypothesis is derived as follows:

H 3

There is a significant relationship between efficiency and the judicial proceeding features in the Italian judicial system

2.5 The relationship between context features and judicial efficiency

The degree of litigiousness characterising a specific area is a dimension traditionally considered in the judicial literature. It involves the number of incoming cases per inhabitant and contributes significantly to the functioning of the local judicial system (CEPEJ 2022). Finocchiaro Castro and Guccio (2018)and Dimitrova-Grajzl et al. (2016) broadly discussed how a specific area’s degree of litigiousness could affect court performance, and the current work considers this attribute. In addition to the degree of litigiousness, Filomeno and Rocchetti (2018) highlighted that the Italian Higher Judiciary Council suggested considering the following dimensions: [a] the number of firms operating in the territory and their concentration for each district, [b] the incidence of organised crime, [c] the number of inhabitants in a specific area.

Regarding dimension [a], the key role of judicial efficiency in market dynamics requires in-depth investigation from multiple perspectives involving economic performance and financial dynamics. These perspectives include access to financial resources, the firms’ growth & investments, and diverse corporate strategies. One of these perspectives considers the impact of the inefficiency of courts in enforcing credit rights on the firm’s behaviour, especially the Small and Medium Enterprises (SMEs). SMEs represent most Italian firms, and Falavigna and Ippoliti (2022b) noted that, in a poor legal environment, (1) firms usually adopt specific payout strategies for relaxing the local capital market and (2) this attitude usually belongs to the SMEs, unlike larger companies. In a different paper, Falavigna and Ippoliti (2023) specifically analysed alternative private limited SMEs’ strategies (such as tax arrears) to deal with financial constraints, considering the inefficiency of courts in Italy. Some evidence of how an efficient legal system impacts economic efficiency through the firm size channel has also been broadly discussed by Dougherty (2014) and Laeven and Woodruff (2007); these later authors noted that the quality of the legal system in Mexico impacted firms’ growth in sectors in which proprietorships predominate more than corporations. Furthermore, Peyrache and Zago (2016) broadly debated the firm concentration in a specific territory and its impact on the workload of the courts. In this present work, the ratio between the number of companies in the district and the population in the area has been used to involve these features.

Concerning [b], Troisi and Alfano (2023) investigated its impact in the Italian criminal courts. EURISPES (2022) proposed an index of permeability to organised crime by combining nineteen composite indexes according to Mazziotta and Pareto’s (2018) methodology, and this current paper considers this index as a proxy for dimension [b].

The dimension [c] cited by the Italian Higher Judiciary Council refers to the number of inhabitants that ‘persists’ in a specific area. According to Istat’s (2020) definition, the population that ‘persists’ in a given area comprises subpopulations of residents, workers, students, and city users. People who move to the areas where services or production activities are located change the physiognomy of both the place of origin and destination (also generating competition between resident and non-resident populations in the usage of resources and services; Eurostat 2019). Therefore, the persistent population should be preferred instead of considering the number of ‘registered’ residents. In the current work, the authors attempted in vain to utilise the persistent population—and different proposals of—dimension [c] proxy, but this indicator was removed from the model as it presented no significant loadings. Based on the dissertation mentioned above, the hypothesis is derived as follows:

H 4

There is a significant relationship between efficiency and the contest features in the Italian judicial system

2.6 Additional efficiency characteristics

Investigating judicial efficiency, Agrell et al. (2020) Dimitrova-Grajzl et al. (2016) and Mattsson and Tidanå (2019), among others, stressed the relevance of the number of judges, staff and other personnel utilised by each court. Voigt and El-Bialy (2016) emphasised that the ratings for the endowment of human resources change significantly depending on the coefficients used in the analysis (for instance, the number of registered proceedings rather than pending proceedings or inhabitants for each judge). Therefore, several methodological proposals have been made available in judicial research papers. According to the CEPEJ (2022) definition, the number of judges for 100,000 inhabitants can be considered, and the authors refer to this characterisation in the current paper. Italy presents the lowest rates among the Council of Europe member states. Ferro et al. (2018) noted that several judge characteristics, such as age and seniority, also impact efficiency. In addition, the negative effect of a high judge turnover on efficiency has been investigated by Guerra and Tagliapietra (2017) since the mobility of judges might significantly influence the speed of the definition of processes, considering that when a judge moves from one office to another, the processes remain pending, awaiting reassignment. Nevertheless, Fabri (2019) noted that these additional features are difficult to assess due to the limited data availability.

Bełdowski et al. (2020) and Gomes et al. (2016) investigated the positive impact on court output of reinforcement in terms of staff. The OFT was first mentioned in Italy in 2012, by the Decree-Law No. 179, and later introduced in the judicial system by Law 114/2014. It falls within the extensive legislation connected to the PCT, a project initiated by the Italian Ministry of Justice aimed at improving the civil court via information technology. Specifically, the OFT is intended to support the judicial offices in reducing the long duration of judicial proceedings. This practice has long existed in several countries (for example, the UK, the US, France, and Spain). The above-mentioned NRRP is part of the Next Generation EU (NGEU) programme and includes an ambitious reform programme encompassing the public administration and justice sectors. Because the OFT is a key part of this programme, special measures (e.g. ministerial decrees) and calls have been fixed in the NRRP to recruit temporary administrative staff for each court. In the current paper, the UPP indicator considers the number of employees in the OFT per 100,000 inhabitants. A potential weakness of this indicator is the difficulty in assessing the contribution of the UPP to date since these employees have only recently started their activity. On the one hand, this measure can be considered as an unavailable dimension proxied by similar job positions that, in recent years, were occupied in each court by staff already and temporarily assigned to the OFT.Footnote 3 On the other hand, this indicator involves a different perspective, targeted to define a proposal for the subsequent (ministerial) decree on staff allocation policies. In terms of viewing everything in perspective rather than an analysis based on already existing entities, this different standpoint will take several years before this structure becomes embedded.Footnote 4

2.7 Heterogeneity of judicial efficiency

Investigating the impact on judicial efficiency of different types of heterogeneity, Falavigna and Ippoliti (2023) and Giacalone et al. (2020) highlighted that a distinct heterogeneity among Italian courts emerged in their analysis. Comi et al. (2021) noted that judicial efficiency positively impacted Italian regions attracting most foreign investments (such as Lombardy) and investigated how the impact of judicial efficiency was affected by geographic and sectoral heterogeneity, in addition to the origin of foreign investors. Accordingly, this paper explores the heterogeneity, as explained in detail in the following sections. Based on the dissertation mentioned above, the hypothesis is derived as follows:

H 5

The heterogeneity significantly affects the courts' efficiency.

3 Methodology

3.1 Research design, data collection procedure and variable descriptions

As mentioned above, this paper utilises several indicators to evaluate the multi-dimensional nature characterising efficiency in the Italian scenario. The data used in this study come from different sources, namely the Italian Ministry of Justice, the Italian National Institute of Statistics (Istat), the Institute for Political, Economic and Social Studies, and Sole24Ore. These databases include indicators usually linked to each Italian court, such as incoming, resolved, and pending cases, but also include dimensions available at different aggregation levels. For instance, as discussed above, the coefficients related to the inhabitants required the preliminary calculation of the residents under the jurisdiction of each court, considering the Italian municipalities rather than NUTS3 areas. Because the number of incoming, resolved, and pending cases depends on the size of the local jurisdiction, and they may also be affected by significant fluctuations over the years, these indicators refer to a period of five years (2018–2022), which also includes the outbreak of the COVID-19 pandemic and the subsequent temporary closure of the courts.

Figure 1 proposes the theoretical framework, showing the step-by-step description of the methods considered in the current research. Several findings result from both SBM and PLS-SEM procedures, and the figure shows the input and output indicators involved in the SBM, and the additional manifest variables (MVs) considered in the PLS-SEM (see, among others Rehman 2023; Rehman and Prokop 2023). Figure 1 also shows that the following constructs were considered in the PLS-SEM: four exogenous determinants—performance, quality, proceeding and context features—and one endogenous variable, efficiency.

Fig. 1
figure 1

Source: Figure by authors

Conceptual model: two-step procedure.

Concerning the SBM, in the current analysis, a relevant feature refers to the involvement of input indicators that need to be increased—or outputs that need to be decreased—to improve performance. For instance, the disposition time represents one output dimension that should be minimised as much as possible. Among the diverse approaches, benefits and drawbacks proposed in the literature to avoid data transformation, the current research considers the usage of this output as input. Dealing with undesirable outputs in DEA represents a common issue, and a critical review of the recent perspectives on this theme has been proposed, among others, by Halkos and Petrou (2019). Furthermore, several dimensions (such as the EFF_UPP and EFF_INC_CAS, see Table 1) initially involved in the PLS-SEM were removed from the subsequent analysis since they showed some weaknesses connected to the model validation. Table 1 presents the dimensions involved in the two-step procedure, their descriptions, and their usage in SMB and PLS-SEM techniques, while Table 2 provides their descriptive statistics.

Table 1 Variables, their descriptions and usage in SMB and PLS-SEM models
Table 2 Summary statistics of dimensions involved

3.2 Data analysis techniques

3.2.1 DEA-SBM

As mentioned above, in the current research, the authors propose using a two-step procedure combining DEA-SBM and PLS-SEM to evaluate Italian judicial systems. In economic theory, efficiency generally consists of managing resources to obtain the maximum output with the available production factors (or using the minimum inputs to produce pre-determined outputs). Therefore, as a first-step analysis, the authors utilise the DEA (introduced by Charnes et al. 1978) to assess the decision-making units (DMUs; the courts in the current research), showing the best practices compared to the entire set of observations. This technique includes radial and non-radial efficiency measurements, considering both input and output orientations. The SBM non-radial DEA has been proposed by Tone (2002, 2001) and Tone and Sahoo (2003). Specifically, the slack variables are directly added to the target function, which includes multiple outputs (and/or inputs) in performance evaluation (Song et al. 2013). SBM can be defined as the following function to be minimised:

$$\begin{gathered} \rho = \frac{{1 - 1/m\mathop \sum \nolimits_{i = 1}^{m} {\raise0.7ex\hbox{${s_{i}^{ - } }$} \!\mathord{\left/ {\vphantom {{s_{i}^{ - } } {x_{io} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${x_{io} }$}}}}{{1 - 1/s\mathop \sum \nolimits_{i = 1}^{s} {\raise0.7ex\hbox{${s_{i}^{ + } }$} \!\mathord{\left/ {\vphantom {{s_{i}^{ + } } {y_{io} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${y_{io} }$}}}} \hfill \\ {\text{subject}}\;{\text{to}} \hfill \\ x_{o} = X\lambda + s^{ - } \hfill \\ y_{o} = Y\lambda - s^{ + } \hfill \\ \lambda \ge 0;\,\, s^{ - } \ge 0;\,\, s^{ + } \ge 0 \hfill \\ \end{gathered}$$
(1)

In the model, \(X=\left({x}_{ij}\right)\epsilon {R}^{m x n}\) and \(Y=\left({y}_{ij}\right)\epsilon {R}^{s x n}\) represents the input and output matrices; \(\lambda\) is a non-negative vector in Rn (\(\lambda \ge 0)\); \({s}^{-}\)Rm and \({s}^{+}\)Rs indicate the input excess and output shortfall (slacks); \(\rho\) denotes the efficiency score for each court.

Commonly used approaches for benchmark analysis include different approaches. Several authors have proposed using DEA in the judicial system, albeit in conjunction with various statistical techniques. Giacalone et al. (2020) and Falavigna et al. (2018) suggested DEA based on the Malmquist productivity index, while Fusco et al. (2021) considered DEA followed by principal component analysis (PCA). Previously, Schneider (2005) and Deyneli (2012) analysed court efficiency via two-stage DEA. Additional DEA investigations on the Italian judicial system have been offered by Nissi et al. (2019), Ippoliti and Tria (2020) and Falavigna and Ippoliti (2022a, b). The DEA approach has also been recently used to analyse the Swedish system (Agrell et al. 2020; Chen et al. 2024; Mattsson and Tidanå 2019). Voigt and El-Bialy (2016) presented a comprehensive overview of selected court efficiency studies. The wide usage of radial and non-radial DEA statistical techniques in previous studies of court performance has also been corroborated by a preliminary bibliometric analysis performed by the authors of the current research using the Scopus database and the bibliometrics tool performed by Aria and Cuccurullo (2017). A broad discussion on this feature is beyond the scope of this research, but details on the search query are available upon request.Footnote 5

3.2.2 PLS-SEM

As a second-step analysis, the authors utilise PLS-SEM to include different latent constructs representing diverse dimensions of the phenomenon to be measured (Lauro et al. 2018; Tenenhaus et al. 2005). Unlike the covariance-based SEM-LISREL (linear structural relations) algorithm, PLS-SEM consists of a system of ordinary least squares regressions to calculate the measurement (or outer) and the structural (or inner) models (Hair et al. 2021). The outer model considers the relationships between the MVs and their respective latent variables (LVs), while the inner model explains the relationships between the LVs. PLS-SEM can be described via a system of matrix equations based on a specific path diagram illustrating the relationships among the dimensions. A detailed discussion on how latent variables reflect the nature of the phenomenon with the conceptual model has been presented by, among others, Maggino (2017). The typical research effort generally aims for a suitable synthesis of directly measurable elementary indicators to describe a multi-dimensional phenomenon (Crocetta et al. 2021).

Bhatia and Kumar (2022), Rasool et al. (2023) and Rehman et al. (2023) emphasised that PLS-SEM represents an appropriate technique to test the hypothesis with no normally distributed data, mainly when a study focuses on the development of theory in formative and exploratory analyses. Rehman et al. (2021) and Rehman and Zeb (2023) highlighted that PLS-SEM offers a systematic mechanism for validating relationships among different constructs—also dealing with complex models—and predicting relationships between the latent variables. The choice of PLS-SEM in this current study over other methods was due to the nature of the research problem and hypothesis, in addition to the predictive nature of the study, which encourages the usage of the PLS-SEM as a suitable technique to investigate the Italian judicial system using the SBM in the first of the two-step procedure.

3.2.2.1 Two-step procedure and heterogeneity

Many varied methodologies can synthesise multi-dimensional phenomena, and these procedures refer to simple or complex aggregation techniques. As mentioned above, the current research proposes combining DEA-SBM and PLS-SEM algorithms among all the possible statistical methods. The application of the combined model in the current study is also justified by SBM recommending actions to improve inefficient courts. Still, it is incapable of suggesting measures for improvement in efficient courts. Therefore, SEM analysis can suggest developmental measures for efficient and inefficient courts (Seth et al. 2021). Kalapouti et al. (2020) and Zhu et al. (2019) discussed the benefits of using the DEA technique in the first-stage analysis to assess coherent efficiency estimates instead of considering inputs and outputs directly in the second-step model. Since each measure indicates how far each court is from the efficient frontier in the DEA-SBM, several new dimensions must be defined in the second-step analysis, considering the slack variables as proposed by, among others, Jiang et al. (2020), Jiang et al. (2016), Lundgren and Zhou (2017), Stępień et al. (2021) and Quintano et al. (2020), According to this perspective, in the current work, the authors associate, for instance, with the ‘number of judges’ dimension (JUD), a new dimension, EFF_JUD, which assumes the following definition:

$$EFF\_JUD = \frac{{JUD - JUD~\hbox{``}Slack\hbox{''}~\left( {or~JUD\hbox{''}~target\hbox{''}} \right)}}{{JUD}}$$
(2)

where JUD” Slack” indicates the input excess in JUD measurement.

The appropriate selection of inputs and outputs of judicial systems is also debated because it has consequences for estimating the benchmarks. Although several indicators have been proposed in the judicial literature, and some are used more frequently than others, as mentioned above, there is no standard model definition to evaluate judicial efficiency. Amon others, Ippoliti and Tria (2020) discussed the literature on the matter, considering the inputs and outputs proposed, the judicial systems investigated, and the mathematical programming techniques used.

Concerning the SBM, the efficiency estimates do not change due to positive linear transformations and do not depend on the measurement scale used for the different inputs and outputs (Bogetoft and Otto 2011). In addition, in the present empirical investigation, the output orientation appears more fitting because this is coherent with the efforts to maximise the output selected for each court. Unlike the SBM, in the second-step procedure (PLS-SEM), a normalisation procedure is required to transform the indicators into dimensionless values since the dimensions present different units of measurement and scales. Among various normalisation methods, this paper utilises min–max normalisation, which represents a rescaling procedure that performs a linear transformation that preserves the relationships among the original data values, as follows (Khalid et al. 2020; Mazziotta and Pareto 2018):

$$Z=\frac{X-min(X)}{max\left(X\right) -min(X)}$$

Figure 2 shows the path diagram considered in the PLS-SEM. It has five LVs, fifteen MVs, and refers to a reflective-formative structural model.

Fig. 2
figure 2

Source: Figure by authors

Path diagram: PLS-SEM Path Modeling (PM).

The path diagram shows that QUALITY, PERFORMANCE, and CONTEXT FEATURES, respectively, have three MVs. PROCEEDING FEATURES include four MVs, while EFFICIENCY only involves two MVs. All constructs, except CONTEXT FEATURES, include target-efficient measurements calculated in the first step of SBM. Several additional clarifications for the model proposed in the current work are necessary for handling heterogeneity. PLS-SEM implicitly assumes that the data stem from a single homogeneous population, but this often unrealistic assumption can lead to incorrect conclusions (Hair et al. 2021). Multigroup analysis (MGA, also known as the multi-sample approach) is used to check whether the model estimates (path coefficients and loadings) differ between several groups of courts in this research. A typical situation refers to a categorical moderator variable consisting of K categories or groups. This ‘observed’ heterogeneity can be investigated through variables that reflect known a priori socio-economic (and/or demographic) features, such as the possible differences between two groups for females and males. In addition to the observed heterogeneity, a model-based segmentation can detect ‘unobserved’ heterogeneity by performing a group-specific analysis. The theme of unobserved heterogeneity in PLS-SEM has been widely debated. PLS-SEM research offers a series of latent class techniques to address this issue to identify and treat unobserved heterogeneity (Sarstedt et al. 2022, 2017). Finite mixture (FIMIX; Hahn et al. 2002) and response-based unit segmentation (REBUS; Esposito Vinzi et al. 2008) represent two PLS-SEM methodologies. The current research utilises the latter approach, which estimates the global model in the first step. In the second step, this technique uses the residuals from inner and outer models to (a) perform a hierarchical clustering algorithm (CA) and (b) fix specific PLS-SEM groups. Among others, Quintano et al. (2020) and Sarstedt et al. (2011) have broadly discussed this theme. Similarly, in frontier analysis, heterogeneous factors can be addressed using meta-frontier analysis (for a more extensive discussion of these topics, see Battese et al. (2004) and Beltrán-Esteve et al. (2014)).

4 Results

This current study uses a two-step procedure to analyse the collected data and validate the proposed research model. Regarding the results of the first-step SBM, 58 courts are shown to be efficient, considering the output-oriented variable returns to scale technical efficiency (TE_EFF). These courts are listed in Fig. 3 according to the number of times they appear in the reference sets. Each reference set includes efficient courts, which dominate (peer) inefficient courts. Savona appears in the reference set 45 times, Trieste appears only once, while the remaining efficient courts (Asti-Verona) do not appear as a peers. Figure 4 shows the remaining 82 courts with efficiency lower than one.

Fig. 3
figure 3

Efficient courts: times appearing in reference sets

Fig. 4
figure 4

The 82 inefficient courts

Figure 5a and b show the graphical representation of all 140 Italian courts and the 58 efficient courts, respectively. The analysis reveals a prevalence of efficient courts in the north of the country.

Fig. 5
figure 5

a All 140 Italian first-instance civil courts. b First-step SBM: the 58 efficient Italian first-instance civil courts

Concerning the second-step procedure, each SEM is composed of two sub-models: the measurement (outer) model concerns how the manifest variables are linked to the corresponding latent variable, and the structural (inner) model concerns the relationships among the latent variables (Kaplan 2009). Identifying the systematic structures in complex systems with many variables requires specific validations. Since there is no global fitting function to assess the goodness of the PLS-SEM, it is essential to evaluate the model through different fit indices.Footnote 6 Table 3 presents the checks for the homogeneity and one-dimensionality of the constructs, using three main indices: Cronbach’s α, Dillon-Goldstein’s ρ (or Jöreskog’s ρ, better known as composite reliability ρ), and PCA eigenvalues. These measures confirm that the model assumptions seem appropriate (the values for Cronbach’s α and Dillon-Goldstein’s ρ are greater than 0.7, and the first eigenvalues are greater than one for all the LVs). Table 3 also shows the average variance extracted (AVE) to validate the convergent validity. AVE for each construct is greater than 0.05, meaning that the construct explains at least half of the variance of its observed variables. The cross-loadings represent the loadings of indicators with the corresponding constructs. These coefficients verify that the shared variance between a construct and its indicators is larger than that of other constructs and presents satisfactory results (Fig. 8 in the Appendix shows the corresponding results). Among others, Hair et al. (2021) discussed some weaknesses in using cross-loadings. The discriminant validity metric measures the extent to which a construct is empirically distinct (not too highly correlated) from other constructs in the structural model, and Fornell and Larcker (1981) proposed comparing the correlation between the construct and the square root of the AVE for that construct (the former should not be larger than the latter for that construct).Footnote 7 In the current paper, the discriminant validity results are satisfactory since the values are smaller than the AVE thresholds. Radomir and Moisescu (2019) broadly discussed the Fornell-Larcker method. As an additional/alternative index of discriminant validity, Henseler et al. (2015) proposed the Heterotrait–monotrait ratio (HTMT), also suggesting a threshold for this index (almost 0.85/0.90 for structural models with constructs that are conceptually very similar). An HTMT value above this threshold indicates that discriminant validity is not present (see, among others, Rehman et al. 2023 and Rehman and Zeb 2023). In the current paper, the results of HTMT have ensured the soundness of discriminant validity, while the variance inflation factor has been used to assess the multicollinearity among constructs.

Table 3 Block unidimensionality and overall model quality. Source: authors’ calculations

Table 3 also reports the main indices indicating the overall model quality: the R2 coefficient, the communality, the redundancy index, and the goodness of fit index (GoF; Tenenhaus et al. 2005). The R2 coefficient shows that the explanatory LVs well predict the endogenous LVs, while the values of the communality and redundancy indices are appreciably higher for all blocks (a value of 0.50 indicates a sufficient degree of construct validity). The GoF shows an absolute value of 0.768 even though the relative value is only 0.533.

The outer estimations are shown in Table 4. The loadings are all positive and statistically significant. Removing several variables (initially involved in the model) from the analysis was necessary since they presented no significant loadings.

Table 4 Outer estimations. Source: authors’ calculations

According to the assumptions mentioned above, in the present research, the following general equation has been considered:

$$EFFICIENCY=f(PERFORMANCE, \,QUALITY,\, PROCEEDING\,FEATURES,\, CONTEXT\, FEATURES)$$
(3)

To assess the significance of the path coefficients, Table 5 shows the values and the significance of the structural coefficients.

Table 5 Inner estimations. Source: authors’ calculations

Based on the finding shown in column 2 (standardised global path coefficient), three path coefficients are positive (PROCEEDING FEATURES, QUALITY, and PERFORMANCE), but only two are statistically significant (the p-values are in parentheses). Therefore, when considering improvements in the indicators included in the QUALITY and PERFORMANCE latent blocks, this positive relationship confirms the model’s validity in monitoring the level of courts’ efficiency. In the global model, PROCEEDING has a positive impact but is not statistically relevant, while the CONTEXT FEATURES construct has a significant but negative impact on the PLS-SEM EFFICIENCY. This negative relationship was expected, considering that this latent construct includes indicators that are proxies of context variables that ‘complicate’ the work of the courts in their respective areas. PROCEEDING FEATURES—which assumes a positive but not significant coefficient—involves the Pinto Law risk proceedings (EFF_POT_U_PENSIC and EFF_POT_U_PENSIE). Since this finding seems to lessen the relevance of the involved MVs, further investigation is required in the discussion section. The endogenous variable EFFICIENCY includes two control variables. One of them, namely TE_EFF, represents the efficiency scores estimated in the first stage, which become the dependent variables of the PLS-SEM proposed in the second stage, according to the approach specified by Simar and Wilson (2007); the second MV refers to EFF_JUD. Both positively affect the efficiency and—according to the model’s assumptions—their positive correlations confirm the consistency of the analysis.

Indirect paths (that originate from the possible causal relationships between the 15 MVs and various LVs not directly connected and that can impact the PLS-SEM EFFICIENCY) are not significantly different from zero. An exhaustive analysis of additional features connected specifically to the PLS-SEM (and DEA-SBM)—such as interaction effects—is not discussed in detail in the present work—even though it was addressed in the preliminary analysis—since it appears to go beyond the scope of this paper and it is has been broadly discussed in previous authors’ research papers (see Quintano & Mazzocchi, 2020).

To investigate the observed and unobserved heterogeneity in the courts’ data, Table 5 reports the results comparing groups based on (a) court size, (b) efficiency, and (c) REBUS-PLS. Concerning the observed heterogeneity, PLS-SEM offers two approaches for comparing model estimates across groups: permutation and bootstrapping (the latter is considered in this paper).Footnote 8 The authors consider a significance level (alpha) of 0.1 to be enough to display significant path coefficients. The grouping variable ‘court size’ represents the first dummy-coded variable in the first multigroup comparison.Footnote 9 Table 5 shows the path coefficients for the samples containing the courts belonging to ‘Group 1’ and ‘Group 2’, in addition to the whole sample (‘Global’) and the bootstrapped t-test based on these estimates. The results indicate that the path coefficients are not significantly different between the two groups. The result does not change even if the codes assigned to the various dimensions of the courts are structured differently through diverse simulations. The analysis can be in-depth, but at this point, the court size does not seem to impact the path coefficients. Further discussion of this not significant outcome is required in the next section.

The scenario changes when considering a dummy related to efficiency rather than the size of the court. In fact, considering a dummy that is equal to one for courts with an efficiency greater than 0.7, a significant difference (at the 10% level) emerges between the CONTEXT FEATURES coefficients of the two groups. In Group 1, the value (0.080) of the coefficient is almost null, highlighting that the context indicator does not affect EFFICIENCY. Conversely, in Group 2, the impact of this indicator increases its negative relevance (–0.153) on the PLS-SEM EFFICIENCY.

Concerning the unobserved heterogeneity, the REBUS-PLS algorithm generates the dendrogram shown in Fig. 6. The preliminary CA suggests considering two court clusters to evaluate the model parameter variations.

Fig. 6
figure 6

Dendrogram: preliminary clustering algorithm (CA) for the latent variables’ score

At the end of the REBUS-PLS procedure, the first cluster includes 33 courts, while the second consists of 107 courts.Footnote 10 Table 5 also indicates the path coefficients resulting from the REBUS algorithm. Considering the impact on PLS-SEM EFFICIENCY, the multigroup comparison reveals two coefficients (LVs QUALITY and PERFORMANCE), which present significant differences at the 10% level. Furthermore, Table 5 shows that the group quality index (GQI) is higher than the GoF value calculated for the global model, confirming that the new model with two clusters performs better than the global model, ensuring the unobserved heterogeneity in the court data.Footnote 11

In general, it can be seen that the courts contained in Group 1 are mainly located in the north of the country. As mentioned above, although the courts of first instance and the courts of appeal do not correspond precisely to the Italian territorial NUTS3 and NUTS2 districts, the representation proposed in Fig. 7b displays, for each region, the average PLS-SEM EFFICIENCY. Figure 7a shows the graphical representation of 33 courts in Group 1.

Fig. 7
figure 7

A-B. Graphical representation of (A)the 33 courts included in Group 1 and (B) the average PLS-SEM EFFICIENCY for each region included in Group 1

5 Discussion

The judicial system represents a relevant sector of the public system, and there is a great need to improve the performance of Italian courts since this system is one of the worst in the EU. A well-functioning judiciary system is a crucial determinant of economic performance, and the current research attempts to investigate the court’s efficiency of the Italian scenario. The results allow helpful considerations for specific policy assessments and contribute to the literature on the topic. The preliminary result is the rating of courts according to their technical efficiency, which emerged from the SBM first-step procedure. Still, the ranking is not too different from that of similar recent studies which place the more efficient courts mainly located in the north of Italy. Nevertheless, beyond this ranking of courts, the investigation of the relationships among the constructs defined in the second-step procedure, also considering the first-step SBM results, provides the answer to the research questions of this study. The PLS-SEM investigates the relationship between judicial efficiency and performance, quality, proceeding and context features and then analyses whether heterogeneity significantly affects the courts in the Italian judicial system. Significant relationships support the hypothesis H1, H2, and H4 (concerning performance, quality and context features, respectively). As for the performance construct, the findings are in line with the prevalent literature. It includes indicators traditionally used as output measurements in judicial performance evaluations, and policymakers have already fixed these pillars, ensuring national and EU-level comparisons (CEPEJ 2022). Regarding quality features, it involves, in addition to established indicators, a dimension that refers to the assumptions of CEPEJ (2016) and Bartolomeo and Bianco (2017) that poor-quality judgments would be more likely to entail an appeal if opportunistic behaviour does not prevail. Since the quality features impact efficiency, the findings suggest that policymakers should pay -among others—greater attention to the accuracy of first-instance sentences to avoid increasing appeal rates.

Results do not show evidence to support H3. The corresponding construct, the proceeding features, involves dimensions connected to the Pinto Law risk proceedings. Authors have paid particular attention to these indicators which have been proposed in the form of pending cases potentially involved in the Pinto Law. Still, their non-significant coefficients are against the expected trend. Nevertheless, the MVs associated with this construct may need to be slightly modified to better characterise the backlog features. A hypothesis could be the inclusion in the model of a proxy connected to reimbursements by the Italian state as a consequence of the Pinto Law. An indicator of this cost does not appear in the model due to the unavailability of data. As for H4, context features negatively impact efficiency. The corresponding construct includes an indicator connected to the firm concentration, but this dimension does not capture the relationship between judicial efficiency and market dynamics (Falavigna and Ippoliti 2022b), involving only the consequences of firm concentration on the courts’ workload, as already debated by Peyrache and Zago (2016). Therefore, the indicator presents a (negative) effect similar to the degree of litigiousness and the index of permeability to organised crime. Still, different proxies also need to be included in further analysis.

Findings support that heterogeneity affects the courts' efficiency (H5) and offers relevant evaluation leverage. Some clarifications are necessary. Adding the observed heterogeneity and distinguishing courts according to the dimensional aspect seem not to be relevant. Nevertheless, the dimensional feature instead inspired the past reforms. In more detail, the 2012 reform in the Italian justice system arranged the closure of many courts, motivating the choice for the benefit of economies of scale resulting from merging larger and more productive courts.

Considering the efficiency level of each court, less efficient courts appear to suffer from the context indicators, while these dimensions no longer affect efficiency courts. Since territorial differences were confirmed by the unobserved heterogeneity analysis—which also confirms that more efficient courts are mainly located in the north—this finding seems to suggest differentiated policy interventions targeted at the courts operating in different areas.

5.1 Practical implications

Finding the main drivers of inefficiency and clear benchmarks can effectively support policymakers in implementing specific interventions in the national justice system. The findings of this study imply that policymakers should implement strategies that consider the indicators proposed in performance, quality and context constructs, also considering that heterogeneity suggests differentiated policy interventions. In addition to leveraging indicators traditionally used for monitoring courts, it is certainly helpful to consider the peculiarities of each indicator used. One example refers to the conditions that can improve the quality of first-instance sentences, reducing the appeal rate in each court. One can consider the number of employees (UPP) in the OFT and the recent ministerial decree connected to the staff allocation policies. As mentioned above, although the findings confirm that the number of judges increases the courts’ efficiency, the evidence does not allow similar conclusions for OFT staff, who directly support judges. The corresponding variable (EFF_UPP) has no significant loadings in the PLS-SEM approach. Despite this validation problem, it is necessary to emphasise that the results of the first-step model (SBM) indicate an optimal value for the OFT personnel, and these suggestions are helpful for policymakers involved in the organisation of court personnel.

An additional example refers to the indicators connected to the Pinto Law risk proceedings. No clear findings can be drawn concerning the indicator connected to pending cases potentially involved in Pinto Law. Nevertheless, monitoring the ultra-triennial pending proceedings that could potentially become ‘Pinto Law cases’ represent a relevant feature for policymakers to implement specific interventions to control public expenditure.

5.2 Theoretical implications

In the author’s opinion, the two-step procedure involving DEA-SBM and PLS-SEM methodologies represents a novelty in methodological contributions to the judicial literature. In addition, combining traditional efficiency indicators and the new dimensions mentioned above (that could potentially have implications in the future) might represent a relevant tool to improve Italian courts’ efficiency, which could also be extendable to different judicial systems.

6 Conclusions, future directions and limitations

This work intends to examine the Italian courts, investigating Italian judicial efficiency and specific constructs identified as relatively more significant than others, also involving the heterogeneity that significantly affects the civil Italian judicial system. The study results provided empirical evidence that significant relationships exist and these relationships are affected by heterogeneity. Identifying (1) the causes of inefficiency and (2) the clusters of courts with similar behaviour patterns represent relevant challenges, especially since the NRRP represents a valuable tool for tackling the challenges of justice reform. The evidence presented in this work attempts to contribute to the debate on the intervention leverages.

The current empirical work represents a data-driven statistical exercise using data from several sources. Nevertheless, the paper should be read with caution due to several limitations. Many weaknesses have already been described in the previous sections, and require further reflections in the future, with reference, for example, to the OFT staff and to the data quality. An additional limitation of the analysis is that it does not consider the degree of digitalisation and the ICT level across the Italian judicial system. Since using new technologies, ICT equipment, and personnel skills represent relevant dimensions, the involvement of suitable proxies for these indicators will be necessary. The current evaluation considers data coming from the period just before the introduction of the “Cartabia Reform” (28 February 2023). The insights of this paper concern only civil courts, while criminal courts are not considered in this analysis. Therefore, future research must incorporate these features and include different pillars, such as artificial intelligence tools and big data analysis.