1 Introduction

The improvement of the quality of justice systems is embedded in European Union policy (European Commission 2015, the 2015 EU Justice Scoreboard, COM (2015) 116 final, foreword), but reforms aimed at introducing the possibility of evaluating the work of courts or judges interfere with judicial independence, and make such reforms highly controversial (Coman 2014; Mak and Taekema 2016). The ‘judge is independent when she can take decisions based on her own preferences and interpretation of law’ (Aydın 2013, 108). Judicial independence refers to the ‘impartiality’ of a judge and means that he or she makes decisions based on law and fact (Shapiro 1981). Judicial independence also refers to protection against political interference that could affect the impartiality of a judge (Fiss 1993). In their evaluation of the work of a judge, the evaluator may be tempted to use their power to achieve current political goals. Thus, the position of the courts is dependent on the executive. On the other hand, ‘judicial independence without accountability can result - in practice - in a purely self-regulated judiciary, insulated from society and unresponsive in terms of performance and good practice’ (Garoupa and Magalhãe 2021, 695). A lack of judicial accountability makes it difficult to identify the poor functioning of courts, thus preventing the improvement of court processes.

Contemporary research has focused mainly on measuring the different dimensions of quality of the courts performance. The dimensions of court quality as discussed in scholarly debates comprise their independence (Ferejohn and Kramer 2002; Haggard et al. 2008; Clark 2009), accountability (Garoupa and Ginsburg 2015; Garoupa and Magalhãe 2021), effectiveness (Helfer and Slaughter 1997; Larsson et al. 2017), efficiency (Ippoliti and Tria 2020) and productivity (Cammnitiello et al. 2017).

The perceived prestige of courts remains separate from the above. Prestige encompasses the vast number of assessments used to determine the esteem and respect that a court has (e.g. Corley et al. 2011). When we speak of a court as being prestigious, we mean that judges from other courts think highly of it (see e.g. Klein and Morrisroe 1999, 372). In the existing literature, this idea is sometimes referred to as ‘influence’ or ‘reputation’. ‘Influence’ is defined as ‘the extent to which the actions of one person have an effect on the views or behaviour of others’ (Klein and Morrisroe 1999, 372). Hence, studies using databases of citations to evaluate the ‘judicial influence’ refer to the work of judges, not courts (see Landes et al. 1998; Anderson 2011), while ‘reputation’ includes an esteem component and is perceived as ‘judicial quality’ (Garoupa and Ginsburg 2010). The concepts of prestige and reputation are closely related, while influence - although similar - is more closely linked to the concept of authority. Prestige and reputation also play similar roles to each other: they provide information to the public about the perceived quality of the judiciary and promote respect for the profession of a judge. Moreover, some courts exhibit a ‘persuasive authority’ (Slaughter 1994; McCrudden 2000; Wind 2016), which denotes the impact that an individual court (e.g. the Supreme Court) has on other courts.

Academics have used citation counts as proxies for judicial quality (see, e.g. Landes et al. 1998). Choi et al. (2009) have since developed an alternative approach that considers not one but three important elements of judicial performance, namely citations, independence and productivity. However, these studies are specific to individual judges, not courts, and are not adaptable to non-precedential court systems.

In this paper we addressed the problem of assessing court’s prestige without interfering with the judicial independence of the courts and judges. We concluded that it is possible through an analysis of the patterns of court citations in the selected group of controversial cases. The analysis focuses on two related issues. The first refers to the characteristics of the courts, the selected group of similar cases and circuits in order to find the reasons for the use of citation from other courts. The second investigates the characteristics of courts and circuits which explain the reasons of citation from a given court. In the first approach, we answered the question of what exactly determines the citation of another court. This helps us to find the courts which are cited when a citation should be taken from a high prestige court (it includes a citation together with a citation of the Supreme Administrative Court (SAC) and a citation when the value of the claim is high). The methods applied here involve logistic regressions. This was based on the assumption that judges have the best understanding of the factors leading to the improvement of courts and took into account all the relevant considerations.

In the second approach, we measure the prestige of courts directly from the number of citations of a given court and subsequently explain it with other variables. In this approach, we cannot use the characteristics of the cited cases to explain the choice of a court because cited cases are not of the same type and they do not possess common characteristics. The method applied to this approach involves negative binomial regression. It is worth emphasizing that we measure the subjective or perceived prestige of courts, not their objective quality. We assume that the more a given court is cited (under the condition that citations are positive—that is, when a court agrees with the opinion of another court), the greater its prestige. We believe that better courts are cited more frequently. We did not investigate the influence of courts, because this would require an assessment of whether the opinion of a given court positively affected the actions and decisions of another court. Our analysis focused on lower administrative courts; these courts do not exhibit persuasive authority (because they are equal to one another in terms of their formal power) but can be perceived as prestigious by other courts at the same level. The proposed method can be extended to different types of courts, judges and decisions according to the availability of data. We focused on the courts, because the names of judges are not given in citations, and only the case number and the name of the court identify verdicts.

For which kind of justice system is this analysis particularly useful? The measurement of prestige is especially important when judges are free to cite the decisions of other courts which support their arguments. This situation is typical in the doctrine of jurisprudence constante, where different lines of authority can emerge, and is less common in the doctrine of stare decisis (typical for common law), where precedents are crucial for reaching a verdict. In the case of jurisprudence constante, judges may choose the lines of authority they wish to follow. With stare decisis, however, the free will of judges in the citations is limited.

We assessed prestige in the sample of all 16 provincial administrative courts in Poland; these operate under the jurisprudence constante doctrine and are able to follow different lines of authority.

We asked ourselves what are the political implications of our proposal for the improvement of the justice system? Making an assessment of a court’s prestige could help to identify the factors causing some courts to function better than others, and may enable politicians to create a favourable environment for improving the quality of those courts and judges perceived as being inferior.

The paper is structured as follows: in the first section, we discuss the reasons for the citation of other courts’ verdicts; then we describe the system of administrative courts in Poland and the characteristics of the cases used. The next section describes the data and variables used in the regressions; and finally, we present the results of logistic regression for the whole sample and the significance of the results obtained for individual courts. Subsequently, we provide the results of zero-inflated negative binomial regression. The paper ends with a brief discussion of the implications of the study for judicial policy.

2 Literature review and hypotheses

2.1 The reasons for citations

Judges cite other cases because they are expected to resolve similar cases with consistency. This is aimed at promoting efficiency, fairness and predictability (Schauer, 1987; Lindquist and Cross 2005). The analysis and citing of previous decisions ‘serve to economize on the costs of decision-making’ (Harnay and Marciano 2003, 408). Berlemann and Christmann (2020) proved that citation of previous verdicts by court decreases the time of case consideration. By citing other cases, judges demonstrate to the litigants that their decision is not isolated and is strongly based on existing judgments (Fronk 2010; Ződi 2015). Judges also take this action in order to minimize the chance of reversal of their judgments by a higher court (Smith and Tiller 2002). Reversal decreases the impact of the judge’s verdict in subsequent cases and tarnishes the judge’s reputation. Berger and George (2005) found that a judge may also prefer to cite the verdict of a judge she knows and trusts. Schmid et al. (2021) have proved that courts adopt the relatively new “split-the-difference” jurisprudential approach requiring citations of both close and ideologically distant cases. Citation of cases from the same court (internal citations) makes a court more influential in the future (Landes et al. 1998).

‘Judges do make citation decisions based on the information that is communicated in majority opinions and are less likely to cite cases that appear unimportant’ (Hume 2009, 143). Our observations based on the cited cases suggest that judges tend to cite those verdicts corroborating their opinion and those with which they wholeheartedly disagree. The latter is cited especially when one of the litigants has used them in the case. Both kinds of citations are important. For example, research on the jurisprudence of circuit courts has shown that negative citations relating to Supreme Court judgments in the United States can affect the ability of the Supreme Court to reconsider resolved cases (McMillion and Vance 2017).

The analysis of citations makes it possible for us to assess the prestige of a court. Judicial prestige can be measured at the court circuit level (Hume 2009) or by examining individual judges (Posner 1990; McCormick 1996a; Landes et al. 1998; Klein and Morrisroe 1999; Smyth 2000; Smyth and Bhattacharya 2003; Fronk 2010; Anderson 2011; Bowie and Savchak 2022; Curry and Miller 2016) focused on the specific features of a case affecting the likelihood of the citation regardless of the authorising judge.

2.2 The hypotheses

We have focused on the verification of the following hypotheses:

Hypothesis 1

While all provincial administrative courts are formally equal, some courts are perceived as being more prestigious by other courts.

Prestige courts should be frequently cited (by other provincial courts) and cited together with the citation of SAC sentences (as the SAC has the highest prestige as a superior court) or cited more frequently when the value of the claim is high (to discourage an appeal).

Judges would prefer to cite verdicts from a higher-instance court such as the Supreme Administrative Court, because the SAC is the highest administrative court with the most persuasive authority in Poland. The citation of another provincial administrative court together with the verdict of the SAC elevates the rank of the lower court since it is juxtaposed with the judgments of the most prestigious court. Courts perceived as being prestigious should also be cited more frequently if the value of the claim is high because such a citation makes the appeal of the losing litigant to a higher court less likely. The high value of the claim is a proxy for the litigants’ willingness to appeal and challenge the lower court’s judgment. For a low-value claim, it is not necessary to cite prestigious courts because in this case, the probability of an appeal is low as it imposes a cost on an appellant.

The frequency of citation was considered when assessing the relative validity of cases while the logistic regression on the interaction was used to check the fulfilment of the two other conditions.

Hypothesis 2

The courts located in provinces with greater populations are cited more often.

It has been hypothesized that the courts in highly populated provinces should possess higher prestige. Caldeira (1983) found that courts with greater experience in resolving cases, as well as those in areas with higher GDPs and with greater populations, are more likely to be cited. Simply put, courts located in larger provinces process higher numbers of cases; therefore, a greater number of cases is available for citation. Larger provinces tend to contain universities with a long tradition of legal education.

Judges from such provinces with a long tradition of legal education will be likely to issue more nuanced legal reasoning. Consequently, the opinions of these judges will be attractive to their peers in other courts. This expectation is in line with works by Landes et al. (1998) and Berger and George (2005).

This hypothesis is verified through zero-inflated negative binomial regression.

Hypotheses 3 and 4 are included because the need for use of external citations differs in importance among courts and circuits. These differences affect the measure of prestige approximated by citations, so they should be taken into account.

Hypothesis 3

The judges in smaller courts cite other courts more willingly.

If judges are uncertain of their verdicts and cannot find support from their colleagues from the same court, they will be more willing to cite cases from other courts to support their decisions. This situation is most likely in small courts with a low number of processed cases, low number of judges and where judges have less experience in adjudication.

Hypothesis 4

If the quality of tax collection agency work in a province is low then the citations of other courts should be more frequent.

If a tax collection agency processes administrative cases ineffectively then the assessment of their work raises doubts amongst judges. In this situation, judges rely less on the argumentation of the tax collection agency. External citations become very useful in justifying courts’ verdicts, so the frequency of citations should be higher.

The last two hypotheses are verified with logistic regression and logistic regression on interactions.

3 Administrative justice in Poland

Administrative courts in Poland provide one basic service typical for European courts, namely dispute resolution (for the ‘services’ of courts, see Landes and Posner 1979, 236–242). It should be emphasized that these courts do not provide the service of judicial law-making, as is provided by American courts. The decisions of the courts in Poland are not legally binding precedents so the principle of stare decisis does not apply to their verdicts.

Administrative courts are special courts whose system differs from that of common courts. The judicial power is divided between 16 provincial administrative courts and one Supreme Administrative Court (SAC), which is an appeal court. The judges of administrative courts are appointed by the President of the Republic of Poland upon the request of the National Council of the Judiciary.

The judicial panel of the administrative court comprises three judges in an open session. In matters concerning procedures, only one judge is involved. A judicial decision is written by the judge-rapporteur. In an adjudication practice, the judge-rapporteur reports the case to the other members of the panel. Every judge is obliged to investigate the case. The panel of three judges takes a joint position; if the judgment is unanimous, then the justification of the judgment is prepared by the judge-rapporteur.

In the next step, the judge-rapporteur familiarizes the entire adjudication panel with the justification of the judgment. This is also the point at which the other judges comment on the text of the judgment. If the verdict is not unanimous, the final report is also written by the judge-rapporteur. In the event of a dissenting opinion, the dissenting judge writes an addendum to the justification explaining her stance. Dissenting opinions very rarely arise (see Kowalski 2020).

The justification of the judgment must include a brief description of the case, the charges of the complainant, the positions taken by the litigants, and the legal basis and explanation of the final verdict. The judgments of the courts in Poland are not as extensive as those of courts in the United States and Canada. In terms of length, they are rather similar to the judgments of courts in Germany (see Hadfield 2008 for information on the judgments of individual countries). The citations of other courts are not used for ‘window dressing’ but to support the arguments of the court.

Of the several empirical studies conducted on the courts in Poland, most of these have concerned civil courts (Jonski and Mankowski 2014; Bełdowski et al. 2020; Staszkiewicz et al. 2020; Banasik et al. 2022) and the constitutional court (Kantorowicz and Garoupa 2016; Fałkowski and Lewkowicz 2021). In terms of administrative courts, Stachowiak-Kudła and Kudła (2022) proved that an important factor affecting the decisions of courts in Poland is the legal tradition of the particular territory in which the administrative court is operating. Courts based in the former Prussian partition of Poland are more likely to be influenced by the German tradition of law (that is, they are more likely to make judgments in favour of the government), while those courts operating in the former Russian and Austrian partitions are more likely to refer to the principle of justice (with judgments tending to be made in favour of the taxpayer)Footnote 1. These institutional factors can still be identified almost one hundred years after the end of partitioning and despite the unification of formal and material law.

4 Characteristics of the cases

We examined the citation patterns of erroneous documentation cases involving mineral gas sales. These disputes concern the excise tax imposed on the taxpayer (complainant) by the tax collection agency. If the taxpayer accepts the decision of the tax authority, it is not brought before the court and the decision is upheld. A litigant dissatisfied with the decision can sue the tax authority before a provincial administrative court and this verdict can be appealed before the SAC. If the outcome of the appeal is in favour of the complainant, the case is once again referred to the administrative court, which passes a new verdict in accordance with the guidelines from the SAC.

The excise duty varies depending on the purpose of the purchased oil; the duty for heating oil is relatively low and the duty on motor oil relatively high (the former being about eight times lower than the latter). This induces a moral hazard and encourages tax fraudsters. In order to prevent fraud, buyers must confirm the purpose of the purchased oil and provide personal identification data to the oil sellers. The sellers are responsible for the accuracy of the sales documentation and the timely delivery of the documentation to the tax authorities. Errors in the identification or signatures of the buyers often result in the rejection of the tax declaration and the imposition of excise duty on the oil provider with a value appropriate to the purpose of the fuel. This practice has resulted in many lawsuits being resolved in courts.

In practice, two different lines of authority have emerged. The first line of authority aims to distinguish between significant defects and insignificant defects in material statements. The court verified whether the condition for the correct taxation of heating oil was met. Such control should enable the identification of the buyer and the verification of the buyer’s statement on the intended use of the purchased product. The second line of authority indicates that the legislator did not introduce any classification of defects in the statements of energy product buyers or their evaluation. It also did not allow for later corrections to or augmentations of the statements. It should be emphasized that we have selected cases which concentrate on the legal assessment of the committed act but not on the assessment of facts.

We have chosen these types of cases because they are homogeneous (all cases are very similar), the number of cases is relatively large (over 500), the value of the claims is clearly defined and the cases were resolved over a period of many years but with clear time constraints. The data comprises verdicts from the period 2009–2016. Before 2008, the identification rule was not properly established within the executive act issued on the basis of the Act concerning excise duty. According to the Constitutional Court, this latter act exceeded statutory authority. In 2016, the rules were altered in favour of taxpayers according to the preliminary ruling of the Court of Justice of the European Union (CJEU) (C-418/14). The CJEU decided that the tax authorities and courts should first of all examine whether or not the oil was actually used for heating purposes. This new line of authority limited the filing of lawsuits by taxpayers and undoubtedly reduced the number of cases settled in courts.

Although the chosen cases are homogeneous, the verdicts are not the same (since courts follow different lines of authority), and verdicts in favour of the complainant have become more common over time. Prestigious courts should be cited regardless of the lines of authority and type of decision (in favour of or against the complainant). By contrast, the citations of courts with a lesser level of prestige should be given based on the type of verdict.

Actually, these cases involving excise tax on mineral gas oil are suitable for measuring the prestige of provincial administrative courts because, due to the low incidence of such crimes in Poland, they did not attract any media attention. There are two reasons for this. Firstly, the media are more interested in VAT fraud than in excise tax fraud, since the former is far more widespread than the latter. Secondly, the number of fuel oil suppliers is relatively small, which is another reason that the reported crimes do not attract the attention of the media. The research on the functioning and development of case law for courts and on the role of the wider public (Baum 2006; Hansford and Spriggs 2006; Lupu and Fowler 2013) indicates that judges are concerned with the reactions of the wider public, which can affect their verdicts. Therefore we decided to select cases in which external pressure exerted by the public would not be a factor, and which would thus not complicate the process of examining the prestige of courts.

5 Data and variables

The data presented in this study was compiled by recording citations on erroneous heating oil declarations in 1242 decisions across all provincial administrative courts resolved between 2009 and 2016 (the courts are number-coded and their locations are included in Appendix Table A1). All citations in the examined cases were coded manually. We used intercoder systems to assess the reliability of the data collection process. The qualitative component of the study was led by one principal investigator. After an initial coding of the key variables in the database, a second coding of those variables was performed for random samples of the data (10% of observations) by a second researcher, yielding acceptable intercoder reliability levels for the tested variables. The reliability of our coding was tested using Cohen’s kappa (k), a measure of interrater agreement for the variable fraud. It is the only qualitative variable which can be assessed differently by different researchers. The kappa statistic ranges from − 1 to + 1 and should be close to or equal to one if assessments are concordant. The calculated value of kappa for the fraud variable was 0.92, representing a very good interrater agreement (see Appendix Table A2).

Having conducted a review of these cases, we identified citations of the SAC in 1096 judgments (hierarchical citations), citations of the same court in 317 judgments (internal citations), and citations of other provincial administrative courts in 504 judgments (external citations). According to Szmer et al. (2020), the courts should be bound by their previous decisions and should cite these. A contrasting point of view is that external citations are intended to convince the litigants that the court’s point of view is justified. In the next stage, we limited the analysis to the last group of 504 judgments.

It should be noted that in the Polish administrative courts, the court does not refer to the judge by name, as is the case in the United States (Klein and Morrisroe 1999). In Polish courts, a citation contains only the name of the court and the reference number of the case (for example: ‘confer verdict WSA in Białystok, the number I SA/Bk 69/10’). We were unable to use the characteristics of the judges because the only publicly available information in Poland is the name and surname of the judge and the court in which she adjudicates; as a result, we could not investigate aspects includingtheir ideology, professional title, position/experience, or the quality of the law school from which the judge graduated. These characteristics were used in the research of Landes et al. 1998; Berger and George 2005; Choi and Gulati 2008; Szmer et al. 2020. Therefore, our focus was limited to the courts and their characteristics.

Recent research has focused on positive or negative citations separately (Clark, Staton and Engst, 2018); with this in mind, we decided to split the citations into two groups—positive citations and negative citations. It was not difficult to determine which citations were negative, because in each of these cases the court clearly indicated that it did not agree with the decision of the cited court. In total, there were 1330 positive citations and 49 negative ones (comprising 3.6% of the total number of citations). Of the 504 judgments with external citations, 499 were issued unanimously while the remaining 5, a dissenting opinion was formulated.

In the regressions, we used only the positive external citations (478), omitting the negative ones. All negative citations reflect the court’s disagreement with the opinion presented by one of the litigants in the dispute and, therefore, cannot be treated as an indicator of the high prestige of the cited court. The dependent variable in our research is a dummy variable describing the citation of a given court (citation). This variable takes the value 1 when at least one positive citation of the court is present in the case and zero otherwise. In fact, this is repeated 16 times for all 478 cases (a total of 7648 observations). Since the citation is described by a set of dependent variables representing the case, the court and characteristics of the province. For the estimation of parameters, logistic regression was applied with a fixed effect for the courts and time effects for the years of the verdicts. This regression allowed us to examine the impact of the variables describing the citation (Table1) and to calculate marginal effects for courts (Appendix Table A4). To calculate these marginal effects, a logit model was applied.

Table 1 The information about the citations in the sample

The dependent variable in logistic regression (citation) took only two values. Therefore, to investigate the impact on categorical variables taking several values(i.e. the number of citations from a court) we had to introduce a different definition of the dependent variable. In zero-inflated negative binomial regression, the dependent variable (citations) was the number of citations of cases from a given court in a given year (Table2). The zero-inflated negative binomial regression directly measures the court’s prestige thereby explaining the differences in the number of citations.

Table 2 The results of zero-inflated negative binomial regression

Several factors may influence the citation practices of provincial administrative courts. To investigate this, we decided to apply 14 independent variables to the econometric model. These comprised institution-level measures (circuit attributes) (3 variables), court performance measures (4 variables), case measures (5 variables) and tax collection agency efficiency measures in their respective provinces (2 variables). Of the 14 variables, the value of the claim and the SAC sentence refer directly to the prestige of the court, while all other variables are control variables capturing the specificity of the circuit, cases and courts.

We checked for collinearity using the variance inflation factor (VIF) for the ordinary least squares regression. The VIF is greater than 10 for imputed tax per taxpayer and oil sales, indicating that collinearity can affect the results. Correlation analysis confirmed these two variables to be highly correlated but this is in line with expectations because it stems from a causal relationship. Higher sales of fuel increase the tax imposed and collected, but at the same time, these two variables provided some unique information - the imputed tax refers only to unpaid taxes while the sales refer to the taxation as a whole.

A similar situation refers to the variables of case_inflow and GDP_per_capita (in more affluent provinces the number of cases is higher) but it is important for the study to capture the work processing in court and the wealth of the province’s inhabitants. This led us to use all these variables in the regression despite the relatively high correlation (the highest correlation is 0.81). The value of Pearson Chi2 test (0.49) did not exclude the logistic distribution as the null hypothesis is not rejected.

In the second step, the logistic regressions on interactions between the courts’ dummy and the variables used in the analysis were performed. Especially important is the analysis of two variables: citation together with the SAC (SAC_sentence) and the value of claim (value_of_claim). It is possible that the citation of a provincial court alone is too weak to support the opinion of the court, so the additional citation of the SAC is used to strengthen the verdict, but this is less likely. The court does not necessarily need to use the ‘weak’ citations and may use only the citation of the SAC case. Therefore, if the citation of a provincial administrative court appears, it means that it is appreciated by the court’s panel. Similarly, the citations used when the value of the claim is high can indicate a higher level of prestige. If the value of the claim is high, then the probability of an appeal by the losing litigant is also high, and the justification of the sentence is especially important to minimize the likelihood of an appeal. Therefore, we expect that verdicts of the most prestigious courts should be chosen to convince litigants that an appeal is pointless. These two variables can be insignificant when courts with high and low prestige are simultaneously present in the sample because these effects cancel each other out (for some courts the effect is positive and for others it is negative).

A higher number of citations of a court in a year indicates the high prestige of that court. Therefore, in the last step, we applied a zero-inflated negative binomial regression for categorical variables represented by the number of citations of each court in a year. This would allow us to directly measure the prestige of courts. This approach has at least one weakness since it excluded the use of case characteristics or courts’ characteristics together. For cases resolved in a particular year, the characteristics of the court and the court circuit are the same, so this cannot account for different levels of citations. The prestige should be related to the characteristics of the court’s circuit and the court’s performance rather than to the characteristics of the cases (e.g., the value of the claim, or whether explicit cheating was revealed). This prompted us to use the former group of variables and omit the latter. Eventually, the dependent variable (given here in plural form as citations) was determined as the number of citations of cases resolved by the court in a given year, and explanatory variables included performance measures of a court, local tax collection agency measures and circuit characteristics.

We applied the zero-inflated negative binomial regression model instead of the Poisson regression because the means and variance of the dependent variable were not the same. Also, the likelihood ratio test of alpha was significantly in favour of a zero-inflated negative model. There were 84 observations of the dependent variable in total, 28 of which were zeros, raising the question of whether the zero-inflated model is effective here. The zeros occurred if all citations in a given year were not later cited in the sample.

Unfortunately, the Vuong test (1989) is not appropriate for deciding the specification of the model used (cf. Wilson 2015), but the zero-inflated model is a potentially better fit for the data. The zero-inflated model was also preferred if we consider Akaike and Bayesian’s information criteria. They were lower for the final zero-inflated model (507.4 and 531.8) than for the ordinary final negative binomial model (521.6 and 538.6). The zero-inflated model applies a logistic function to model the excessive number of zeros. The effect of the inflated number of zeros was captured by two variables: cases (the number of oil documentation cases resolved in a year) and year (the year of the verdicts issued in a court). A low number of verdicts in a year should negatively affect citation numbers by increasing the number of zeros. The variable year can increase the number of citations (if the verdict is starting a new line of authority) or decrease it (when the verdict is published at the end of the investigated period and there is no time to cite it), providing the higher number of zeros for later verdicts.

5.1 The circuit-level measures

The first variable, population, was added because the court’s circuit size can affect the frequency of citations according to hypothesis 2. The population in provinces (which are also the court’s circuits) differ significantly (from 1million to more than five million inhabitants) affecting directly the courts’ size but also the perceived importance of their verdicts. From this, we can infer that highly populated circuits are cited more frequently. This can be partially explained by their greater experience in resolving cases and by the better preparation of judges in larger provinces (universities with distinguished law faculties are generally located in the centre of more populous provinces and become an established source of legal education and doctrine).

The second variable represents the difference between the provinces in terms of wealth as represented by Gross Domestic Product per capita (gdp_per_capita). It is likely that courts in more affluent provinces are cited more frequently because such courts are perceived as being more influential.

The last variable in this group is the value of heating oil sales (oil_sales). The use of heating oil varies by province, so the results of the cases have different meanings for the public. We expect that the courts in provinces with higher oil sales cite other courts’ verdicts more frequently. It is reasonable to assume that the justification of cases in provinces with higher oil sales would be more elaborate because the judgments are more important for future jurisprudence than in provinces with lower sales of oil.

5.2 Court performance measures

The first variable in this group, case_inflow, represents the number of cases brought to the provincial administrative court in a year. This includes all cases, not just those referring to the documentation of heating oil sales. This gives an indication of not only of the size of the circuit but also of the experience of the court in resolving cases and the tendency of people in a given area to sue. Furthermore, it can indirectly reveal the quality of administration in the province (if the quality is poor, we can expect more cases to be brought to the court).

The variable judges’_load measures the workload of the judges in a court (confer Epstein, Landes and Posner 2011). This variable is calculated by dividing the yearly number of new cases in a court by the number of judges. This variable ranges from 4.47 new cases per judge to 45.31 new cases per judge. However, the number of adjudicated cases by one judge is three times higher because a panel of three judges resolves each case. A higher workload means that judges are able to spend less time reading citations; in this instance, we would expect a lower number of citations and a focus on citations from larger court circuits (since larger courts have a large number of cases, they are easier to find).

The number of judges employed (judges) represents the diversity of opinions in a court and potential support and advice from other judges. The expected impact of this variable is negative according to hypothesis 3. The need for the use of external opinions is lower when internal opinions are easily available. The institutional efficiency of the circuit as a whole may impact judicial performance (e.g. Melcarne and Ramello 2015) and citation patterns. Therefore, we applied the clearance rate variable, this being the ratio of finished cases to the total number of new cases in a given year. This measures the degree of a court’s efficiency in processing the cases. To determine the clearance rate of a court, we used annual reports from courts containing data on the number of incoming and resolved cases in each court per year. High clearance rates can be attributed to the high prestige of courts, but their choice of citations cannot be determined a priori.

For the zero-inflated negative binomial model, the number of oil documentation cases resolved in a year was added (cases) to determine the probability of citation of cases issued in a given year (the greater the number of cases resolved, the greater the number of cases which can be cited).

5.3 Case measures

The first variable in this group, verdict, is a dummy variable describing a win (1 - when the court decides that the contested administrative decision should be revoked) or loss (0 - when the court decision is to dismiss the complaint). This variable describes the type of decision and impacts the choice of citation since courts strive for consistency in their adjudications and tend to make rulings in the same manner until the line of authority or the law are unchanged.

Citation is a time-dependent process. Current cases do not cite future cases, which reduces the legal significance of recent cases and makes the meaning of previous cases higher. This feature is represented by the variable case_age. It allows for controlling the changes in citations over time. The database of administrative court rulings is structured in such a way as to firstly show the judgments that have most recently been issued. The judge may not have enough time to investigate further, which will result in the dispersion of citations from different courts. Therefore, we expect to see a dispersion of citations over time with new judgments emerging in similar cases. The choice of newer verdicts for citations is broader; as they are easier to find and more up-to-date they should be preferred by the courts. In the zero-inflated negative binomial model, a similar variable - year - is used, but it represents only the year of the verdicts issued by a court and not the precise date of a case verdict.

A similar variable has been proposed by Choi and Gulati (2006) and Fowler et al. (2007). It was also applied in a network analysis of the US Supreme Court precedents by Cross et al. (2010). These studies conducted in the US courts revealed that the age of the precedent and the importance of the case influenced the likelihood of the courts citing the case in future verdicts (Black and Spriggs 2013).

The third variable is value_of_claim. The higher the value of the claim, the higher the probability of an appeal if the verdict is in favour of the government, so we expect that a claim with a higher value encourages a stronger justification of verdicts in order to avoid an appeal. It seems reasonable to conclude that this gives rise to the use of citations from courts with higher prestige.

The fourth variable is fraud. This helps to identify those cases which include evidence of intentional deception. For cases involving deliberate cheating, it is easier to make judgments and to issue a negative verdict for the complainant. Consequently, they do not require citations of persuasive authority from courts.

Finally, we added a dummy variable SAC_sentence. It takes the value of 1 if the justification includes the citation of SAC; otherwise the value is 0. Previous studies of appellate courts in Australia (Fausten et al. 2007; Smyth 2018) and Canada (McCormick 1996b) have found that hierarchical citations comprise a larger proportion of the total citations than citations of other administrative courts. We assume that where external citations (citations of other courts at the same level) occur in conjunction with hierarchical citations, the administrative-court-cited enjoys a level of prestige similar to higher courts. In these cases, external citations play a similar role to those of SAC citations. The SAC’s authority is used to reduce the tendency of taxpayers to appeal against negative sentences (based on the reading of verdicts in the sample) and it partially exempts the court from responsibility for the negative judgment as the SAC has persuasive authority.

5.4 Tax collection agency measures

The possibility that the behaviour of litigants and judges is affected by the actions of the tax collection agency in a province cannot be ruled out. The tax collection agency can be very effective in collecting tax arrears, but quick action can stimulate further conflicts and ultimately lead to disputes. Moreover, the poor quality of tax administration can confuse judges and encourage them to make greater use of external citations (hypothesis 4). We did not have the data for each year, but we were able to add some measures describing the behaviour and quality of work in tax collection agencies. The tax collection agencies differ in terms of their effectiveness in enforcing taxes; therefore, we included the variables of debt_collection_time - calculated as the average number of days taken to collect tax arrears in a given territory - and imputed_tax i.e. the new tax imposed on a taxpayer as a result of an audit. These factors are an indicator of the effectiveness of tax collection agencies in different provinces. These figures are only available for a certain number of years, so their values are taken from the end of our data period and cover data on all taxes (not only those related to the taxation of oil). Fortunately, this characteristic remains relatively constant over time (The descriptive statistics of continuous variables are presented in Appendix Table A3.).

6 The results

6.1 Some general observations

Before describing the results of regressions, it is worth noting some general observations (Table1). Court 16 (195 cases) was cited most frequently, while courts 4 (19 cases) and 5 (20 cases) were cited the least often. Circuit 5 had the smallest number of cited cases (9 cases) and circuit 16 had the largest number (102 cases).

Some indicators were found to be useful for determining the relative popularity of a court. The first of these is the relative validity of cases, calculated as the number of citations of a given court when divided by the total number of cases cited from that court. For instance, court 1 was cited in 110 cases but there were 15 cases in total cited from this court, so on average, each case was cited more than 7 times. The high number of citations of court 1 stems from the fact that this court had turned to the Constitutional Court with a preliminary question; consequently, the citation of this court was very popular among other courts. The calculated values of the relative validity of judgments were highest for courts 1, 13 and 8. The lowest values were observed for courts 4, 2 and 12. In court 12, there were 38 citations but the court was mentioned in only 40 cases.

Sometimes, a court is cited multiple times in one case. The highest values of this indicator were found for courts 2, 6 and 1. This hints that some courts are preferred by other courts, although this ‘popularity’ is limited. However, this practice of multiple citations is not useful in the modelling of courts’ prestige because it could result from the particular order of cases in the database of verdicts. If one citation from a given court is used, then it is easier to use another citation from this court because it is more easily accessible than the citations of other courts in the browser. Similarly, the date of a particular case has been observed to influence the result. Older cases are cited more frequently in the text of judgments since judges are more familiar with them from previous verdicts. A reference to the judgments of the SAC occurred in 1096 judgments (out of 1241). The courts cited the judgments of the superior court most often when they decide in the favour of the tax collection agency (the state). In the case of judgments in favour of the complainant, the courts prefer the citation of judgments of their own court.

In contrast, consistent citations were used in judgments both in favour of and against the complainant. Therefore, we expect to observe more citations in negative judgments (decided in the favour of the state).

6.2 The regression results

The first two models explain the occurrence of a citation of another court in the verdict. The model of logistic regression helps to distinguish variables associated with the use of citation in general and the model of logistic regression on interactions of a court and variables used allows interpreting the citation patterns of individual courts.

The results of logistic regression are presented in Table3. We reported the parameters for non-standardized data in the second column and odds ratios for standardized data in the third column. The odds ratios facilitate the comparison of the relative strength of the variables used. Unfortunately, the model explains only a small fraction of the total variance, as the pseudo R2 is equal to 0.09. From all estimated regression the highest pseudo R2 equals 0.3 for the model of interactions (Table4). It is much higher than the pseudo R2 in basic logistic regression but, most of the variance remains unexplained. Therefore, the results and conclusions reported should be treated with caution because factors outside the model can make the estimated parameters unstable as they include a prevailing fraction of variance.

Table 3 The results of logistic regressions with fixed effects for courts and time effects

The results of logistic regression with fixed effects and time effects (Table3) confirmed that the citations of a court were associated with the variables from all four groups of variables. The weakest relationships were evidenced for case-specific measures, so the citations are not case-driven. Therefore, it is less likely that the choice of cases involving heating oil documentation introduced bias to the results. The effects turned out to be jointly insignificant, while the court-fixed effects were found to be jointly significant.

The courts in more populated provinces were found to be less willing to cite other courts. Perhaps the judges in such courts perceive their own institutions to be important centres of legal authority and consider other courts to be less experienced in resolving cases. Similarly, it is conceivable that judges in less populated provinces feel less confident in their judgments and thus more willing to use citations from other courts. This effect is confirmed by the negative sign of variable judges. In larger courts, the advice of other judges is more available, so the need for external citations is reduced. The results are in line with hypothesis 3 but the size of these effects is moderate as the odds ratios are 0.79 for population and 0.96 for judges.

For the two measures of tax collection agencies’ effectiveness in enforcing taxes, a negative relationship emerges. The citations of other courts were used rarely if a court was located in a province characterized by a short debt collection time. Similarly, a higher value of a tax imposed on the taxpayer after an audit was related to a lower probability of citation. It seems plausible that the more efficient the tax collection agency was, the lesser the need for the citation of other courts. This suggests that courts follow the line of reasoning presented by the tax collection agency if they perceive it as being right. The value of imputed tax had the lowest odds ratio indicating that this effect was the strongest among negative factors affecting citations. It confirms hypothesis 4. The higher sales of heating oil in a province make the probability of other courts’ citation more likely.

The importance of the verdict in provinces with high oil consumption is higher than in low consumption ones. Therefore, the verdicts in the former require better justification and incline judges to make greater use of external citations. According to the value of the odds ratio, this positive effect is the highest among all investigated variables.

A positive correlation was observed between the number of new cases (also not related to excise matters case_inflow) and the probability of other courts’ citations. If the inflow of cases is high judges are more willing to cite other courts as it saves them having to make an extra effort. In the same instance, if the number of cases resolved by a judge (judge_load) increases, the chance for citation of another court decreases. It suggests that judges with high workloads have less time to choose between citations and thus minimize their use. The relative strength of this effect is the second-largest among variables negatively associated with citations.

The case_age revealed a small positive effect, which can be interpreted as an increasing diversification of citations over time. It indicates that the age of the cited ruling was a significant factor when explaining the frequency of citation, with new verdicts competing for the attention of judges. The value of the odds ratio evaluated this effect as the second-largest among variables positively associated with citations.

The insignificant turned out variables were related to the specifics of a case (value_of_claim, fraud, SAC_sentence and verdict)with one variable related to the efficiency of cases processing (clearance_rate) and one variable related to the characteristics of the circuit (gdp_per_capita).

The analysis of marginal effects (cf. Appendix Table A4) indicates the differences between the courts in terms of their propensity to use external citations. Courts 16, 3, 14 and 9 were inclined the most, while courts 4 and 5 were less inclined to cite other courts. The relative size of those effects was significant and differed by more than a factor of eight. However, these differences stemmed from different factors, so to shed some light on this problem, the analysis was augmented with the logistic regression on the interactions between variables and individual courts. The interaction indicates which variables are associated with fewer or higher external citations in individual courts (Table4).

Table 4 The signs and significance of interactions between a given court and variables in the logistic regression explaining citation

We start the presentation of results from the variables related to hypothesis 1. The SAC citation together with external citation increases the probability of citation of the court 13. We know also that also the relative importance of cases is relatively high for this court (Table1, 4.04). Therefore, we can stipulate a high level of authority of this court and high prestige. Four courts (1, 8, 9 and 11) decreased the probability of other courts’ citations when they were cited together with SAC. It can lead to the conclusion that they are not perceived as being prestigious.

The value of the claim variable was negatively correlated with citations of courts 1 so the citation of verdicts from this court is less frequently used when the value of the claim is higher. This variable can be an indicator of perceived prestige but for most courts, it was found to be insignificant. Perhaps verdicts with a higher value of claim do not require the special citations of high-prestige courts to justify the decision, so the citations from different courts can be used. Thus, the results of this variable did not confirm the prestige of any court.

As we know from the regression on the whole sample (Table4), the courts from highly populated provinces were reluctant to cite other courts (negative population coefficient) but on the individual level (Table2), some courts exhibited the opposite pattern (2, 10, 14 and 16). Therefore, the population size does not act in a similar way for all courts. It is worth noting that the pattern of a citation for population differed from the pattern of a citation for gdp_per_capita. Affluence and population size are not similar factors.

For several variables, we observed different results depending on the court. Precisely, some courts increase citation of other courts and some decrease it in response to changing variables. It applies to such variables as case_inflow, clearance_rate, judge_load, fraud and debt_collection_time. The courts with a highest number of judges prefer court 16. This is interesting because judges from larger courts use citations less often. This can hint that these verdicts provide some additional arguments which are perceived as valuable.

The effectiveness of tax collection agencies negatively affects the probability of citation of other courts and it is especially evident for the variable imputed_tax. It seems likely that the more effective a tax agency, the greater the conviction of the judges that the tax authorities are correct and, therefore, the judgment does not require additional arguments and a deeper justification. Only two courts, courts 13 and 16, revealed a reverse citation pattern for the debt_collection_time. These two courts are ready to challenge the position of the tax authorities corroborating a high self-assessment of their competence.

In zero-inflated negative binomial regression, the total number of citations for a given court is explained by variables describing this court and its circuit. The results of this model are shown in Table2. Contrary to the previous logarithmic regressions, the dependent variable in this instance strictly refers to the popularity of courts and can be a proxy for a court’s prestige.

The courts in highly populated provinces were cited more frequently which is in line with hypothesis 2. It is interesting to note that there is an asymmetry; courts from highly populated provinces were more frequently cited and were also less willing to cite other courts (following the results of logistic regression). It may indicate that they perceive themselves as special. It also augments the behaviour of small courts (postulated by hypothesis 3). Small courts more intensively cited other courts and they were also more cited by other courts. Maybe there is some kind of reciprocity in citations. The clearance_rate is negative, so the courts effectively processing cases were cited less frequently. However, this result is significant at the 10% significance level but if true, it may hint that the high number of resolved cases in a court is not an indicator of good quality of verdicts. Of the circuit characteristics, only two were found to be significant and positive (population and oil_sales). The higher sales of heating oil in a province fostered the citation from the appropriate court. This variable affected positively and symmetrically the courts citing and being cited. The first effect refers to the relative importance of provinces and the second to the specific experience in resolving oil cases. The variables referring to the tax collection agency turned out to be insignificant but contrary to the analysis of reasons for other courts’ citations, there is no clear relationship justifying the cited cases with tax administration effectiveness.

7 Concluding remarks

By examining the determinants of case disposition and citation practice, this study offers empirical insight into the functioning of courts in countries where the courts adjudicate according to the jurisprudence constante doctrine.

Our study indirectly contributes to the efficient design of legal institutions. In this study, we proposed a universal set of factors relating to (1) court circuit characteristics, (2) the performance of courts, (3) case features and (4) the effectiveness of the tax administration agencies in a given province. Moreover, this set of variables can be easily adjusted and applied to other courts and countries with a legal system based on statute law. It should be especially useful for judicial systems which have no mechanism for evaluating judges or courts, or for countries where such a system is underdeveloped. The results presented in this article may be of crucial importance for policymakers who wish to improve the quality of courts and their judgments. This examination of courts’ prestige constitutes an inexpensive means of augmenting the evaluation systems of courts and has the potential to create a competitive climate in which the quality of courts’ outcomes is increased. The results indicate that courts with a high number of judges and a record of quickly resolving cases were less frequently cited, as they were not perceived as being prestigious. Perhaps these two variables are indicators of the low quality of verdicts. It seems reasonable to encourage such courts to improve their performance.

Our study also contributes towards evaluating of the performance of administrative courts. The performance of courts has important implications for the proper functioning of the judicial system and the rule of law. However, scholars have still not managed to reach a consensus on the factors determining which courts are performing well and which courts are performing poorly. If we assume that judges represent the best source of knowledge regarding the operations of other courts (because they are familiar with the judgments of other courts and indeed make assessments of them in the course of their work), then we can utilize this knowledge to evaluate a court’s performance. Leaving aside other factors (e.g. heavy workloads of judges, the characteristics of cases), judges will more frequently cite the cases which they consider to be better argued than others. For example, studying verdicts from courts we can find that the most frequently cited cases are well-written and provide well-argued justification but these elements are hard to quantitatively assess. The most frequently cited verdicts were from courts located in provinces with a higher population and higher sales of heating oil.

The conducted analysis provides several remarkable conclusions. Firstly, the distinction of courts with high prestige is difficult. Only one court (13) was frequently cited and often together with the citation of SAC so could be qualified as prestigious. Some other courts could be most aptly described as being popular (16, 3, 14 and 9). In cases where the value of the claim was high, we observed that only one court was less likely to be cited, while there were no courts more likely to be cited. This lack of significant diversification can hint that the administrative courts were not very diversified in their perceived prestige. Therefore, hypothesis 1 can be confirmed only in part.

Secondly, the population affects the willingness to use external citations and the chance of being cited. Highly populated provinces typically cite less but are cited more by their less populated counterparts. The second part of this result confirms hypothesis 2.

Thirdly, smaller courts are more willing to cite other courts but they are also more cited. The first part of this result confirms hypothesis 3. It is interesting to note that the population of courts can be divided into two groups and their mutual citations are less frequent.

Fourth, the quality of tax administration processing tax cases is an important factor affecting the need for citation of verdicts from other courts. The verdicts in provinces where indicators of tax collection agencies are high (which means high additional imputed tax and effective enforcement of tax arrears) cite other courts less frequently in line with hypothesis 4. This is likely due to the cases proceeded by a court being better prepared by tax administration. The effect is asymmetric, as the number of court citations is not affected by the effectiveness of tax collection agencies in the provinces which the court is cited.

Fifth, the citation of other courts was linked to the features of the court’s circuit, the effectiveness of tax collection agencies and some measures of processing efficiency. However, this was not related to the detailed characteristics of the case. The latter observation may be due to the homogeneity of the analysed cases.

Sixth, we observed an increasing dispersion of citations with time. This was a very interesting aspect of our findings since it confirms that the age of the cited ruling is an important factor in explaining the frequency of citation. There are many factors that diminish the value of a precedent, but few studies have been conducted on this matter. There were 14 courts whose citations increased over time and there was no court in which citations decreased. We can only speculate as to the possible reasons. For example, it can be theorized that the dispersion may be an artefact of the case law database, which is set up in a manner favouring the newest judgments. Therefore, judges looking for similar verdicts most likely refer to cases which appear at the top of the search results.

In conclusion, we can observe directly from the data that SAC sentences were used mainly to justify negative verdicts, while in contrast, other citations were used for the justification of negative as well as positive verdicts. This finding showed that SAC sentences were perceived as a means of limiting the probability of an appeal by the complainant. It acknowledged the high prestige (persuasive authority) of the SAC.