1 Introduction

In criminal proceedings, pretrial detention is debated and controversial, since it is an exception to the fundamental principle of the presumption of innocence, by depriving defendants of their liberty at the initial stages of proceedings, before their guilt is proven. The conditions under which such a measure is legitimate include, for instance, the reasonable suspicion of the person having committed the offence, the necessity to prevent defendants from absconding or committing further offence(s), and the risk of interfering with the course of justice during pending procedures. Their occurrence is subject to a case-by-case evaluation, based on the judge’s discretionary assessment. Moreover, the remand measure shall last no longer than necessary to achieve the objectives pursued by the law [7]. Unfortunately, while there have been numerous studies on the legal framework governing pretrial detention, limited research has been carried out to date into the practice of pretrial detention decision-making. In this regard, Italy and Brazil are interesting fields of investigationFootnote 1. According to the World Prison Brief latest ratesFootnote 2, in both countries, approximately 30% of the prison population are pretrial detainees. In this context, our research is aimed at identifying the relevant factors on the basis of which Italian and Brazilian Supreme Courts impose the pretrial detention –more exactly, maintain rather than reform, decisions on this matter by lower courts-, as well as how such factors relate to each other. To this end, we built two different corpora of Italian and Brazilian judicial decisions, as detailed in Sect. 2. Section 3 describes the unsupervised learning approaches, in particular association and clustering methods, used to analyse and extract the relevant predictive-features from the documents in the corpora. Section 4 reports the experimental setup and the results, as well as delineates commonalities and differences between the two legal systems. Section 5 concludes and outlines possible future research lines. This project follows recent attempts at explaining decision-making systems through factor-based reasoning, justifying decisions on the basis of legal features of a case [9, 10]. In order to identify the legally relevant factors, described by [2] as case decision predictors, we followed recent experiments as seen in [5].

2 Datasets

We built two different datasets of Brazilian and Italian judicial decisions, as we could not find any existing data collections to help augment our own. The Brazilian corpus consists of 2,018 documents, collected from the official Brazilian Supreme Court’s website ( Documents are structured in the following sections: (a) heading (lawsuit metadata), (b) summary of the judgment, (c) case report (including the grounds of appeal), (d) reasons and decision of the judge-rapporteur, (e) votes of the other judges (when they differ from the judge-rapporteur), and (f) final decision. The Italian corpus consists of 718 judicial decisions by the Italian Supreme Court, downloaded from the DeJure database. Documents are structured according to the following sections: (a) heading (lawsuit metadata), (b) summary of the judgment, (c) case report (including the grounds of appeal), (d) reasons and (e) the final decision. In this regard, the main difference between the two corpora concerns the absence of dissenting statements in Italian rulings.

3 Methodology

In this section we briefly describe the general methodology and the unsupervised learning techniques we employed. We approach the research problem in two goals: (i) identification, aimed at extracting the relevant factors, and (ii) correlation, aimed at finding relationships between the extracted factors and judicial outcomes, i.e., whether Italian and Brazilian Supreme Courts maintain rather than reform decisions on pretrial detention. To this end, we adopted, for both the Brazilian and Italian corpora, a four-step process. First, we manually extracted some factors from judgments which we call objective factors, since they are clearly stated in the text. Second, we addressed the association task to find possible relationships between these objective factors and the decision outcomes. Third, to automatically extract further relevant features, we split each dataset into 2 subsets, on the basis of the outcome of the decisions. Finally, we applied clustering methods to each subset in order to detect what we name subjective factors, i.e., those that are more difficult to identify. Note that we did not apply association methods to the subjective factors, since the 2 corpora were already split depending on their outcome. To perform our experiments, we have relied on existing implementations and standard methods, including the open-source software Orange 3 [6] and Carrot2 [15], as detailed in Sect. 4. In Sects. 3.1 and 3.2, we briefly explain association and clustering methods.

3.1 Association

To identify relationships between factors and outcomes, we extracted association rules having the forms x \(\rightarrow \) y, where x is a set of factors and y is one of the two outcomes. For each rule, we determined its support and confidence, namely (a) the proportion of the cases in which both the antecedent x and outcome y are satisfied (the likelihood of finding x and y cases), as a fraction of all cases in the dataset, (b) the proportion of cases in which outcome y is satisfied, as a fraction of all cases satisfying factors x (the likelihood of x cases have outcome y).

$$\begin{aligned} \begin{array}{llll} s(x \rightarrow y)=\frac{Frequency(x,y)}{N}&;&\,&c(x \rightarrow y)=\frac{Frequency(x,y)}{Frequency(x)} \end{array} \end{aligned}$$

In particular, we applied the FP-Growth association algorithm to scan the whole data and find the rules which satisfy given support thresholds. Then the rules were represented as a conditional tree, which saves the costly dataset scans in the subsequent mining processes [8].

3.2 Clustering

Clustering is an unsupervised learning task used to uncover hidden patterns in unlabeled data [12]. Considering that documents may present common factors, we adopted the so called soft clustering approach, whereby documents can be assigned to one or more clusters. In particular, we applied Hierarchical Clustering, which builds tree structures, by merging documents, and clusters of them, depending on similarities [1]. To assess similarities we used the cosine measure [4]. Once clusters have been generated, we ran the Latent Semantic Indexing (LSI) algorithm, which captures the underlying semantics of textual documents and computes how words relate to each other, so as to reveal the occurrences of topics within the corpora [16]. We also used the Lingo algorithm, which extracts frequent phrases from documents, under the assumption that such phrases provide informative human-readable descriptions of topics. Among the techniques on which Lingo relies, we employed the LSI, aimed at discovering any existing latent structures of diverse topics. Finally, Lingo matches the cluster description with the extracted topics and assigns each document to one or more clusters. To select the best label for each cluster, it uses a score measure, based on cosine similarity [14].

4 Experiments and Results

As explained in Sect. 3, we addressed our research questions as identification and correlation goals. In the following we detail the experimental uptake, we report the results and make some considerations.

4.1 Manually Extracted Information

Following the first step, we manually extracted 5 objective factors: the prisoner status, the name of the judge rapporteur, the crime category, the crime location and the judgment date. In the following, we detail each factor and the values it may assume depending on the data.

  1. 1.

    Prisoner Status, i.e., the situation of the accused after the appeal ruling. This factor may have two alternative values, i.e., released and not released. Cases in which the Court replaced pretrial detention with house arrest, were considered as released.

  2. 2.

    Judge Rapporteur, i.e., the judge who furnishes a report on the case at hand. The Italian data is characterised by a higher variance compared to the Brazilian one, due to the different number of seats in the two Supreme Criminal Courts: at least 35 in the Italian Supreme Court, regularly replacedFootnote 3, versus 11 seats in the Brazilian one, where judges have a permanent position.Footnote 4

  3. 3.

    Crime, i.e., the general category to which the committed crime belongs to, under the Brazilian and Italian criminal laws. In particular, we identified four main categories: (i) “crimes against the person”, (ii) “crimes against property”, (iii)“drug-related crimes”, and (iv)“criminal organization”.

  4. 4.

    Location, i.e., the place where the crime took place. While in Brazil it corresponds to a state, in Italy it is represented by a regional capital.

  5. 5.

    Date, i.e., when the judgment was issued. It corresponds to the ruling year.

Following the second step, we run experiments by employing the FP-Growth association algorithm (see Sect. 3). Table 1 indicates the specific parameters we adopted. To generate a set of reliable rules having Released as a consequent, we had to lower the required support and confidence scores (given the smaller number of realise-cases being present in each dataset).

Table 1. Association setup parameters.

Tables 2 and 3 show some selected results. In particular, we report the rules presenting a certain degree of similarity within the two corpora.

Table 2. Association rules in Italian dataset.
Table 3. Association rules in Brazilian dataset.

As we can note from rules no. 2 and no. 5 within the Italian and Brazilian datasets, drug-related crimes as well as the combination of criminal organization and crimes against property, are factors usually related to the not released outcome. The same is true for the date factor 2019, the locations São Paulo and Naples, as shown in rules no. 3 and no. 4 in the two tables. Conversely, rule no. 6 in both datasets shows a relationship between the released outcome and the combination of date 2013, drug-related crimes and the location, respectively Naples and São Paulo. However, it should be noted that in the Brazilian dataset the confidence of this association rule is lower compared to the Italian one. From a general perspective, results show highly reliable association rules for the not released outcome within the two datasets. Conversely, we did not find association rules related to the released outcome with high confidence. This remains true even by reducing the confidence threshold.

4.2 Automatically Extracted Information

Following the third step, we split each corpus into two subsets, containing respectively the judgements for the defendant (Released) and for prosecution (Not released): in the Italian corpus, the first subset contains 614 judgements, and the second 104; in the Brazilian corpus respectively 1,503 and 515. We applied pre-processing techniques before clustering: normalization, tokenization combined with regular expressions, stemming, filtering of stop words and n-grams with \(n=2\) [12]. To encode sentences, in an effort to make our method as general as possible, we opted for well-established approaches. For the Lingo algorithm, we used the Bag of Words (BOW) model [11, 17]. In this model, one feature is associated with each word in the vocabulary. The value of each feature is usually computed as the \(TF-IDF\) score, and measures the importance of the corresponding word. For the Hierarchical algorithm, we used Word Embeddings, a popular technique for language models and deep learning applications [3, 13]. The parameters adopted for clustering are reported in Table 4, depending on the outcomes and the number of documents in each subset.

Table 4. Clustering setup parameters.

Following the last step, for clustering, we rely on the Lingo algorithm, Hierarchical clustering and LSI. Tables 5, 6, 7 and 8 report some results obtained by using Lingo, sorted by highest score.

Table 5. Lingo clusters and labels in Italian Not released subset.
Table 6. Lingo clusters and labels in Brazilian Not released subset.

We classified the obtained labels as follows: (a) grounds of appeal (i.e. elements alleged by the defendant); (b) the reasons of the decision (elements indicated by the judges); (c) the type of committed crime; (c) the location of the lower court; (d) the date of the Supreme Court judgment; (e) and the name of the judge rapporteur. In analysing the results, we found some difficulties since multiple labels had similar meanings, and certain documents were included in more than one cluster. From the Not released subset of the Italian corpus we extracted grounds of appeal such as the nullity of the defendant’s interrogation (label no. 2), the expiration of the pretrial detention term (label no. 3), and the violation of the presumption of innocence principle (label no. 7). Lingo also extracted labels referring to manually identified objective factors, e.g., the location (Naples, label no. 6), the date (May 2013, label no. 1) and the crime type (criminal organization, label no. 8). Among the requirements needed to apply the pretrial detention measure, the seriousness of the risks (label no. 4) is also related with maintaining the prison order. From the Brazilian Not released subset we extracted similar grounds of appeal, such as the expiration of the pretrial detention term (label no. 2) and the procedural nullity (label no. 5). As reasons for judgment, we listed the victim’s appearance in court (label no. 1), the impossibility of converting the prison into an alternative measure in cases of drug-related crimes (label no. 6), also depending on the nature of the drug seized (cocaine, label no. 8). Here we also identified manually extracted labels, such as the date (December 2014, label no. 7), the crime (drug law crime) and the judge rapporteur (C. L., label no. 3).

Table 7. Lingo clusters and labels in Italian Released subset.
Table 8. Lingo clusters and labels in Brazilian Released subset.

As regards the Released outcome, in the Italian subset we can note as related reasons the procedural nullity involving the defendant’s hearing (label no. 1) as well as the suspension of the prison term-limit and its expiration (labels no. 2 and no. 6). These reasons can also be framed as grounds, as they were alleged by the defendant. We can further identify the following reasons: the issues concerning the defender (label no. 4), cases returned to the previous grade of judgement (label no. 7), and the replacing imprisonment with less restrictive measures (house-arrest, label 3). Once again, we verify factors regarding the date (February 2009, label no. 6) and the location (Catanzaro Court, label no. 8). We also found similarities in the Brazilian Released subset in terms of judgment reasons and grounds of appeal, such as the expiration of the prison term and unlawful constraint (label no. 2), less restrictive measures (label no. 3) and appeal proposed by the public defender (label no. 4). Cases related to investigated companies are also a factor that we classified as a reason (label no. 5). Other labels verified are when the situation involves an insignificant burglary (crime, label no. 7) and the judge-rapporteur (G. M., label no. 8).

Tables 9, 10, 11 and 12 show some selected results from Hierarchical and LSI.

Table 9. Hierarchical clusters and LSI topics in Italian Not released subset.
Table 10. Hierarchical clusters and LSI topics in Brazilian Not released subset.

LSI returns green and red words, respectively indicating positive and negative weights. A positive weight indicates that a word is highly representative of a topic, while a negative weight indicates that a word is highly unrepresentative for that topic [6]. We tried to either lower or increase the number of topics with no real impact on the overall intelligibility of the results. Hence, the disadvantage of combining Hierarchical clustering and LSI is that we had to interpret single words rather than strings.

In the Italian Not released subset, we could identify factors already identified with Lingo, e.g., the suspension of the prison term and its expiration as grounds (C16 topics) on the one hand, and the seriousness of precautionary requirements, the connection between criminal organizations and drug-related crimes as a reason for applying pretrial detention (C19 topics) on the other hand. This factor can also be observed in the Brazilian Not released subset (C12 and 21 topics).

Table 11. Hierarchical clusters and LSI topics in Italian Released subset.
Table 12. Hierarchical clusters and LSI topics in Brazilian Released subset.

In the Italian Released outcome, we can observe similar results to those obtained with Lingo. In particular, we identified a few words referring to the hearing of the defendant, the general requirements for applying a precautionary measure (C7 topics), and the prison time expiration (C4 topics). In the Brazilian subset we identified a set of words referring to the time-limit of the prison (C16 topics), and house arrest as an alternative measure (C5 topics). Moreover, the algorithm extracted the name of two judges that are related to the release outcome(C16 and C5 topics).

5 Conclusion and Future Works

It is well known that the Brazilian and Italian Supreme Courts usually maintain, rather than reform, decisions on pretrial detention by lower courts. In our experiments, we aimed to go beyond this obvious observation and analyse the reasons behind such decisions. This may help us in determining whether this practice is legally correct or rather reflects the reluctance to overhaul decisions by lower courts. While our analysis does not provide a definitive answer, it shows a certain consistency in high court decisions. In both legal systems, clustering labels and topics point to factors in common, i.e., the excessive length of time spent in prison, and the time-limits established by the law are factors which support the release. On the other hand, crimes against property, drug-related crimes and involvement in criminal organizations are highly related to the maintenance of the pretrial detention measure. The same is true with regard to the locations of Naples and Sao Paulo, suggesting that in these places serious crimes are more recurrent. In the Brazilian dataset, we found relationships between the judicial outcome and the judge rapporteur. This situation is absent in the Italian dataset. This may be due to the higher variability of judges in this Court. Concerning the experimented methods, Lingo performs better than the Hierarchical clustering combined with LSI. Labels are immediately intelligible and contain meaningful information, from both computer science and a legal perspective. Moreover, the topics resulting from LSI could not be as easily linked to any relevant legal circumstance.

Future research includes structuring a dataset based on the factors highlighted and performing classification experiments through deep and classical machine learning to predict the outcome. In this sense, we also aim to obtain explanation of the predictions through the extracted factors.