Introduction

The creation of a new law in Chile is a complex process. Most laws start with the submission of an initiative or bill to Congress, either in the Senate or the Chamber of Deputies (by their members) or by the President (as co-legislator). They are then revised by both chambers, and in most cases also by parliamentary committees, thus offering multiple opportunities for text modification, merge with other projects, revisions, and plenary debates, all of them allowing incorporation of the rainbow of ideological and thought perspectives represented in Congress. This complexity often leads to long processing time for each law. This phenomenon is so prevalent that the expression “laws sleeping in Congress” has become a commonplace metaphor. Undoubtedly, a more efficient legislative process would facilitate the formulation and enactment of policies that address the needs and demands of the population in a timely and effective manner, contributing to strengthen the legitimacy of and confidence in government institutions.

Hence, it becomes essential to help expediting the legislative process and reducing the average time to pass a law. To determine which characteristics can shorten bills processing, we explored data that Chile’s Library of Congress (BCN)Footnote 1 has gathered and preserved about the legislative process, and published in several public datasets (legal norms, parliamentary biographies, national budget, etc. [1,2,3]) as Linked Open Data (LOD) [4], following the FAIR principles [5] (Findable, Accessible, Interoperable, Reusable). The Chamber of Deputies and the Senate also host an open data portal with legislative process data,Footnote 2 recording voting on law proposals.

We analyzed roll-call votes with a novel bottom-up data-driven method, and were able to identify four quadrants, each one describing the behavior of a significant number of Congress members; we labeled them ideological stance, personal interests, thematic/local interest, and technical consensus. These quadrants arise from combining two measures, political alignment and polarization, and are related to the concepts of consensus and conflict, borrowed from political science.

We utilized this analysis to identify law projects with low polarization and high political alignment, which could undergo smoother processing, perhaps even being handled with a simplified processing path to reduce their processing time.

Earlier work on data analysis of roll-call votes and related topics [6,7,8,9,10] focus on sociological and political aspects rather than on process improvement. Although the concepts of political alignment and polarization have been widely studied as such in political science,Footnote 3 their combination with semantic web technologies, and particularly with open data, break new ground in transparency, analytical possibilities, and enhanced reproducibility of results.Footnote 4

The article continues as follows: “Semantic Web Data Definition” presents the technological layer of Semantic Web and open data; “Data Acquisition” describes the datasets and their acquisition mechanism; “Polarization and Political Alignment Data Analysis” introduces the concepts associated with polarization and political alignment, and their calculation methods; “Data Analysis” details the data analysis; “Discussion” discusses the analysis results; “Related Work” surveys and compares previous work; and “Conclusions” summarizes and concludes.

Semantic Web Data Definition

All the data model has been developed according to the Linked Open Data best practices [11], with the idea of enabling a natural interoperability mechanism that allows to share public information. In this context, the main semantic web pieces of the dataset are: the definition of OWL ontologies, the formalization of shapes (ShEx/SHACL) to describe the data model, the description of the datasetFootnote 5 by means of DCAT version 2 vocabulary [12], and the usage of URIs for identifiers in conjunction with a Linked Data frontend to access RDF resources.

Ontologies

An ontology is a formal specification of a representational vocabulary for a shared domain, encompassing classes, relations, and other objects. In short terms, it is an explicit specification of a conceptualization [13]. In the field of Semantic Web, ontologies enable the description of the semantic aspects of a data model through technologies such as RDF,Footnote 6 RDF Schema,Footnote 7 and OWL,Footnote 8 facilitating the specification for sharing and reusing data.

The dataset model is expressed by means of two ontologies (the Biographies ontologyFootnote 9 and the Legislative resources ontologyFootnote 10)Footnote 11 as well as by reusing concepts from other ontologies and vocabularies like Dublin Core, Time Ontology, SKOS, BIO and Wikidata.

The two base ontologies of the data model are composed by more than one hundred classes and a similar number of properties. In the case of the Biographies ontology, it describes political characters in terms of their public personal information (name, picture, web page, URI, birth/death date/place, wikidata URI, etc.), political party affiliation by period, public offices by period, or related persons, among others. The Legislatives Resources ontology describes a partial data model of legislative processes, which includes document types and structures, stages in the process of law discussion, voting procedures, semantic annotation of debates, and others. Both ontologies are written in Spanish (using CamelCase convention) but documented also in English by means of RDF language tag annotations.

RDF Shapes

In order to improve the usage specification of the dataset, an RDF Shapes model has been published. RDF Shapes are an emergent component of the Semantic Web Stack that provides a method to describe and validate RDF data, describing shapes or the topology of a node group in the context of a specific RDF graph, extending the expressivity of data specification, and filling a validation space not covered by ontologies and vocabularies. Shape Expressions (or simply ShEx) [14] and shapes constraint language (SHACL) [15] are the most widely accepted proposals to define and validate RDF graph’s topology, and although SHACL has become a W3C recommendation, ShEx is being used in many different scenarios [3, 16,17,18,19] due to its concise and human-readable syntax, and an increasing set of open source and community tools that are currently being developed.Footnote 12 The diagram in Fig. 1 shows the Shape Expressions modelFootnote 13 which was created with the ShEx author tool,Footnote 14 and is available to validate the described dataset. Consequently, the SHACL specification was developed starting from ShEx, using the shEx2Shacl conversor tool from RDFShape.Footnote 15

Fig. 1
figure 1

Shape expressions diagram model of voting dataset (translated from spanish)

URI Patterns

A Uniform Resource Identifier (URI)Footnote 16 is a string of characters that identifies a particular resource, typically on the internet. A “URI pattern” refers to a structured format or template used to define URIs for a particular category or group of resources.

In order to follow the linked data principles, Cool hierarchical URIsFootnote 17 were used, as they allow to partially reflect the conceptual scheme behind the data they represent. This facilitate debugging and quality control, which is important since the entire data model is used in other systems that are in production within BCN. All dataset URIs are defined under the following URL base:

http://datos.bcn.cl/recurso/

A complete description of the URI patterns implemented for this dataset is presented in Table 1.

Table 1 URI patterns of dataset

Linked Data Frontend

We have implemented a linked data front-end following the preceding URI patterns using WESO-DESH,Footnote 18 a Java linked data front-end (LDF) tool. This application offers RDF data as both human and machine readable ways, and was already employed in the first version of the BCN data portal [1]. Its main features are:

Regex patterns and URI prefix

{number}

([0–9])+

{pattern1}

([0–9]+\(\backslash\)-[0–9]{2})

{pattern2}

([a–z]+\(\backslash\)-?)+

bcn

http://datos.bcn.cl/recurso/

bcnp

http://datos.bcn.cl/recurso/cl/proyecto-de-ley/

  • A native HTML+RDFa output,

  • Content negotiation using HTTP 303 code,

  • Independence of SPARQL endpoint technology, which allows to show federate results as single resource,

  • URIs defined trough regular expressions, allowing to design complex URI patterns (hierarchical or REST queries),

  • It allows to execute multiple types of SPARQL queries (CONSTRUCT, ASK and DESCRIBE).

WESO-DESH enables to use dereferenceable URIs, and brings an optimized view of RDF data, in a simmilar way to applications such as Pubby.Footnote 19

Data Acquisition

In order to curate the political and legislative Knowledge Graph, the data has been obtained from multiple sources, and consequently has been particularly processed and transformed for each case. The main sources of data are the two Chilean National Congress chambers, which provide an open data portal with XML Web Services and data about legislative process, as well as their own web pages.

Other important source of data is the BCN Archive, and in particular, the Political History portal and repository,Footnote 20 where among several resource types, the parliamentary biographies are published and maintained. It should be mentioned that, although these three sources of data are common bodies of the National Congress, they do not implement de facto a common standard or web service schema, hindering a clear and consistent integration of data published by the Chamber of Deputies and the Senate separately. Indeed, in each of the chambers, the published Web services are described by different XML schema and details. For example, the lists of active senators and deputies have distinct and disjoint identifiers (even name descriptors or dates are described under different standards or formats).

This aspect also occurs with other kinds of resource types, such as party militancies and information about bills and voting, all of which are not integrated either (with the exception of bill number which is a functional code), and there are even restrictions on the limit of data allowed to harvest. This scenario has hampered data processing and curation. Nevertheless, due to an early strategic decision to use Semantic Web technologies in the BCN, this labour has been carried on in an incremental and progressive manner during years, keeping to date several processes that automate the data integration.

With regards to the mechanisms of data acquisition, it has been mixed, a part harvested from various XML Web services from the legislative congress open data page, as well as a web scrapping processing from the Congress chambers web pages. Once captured, the data has been curated, integrated and modelled in RDF using the Legislative Resources ontology (which includes bill voting), finally being published as Linked Open Data.

LOD publishing is a policy established at BCN since 2011, where the first legal norms ontology were published[1]. Thence, a variety of datasets and vocabularies have been published in RDF at the LOD portal through its public SPARQL endpoint,Footnote 21 among which are the bill voting and the biographies dataset, which are the data sources of this work.

To date, the dataset of bills is composed of more than 8 million of RDF triples, while the dataset of Congress members is close to 476.000 RDF triples.Footnote 22 Regarding the stability of dataset, a relevant part of the data contained are used in strategic products of the BCN, and all data are part of the institutional core business in terms of archival of poltical history of Chile. In these terms, the published KG is mature and relatively stable, although it has a sustained growth, and some minor errors need to be repaired once detected.

The following sections describe more details about the data and their sources.

Congress Members and Political Parties Dataset

This dataset is composed of information from all Congress members and political parties that have been part of the National Congress (which is bicameral), having information from the 1990 period onwards. The data, published as Linked Open Data in RDF, provides basic information about each person, their periods of membership to political parties and parliamentary positions held from the aforementioned date, which is open to the public under the principle of transparency.

The data was collected from an institutional wiki (based on MediaWiki) where biographical reviews of the main political actors in the history of the country are stored, archived and maintained. This institutional wiki, developed in 2010, contains RDFa marks that have been extracted and transformed into RDF triples in accordance with the URI convention described in Table 1. Although a large amount of data was normalized during this process, due to the fact that the Wiki did not have validation mechanism of the inputs, there have been minor errors related to formats and some inconsistencies in the information such as duplicate periods of militancy or dates in different formats (such as descriptive text or other formats other than ISO-8601) that have progressively been corrected manually.

Although the database contains 5.275 people related with the political history of the country, the total number of different Congress members who have participated in project voting during the period analyzed is 504.Footnote 23 This is because many Congress members have been reelected in the same chamber or have changed chambers between elections (usually from the Chamber of Deputies to the Senate), as well as the incompleteness of voting for the entire period. This low turnover in the previous 30 years changed in 2020, when re-election [20] was limited to 2 terms of 8 years (16 years in total) in the Senate and 3 terms of 4 years (12 years in total) in the Chamber of Deputies.

Bills Dataset

A bill is a document presented in the National Congress to propose a legal text, to be discussed by the Congress and to create a new law. The presentation of a bill in Chile can be carried out at the initiative of the executive branch (called Presidential Message), or by a Congress member (Parliamentary Motion). Generally speaking, each bill is entered into legislative proceedings, and joins a workflow where both chambers participate, in which the proposed legal text is evaluated in full (in general) and at the level of its basic normative units (in particular) by the Congress members.

During this evaluation, votes are carried out to reach a consensus on the views of the legislators and define the final version of the law. The process of processing the law involves great complexity according to its regulations, which will not be exposed in this article. However, the ontology of legislative resources brings an overview of the process in its main stages (Constitutional and Regulatory Procedures defined by bcnres:TramiteConstitucional and bcnres:TramiteReglamentario), as well as various aspects that are currently processed, recorded and published as open data, within which there are various types of entities, documents and link properties among others.

Figure 2 shows a graph with the distribution by type and year of the bills published in RDF on the open data portal, differentiating the initiatives of the Executive Power from those carried out by Congress members.

Within the data there are 21 bills prior to year 1990, which have been created in the database to digitize relevant historical norms or that remain in force, such as political constitutions and other norms created during the 1973–1989 dictatorship period (Other).

The graph shows data from 1978 onwards, although there is a bill that was created to build the history of the 1925 Constitution. These data have been obtained mainly from three different sources: (1) the BCN project processing database, (2) a database created in 1990 that was replaced in 2010 by the Web services that provide the open data portal of the Congress (with which there is currently an automatic update service), and (3) by manual creation [2] from the History of Law website.

Fig. 2
figure 2

Bills by type and year in the Chilean Congress

The query in Fig. 3 can be executed in the SPARQL endpoint to obtain the RDF representation of roll call voting data of bill 9404-12, which returns a set of datatype values and dereferenceable URIs.

Fig. 3
figure 3

SPARQL query to get votes of bill 9404-12

Polarization and Political Alignment Data Analysis

The key idea of the analysis is to characterize bills with two measures: political alignment and polarization. Alignment is an internal consistency variable (intra-group), and polarization is an external consistency variable (inter-group).

From the published open linked datasets, we used SPARQL queries to retrieve voting events and votes of each bill, as well as voting Congress members and their political parties. With these data, the coefficient of each vote were calculated as follows:

  1. 1.

    Political alignment coefficient: the degree of cohesion in the vote that Congress members have with respect to their party (intra-group - only in the context of voting).

  2. 2.

    Polarization coefficient: the degree to which the vote divides the group of voters into opposite poles (among groups).

Then, the average value of each index is calculated for each bill, to characterize it with a single value for each measure.

The law project-level values are depicted in a scatterplot, with political alignment vs polarization. The diagram shows four quadrants associated with the values of the indices (polarization \({>= 50\% high, >50\% low}\), alignment \({>= 70\% high, > 70\% low}\)). Each quadrant has been assigned a category, built inductively from the types of project voted associated with it. The four categories that appear correspond to a gradient between consensus and conflict in political theory:

  1. 1.

    Technical consensus: bills with low polarization and high alignment in voting; i.e. bills where technical consensus was established, and with no political antagonisms in voting.

  2. 2.

    Thematic/local interest: bills with low polarization and low alignment in voting; i.e. bills of thematic or local interest, so a parliamentarian represents these interests, and the antagonism is against the disinterest of other Congress members.

  3. 3.

    Personal interest: bills with high polarization and low alignment in voting; i.e. showing a divergence between a parliamentarian and their political party, suggesting prevalence of personal interests over party principles.

  4. 4.

    Ideological stance: bills with high polarization and high alignment in voting; i.e. showing a divergence in the political axis between left and right, so bills votes are ideologically sorted.

In the absence of additional indicators or other complementary techniques (such as the application of clustering or classification algorithms), we defined the cutoff points in each axis to define the quadrants follows: for polarization index, the midpoint is 0.5 since its values vary within the interval (0, 1); but for the alignment index, the midpoint is 0.7 because it minimal value is around 0.4.

The algorithms and formulas used to calculate the polarization and political alignment indexes are as follows.

Political Alignment

Political alignment will be defined as a characteristic that describes the degree of convergence or coincidence that occurs within a group of individuals with respect to a certain opinion (intragroup consistency). Other terms that for the purposes of this article are considered synonymous of political alignment (or just alignment) are cohesion and party discipline [21].

This measure can be used both at the group level (political party or coalition), personal (Congress member depending on the group), by bill, or by voting event. In particular, when Congress members vote on bills, political alignment describes the degree of similarity in the votes of a group of parliamentarians from the same political party.

Formally, group alignment is:

$$A_{g}= \frac{\sum _{i=1}^{n} \frac{A_{i}*N_{i}}{N} }{N} = \frac{\sum _{i=1}^{n} N_{i}^2}{N^2}$$
(1)

where:

  • \(A_{g}\): group alignment;

  • \(A_{i}\): alignment of the subgroup of individuals who voted for the option i;

  • \(N_{i}\): total number of individuals who voted for the option i;

  • N: total number of individuals in the group.

where \(A_i\) is defined as follows:

$$\begin{aligned} A_i=\frac{N_i}{N} \end{aligned}$$
(2)

where:

  • \(A_{i}\): the alignment within the group of those who voted for option i:

  • \(N_{i}\): the total number of individuals who voted for option i;

  • N: the total number of individuals in the group.

To illustrate, if within the same group, in a specific vote the total number of individuals vote against, the alignment of the group is 100%, since they all vote the same way. In another hypothetical scenario, if half of the individuals from the same group (for example the same party) vote in favor and the other half against, the group alignment is 50%, given that the group globally had an opinion divided, although internally there was alignment.

The social science literature mentions the Rice Index [22] (and variations [23]) to calculate the cohesion or degree of agreement within a voting event. However, this indicator allows only to have a single measure for a complete group under analysis (e.g. a political party), penalizing the entire group for the differences within it. The political alignment coefficient we use allows to associate an independent value to each person and vote, as well as for the entire bill, obtaining more representative values. This allows to characterized each Congress member with measures associated to their alignment and the value of their votes. This offers a wider application range than the Rice-Index, without performing complex calculations.

Analyzing these cases using the Rice-Index, the maximum alignment would be a 100%, but if the vote were divided exactly 50% within the group, the alignment value would be equal to 0%. The image 4 describes the behavior of Rice-Index, Cos-Rice-Index (variant) and Alignment measures seen as functions.

Fig. 4
figure 4

Political alignment measures behavior

Polarization

In the context of legislative votes, polarization will be defined as the lack of agreement on an issue, which leads to a universe of voters grouping into two politically opposed positions (difference among groups). The level of polarization is maximum when there are two groups with an equivalent number of voters facing each other, while it is minimum when the voting universe votes for the same option. Figure 5 shows polarization for several percentages of yes/no votes.

Fig. 5
figure 5

Polarization measure behaviour

To calculate polarization, only extreme values (i.e. “yes” and “no”) are considered; other types of votes are omitted or normalized to one of these two options. This is because the meaning of other voting is always relative to the political context, e.g. absence and abstention may have different grounds. In practice, the approval of the vote is achieved by obtaining a certain quorum, which translates into having enough votes in favor.

Thus, the formula to calculate the polarization index is:

$$\begin{aligned} C_f = \frac{N_f}{N_f+N_c} \wedge C_c = \frac{N_c}{N_f+N_c} \end{aligned}$$
(3)

where:

  • \(C_f\) corresponds to the polarization coefficient for votes in favor

  • \(C_c\) corresponds to the polarization coefficient for the votes against

  • \(N_f\) corresponds to the total votes in favour

  • \(N_c\) corresponds to the total votes against

$$\begin{aligned} P_g = 1 - \sigma _p * \sqrt{2} \end{aligned}$$
(4)

where:

  • \(P_{g}\) corresponds to the degree of polarization within the group in voting

  • \(\sigma _p\) corresponds to the standard deviation of the set \({C_f,C_c}\)

Data Analysis

We analyze 20.731 roll call voting events of the Chilean Congress, related to 3.249 bills, which represent a 23.4% increase in voting events compared to a preliminary study. Table 2 shows the descriptive statistics about the composition of data corpus.Footnote 24

Additionally, for the analysis, we used the Congress members and political parties dataset available in the data portal. We note that:

  • some voting events present a number of votes smaller than the total members of the chamber; this is an artifact of the incomplete register of older bill votes (before 1990);

  • voting related to the max number of votes are related mainly to budget law discussion, when a high number of voting events are realized;

  • the varying number of Congress members through the period also affects the register of votes; indeed, in 1990 there were 120 deputies and 38 senators, but in 2024 there are 155 deputies and 50 senators.Footnote 25

Table 2 Descriptive statistics of Roll call voting events by bill in RDF

We recall two key design decisions for the study:

  • Only votes Yes (+) and No (–) were analyzed; although there are other rarely used types, these were considered irrelevant in this study.

  • It is possible to carry out this analysis considering general and particular votes separately, however, to simplify the experiment, both are used interchangeably.

Figure 6 shows in aggregate manner how the polarity and political alignment values are distributed for each chamber, according to the analyzed data. A comparison of alignment and polarity distribution graphs of each chamber for the entire period, shows that senators have a voting much more aligned but less polarized than deputies.

Fig. 6
figure 6

Distribution of polarity in voting on bills for the Chilean Congress

Figure 7 shows two scatter diagrams where each point represents a bill positioned in one of the four defined quadrants (similar to a Cartesian plane), according to its average polarization and alignment value. In both cases, a regression line is added, showing a high negative correlation (Senate \(-\)0.55, Chamber of Deputies \(-\)0.32, both with p-value less than 0.05) between the two indices (alignment and polarization).

The quadrant with the highest number of bills is the one with low polarization and high alignment, i.e. the one previously defined as Technical Consensus.

Fig. 7
figure 7

Bills located in each defined quadrant by chamber

Figure 8 shows how Congress members are grouped in these bills, with force graphs calculated with a distance function among Congress members given their voting record; i.e. if they vote the same, the distance is 0, and if they vote differently, the distance is 1. This calculation is performed for each voting event of the bill and for all Congress members, obtaining the average distance values for all pairs in each bill. The red and blue colors identify the Congress members associated with parties of the right or left.

Some remarks:

  • In Quadrant I (Technical consensus, low polarization and high alignment), the force graphs are gathered in only one group per chamber, and there is no equivalent distance difference in votes among Congress members.

  • In Quadrant II (Sectorial interests, low polarization and high alignment), voting has a diffuse ordering, and in fact some Congress members have missing votes due to absences, which may explain their lower number; an example is the bill “Simplify municipal referendum.”Footnote 26

  • In Quadrant III (Personal interests, high polarization and low alignment), nodes are not grouped by color, but show proportionally polarized groups; an example is the bill “Prohibit and penalize driving while smoking.”Footnote 27

  • In Quadrant IV (Ideological stance, high polarization and high alignment), graphs are presented (one for each chamber) where nodes of similar color (same political tendency) are closely grouped and polarized with respect to the other group; examples are the bill to decriminalize abortionFootnote 28 and “Establish benefits for health workers.”Footnote 29

Fig. 8
figure 8

Various graphs of forces of bills belonging to each quadrant

For this analysis, some data that did not fit with the designed tools were excluded, including abstention-type voting, match (abstentions by opposing pairs), non-voters due to absence, and others. However, these data represent less than 2% of the total of votes, and we consider its impact on this study as minimal.

Discussion

Using our model, the alignment graph in Fig. 6 shows that the Chamber of Deputies has a less disciplined behavior in voting compared to the Senate, since the trend in the distribution of the latter chamber shows a much larger bias towards 1 (fully aligned). This could be explained by various variables, such as the average age of the Congress members (in the Senate, the average age is higher than in the Chamber of Deputies), political experience, etc.

Regarding polarization, the data distribution graph shows that although the behavior is similar in both chambers, the Senate has a slightly less polarized behavior than the Chamber of Deputies, since although in the analyzed group the Senate has less voting, shows a higher bias towards zero polarity than the Chamber of Deputies. These observations are supported by the alignment and polarization correlation analyses applied to both sets of votes, which show that the Senate tends to be more aligned and less polarized than the Chamber of Deputies, with a regression function exhibiting a stronger slope.

Regarding the analysis of bills in the context of the quadrants, the tool parsimoniously fulfills the function of characterizing each bill according to how it has been voted. Although a similar number of projects were randomly and manually analyzed (without the use of automatic text analysis) to identify a profile and conceptualize each of the four categories, it should be mentioned that in this aspect the analysis is qualitative based on inductive reasoning. However, it is considered valid to indicate that the tool can be useful for political actors, trying to predict the possible scenario that certain bills will face, that allows establishing intrinsic characteristics of projects that allow anticipating their legislative processing ex ante, with the idea of seeking strategies in advance to obtain the approval of quorums.

In the same vein, it can also be useful for the development of artificial intelligence systems associated with making political decisions, where it is necessary to incorporate weighting factors for decision-making based on historical data or associated with specific issues, or be applied to make optimizations to the legislative process, where those initiatives that will be approved more easily are identified to conduct their processing in a simplified way, and giving priority in discussion to those projects that generate greater polarization.

In any case, transparency in legislative votes affects the behavior of voters, allowing a greater citizen audit, and at the same time that the parties suffer fewer deviations compared to the case of not having public data [24].

Other analysis, such as identifying the specific parts of a norm that show greater differences based on their votes (in a project there may be few polarizing or aligned votes associated with specific articles), can be difficult in the current scenario, due to the absence of detailed descriptors in the data associated with each vote in open data format. Such information is available for download as PDF documents on both chambers’ websites, but obtaining, processing and publishing it is is future work. We consider that the potential for analysis provided by this tool and dataset is high, considering that it maintains a relatively constant growth. In addition, the sets that coexist and interrelate are varied (and expanding) and they belong to a reliable and persistent source over time.

The publication and analysis of linked open data on votes can be used for several kinds of deeper analyses of the collective or individual behavior of political actors (legislators, parliamentary groups, political parties). Additionally, it can enable assessment of congruence between the voting behavior of legislators and the expectations of their constituents, thereby facilitating ongoing monitoring of legislative representatives’ alignment with public sentiments. Finally, both measures can also be used as input for building predictors of legislative outcomes, which can be a useful tool for various political actors and parliamentary work.

Related Work

Other studies have also explored how legislators vote on bills.

Butler et al. [25] argue that when legislators in the U.S.A. vote on issues for which they do not have information, their decision is affected by the opinion of their voters; however, in other cases, the opinion may be influenced by interest groups, party leaders, and their own preferences. This description seems similar to the categorization described in this article.

Kau and Rubin [26] suggest that U.S. congressmen vote according to one of three motivational axes: self-interest, exchange of favors, and ideology. However, a vote eventually indicates a direction or preference but not a vote intensity.

Hug [21] offers an alternative perspective when analyzing votes if used data lacks context; where characteristics of the legislative work are erroneously inferred with selection biases because the roll call votes are retrieved but not transmitted. He distinguishes cases where all votes are registered, such as the U.S. Congress, and others where registration is on request, such as the European Parliament. Poole and Rosenthal [27] and Roberts [28] express similar views.

On the technology side, few studies have explored the use of Semantic Web technologies to analyze legislative voting.

Carrubba et al. [29] published a theoretical game model based on requests made by party leaders, who request enforcement of party discipline (for our case, the alignment measure), is presented. The model exhibits good predictive behavior, explaining how the concept of party cohesion affects legislative voting.

Loyola et al. [30] proposed an estimator of the ideological tendency of a bill is presented, developed based on bill data and documents available on the web associated with groups in favor and against legislative initiatives. These data were processed into vectors using NLP techniques such as Latent Dirichlet Allocation, Word Embeddings, and Term Frequency, generating proximity indices of the document vectors in favor and against associated with the bills, and finally combining them to determine proximity to either ideological group.

Sánchez-Nielsen and Chávez-Gutiérrez [31] described semantic annotations based on RDF ontologies are used on parliamentary debate videos associated with legislative initiatives, focusing on improving informative channels for citizen participation and collaboration in the legislative process, while also emphasizing transparency and making information understandable within it.

Mou et al. [32] show how predictions of voting behavior in the chamber are modeled based on historical votes and public statements made on the social network Twitter using hashtags, generating graphs that provide political context associated with legislative voting.

Hyvonen et al. [33] describe Semantic Web technologies experiences to publish legislative data in Finland; they present the knowledge graph created with speeches from plenary sessions of the Finnish parliament between 1907 and 2021, a linked open dataset, the data infrastructure, ontologies, and semantic portal for Finnish political culture, language, and networks of Parliament members.

Finally, Chalkidis et al. [34] explain a practical case of publishing legislative open data in Greece using semantic web technologies, employing OWL ontologies and RDF, and making available a SPARQL endpoint.

Conclusions

The bottom-up, data-driven analysis of Chile’s Congress voting data has allowed to establish a categorization of bills using two dimensions (alignment and political polarization) and four quadrants, also related to political science categories.

From an algorithmic perspective, this approach exhibits explainability since the derivation of its analytical categories (dimensions and quadrants) can be traced step-by-step, without biases or hidden layers of data processing. It allows a repeatable evaluation of a bill processing, considering factors always present in politics but usually implicit (i.e. alignment and polarization). These elements have usually had limited utility for improvement of the legislative process.

From a data perspective, the use of Semantic Web technologies to publish open legislative data provides high standards for public data in political science and research, which can improve the development and impact of studies that are integrative and multidisciplinary. Having high-quality data, persistent over time and from a reliable and available source, allow replicating or repeating experiments, a cornerstone of science and of accountability. This article aims to contribute along these lines, where a set of legislative voting data is available with everything necessary to be reused and combined. It can also be seen as an example to develop new data sets, being exposed to external quality checks of data.

The main motivation of this work has been to make the legislative process more efficient, perhaps allowing separate, swift processing for high-aligned and low-polarized bills. On the other hand, it allows to focus legislative efforts on bills with higher polarization, with a greater risk of rejection.

Finally, this type of practice provides citizens with more transparent and reliable public services. Since the legislative branch constantly has a poor image in the eyes of citizens [35],Footnote 30 the adoption of initiatives like this may contribute to improve the trust perception by society. Thus, something as simple and technical as publishing voting records as open datasets may help to increase trust and reduce corruption via increased accountability [36].