Corruption risk in contracting markets: a network science perspective

Wachs, Johannes; Fazekas, Mihály; Kertész, János

doi:10.1007/s41060-019-00204-1

Corruption risk in contracting markets: a network science perspective

Regular Paper
Open access
Published: 18 January 2020

Volume 12, pages 45–60, (2021)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Corruption risk in contracting markets: a network science perspective

Download PDF

8587 Accesses
34 Citations
19 Altmetric
Explore all metrics

Abstract

We use methods from network science to analyze corruption risk in a large administrative dataset of over 4 million public procurement contracts from European Union member states covering the years 2008–2016. By mapping procurement markets as bipartite networks of issuers and winners of contracts, we can visualize and describe the distribution of corruption risk. We study the structure of these networks in each member state, identify their cores, and find that highly centralized markets tend to have higher corruption risk. In all EU countries we analyze, corruption risk is significantly clustered. However, these risks are sometimes more prevalent in the core and sometimes in the periphery of the market, depending on the country. This suggests that the same level of corruption risk may have entirely different distributions. Our framework is both diagnostic and prescriptive: It roots out where corruption is likely to be prevalent in different markets and suggests that different anti-corruption policies are needed in different countries.

Networked Corruption Risks in European Defense Procurement

Collusion risk in corporate networks

Article Open access 07 February 2024

Market and network corruption: Theory and evidence

Article 11 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Ever since states have existed, they have sought to count and track their citizens in order to tax and control them [1]. Modern states record their own activities in great detail. Just as the digital traces of individuals contain valuable insights about everything from their health to their socioeconomic status [2], the administrative big data maintained by the state can be used to analyze its successes and failures [3]. One such failure that is surprisingly persistent even among economically advanced and democratic states is corruption, which we frame as the “deliberate restriction of open and fair access to public resources for the benefit of connected actors” [4]. Corruption is indeed a significant failure of governance, as it has been shown to slow growth [5] and innovation [6] and to subvert democracy [7], while compounding inequality [8].

Despite its importance, corruption is difficult to measure because individuals engaging in corruption obviously want to keep it a secret. Ground truth examples of convicted corrupt actors are hard to come by and anyway form a biased sample: large-scale corruption often goes unpunished because the state, including law enforcement and the judiciary, has been captured by corrupt interests [9]. For these and other reasons, corruption is traditionally measured via surveys which suffer from well-documented shortcomings [10, 11].

The significant amount of administrative data collected by governments presents an opportunity to study corruption risk from a new perspective using new tools of data science [12]. In this paper, we mine a big administrative dataset with over 4 million public procurement contracts in the EU for insights on the organization of corruption. Public procurement, the process by which governments purchase goods, services and construction works from the private sector, accounts for up to 20% of GDP [13] and is known to be vulnerable to corruption [14]. We proxy for corruption risk at the contract level with a binary indicator that is 1 if there was no competition for the contract. Such single-bidder contracts have been shown to predict significant risk of corruption [15, 16]. Aggregated to the country level, the rate of single bidding correlates significantly with commonly used survey-based measures of corruption and predicts overpricing on the auction level.

With micro-level data in hand, we apply the tools of network science to map and analyze the distribution and structure of corruption risks in different European countries. We represent markets as bipartite networks of the issuers (public institutions, sometimes referred to as buyers) and winners (firms, sometimes referred to as suppliers) of procurement contracts. These bipartite networks have many qualitative characteristics in common with other empirical networks [17] that distinguish them from random networks. For instance, they have heterogeneous degree distributions and significant local correlations.

These networks are more than just maps of markets. We argue that they provide an ideal framework for studying the organization of corruption. There are many reasons to consider corruption as a fundamentally networked phenomenon. The most significant disclosed examples of corruption, for instance the recent scandal involving the Brazilian state-owned oil company Petrobras [18, 19], involve hundreds of individuals, firms, and institutions. The Petrobras scheme entailed the exchange of billions of dollars in bribes and kickbacks for the award of public contracts. The size of the conspiracy, of which nearly 100 conspirators—including the former president of Brazil—have been convicted, suggests that it involved a sophisticated and collective effort. Likewise, network studies of organized crime [20] and terrorism [21] demonstrate that illegal activity tends to leave complex traces that networks are especially suited to describe and explain.

In recent years, network analyses of big relational datasets have delivered important insights into social phenomena. Networks can highlight important signals of incoming financial crises [22], predict how economies will develop [23], and suggest how social organization can facilitate cooperation [24]. As macro-level phenomena such as inequality tend to emerge from many micro-level interactions [25], network representations are especially useful to analyze macro-outcomes when micro-level data are available.

In this paper, we use a network approach to describe the distribution of corruption risk in different procurement markets from two perspectives: the extent to which it is centralized, that is, how prevalent it is between actors in the core of the market versus its periphery, and how clustered it is, describing the extent to which corruption risk is bunched in different parts of the network. We find that in some EU countries corruption risk is more common in the core of the market, while in others it is more common in its periphery. In every country in our database, corruption risk is clustered, though the magnitude of this effect varies significantly. Our results are compared to realistic null models, which take into account the overall magnitude of corruption risk as well the propensity of governments in different countries to purchase different goods and services. Our findings have significant implications for both the academic study of corruption and practical anti-corruption efforts. Observing that corruption seems to be organized in different ways suggests that different anti-corruption strategies may be effective in different contexts.

The paper is organized as follows. Section 2 surveys related work, including recent work on the measurement of corruption risk via public procurement contracts. Section 3 describes the data and corruption risk indicator. Section 4 introduces the network measures of the distribution of corruption risk in markets. Section 5 concludes and suggests ideas for future research.

2 Related work

Big data has been applied to a variety of social and economic problems in domains including health [26], scientific research [27], urban mobility [28, 29], and development [30]. The digitization of our social and professional lives has facilitated most of this work, providing researchers access to high-resolution trace data about human behavior and activities. Often the data analyzed are “found data”—data which are not originally collected for the purpose of the research. Data collected from public administrative databases, for example public procurement, fall into this category [31].

The study of corruption using big administrative data is a new and evolving area of research with its own challenges. It represents a major departure from previous empirical studies of corruption which leverage perception-based surveys for macro-studies and experiments at the micro-level. Below, we briefly survey these approaches to the research of corruption.

The most prominent and longest running survey-based measures of corruption are Transparency International’s Corruption Perceptions Index (CPI) [32] and the World Bank’s Worldwide Governance Indicators (WGI) [33], both available since the 1990s. Both measures, quantifying perceptions of corruption and quality of government at the national level, are composite indicators, mixing general population and expert surveys. The measures are highly correlated ($\rho > $ .9) and are designed to be consistent over time.

The significant complexity of weighing components and correlating observations year to year has lead some researchers to criticize the validity and broad application of these measures [10, 34]. More recent attempts to resolve some of these issues or to create more consistent measures include the Varieties of Democracy (V-DEM) indicator of political corruption [35] and the Quality of Government Institute’s (QoG) European Quality of Government Index (EQI) [36].

Experimental studies of corruption have the advantage of allowing researchers to directly measure causes or levels of corruption in specific contexts by precise specification. For example, to study the influence of culture on corruption Cameron et al. [37] observed individuals from different cultures playing games with both an economic incentive to cheat and a mechanism to punish cheaters. Interestingly, they found greater variation in the propensity to punish corruption than to engage in it across cultures. Weisel and Shalvi [38] observe pairs of individuals playing a dice-rolling game in which pairs can increase their payout by cheating as a team, and find that pairs tend to cheat more often than individuals playing a similar game alone.

These studies have the clear limitation that they happen in artificial environments. Some experiments are carried out in the field, though coming with significant higher costs and risks. Perhaps the most famous of these relating to corruption are Olken’s field experiments in Indonesia [11, 39]. Olken designed a series of interventions to test the effects of both audits and grassroots organization on corruption outcomes in 600 Indonesian villages during a national road construction project. Crucially, Olken had the resources to independently assess the expected cost of the roads in order to estimate actual observed corruption in each village. He found that pre-announced audits reduce corruption, but that community organizing induced no change. Such studies are a gold-standard in causal inference, but suffer from significant costs and limitations in scope.

Some observational studies study corruption using conviction data. Most of these studies focus on the USA, exploiting the federal structure of government and making the assumption that independent federal corruption prosecutions give an accurate measurement of corruption at the state level [40]. The correlation between state-level corruption perception and federal conviction data is low (roughly 0.3) [41], suggesting that these two measures of corruption are quantifying different things. Conviction data can be biased by selective enforcement or geographic factors—for example if more competent enforcement is clustered in certain regions. The value of conviction data as a proxy for corruption is even more limited when legal interventions happen at the same level as the corrupt behavior. For example, in a country with high levels of corruption, the courts themselves may be corrupt—a fact that would skew conviction rates.

Observing the shortcomings of other approaches, researchers have increasingly turned to big data methods to quantify and study corruption in the public sector. These studies tend to leverage found data collected and curated by governments for other purposes. They rely heavily on new trends for open government and e-government [42]. As citizens increasingly demand transparency and public-sector use of information and communication technologies proliferates, it is reasonable to expect that more data will become available in the future [43]. Data in the USA on lobbying [44] and campaign contributions [45, 46] have been used to measure the influence of money in politics. Internationally, large-scale firm ownership data reveal how firms avoid taxes [47]. Crowdsourced data on convicted public officials, for instance extracted from Wikipedia, have been used to study the emergence of systemic corruption [19]. Big data has also been leveraged to study topics related to corruption like nepotism [48].

As mentioned in Introduction, our study leverages big data on public procurement. The scale and scope of procurement and its status as a key interface between the public and private sectors have made it a popular topic of research. How do researchers quantify corruption using public procurement data? Fieldwork suggests that corrupt officials steer contracts to favored firms by restricting competition, insuring high profits [49]. In general, traces of the strategies used to restrict competition can be extracted from administrative data on the contracting process, enabling researchers to score individual contracts for corruption risk. The most general indicator tracks the competitive outcome directly: Whether the contract attracted a single bidder. Single-bid contracts represent a clear corruption risk.

Single-bid contracting rates have been used as an effective proxy for corruption in various contexts, including studies on the relationship between corruption and political incumbency [15], the importance of meritocracy in bureaucratic outcomes [16], the impact of social networks on local corruption [50, 51], and the effect of campaign contributions on corruption [52]. Procurement-based corruption risk indicators have the advantage that they apply to micro-level transactions, enabling the study of corruption at multiple scales. Several previous studies study the distribution of corruption risk in procurement markets represented as networks. One such study finds that repeated interactions are significantly related to corruption risk [53], while another demonstrates how procurement markets change when corruption risk becomes endemic [14].

Despite the significant interest in corruption as a problem and the proliferation of public procurement-based big data indicators, we are not aware of research that explores the distributional and structural properties of corruption in different countries based on the patterns of interactions between public and private actors. This gap in the literature is surprising given that social scientists have been categorizing countries by the organization of corruption in their public sectors for decades [54, 55]. We fill this gap by leveraging the tools of network science, providing an analytic framework to both study the distribution of corruption risks in countries and to provide tailor suggestions on how to combat it.

3 Data and framework

3.1 Procurement data

Our analytic framework measures corruption risk at a transaction level using data from public procurement contracts. We collected data on all public procurement contracts published in Tenders Electronic Daily^{Footnote 1} (TED), the official journal of public procurement contracts of the European Union, from 2008 to 2016. Both calls for tenders and the announcement of their award are published in TED. EU law requires that any public procurement contract exceeding an estimated value of 5.2 million Euro for works and 135 thousand Euro for services and supplies must be published on TED. TED estimates that over 400 billion Euro of total contract value is published in its pages every year, accounting for nearly 3% of EU GDP. Though the high thresholds exclude a significant number of contracts from our data, using only data from TED maximizes the international comparability of our analyses.

Our final dataset consists of 4,098,771 contracts awarded in 2008–2016. We consider 26 member states of the EU, excluding Luxembourg because of the relatively small number of contracts awarded there and Croatia because it only joined the EU in 2013. We also exclude contracts awarded by EU institutions (the Commission, Parliament, Council) as we aim to compare countries. The data are available for download at https://zenodo.org/record/3537986#.XcmMPkX0mgk.

We processed the dataset to deduplicate the identities of each contract’s issuing buyer (i.e., public institution such as a ministry or city hall) and supplying winner (i.e., a private firm). Given that our analytic approach requires an accurate map of the interactions between issuers and winners, we developed a pipeline to maximize the accuracy of our deduplication, following the approach of Christen [56]. For each country, we preprocessed the text data for each entity, used machine learning to select both optimal string similarity measures and blocking methods, and selected a clustering threshold maximizing accuracy on a manually labeled subsample using the Dedupe computer software [57]. To give an example, the number of unique suppliers of contracts in France decreased from 364,125 to 200,584. For a more detailed summary of the procedure, see [58].

Besides the issuing buyer and winning supplier of each contract, we were able to extract several additional pieces of information, including the year the contract was awarded, the Common Procurement Vocabulary (CPV) code—a EU-wide taxonomy of procurement contracting goods and services [59], and the number of bidders competing for the contract. We use the latter information to score each contract for corruption risk.

3.2 Corruption risk

We quantify the corruption risk of procurement contracts using a binary indicator tracking whether the contract attracted only a single bid in its competition. The use of single bidding as a signal for corruption risk or “undetected fraud” has recently been suggested by the European Court of Auditors [60]. Nevertheless, it is certainly not the case that any single instance of single bidding can be used as evidence for corrupt behavior. For instance, it may be the case that the government is purchasing a niche good or service with few suppliers on the market or under emergency circumstances—certainly it is more likely for such contracts to be awarded to a single bidder.

Two aspects of our data mitigate these limitations to the validity of single bidding as a signal of corruption risk. The first is that the high threshold of contract values insures both visibility and interest from the private sector. Secondly, the null models we create to benchmark our measures consider the CPV code of the contract, distinguishing between contracts for different categories of goods and services, such as medicine and furniture. In the case when data on the number of bidders are missing from a contract, we impute the country-level average single-bidding rate. This imputation strategy likely leads to an underestimate of corruption risk.

We plot the single-bidding rates with of each country over the 2008–2016 period in Fig. 1. We see significant variation, with single-bidding rates below 10% in some countries and over 30% in others. How do these measures of corruption risk correlate with the survey-based perception indicators mentioned in the previous section? In Fig. 2, we correlate the average single-bidding rates from 2008 to 2016 with Transparency International’s Corruption Perception Index (TI CPI) [32], the World Bank’s Control of Corruption Index (WB CoC) [33], Varieties of Democracy’s Corruption Index (V-Dem Corruption Index) [35], and the Quality of Government (QoG) Institute’s European Quality of Governance Index (European Qual. Gov. Index) [36] each measured in 2013 (this was the most recent year for which all five indicators were available). In each case, we find a significant correlation between country-level single-bidding rates and worse corruption or quality of government outcomes (Pearson correlation between .66 and .72, Spearman correlation between .74 and .76 in magnitude, all correlations significant at $p< 0.05$). We also find a significant correlation ( around − 0.7) between single-bidding rates and the recently developed Index of Public Integrity [61], which quantifies the extent to which the institutional framework of a country fosters integrity and blocks corruption.

Our data offer several advantages over the survey-based indicators. Surveys encode subjective perceptions about corruption that may be biased. For instance, past research by Olken found that perceived corruption exceeds actual corruption when ethnic diversity is high [11]. This is a significant obstacle when we would like to compare the rate of corruption across countries. Moreover, the way such indicators are calculated is often modified from year to year, again making comparison difficult. From our perspective, however, the most significant advantage of the single-bidder measure of corruption risk is that it is observed at a granular level; surveys are expensive and, with few exceptions [36], are not carried at the sub-national, e.g., regional level. We exploit that our indicator is available at the transaction level in the following section by introducing our network science-based analytic framework.

4 Network analysis

Public procurement markets can be described as bipartite networks because of the clear division of their participants into issuers and winners. Bipartite networks have been applied to the study of systems including two kinds of actors including flowers and their pollinators [62], cities and industries present in them [63], and buyers and sellers in markets [64]. We apply the same paradigm to our data on procurement markets: For each country and year in our dataset, we create bipartite networks in which issuers and winners are connected by a weighted edge, with weight counting the number of contracts between the two entities.

The standard analysis of a complex network is based on null models, which are randomized versions of the empirical system. The statistical observations made on the empirical network are compared to those obtained on the randomized versions. Significant differences can be interpreted as results of correlations, which are eliminated by the randomization process.

In Table 1, we report summary of the raw statistics of each country’s procurement market network, averaged over 2008–2016. In all cases, the average weighted degrees of nodes (counting the number of contracts the node is involved in) are smaller than their standard deviations, indicating heterogeneous degree distributions.

Table 1 Summary statistics of procurement market networks, averaged over 2008–2016

Full size table

We inspect the degree heterogeneity of each country graphically in Fig. 3. Plotted on a log–log scale, the degree distributions of both issuers and winners are heterogeneous in all countries: Some rare issuers and winners are involved in hundreds, even thousands of contracts across the time span of our data, while the majority are involved in only a few contracts. This heterogeneity cannot be considered as a signature of any corruption; it is more related, e.g., to the broad size distribution of firms and institutions [66].

We also report the Robins–Alexander clustering of each network in Table 1. Robins–Alexander clustering is defined as the number of cycles of length four in the network divided by the number of paths of length three [65]. Given that a firm wins contracts from two issuers, and that another firm wins a contract from one of these two issuers, Robins–Alexander clustering can be interpreted as the probability that this second firm also wins a contract from the other issuer. This measures the tendency for local clustering in the market analogous to the clustering coefficient in monopartite networks. The expected Robins–Alexander clustering of random bipartite networks tends to their density as they get large [65]. As seen in Table 1, the observed clustering is typically an order of magnitude greater than the density of the observed networks. This indicates the presence of significant local correlations in the markets.

These descriptive statistics indicate that our networks have rich structure which deviates significantly from random behavior. In the next subsections, we introduce two measures of the structure of the market: its centralization and the extent too which it is clustered. We then exploit these measures to describe the distribution of corruption risk in each market.

4.1 Core-periphery analysis

A common task of network analysis is to highlight the most central and active nodes. Some empirical networks have a group of densely connected actors at their centers, sometimes called cores [68]. Besides highlighting important actors, methods to detect network cores can be used to compare the degree of centralization in a network. In this subsection, we adapt a method known as k-shell decomposition to weighted bipartite networks in order to rank issuers and winners by their centrality in the network. We say that the most central actors form the core of the market. We measure and compare the relative sizes of the cores of each procurement market, interpreting a larger core as a signal of procurement market centralization. We find a strong correlation between market centralization and corruption risk.

In generic unweighted graphs, the concept of a coreness is defined iteratively [69]. To start, all nodes of degree 1 are assigned a core number of 1 and then removed from the graph. In the next step, all remaining nodes with degree 1 in the trimmed graph are also assigned core number 1 and then removed. This process repeats until all nodes have at least degree 2. Nodes with degree 2 are assigned core number 2, and they are the first nodes to be removed in the next iteration of removals. This process yields a hierarchical decomposition of the nodes in the network by their core number. The k-core of the graph refers to the subgraph consisting of nodes with core number at least k [70].

Our networks have two features that distinguish them from the graphs considered in this example. The first is that the edges include weights, encoding the volume of contracts between the focal issuer and winner. The second aspect is that the different node sets have different degree distributions. We tweak the concept of the core number and k-core in order to apply it to our networks.

Our method modifies the iterative procedure described above to consider the weighted degrees, defined as the count of contracts they are involved in as either issuers or winners, respectively, of nodes. Otherwise the method is identical. As the edge weights in our networks are integers (describing contracting volume), no further modification is necessary. This is a simplified version of the weighted k-shell decomposition of Garas et al. [71].

This procedure assigns each node in our network a weighted core number, measuring its centrality in the network. As we would like to partition the network into a core and periphery, we need to determine a cutoff for a node to be considered a member of the core. This cutoff should be a function of the size of the network as it would not be appropriate to use the same cutoff for networks of vastly different sizes. We also address the concern that issuers and winners have different degree distributions by setting two different cutoffs for core membership: one for issuers and one for winners. In the end, we choose to consider issuers as core issuers if their weighted core number exceeds the average degree of issuers. As winners tend to have lower average contract win counts, we only consider those winning twice the average degree as core winners.

We visualize the Hungarian core in Fig. 4, marking edges with above-market average single-bidding rates in red. The visualization highlights the importance of considering the volume of contracting between issuers and winners in our measure. Issuers and winners can have only a single neighbor and still be considered for core membership because they may have a high volume of contracts with that neighbor.

We observe that the core has a consistent size over time and remains densely connected in all years. We confirm that in general the weighted core subgraph has a much higher density in a summary statistics table included in Appendix. In general, between 20 and 60% of all contracts awarded in a given country are between core issuers and winners.

We now pose two questions about the relationship between network centralization and corruption risk measured by single bidding. First: What is the relationship between the degree of centralization of a market and its corruption risk. In Fig. 5, we plot the relationship between centralization, quantified by the share of all contracts between core issuers and winners, against corruption, measured by the overall single-bidding rate and the survey-based EQI measure. We find a significant positive relationship between procurement market centralization and corruption. In the political economy literature, there is an ongoing debate about the relationship between the centralization of government power and corruption. Our finding supports the side of the debate claiming that centralization facilitates corruption [72].

We are also interested in the distributional nature of corruption risk. Does centralization induce corruption by fostering corruption among core issuers and winners? We probe this question by comparing the rate of single bidding in the core against a null model that randomizes the distribution of single bidding. Specifically, we preserve network structure and the label of nodes as core or periphery, but randomly shuffle the single-bidding labels. We restrict the randomization by permuting single-bidder labels only across contracts with the same two-digit CPV code. This takes into account market-specific effects. For each market, we divide the observed rate of single bidding in its core $SB_{\mathrm{core}}$ by the average rate of single bidding in its core over 1000 CPV-preserving randomizations $\mu (SB_{\mathrm{core}}^{rand})$. We plot the results averaged over all years for each country in Fig. 6.

Unlike in the previous figure which indicated a strong relationship between market centralization and corruption risk, no clear pattern emerges. In some countries, corruption risk in the form of single bidding is overrepresented in the core. In other countries, it is underrepresented. For example, Czech Republic and Hungary have similar overall corruption risk scores, but in the former single bidding is more common between core issuers and winners, while in the latter it is more common between actors on the periphery. In the second panel of Fig. 6, we show that the size of the core does not have any straightforward relationship with the overrepresentation of corruption risk in the core.

We draw several conclusions from this analysis. The first is that, as mentioned, there is a strong correlation between centralization in procurement markets and their overall procurement risk. What is less clear is by what mechanism this effect might emerge. Our latter findings show that centralization is not related to higher corruption risk in either the core or periphery in general. The observed heterogeneity in the tendency of corruption risk to accumulate in the core or periphery of different countries has important policy implications because the strategies used by corrupt actors to extract rents are likely very different.

4.2 Edge clustering

Another dimension of potential heterogeneity in the distribution of corruption risk is its clustering. Previous research has found that corruption is significantly correlated in the network, with high corruption risk edges typically bunched together [14]. To demonstrate this idea, we plot the Hungarian procurement market in 2014 in Fig. 7. Coloring edges with above-market average rates of single bidding, a clear pattern emerges: The top left cluster of nodes has significantly higher rates of single bidding than other parts of the network. We propose to quantify this tendency, as well as the overall tendency of a procurement market to exhibit topological clustering using community detection. We first group the edges of the network into communities and measure the quality of this partition. We then calculate the variation of single-bidding rates across communities as a measure.

Community detection is a distinguished area of research interest in network science [73]. There are many methods to partition the nodes of networks into communities. As our object of interest, corruption risk, is an attribute of network edges, we should rather partition the edges of the network into communities. There are several well-studied methods used to cluster the edges of networks into so-called link communities. We adopt one approach based on line graphs [74, 75]. The line graph of a network is the network obtained if the edges if the original graph are considered as nodes and then connected if they share a node in the original graph. We transform each of our networks into the corresponding line graph, obtaining a new network for each one in which the nodes correspond to the original network’s edges or contract relationships. We can then apply any standard clustering algorithm on the new network to obtain an assignment of each issuer–winner edge to a community. We apply the Louvain method, a computationally fast and accurate method commonly used to detect communities in networks [76].

We can also use the partition of the line graphs into communities to describe the extent to which the networks are topologically clustered. We measure the tendency of edges to be within rather than between communities detected by the Louvain method using modularity, a quality function of network partitions. Modularity varies between − 1 and 1, with negative scores indicating that edges are rather present between “communities” of the given partition than within them, scores around 0 indicating that there is no difference in the frequency of inter- vs intra-community edges, and higher values indicating that edges are much more likely to be between nodes of the same community rather than across communities. In other words, higher modularity scores indicate that the network has more distinct and separated groups of nodes.

There is significant topological clustering in all countries and years, with modularity ranging from .55 to .80. For reference, the 2014 Hungarian market plotted in Fig. 7 has a modularity of .71. Given the partition of the edges of each network into communities, we can now quantify the extent to which single bidding varies across the different communities of a network.

The partition of edges in a network, denoting contracting relationships between issuers and winners naturally gives us a partition of all contracts awarded in a given market. We then calculate the coefficient of variation of single bidding across the clusters, defined as the standard deviation of the single-bidding rates across the contract clusters over the average. As clusters can have significantly different numbers, we actually calculate the weighted standard deviation $\sigma ^{W}_{SB}$ and mean $\mu ^{W}_{SB}$ of single bidding across clusters, defined as follows:

$$\begin{aligned} \sigma ^{W}_{SB} = \sqrt{ \frac{ \sum _{c\in C} |c| \left( sb_{c} - \mu ^{W}_{SB}\right) ^2 }{ \frac{(|C|-1)}{|C|} \sum _{c\in C} |c| } }, \end{aligned}$$

and

$$\begin{aligned} \mu ^{W}_{SB} = \frac{\sum _{c \in C} |c| sb_{c}}{\sum _{c \ in C} |c|}, \end{aligned}$$

where C denotes the set of contract clusters, c is a specific cluster, and $sb_{c}$ is the rate of single bidding in the cluster c. The weighted coefficient of variation, which in our context we refer to as the clustering of single bidding, is simply the ratio $\sigma ^{W}_{SB}/\mu ^{W}_{SB}$.

As with our measure of centralization, we compare our observed clustering of single-bidding measure against a suitable null model. We again opt to randomize the single-bidder label on contracts within the 2-digit CPV classes to create a realistic yet randomized distribution to compare against the empirical data. Thousand times we shuffle the single-bidder label on all contracts and recalculate the clustering of single bidding for these randomized markets. We divide the observed clustering of single bidding by the average of the 1000 randomized clustering of single-bidding scores. We plot the result by country, averaged over 2008–2016, in Fig. 8. In every case, single bidding is non-trivially clustered within communities in the network. The magnitude of observed clustering ranges from 2 to over 8 times higher than expected in the sector-preserving randomization. We interpret this as significant evidence that corruption is not randomly distributed in the public sector.

The extent to which corruption risk clusters in the market has important policy implications. The greater the clustering of single bidding in a market, the more likely it is that investigations of the network neighbors of known corrupt actors will be successful. For example, returning to our visualization of the Hungarian market in 2014 (see Fig. 7), we do not only suggest that corruption risk is significantly higher in the northwestern community, but also that investigators of corruption in Hungary should consider this pattern of clustering in the future to shape their strategy.

We now compare our two measures of the distribution of corruption risk in markets. For simplicity, we consider only those countries with above average single-bidding rates in our data. We plot the centralization of corruption risk against its tendency to cluster for these countries in Fig. 9. We observe that although corruption risk is high in all countries, corruption risk within each country is distributed differently. Countries in the bottom right corner of the plot, such as Portugal and Italy, have higher corruption risk in the core of their markets and a weaker tendency for risk to cluster. Countries in the top and center of the plot such as Poland and Latvia have a neutral distribution of risk across the core and periphery of their markets, but overall risk has a strong tendency to cluster. Finally, countries such as Hungary, Slovakia, and Estonia have higher corruption risk in their peripheries and a moderate tendency for risk to cluster.

This emphasis on the distribution of corruption risk presents a novel way to think about corruption in a comparative manner. Though it may not be accurate to say that each country with high corruption risk is corrupt in its own way, these deviations suggest that corruption may be organized in different ways, likely reflecting political and economic constraints. The differences in topological structure (i.e., the degree of centralization, measured by the share of contracts in the core of the market, and the clustering of edges into communities measured by modularity) of the markets also indicate that interesting structures emerge in the mesoscopic level of these data: between the micro-level of individual transactions and the global picture of the entire market.

5 Discussion and future work

In this paper, we applied network science methods to mine a big administrative dataset on public procurement contracts for insights on the distribution of corruption risk. We found that countries may have similar levels of corruption risk but significantly different distributions of that risk. In some countries, corruption risk is more prevalent in the center of the network, while in others among peripheral actors. We also found that the degree of centralization in procurement markets overall is a strong predictor of their global corruption risk and low quality of government. Another dimension of heterogeneity is the degree of clustering of corruption risk: Though corruption risk is significantly clustered in all countries in our data, the extent of the clustering varies between 2 to 8 times what is expected under a sector-preserving randomization.

These heterogeneities are not merely curious artifacts in the data. They have significant implications for anti-corruption policy. For example, when corruption risk is centralized, it is possible that the central government itself is a hotbed of corruption and should not be trusted to address the problem. If corruption risk is highly clustered, successful investigations of corruption should snowball by following up with “nearby” actors. Overall, our findings serve as a complement rather than a substitute to traditional research on the study of corruption risks. They represent an opportunity to examine corruption risk from a new perspective that effectively leverages a new emerging data source. We also note that our measure of corruption risk is only a proxy—there are other potential reasons that competition is lacking in a market. This is another reason why our approach should not replace but rather enhance previous work.

We suggest several directions to extend our findings. For instance, considering that corruption is known to vary considerably at the regional level [36, 51], geographic information about the locations of issuers and winners could enhance our analysis [53, 77]. Before we can adjust the granularity of our study, more work is needed to define markets is a rigorous way. For example, markets for different goods and services likely have different characteristic geographic sizes: IT consulting services markets are likely more geographically diverse than markets for road repair, where transportation and fuel costs are a significant part of the budget. Defining market boundaries from the data can be misleading.

More work is needed to understand the evolution of corruption within countries. Our approach aggregates both corruption risk and network structure over time—certainly it is the case that corruption can change over time. Past work on the response of corrupt networks to political turnover suggests that procurement markets are significantly rewired following changes of government [78]. Methods to analyze temporal networks are becoming increasingly sophisticated [79, 80]. Comparative analyses and case studies using these data and these new methods promise to enhance our understanding of the different ways corruption works, and potentially how it can be limited.

Policymakers are of course not only interested in measuring and understanding corruption, but mitigating it. Future work should describe and test possible interventions that leverage network maps of corruption risk. Given the significant clustering of corruption risk in all the countries we studied, it would be of interest to move authorities responsible for anti-corruption across geographic and market boundaries, observing whether certain individuals are more effective anti-corruption actors. The major challenge in this area is not that we lack ideas for experimentation, but that obtaining buy-in from officials is difficult, especially in countries where corruption is a significant problem.

More broadly, our work demonstrates the potential of the application of a data science perspective on newly available administrative data. We agree with Rademacher’s claim that open data are fundamental for open societies [12]. As access to data becomes more abundant, the value of statistically sound, ethically responsible data science as an input to policymaking will only increase. The value of high quality public administrative data can also be increased by integrating it into broader research frameworks [81], enabling scientists to solve real-world problems by combining data in novel ways. For example, previous attempts to model the relationship between corruption and economic growth [82, 83] and complexity [84, 85] could be revisited using our fine-grained indicators of corruption risk.

Notes

https://ted.europa.eu.

References

Scott, J.C.: Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press, New Haven (1998)
Google Scholar
Pappalardo, L., Vanhoof, M., Gabrielli, L., Smoreda, Z., Pedreschi, D., Giannotti, F.: An analytical framework to nowcast well-being using mobile phone data. Int. J. Data Sci. Anal. 2(1–2), 75 (2016)
Google Scholar
Kim, G.H., Trimi, S., Chung, J.H.: Big-data applications in the government sector. Commun. ACM 57(3), 78 (2014)
Google Scholar
Mungiu-Pippidi, A.: The Quest for Good Governance: How Societies Develop Control of Corruption. Cambridge University Press, Cambridge (2015)
Google Scholar
Mauro, P.: Corruption and growth. Q. J. Econ. 110(3), 681 (1995)
Google Scholar
Rodríguez-Pose, A., Di Cataldo, M.: Quality of government and innovative performance in the regions of europe. J. Econ. Geogr. 15(4), 673 (2014)
Google Scholar
Stockemer, D., LaMontagne, B., Scruggs, L.: Bribes and ballots: the impact of corruption on voter turnout in democracies. Int. Polit. Sci. Rev. 34(1), 74 (2013)
Google Scholar
Gupta, S., Davoodi, H., Alonso-Terme, R.: Does corruption affect income inequality and poverty? Econ. Gov. 3(1), 23 (2002)
Google Scholar
Mungiu, A.: Corruption: diagnosis and treatment. J. Democr. 17(3), 86 (2006)
Google Scholar
Hawken, A., Munck, G.L.: Do you know your data? Measurement validity in corruption research. Technical report, Working paper - School of Public Policy, Pepperdine University (2009)
Olken, B.A.: Corruption perceptions vs. corruption reality. J. Public Econ. 93(7–8), 950 (2009)
Google Scholar
Radermacher, W.J.: Official statistics in the era of big data opportunities and threats. Int. J. Data Sci. Anal. 6(3), 225 (2018)
MathSciNet Google Scholar
OECD.Stat.: Government at a glance—2017 edition: public procurement. https://stats.oecd.org/Index.aspx?QueryId=78413. Accessed 08 Sept 2018 (2017)
Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320 (2016)
Google Scholar
Klašnja, M.: Corruption and the incumbency disadvantage: theory and evidence. J. Polit. 77(4), 928 (2015)
Google Scholar
Charron, N., Dahlström, C., Fazekas, M., Lapuente, V.: Careers, connections, and corruption risks: investigating the impact of bureaucratic meritocracy on public procurement processes. J. Polit. 79(1), 89 (2017)
Google Scholar
Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167 (2003)
MathSciNet MATH Google Scholar
Watts, J.: Operation car wash: is this the biggest corruption scandal in history. The Guardian 1(06), 2017 (2017)
Google Scholar
Ribeiro, H.V., Alves, L.G., Martins, A.F., Lenzi, E.K., Perc, M.: The dynamical structure of political corruption networks. J. Complex Netw. 6, 989–1003 (2018)
MathSciNet Google Scholar
Calderoni, F.: In: Third Annual Illicit Networks Workshop. (Équipe de recherche sur la délinquance en réseau, 2011), pp. 1–21
Krebs, V.E.: Mapping networks of terrorist cells. Connections 24(3), 43 (2002)
Google Scholar
Saracco, F., Di Clemente, R., Gabrielli, A., Squartini, T.: Detecting early signs of the 2007–2008 crisis in the world trade. Sci. Rep. 6, 30286 (2016)
Google Scholar
Hidalgo, C.A., Klinger, B., Barabási, A.L., Hausmann, R.: The product space conditions the development of nations. Science 317(5837), 482 (2007)
Google Scholar
Mamei, M., Pancotto, F., De Nadai, M., Lepri, B., Vescovi, M., Zambonelli, F., Pentland, A.: Is social capital associated with synchronization in human communication? An analysis of italian call records and measures of civic engagement. EPJ Data Sci. 7(1), 25 (2018)
Google Scholar
Stadtfeld, C.: The Micro–Macro Link in Social Networks. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, pp. 1–15 (2015)
Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. JAMA 309(13), 1351 (2013)
Google Scholar
Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.L.: Quantifying the evolution of individual scientific impact. Science 354(aaf6312), 5239 (2016)
Google Scholar
Pappalardo, L., Pedreschi, D., Smoreda, Z., Giannotti, F.: In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp. 871–878 (2015)
Szell, M.: Crowdsourced quantification and visualization of urban mobility space inequality. Urb. Plan. 3(1), 1 (2018)
Google Scholar
Hilbert, M.: Big data for development: a review of promises and challenges. Dev. Policy Rev. 34(1), 135 (2016)
Google Scholar
Connelly, R., Playford, C.J., Gayle, V., Dibben, C.: The role of administrative data in the big data revolution in social science research. Soc. Sci. Res. 59, 1 (2016)
Google Scholar
Transparency International, Transparency international corruption perceptions index. Technical report. Data retrieved from https://www.transparency.org/research/cpi/overview
The World Bank. World bank worldwide governance indicators. Data retrieved from https://info.worldbank.org/governance/wgi/index.aspx#home
Heywood, P.M., Rose, J.: “close but no cigar”: the measurement of corruption. J. Public Policy 34(3), 507 (2014)
Google Scholar
Coppedge, M., Gerring, J., Lindberg, S.I., Skaaning, S.E., Teorell, J., Altman, D., Andersson, F., Bernhard, M., Fish, M.S., Glynn, A. et al.: V-dem codebook v8 . Data retrieved from https://www.v-dem.net/en/reference/version-8-apr-2018/ (2017). Accessed 1 May 2019
Charron, N., Dijkstra, L., Lapuente, V.: Regional governance matters: quality of government within European Union member states. Reg. Stud. 48(1), 68 (2014)
Google Scholar
Cameron, L., et al.: Propensities to engage in and punish corrupt behavior: experimental evidence from Australia, India, Indonesia and Singapore. J. Public Econ. 93(7–8), 843 (2009). https://doi.org/10.1016/j.jpubeco.2009.03.004
Article Google Scholar
Weisel, O., Shalvi, S.: The collaborative roots of corruption. Proc. Natl. Acad. Sci. 112(34), 10651 (2015). https://doi.org/10.1073/pnas.1423035112
Article Google Scholar
Olken, B.A.: Monitoring corruption: evidence from a field experiment in indonesia. J. Polit. Econ. 115(2), 200 (2007)
Google Scholar
Glaeser, E.L., Saks, R.E.: Corruption in America. J. Public Econ. 90(6–7), 1053 (2006)
Google Scholar
Goel, R.K., Nelson, M.A.: Measures of corruption and determinants of us corruption. Econ. Gov. 12(2), 155 (2011)
Google Scholar
Kornberger, M., Meyer, R.E., Brandtner, C., Höllerer, M.A.: When bureaucracy meets the crowd: studying “open government” in the Vienna City Administration. Organ. Stud. 38(2), 179 (2017)
Google Scholar
Bertot, J.C., Jaeger, P.T., Grimes, J.M.: Using icts to create a culture of transparency: E-government and social media as openness and anti-corruption tools for societies. Gov. Inf. Q. 27(3), 264 (2010)
Google Scholar
Borisov, A., Goldman, E., Gupta, N.: The corporate value of (corrupt) lobbying. Rev. Financ. Stud. 29(4), 1039 (2015)
Google Scholar
Bonica, A.: Mapping the ideological marketplace. Am. J. Polit. Sci. 58(2), 367 (2014)
Google Scholar
Traag, V.A.: Complex contagion of campaign donations. PLoS ONE 11(4), e0153539 (2016)
Google Scholar
Garcia-Bernardo, J., Fichtner, J., Takes, F.W., Heemskerk, E.M.: Uncovering offshore financial centers: conduits and sinks in the global corporate ownership network. Sci. Rep. 7(1), 6246 (2017)
Google Scholar
Prosperi, M., Buchan, I., Fanti, I., Meloni, S., Palladino, P., Torvik, V.I.: Kin of coauthorship in five decades of health science literature. Proc. Natl. Acad. Sci. 113(32), 8957 (2016)
Google Scholar
Fazekas, M., Tóth, I.J., King, L.P.: An objective corruption risk index using public procurement data. Eur. J. Crim. Policy Res. 22(3), 369 (2016)
Google Scholar
Bergh, A., Erlingsson, G., Gustafsson, A., Wittberg, E.: Municipally owned enterprises as danger zones for corruption? How politicians having feet in two camps may undermine conditions for accountability. Public Integr. 21(3), 320 (2019)
Google Scholar
Wachs, J., Yasseri, T., Lengyel, B., Kertész, J.: Social capital predicts corruption risk in towns. R. Soc. Open Sci. 6(4), 182103 (2019)
Google Scholar
Fazekas, M., Ferrali, R., Wachs, J.: Institutional quality, campaign contributions, and favouritism in us federal government contracting. GTI working paper series (1) (2018)
Popa, M.: Uncovering the structure of public procurement transactions. Bus. Polit. 21(3), 1–34 (2019)
Google Scholar
Klitgaard, R.: Controlling Corruption. University of California Press, Berkeley (1988)
Google Scholar
Johnston, M.: Syndromes of Corruption: Wealth, Power, and Democracy. Cambridge University Press, Cambridge (2005)
Google Scholar
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)
Google Scholar
Gregg, F., Eder, D.: Dedupe. https://github.com/datamade/dedupe (2015). Accessed 3 Dec 2018
Wachs, J.: Network approaches to the study of corruption. Ph.D. thesis, Central European University (2019). http://www.etd.ceu.edu/2019/wachs_johannes.pdf
Attström, K., Kröber, R., Junclaus, M.: Review of the Function of the CPV Codes/System. Technical report, European Commission (2012)
European Court of Auditors, Fighting fraud in EU spending: action needed. Technical report (2019)
Mungiu-Pippidi, A., Dadašov, R.: Measuring control of corruption by a new index of public integrity. Eur. J. Crim. Policy Res. 22(3), 415 (2016)
Google Scholar
Jordano, P., Bascompte, J., Olesen, J.M.: Invariant properties in coevolutionary networks of plant–animal interactions. Ecol. Lett. 6(1), 69 (2003)
Google Scholar
Bustos, S., Gomez, C., Hausmann, R., Hidalgo, C.A.: The dynamics of nestedness predicts the evolution of industrial ecosystems. PLoS ONE 7(11), e49393 (2012)
Google Scholar
Hernández, L., Vignes, A., Saba, S.: Trust or robustness? An ecological approach to the study of auction and bilateral markets. PLoS ONE 13(5), e0196206 (2018)
Google Scholar
Robins, G., Alexander, M.: Small worlds among interlocking directors: network structure and distance in bipartite graphs. Comput. Math. Org. Theory 10(1), 69 (2004)
MATH Google Scholar
Axtell, R.: Firm sizes: facts, formulae, fables and fantasies. SSRN Electron. J. (2006). https://doi.org/10.2139/ssrn.1024813
Article Google Scholar
Alstott, J., Bullmore, E., Plenz, D.: powerlaw: a python package for analysis of heavy-tailed distributions. PLoS ONE 9(1), e85777 (2014)
Google Scholar
Csermely, P., London, A., Wu, L.Y., Uzzi, B.: Structure and dynamics of core/periphery networks. J. Complex Netw. 1(2), 93 (2013)
Google Scholar
Batagelj, V., Zaversnik, M.: An o (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003)
Dorogovtsev, S.N., Goltsev, A.V., Mendes, J.F.F.: K-core organization of complex networks. Phys. Rev. Lett. 96(4), 040601 (2006)
Google Scholar
Garas, A., Schweitzer, F., Havlin, S.: A k-shell decomposition method for weighted networks. New J. Phys. 14(8), 083030 (2012)
MATH Google Scholar
Persson, T., Tabellini, G.E.: Political Economics: Explaining Economic Policy. MIT Press, Cambridge (2002)
Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75 (2010)
MathSciNet Google Scholar
Evans, T.S., Lambiotte, R.: Line graphs of weighted networks for overlapping communities. Eur. Phys. J. B 77(2), 265 (2010)
Google Scholar
Ahn, Y.Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761 (2010)
Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
MATH Google Scholar
Monteiro, J., Martins, B., Pires, J.M.: A hybrid approach for the spatial disaggregation of socio-economic indicators. Int. J. Data Sci. Anal. 5(2–3), 189 (2018)
Google Scholar
Fazekas, M., Skuhrovec, J., Wachs, J.: Corruption, government turnover, and public contracting market structure. GTI working paper series (2) (2017)
Sikdar, S., Ganguly, N., Mukherjee, A.: Time series analysis of temporal networks. Eur. Phys. J. B 89(1), 11 (2016)
Google Scholar
Tsalouchidou, I., Baeza-Yates, R., Bonchi, F., Liao, K., Sellis, T.: Temporal betweenness centrality in dynamic graphs. Int. J. Data Sci. Anal. (2019). https://doi.org/10.1007/s41060-019-00189-x
Article Google Scholar
Grossi, V., Rapisarda, B., Giannotti, F., Pedreschi, D.: Data science at sobigdata: the European research infrastructure for social mining and big data analytics. Int. J. Data Sci. Anal. 6(3), 205 (2018)
Google Scholar
Podobnik, B., Horvatić, D., Kenett, D.Y., Stanley, H.E.: The competitiveness versus the wealth of a country. Sci. Rep. 2, 678 (2012)
Google Scholar
Correa, J.C., Jaffe, K.: Corruption and wealth: unveiling a national prosperity syndrome in Europe. arXiv preprint arXiv:1604.00283 (2015)
Paulus, M., Kristoufek, L.: Worldwide clustering of the corruption perception. Physica A 428, 351 (2015)
Google Scholar
Albeaik, S., Kaltenberg, M., Alsaleh, M., Hidalgo, C.A.: Improving the economic complexity index. arXiv preprint arXiv:1707.05826 (2017)

Download references

Acknowledgements

Open Access funding provided by Projekt DEAL. The authors would like to thank Ágnes Batory and Eelke Heemskerk for useful feedback on a preliminary version of this paper. JK acknowledges support from the Hungarian Scientific Research Fund (OTKA K129124—“Uncovering patterns of social inequalities and imbalances in large-scale networks”).

Author information

Authors and Affiliations

Chair for Computational Social Sciences and Humanities, RWTH Aachen, Aachen, Germany
Johannes Wachs
School of Public Policy, Central European University, Budapest, Hungary
Mihály Fazekas
Department of Network and Data Science, Central European University, Budapest, Hungary
János Kertész

Authors

Johannes Wachs
View author publications
You can also search for this author in PubMed Google Scholar
Mihály Fazekas
View author publications
You can also search for this author in PubMed Google Scholar
János Kertész
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Wachs.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Here we report additional information about the cores of the procurement markets in each country (Table 2).

Table 2 The core statistics of each national market, averaged over 2008–2016

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wachs, J., Fazekas, M. & Kertész, J. Corruption risk in contracting markets: a network science perspective. Int J Data Sci Anal 12, 45–60 (2021). https://doi.org/10.1007/s41060-019-00204-1

Download citation

Received: 09 July 2019
Accepted: 24 December 2019
Published: 18 January 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s41060-019-00204-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Corruption risk in contracting markets: a network science perspective

Abstract

Similar content being viewed by others