Corruption in Public Procurement: Finding the Right Indicators

Red flags are widely used to minimize the risk of various forms of economic misconduct, among which corruption in public procurement. Drawing on criminal investigations, the literature has developed several indicators of corruption in public procurements and has put them forward as viable risk indicators. But are they genuinely viable, if only corrupt procurements are analysed? Using a dataset of 192 public procurements — with 96 cases where corruption was detected and 96 cases where corruption was not detected — this paper addresses the identification of significant risk indicators of corruption. We find that only some indicators significantly relate to corruption and that eight of them (e.g. large tenders, lack of transparency and collusion of bidders) can best predict the occurrence of corruption in public procurements. With this paper we successfully tap into one of the most vulnerable areas of criminological research — selecting the right sample — and consequently, our results can help increase the detection of corruption, increase investigation effectiveness and minimize corruption opportunities.

Introduction 1975, in a seminal paper, Rose-Ackerman proposed a new framework for studying and combating corruption 1 in public procurement. She started with the case when a government knows perfectly what it wants to buy and when there are many sellers competing to deliver this product. This is the case when corruption is unlikely to occur as sellers have little financial incentive to bribe, and corrupt deals can easily be detected. Moving away from the scenario where tax payers got the best value for money, Rose-Ackerman (1975) discussed the more realistic caseswhen the government does not have a clear preference on the product it wants to buy and finally, when there is one single seller who can provide the good. In these cases, incentives exist for sellers to bribe governmental officials for the purpose of either killing their competitors or of making extraordinary profits. These latter situations would lead to the tax payer receiving a lower value for money. And considering that according to the European Commission (2015) "every year, over 250,000 public authorities in the EU spend around 18 % of GDP on the purchase of services, works and supplies" studying the settings in which corruption is likely to occur and the ways in which incentives for corrupt behaviour may be controlled is certainly no trivial exercise.
In 2002, the American Institute of Certified Public Accountants [AICPA] published a list of fraud indicators, it felt are often present when fraud occurs (AICPA 2002). In 2006, the Financial Action Task Force started recommending the use of red flags to minimize the risk for financial institutions of handling criminal money (Financial Action Task Force [FATF] 2006a[FATF] , b, 2008[FATF] , 2010; and in 2007, the World Bank adopted a new Governance and Anticorruption Strategy, whereby it recommended the wide usage of 'red flags' to spot fraud and corruption in its financed projects. These red flags are "drawing on lessons from […] investigations regarding effective anticorruption safeguards and due diligence" (World Bank 2007, p. vi). Finally, the method of red flags has since, also been applied, extensively, for the purpose of minimizing the risk of other forms of misconduct, in the criminal justice literature. For instance, in 2009, Kane and White noted the lack of reliable indicators for police misconduct. Consequently, they constructed a large dataset composed, equally, of police officers that were discharged due to misconduct and of police officers that served honorably. Based on a review of prior research, they generated and tested several potential indicators of police misconduct. Consequently, Kane and White (2009), and later, White and Kane (2013) showed, in two seminal papers, that several officer individual characteristics and institutional responses can distinguish deviant officers from their colleagues, as well as, the timing of misconduct in police officers' careers. Importantly, their samples and methodologies could support inferences about the individual-level patterns of misconduct and could be used as evidence for policymaking.
Nevertheless, a large fraction of the literature using risk indicators to predict some form of criminal misconduct is plagued with econometric shortcomings, which ultimately limit them in their ability to predict risk correctly (Cf. Bushway, Johnson and Slocum 2007). For instance, Soudjin (Soudjin 2015) used criminal records to distil red flags that Hawaladars can use to distinguish criminal money. Likewise, Choo (2009) analysed police records to distinguish red flags of prepaid cards being used by organized criminals and terrorists to launder their illicit proceeds of crime. Similarly, in lack of public criminal records, Pluchinsky (2008) used anecdotal evidence to support the claim that imprisonment for involvement in terrorist activities is a red flag for future jihadist recidivism. Nevertheless, while these studies are informative in the context of a novel crime or criminal technique and appeal to policymakers who need to implement fast solutions to pressing issues, they fall into what is known as the selection bias 2 trap (Cf. Bushway et al. 2007).
Likewise, a substantial part of the literature on corruption and public procurement focuses on the indicators that point to corruption, which may eventually be used to develop an indicator-based risk assessment (Di Nicola and McCallister 2007). However, most of these indicators are weak in their ability to support inferences about the individual characteristics of a corrupt public procurement case, as they were developed using only criminal records (e.g. Association of Certified Fraud Examiners [ACFE] 2008;OLAF 2011). In effect, as a result of the selection bias, these studies fail to distinguish those indicators that can effectively distinguish corrupt public procurements from the rest. Consequently, our attempt mirrors those of White (2009, 2013) and of Unger and Ferwerda (2011) in terms of method and scope.
In this paper we investigate which red flags can effectively distinguish corrupt public procurement from non-corrupt ones. For this purpose we scan the criminological literature and select the therein identified micro-level indicators (Literature overview). We then operationalize the 28 indicators found in the Literature overview and apply them to a representative sample of public procurementsa sample of 192 public procurements, collected across eight EU Member States and five different public procurement sectors, with 96 public procurements related to corruption and 96 non-corrupt public procurements as a control group (Tested indicators of corruption in public procurement). Using pair-wise correlations and a multivariate probit model we investigate the extent to which these indicators correctly identifies corrupt public procurements individually and jointly (The dataset). The last section concludes and places the results in a policy making perspective.

Literature overview
Since 2000 the literature on corruption has increased exponentially. Good reviews of the economic literature on corruption in public procurement are provided by Søreide (2002), Rose-Ackerman (2004) and Rose-Ackerman and Palifka (2016). This literature has mainly focused on the institutional environments in which corruption flourishes (e.g. Nelson 2010, 2011;Mungiu-Pippidi et al. 2011;Acemoglu and Robinson 2012) on the incentives of officials to demand and take bribes (e.g. Campos and Pradhans 2007;Rose-Ackerman and Søreide 2011;Søreide and Williams 2014) and on the negative welfare consequences of corruption (e.g. Mauro 1997;Wei 2000;Aidt 2009). This literature reflects the broader efforts of the international community to raise awareness of the detrimental effects of corruption and to understand the corruption phenomenon. To this end, the recommendations converge. 3 One of these recommendations is the use of indicators of corruption or red flags 4 to separate corrupt from non-corrupt public procurements, within the same sector and country. The logic on which red flags are used, is that corrupt activities require certain forms of economic behaviour (e.g. low bid participation rates, inexplicably rich public officials, poorly negotiated public procurement contracts) and that this behaviour leaves traces (Kenny and Musatova 2011). Consequently, red flags are accumulations of traces that may point to the presence of corrupt activities. 5 Consequently, they are primarily aimed at helping practitioners, investigators and policy makers in estimating the probability of corruption of a certain procurement case and lay the foundation of a new evidence-based approach to fighting corruption.
Nevertheless, the literature on red flags has systemically focused on analysing atypical samples 6 , and consequently can be seen to suffer from a systemic selection bias problem. For instance, in 2009, the European Commission composed a note on the red flags for fraud (including corruption) in public procurements, on the basis of known or suspected cases discussed and presented in its annual reports on the fight against fraud (European Commission [EC] 2009, p. 3). This list of red-flags supported the list of red-flags for corruption in public procurement composed by the Association of Certified Fraud Examiners. This perhaps comes as no surprise, as the latter list is also composed on the basis of known fraud case investigations and fraud schemes (Association of Certified Fraud Examiners [ACFE] 2008). Similarly, in 2011, the European Anti-Fraud Office revealed a list of structural indicators of fraud in the EU was based on the investigative experience the office had accumulated over the years. It made use of anonymized cases that have been investigated by the Office where elements of fraud had been detected. Knowing the particularities of the cases under investigation as well as what has gone wrong in that specific case, OLAF was able to conduct a qualitative analysis and reveal some of the most important fraud indicators (European Anti-Fraud Office [OLAF] 2011).
Moreover, the OECD (2007) and Ware, Moss and Campos (Ware et al. 2007) presented some of the most observed forms of corruption (e.g. kickbacks, bid rigging and use of shell companies) and then suggested a pallet of 'red flags' that can be used to identify corruption. Furthermore, in 2010, the World Bank issued a guide on the top ten most common red flags of fraud and corruption in procurement for bank financed projects. This list was, once again, based on anecdotal evidence provided by investigated cases of fraud and corruption in the public and the private sector. Additional sources of information have been theoretical models and expert opinions on the likely symptoms of corruption (cf. Kenny and Musatova 2011, p. 500). Finally, a study of Transparency International on corruption in the sphere of public procurement in Indonesia, Malaysia and Pakistan identified and clustered the indicators of corruption along the public procurement cycle (Transparency International [TI] 2006). Similarly, Ware, Moss and Campos (2007) distinguished the following procurement stages along which corruption indicators can be organized: (1) project identification and design; (2) advertising, prequalification, bid preparation and submission; (3) bid evaluation, post qualification and award of contract; and (4) contract performance, administration and supervision. The red flags we use and describe in the following section are broadly organized along these lines of the public procurement process to more accurately capture the risks of corruption in each stage of the procurement process.

Tested indicators of corruption in public procurement
Importantly, the indicators we tested in this paper 7 relate not only to the procurement process itself, but also to the decision to contract and the contract monitoring and implementation stages that tend to receive less attention but which are prone to corruption as well (cf. Kenny and Musatova 2011, pp. 504-505). Consequently, we structured them according to the different stages of the procurement process: (1) the decision to contract; (2) the definition of contract characteristics; (3) the contracting process; (4) the contract award; and (5) the contract implementation and monitoring. Finally, we operationalized the indicators by stating them as questions, such that they could directly be used in the data collection process. All questions were generally posed in such a way that an answer 'yes' indicated an increased chance of corruption.

The decision to contract
Public authorities decide to purchase goods, works and services. It is possible at this point that the decision does not follow a policy rationale or an existing need but rather the desire to channel benefits to an individual or/and organization (OECD 2007, p.19-20). Consequently, the red flags are: & Is there strong inertia in the composition of the evaluation team of the tender supplier? & Is there any evidence for conflict of interest for members of the evaluation committee (for instance because the public official holds shares in any of the bidding companies?) Svensson (2005) finds mixed evidence for the hypothesis that adding resources to governmental institutions helps deter corruption. One notable supporting case is that of Singapore, where "public officials were routinely rotated to make it harder for corrupt official to develop strong ties to certain clients" (Svensson 2005, p.35). If funds are corruptly channeled to individuals or organizations, this is likely to be seen as an unexplained rise in the wealth of officials involved in the tendering procedure just before the tender and shortly after the award (OECD 2007, p.57). They are also very likely to explain why officials are unlikely to seek a promotion or another job as the present one offers extra benefits that are not legally accounted for.
Conflict of interest is a clear red flag for corruption. This can be due to family, business or political ties. Another red flag is the possible impartiality of the tender provider to certain suppliers because of past or present affiliation (OLAF 2011, pp. 68-69). This affiliation, be it direct or mediated via family members reduces the uncertainty that exists between the tender providers and the specific supplier and could therefore create the proper environment for illegal funds channeling. Persily and Lammie (2004) examine the relationship between public perception of corruption and campaign finance in the US. They argue that while reforming the law of campaign finance is not going to reduce widespread perceptions of corrupt officials, the US Courts prefer promoting the reforms, as disproving corruption in campaign finance is very difficult.

Definition of contract characteristics
Public authorities determine what they need and how they will go about it. The risk here is that the tender is designed in such way that it favours a special bidder instead of addressing a specific need. Consequently, the red flags are: & Are there multiple contact offices/persons? & Is the contact office not directly subordinated to the tender provider? & Is the contact person not employed by the tender provider? & Are there any elements in the terms of reference that point at a preferred supplier?
Coolidge and Rose-Ackerman (1997) argue from a theoretical point of view that a corrupt government is larger than optimal, and that kleptocrats who wish to maximize the size of their rents are likely to select mixes of governmental services that are sub-optimal from a social welfare maximization point of view. Middlemen are often used by tender suppliers to intermediate the flows of money (Transparency International 2006). The existence of multiple contact offices that are not directly subordinated to or employed by the tender provider and that provide consultation to the bidding companies could point to their position in the tender process as middlemen.
Furthermore, the tender can be constructed in such a way that it discourages the participation of non-corrupt competitive bidders (European Anti-Fraud Office [OLAF] 2011). This ranges from low attention of the tender provider when it practically nominates the favoured supplier in the text of the call, to high attention when the tender provider uses multiple evaluation criteria and small weights to stump out criteria that favour a certain supplier. As such, Søreide (2002) argues that among the known corrupt techniques in public procurement is the preferred supplier indicationi.e. the public officials "decide which enterprises to invite to the tender" (Søreide 2002, p. 14).

Contracting process
When a contracting process opens, it should take place according to what method the law determines to be used to receive proposals (e.g. open bidding system) or evaluate contractors (e.g. single source). The risk is that the tender process does not follow the legal design in order to restrict the entrance of competitive bidders. Consequently, the red flags are: Della Porta and Vannucci (2002, p. 63) present examples of artificially shortened bidding processes favouring certain companies involved in a public health system tender in Parma -known under the pseudonym the 'summer call for bid'. Furthermore, Moody-Stuart (1997) argues that among the indicators for corruption one can count the size of the procurement contract and the speed of contracting. The first is motivated by the fact that bribes are usually calculated as a percentage of the total contract value, whereas the second is motivated by the risk that the officials being bribed lose office. Furthermore, Della Porta and Vannucci (2002) argue that sometimes the competition appears real since a large number of firms enroll for a tender, while in reality most of these companies are not real competitors. Finally, Kenny and Musatova (2011, p. 506) also test for a shorter timespan in the bidding process, for too few bids, and for artificial bids and bidders.
Once a tender process is open, the tender provider can still dissuade competitive bidders by keeping the contracting process non-transparent and by circulating private information to favour a particular clientele. From the bidder's side, the chance of bidder collusion increases when tender procedures are not transparent and predictable (Organization for Economic Cooperation and Development [OECD] 2007a, b). Therefore the unusual composition and distribution of bids put forward in a call should be analysed and matched with known patterns of collusive behaviour. In this sense, high prices and similar bids are expected to strongly signal collusion; bidders would be expected to also adopt more sophisticated strategies, for example in subcontracting one another so as to avoid competition.

Contract award
The contract process ends and a decision is made to select the winning bidder. The risk is that evaluation criteria are not clearly stated in tender documents, leaving no grounds to justify the decision of awarding the tender to a corrupt supplier. Consequently, the red flags are: & Are not all bidders informed of the contract award and on the reasons for this choice? & Are the contract award and the selection justification documents not all publicly available?
At this point the tender provider has already made a decision over the winning supplier, and this decision has to be justified and made public (Organization for Economic Cooperation and Development [OECD] 2007a, b). Kunicova and Rose-Ackerman (2005) analyse the effect of different electoral rules and constitutional structures on constraining corruption. For this, they look at, among others, the capacity of voters and political opponents to monitor public officials, organize for oversight and expose corrupt deeds of public officials, given the different electoral rules and constitutional structures in place. And while they are particularly interested in the incentives and ability of voters and political opponents to directly monitor public officials, they acknowledge the role of the media and of the judiciary in indicating corruption of public officials (Kunicova and Rose-Ackerman 2005, p. 583). Bertot, Jager and Grimes (2010) argue that if effectively supported by politics, social media can effectively stimulate anti-corruption measures. Finally, based on a crosssectional analysis, Brunetti and Weder (2003) show that free press and low corruption are significantly correlated.
One should look at whether the tender formulates strict requirements for justification of the award and at whether these reasons are presented in due time to all other bidders. One should also investigate whether the contract award and the justification documents are publicly available, becausealso herea lack of transparency could indicate corruption.

Contract implementation and monitoring
When the contract is signed with the selected bidder or contractor, the risks are that contract changes and renegotiations after the award are of a nature that changes the substance of the contract itself. Another risk is that monitoring agencies are unduly influenced to alter the contents of their reports so changes in quality, performance, equipment and characteristics go unnoticed. Moreover, contractor's claims can be false or inaccurate and can be protected by those in charge of revising them. Finally, fictitious companies can be used to relieve the procurement authorities from any accountability or to unlawfully channel funds. Consequently, the red flags are: At the contract implementation stage, the risks of corruption are threefold. First, the procuring entity can fail to keep track records of their procurement process thereby allowing changes to the awarded contract to be made and even to go unnoticed. 8 This would provide public authorities the freedom to ask for additional services to be provided on top of what was requested in the tender, but it would also allow the winning bidder to reduce the proposed workload, the scope of the project etc. It is therefore important to investigate any changes in the scope of the project compared to the original design, as well as changes in quoted prices as compared to the original quotations (cf. Kenny and Musatova 2011, p. 506). Second, the monitoring entity can be corrupt or negligent such that the poor performance of the contractor is not recorded or is diluted. It is therefore important to look at audit assessments 9 and compare these with relevant media coverage of the tender. Third, cases in which audit companies reveal irregularities due to poor performance of the supplier should be considered to have a higher risk. At this point, the risks of corruption are assigned to the supplier who has to fake some of the costs it has incurred such that it can recuperate the bribe and make a profit (European Anti-Fraud Office [OLAF] 2011). Finally, phantom companies can provide the best coverage for fake invoices, and therefore the real existence of the subcontracting firms and of the other team members and their persistence in the market is important. Table 1 presents the overall list of red flags assembled. As can be seen, these are generally the red flags that are discussed above, with four exceptions. First of all, we added two red flags on the funding authority of the public procurement, since these can have different policies and processes, which might influence the probability of corruption. 10 Finally, we added two red 8 Della Porta and Vannucci (2002) explain that modifications of contracts and price alterations post contract allocation are indicators of corruption. While as Laffont and Tirole (1990) argue that while renegotiation of contracts can be socially beneficial, malevolent public officials and companies can abuse this provision for the purpose of rent collection. 9 Bowles and Garoupa (1997) model optimal crime repression when police can be bribed. They show that when police can be corrupted, optimal fines should be lowered and resources should be allocated to auditing the work of policemen. Extrapolating, auditors work is especially important in uncovering corruption in public procurement. 10 Whether EU funds are more prone to corruption or not is unclear. Alesina and Weder (2002) investigate whether foreign aid rewards less corrupt governments and policies made in the spirit of good governance. Supporting Wei (2000), they argue that multi-lateral aid does not favour less corrupt governments while foreign direct investment does.
flags on a lack of transparencyred flags 25 and 28which measure whether all information on the tender is filled in a Centralized European Tender Database, and, respectively, which counts how many of the 27 red flags we were unable to get a definite The red flags in Table 1 are phrased as questions such that, according to the literature, they are expected to be answered positively more often for corrupt public procurements. Hence, a positive relationship is expected between the number of red flags, and the status of a public procurement being corrupt. Exceptions are, by definition, red flags 12, 23, 24 and 28, with care answered not with "yes" or "no" but numerically. For those red flags the expected relationship is either negative (red flag 12: less bids indicates corruption), unclear (red flag 23 and 24: % of funding from EU/the MS) or positive (red flag 28: more missing information indicates less transparency and therefore corruption). In absence of consistent findings from the literature, note is therefore to be taken of the fact that no a priori assumptions have been made with regard to the relation between the corrupt status of public procurements and cases which have benefitted from EU funding.
TED stands for the European public procurement journal Tenders Electronic Daily, which contains all active notices published in the supplement of the Official Kournal of the European Union dedicated to European public procurement. CAN stands for Contract Award Notice. The Contract Award Notice is a public announcement of the outcome of a public procurement exercise in the Official Journal of the European Union.
answer on. 11 The assumption underlying this last red-flag is that being unable to get information on the 27 other red flags might indicate an effort of a corrupt official or entity to hide this information.

The dataset
To test to what extent the 28 red flags correctly indicate corruption, we built a unique dataset on 192 public procurements in the EU, of which 96 are related to (suspected) corruption and 96 are not. 12 For each of these 192 public procurements we note the presence of the 28 previously identified red flags. Our dataset consists of three types of public procurement cases, ordered along an ordinal scale, which we denote as corrupt, grey and clean. Corrupt cases comprise cases where a judicial ruling of corruption exists, or where a validated confession of corruption of one of the parties involved is presented. Additionally, grey cases comprise cases where significant indications of corruption exist but where no evidence of corruption (judicial ruling or validated confession) is present. Finally, clean cases are cases where there is no reason (evidence and indication) to assume that corruption has taken place. Additionally, OLAF (the European Anti-Fraud Office) has double-checked the cases against their information and was able, for a number of cases, to confirm the corrupt-grey-clean classification.
Tables 2 and 3 provide some data descriptive. All our cases are selected at random within the above mentioned sectors and countries. However due to differences in frequency and availability, the number of cases varies between sectors (see Table 2). Moreover, the sector breakdown has been adjusted in light of the number of cases that could be identified in the sectors and Member States studied. 13 Furthermore, as we could not find more than 13 corrupt and grey cases in the selected sectors in Poland, two corrupt and grey cases from other sectors were added. 14 Furthermore, there is considerable heterogeneity among the analysed cases (see Table 3). The cases differ in budgets from several thousand to several hundred million Euros and the overall public budget involved in these cases amounts to more than € 5.5 billion.
11 Rose-Ackerman (1999) argues that procurements where post-delivery inspections are difficult to conduct are foremost subject to corruption. As such, she gives the example of consumption goods where post-delivery inspections reveal missing informationi.e. Malawi's Governmental acquired millions worth of stationary that could later not be found by the auditors (see also e.g. Kunicova and Rose-Ackerman (2005), Bertot, Jager and Grimes (2010) and Brunetti and Weder (2003) on the relation between transparency and corruption). 12 These 192 cases were collected by country teams within the OLAF-financed project by Wensink et al. (2013), including national experts from PwC and Ecorys. The operationalization of the red flags was done by these teams based on a uniform country team instructions document. The main instruction was to gather 15 random corrupt and grey public procurements and 15 random clean public procurements in each country spread as evenly as possible over the five selected sectors. This database has not been made public due to confidentiality agreements. 13 Sector grouping has taken place on the basis of the characteristics of the suppliers (their CPV codes). 14 A robustness analysis (available upon request) shows that the inclusion or exclusion of these two cases does not alter the results in any way. Furthermore, Table 4 shows the answers to the red-flag questions for the 192 public procurement cases. For four of the red flag questions, it proved impossible to find a definite yes/no answer in more than 50 % of the cases. As mentioned earlier, we included a red flag which tracks this lack of transparency. Table 5 reveals descriptive statistics of our sample, organized by case type, sector of public procurements and country. The average number of red flags differs between the corrupt, grey and clean cases and also differs between sectors and Member States. The overall number of red flags (corrupt, grey and clean cases combined) is highest in waste water treatment (3.8 red flags), followed by Urban and utility construction (3.5) and Road and rail construction (3.1). The number of red flags (again, corrupt, grey and clean cases combined) appears to be relatively high in Romania, followed by France, Lithuania, Italy and Hungary. The numbers of red flags scored are lowest in Poland and the Netherlands.
Importantly, Table 5 shows that the average number of red flags scored is 4.6 for corrupt cases, 4.5 for grey cases and 1.8 for clean cases. The above differences between corrupt/grey and clean cases are statistically significant and have three implications, namely that (1) corrupt cases are indeed characterized by a higher number of red flags than clean cases; (2) that grey cases resemble corrupt cases in terms of the presence of red-flags, much more than clean cases; and (3) that the amount of information that could be collected (the level of transparency) is for

Analysis and results
The null hypothesis of this paper is that all the identified red flags which we could apply on this dataset are able to distinguish corrupt public procurements from non-corrupt ones. Unanswered is a rest category, which includes mainly "don't know", "N/A" and not answered. For the econometric analysis we interpret solely whether the red flag is present or not. *Questions 12, 23 and 24 are not answered with yes/no but with a number Consequently, the predictive power of an individual red flag is determined by relating the status of a case (corrupt, grey or clean) to the occurrence of that particular red flag. We therefore use statistical and econometric analyses to determine which red flags (individually and jointly) increase the probability of corruption in a public procurement and to what extent. Table 6 reveals the degree of association 15 between the status of a case and the presence of a red-flag. Overall, 18 red flags are significantly correlated with corrupt/grey public procurements. All the significant red flags have the expected sign. Note that this directly implies that for ten of the 28 indicators, no statistical support could be found. 16 The latter characteristics of corrupt public procurements also belong to non-corrupt public procurements, and as such, these indicators cannot discriminate between the two groups.
Appendix 1 provides an analysis of how the indicators are related to the different types of corruption. From this analysis we conclude that the indicators may have a stronger predictive power for bid rigging than for kickbacks. 17 This is especially important when types of corruption vary between sectors and countries. 15 Using Spearman and Kendall rank correlations for the binary variables reveals very similar results. 16 Kenny and Musatova (2011) find little statistical support for the 13 red-flags they tested on a sample of 60 Wold Bank financed water and sanitation contracts. 17 There are many types of behaviour that lead to a corrupt procurement. Some red flags only capture a specific behaviour that leads to corruptions. Consequently, for efficiency reasons, they should be employed in the sectors or countries which are theoretically more vulnerable to that particular type of corruption. The use of otherwise inefficient indicators will generate unnecessary delays and will erode trust in the method (cf. Kenny and Musatova, 2010). We further employ a multivariate ordered probit model to assess whether the probability of a case being corrupt, grey or clean can be explained on the basis of its characteristic red flags. As opposed to the previous correlation exercise, we can observe how well multiple red flags explain the corrupt, grey or clean nature of a public procurement case. However, country dummies and other macro-economic indicators are uninformative in this estimation, as the selected treatment groups (corrupt and grey cases) and control groups (clean cases) are equally large, per country. Therefore, variations of the dependent variable would not be picked up by macro variables, but merely indicate the extent to which a balance was achieved in the data collection process.
Additionally, we assessed how independent the corruption indicators found present, and the status ('corrupt', 'grey' or 'clean') of a case were. Corroborating the econometric and the factual evidence (discussions with the country teams that had collected and selected the 192 cases in Europe), we decided to drop red flags 2, 6 and 27, as they were too strongly related to the selection procedure and could therefore better be seen as dependent variables (part of corruption) than independent variables (indicators for corruption). Subsequently, our econometric model dropped a number of red flags (red flags 10,11,13,19,21,22 and 26) that overlapped, thereby not providing substantial information to the already included indicators. Finally, red flags were not included in the final estimation because they lacked too many observations. The ordered probit model showed that red flags 12, 23 and 24 did not add sufficient explanatory value, hence they were dropped. Table 7 shows the results of our ordered probit estimation model where the status of the case is our ordered dependent variable: clean (0), grey (1) or corrupt (2). Overall, the explanatory power of the model measured with the pseudo R-squaredusing a total of 15 red flagsis 0.4. This implies that the model is able to explain for 40 % whether a case is corrupt, grey or clean. This percentage can be considered high given the hidden nature of corruption and the variety in patterns of corruption between countries and sectors. In a study employing a similar approach to explain money laundering in the real estate sector, Unger and Ferwerda (2011) derived a model with an explanatory power of about 10 %. 18 Knowing this, we argue that our present model performs well. Additionally, the model is robust to the inclusion of the control variables. When estimating the same ordered probit model without control variables the exact same indicators are significant, with the same sign and to similar degree, but with a lower overall explanatory power of the model (pseudo R 2 of 0.291). Excluding only the sector dummies or only the country dummies also gives similar results. Table 7 shows that five red flags explain significantly whether a project is corrupt, grey or clean. Many of the other red flags are not significant in this specification. Therefore, we use Akaike's Information Criterion (AIC) 19 to search for the best-fit model specification. 20 Table 8 shows the model with the lowest AIC. The AIC indicates to drop red flags 1, 3, 5, 8, 15, 18 and 20. This leaner model shows that with only eight indicators we can explain corruption with a pseudo-R 2 of 0.38. All indicators are significant 21 and have again the expected positive sign. Although we cannot interpret the coefficients directly, we do see that the importance of the different red flags is rather similar, except the amount of missing information. The amount of missing information is not as much a binary variable as the other red flags, but is a frequency with an average value of 7.9. Our study therefore corroborates the findings of the literature (e.g. Kunicova and Rose-Ackerman 2005;Bertot et al. 2010) in that the lack of transparency is a good indicator of corruption. 18 Unger and Ferwerda (2011) use a regular probit model, while we use an ordered probit model. However, we also estimated a regular probit model to compare the exploratory power of our model. A comparable probit model as used in Unger and Ferwerda (2011) to estimate the probability of corruption with our dataset has an explanatory power (measured by its pseudo-R 2 ) of 43 %, considerably higher than the 10 % in Unger and Ferwerda (2011) for money laundering in the real estate sector. 19 Using Bayesian Information Criterion (BIC) instead of AIC gives very similar results, with the only difference that BIC indicates to drop one more red flag (number 4, contact office not subordinated to tender provider). This is in line with the general observation in the literature that BIC "tends to favour more parsimonious models than AIC" (Verbeek 2008, p. 61). 20 An analysis of the correlations between explanatory variables shows that the explanatory variables are not highly correlated with each other, with two notable exceptions. The correlation between red flag 4 (contact office not subordinated to tender provider) and red flag 5 (contact person not employed by tender provider) is 0.754 and the correlation between red flag 7 (shortened time span for bidding process) and 8 (accelerated tender) is 0.507. The full correlation table is available upon request. Furthermore, in the search for a parsimonious model to estimate the probability of corruption, we prefer to only include red flag 4 or red flag 5 and only red flag 7 or red flag 8. 21 Red flag 18 is only significant with a 90 % confidence interval.
Additionally, the efficient estimation model based on AIC is robust to the inclusion of the control variables. When estimating the same ordered probit model without control variables, Ordered probit-regressions are non-linear and therefore the estimated coefficient cannot be interpreted directly. We therefore focus on the significance and sign of estimated coefficients. The control variables (four sector dummies and seven country dummies) are not displayed since they cannot be interpreted. *** p < 0.01, ** p < 0.05, * p < 0.1 R-squared (pseudo) 0.383 *** p < 0.01, ** p < 0.05, * p < 0.1. Note that ordered probit-regressions are non-linear and that therefore the estimated coefficient cannot be interpreted directly. The control variables (four sector dummies and seven country dummies) are not displayed since they cannot be interpreted the exact same indicators are significant, with the same sign and to similar degree, but with a lower overall explanatory power of the model (pseudo R 2 of 0.281). Excluding only the sector dummies or only the country dummies also gives very similar results.
In Appendix 2, we test the robustness of the results by excluding cases from each of the eight countries and each of the five sectors, one by one. The robustness analysis shows that all the coefficients in all 13 different estimations have the same sign as in our benchmark model ( Table 8). The significant coefficients of red flags 7, 16, 17, 25 and 28 are more robust than those of red flags 9 and 14. The reduced significance of these coefficients when dropping certain countries or sectors from the database is not per se related to the fact that signals for corruption differ greatly between countries or sectors, but could also be related to some extent to the reduced number of observations. We believe that an econometric analysis such as ours, with mostly binary data, would perform best with at least 200 observations. This is unfortunate for a research field in which data availability is generally a challenge, due to the secret nature of the behaviour studied.

Conclusion
In this paper we test which indicators of corruption in public procurement can effectively identify corrupt public procurement cases from those that are non-corrupt. In our attempt we built on the criminological literature and on the corruption and procurement literature, and compile a list of micro-level red-flags, which we then apply to a relevant sample of European public procurements. Methodologically, we follow the works of Kane and White (2009) and Unger and Ferwerda (2011) and introduce control groups in our analyses to avoid falling into a selection-bias trap and, consequently, to be able to infer our results to the entire population of European public procurements.
Factually, we identify a list of 28 red flags for corruption along five phases of the procurement cycle: (1) project identification and design; (2) advertising, prequalification, bid preparation and submission; (3) bid evaluation, post qualification and award of contract; and (4) contract performance, administration and supervision. We then construct a unique database of 24 corrupt, 72 grey and 96 clean public procurements selected across five sectors (Road & Rail Construction, Training, Urban & Utility Construction, Waste Water Treatment and Research & Development) and eight European Union Member States (France, Hungary, Italy, Lithuania, Netherlands, Poland, Romania and Spain). Finally, we test, by means of pair-wise correlations and a multivariate ordered probit model, whether the 28 indicators are indeed significantly related with corruption when tested against a control group of clean cases.
The null hypothesis is that all of these indicators would significantly occur more in corrupt procurements. Our statistical analysis points to significant correlations between the occurrence of red flags and the (corrupt) status of a case: 18 out of the 28 red flags are statistically significant. Consequently, the rest of the indicators lack empirical support. Supporting the earlier analysis of Kenny and Musatova (2011) we find that even though these characteristics are often found in corrupt public procurements, they are also found often enough in noncorrupt public procurements to make the discriminating power of the indicator insignificant. Importantly, the amount of information that could be collected appears to be the strongest indicator of corruption as the coverage rate for clean cases amounts to 80 % for clean cases, much higher than for grey (64 % coverage) and corrupt (54 % coverage) cases. This supports the original claims of the literature on corruptionnamely that more transparency in public procurements is key to combatting corruption. Finally, our best-fit econometric model needs only eight indicators to explain corruption relatively well (pseudo R 2 of 0.4).
The results put forward by our analysis have multiple implications. First of all, this paper builds on the earlier efforts of Kenny and Musatova (2011) and shows that by collecting a sufficiently large and balanced dataset of corrupt and clean procurement cases it is possible to test the effectiveness of some of the most prominent red-flags put forward by the experts on corruption and public procurement. Consequently, we further encourage a more structural collection of data on public procurements. Furthermore, since the list of indicators of corruption tested in this paper is not exhaustive, we encourage more research to replicate our analysis and to test other indicators as well, in order to draw more accurate lessons and to support much needed evidence based policy making.
Second, we argue that our analysis and results can be used to build a prediction model for corruption in public procurements. By using the characteristics of public procurements during the procurement process, one can indicate which projects are expected to have an increased chance of corruption and develop an early-warning system accordingly. Additionally, knowing which characteristics create more opportunities for corruption could mean that one can adapt public procurement procedures to minimize the opportunities for corruption in the future. Finally, these vulnerability insights can also help to develop a better system of incentives (such as suggested by Rose-Ackerman 1986, p.132) by, e.g. codes of conduct (Savona 1995). All in all, these improvements are all in line with the goal the World Bank intended to reach by means of the red-flag analysisnamely showing zero tolerance on corruption (World Bank 2007).
Third, law enforcement agencies can use the results from our econometric model to measure which public procurements have an increased chance of corruption and to thereby focus on conducting targeted investigations. From a strategic point of view, our results may give law enforcement an incentive and a tool to switch from reactive investigation to proactive, information-based investigation. Additionally, it may help law enforcement utilize their resources in a more effective way, thereby increasing their overall effectiveness in combatting corruption in public procurement.
(see Table 9). Due to the limited number of cases on deliberate mismanagement, Table 9 does not report the correlations for this type of corruption. Table 9 shows that the relation between types of corruption and red flag patterns is complex and nuanced (These findings are in line with the earlier works of OECD (2007) and Ware et al. (2007)). Notwithstanding, the following patterns can be observed with regard to individual red flags. Complaints from non-winning bidders are related to conflict of interest only, which can be mitigated by the fact that (suspicions of) conflict of interest is visible to parties economically involved in the tender procedure. Furthermore, bid rigging correlates strongly with the red flags identified, suggesting that these red flags appear quite capable at detecting this type of corruption. Typical and powerful indicators associated with bid rigging are a low number of bids and all bids being more expensive than the expected overall costs. Other indicators for bid rigging (with consent of the public official) are a shortened time span and accelerated tender. Finally, kickbacks correlate less with the red flags identified. Typical and powerful indicators associated with kickbacks include conflicts of interest within the evaluation team and a large amount of information missing. Consequently, the indicators selected may have a stronger predictive power for bid rigging than for kickbacks. This is especially important when types of corruption vary between sectors and countries.

Appendix 2: Robustness analysis
We further analyse how results change when cases from a certain country or sector are dropped. This robustness check should reveal whether our results are spuriously driven. The benchmark model for our robustness analysis is the estimation of the efficient model based on AIC, as shown in Table 8. Table 10 reveals the results for the same specification, when leaving out cases from one country at a time. Consequently, Table 11 reveals the results when dropping cases from one sector at a time. Table 10 and Table 11 show that the eight relevant indicators retain the same coefficient sign as in the original model. The significant coefficients for red In each column, the observations from one country are left out. Countries are indicated with their 2 digit ISOcode. Obs stands for number of observations and p-R 2 is pseudo R-squared. T1 and T2 stand for threshold 1 and threshold 2 respectively. # marks the different red flags described in Table 8. The control variables (four sector dummies and six country dummies) are not displayed since they cannot be interpreted. *** p < 0.01, ** p < 0.05, * p < 0.1 flags 7, 16, 17, 25 and 28 are more robust than those of red flags 9 and 14. The reduced significance of these coefficients when dropping certain countries or sectors from the database is not per se related to the fact that signals for corruption differ greatly between countries or sectors, but could also be related to some extent to the reduced number of observations. In our view, such econometric analysis with mostly binary data needs at least 200 observations (Table 10).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. In each column, the observations from one country are left out. Countries are indicated with their 2 digit ISOcode. Obs stands for number of observations and p-R 2 is pseudo R-squared. T1 and T2 stand for threshold 1 and threshold 2 respectively. # marks the different red flags described in Table 8. The control variables (four sector dummies and six country dummies) are not displayed since they cannot be interpreted. *** p < 0.01, ** p < 0.05, * p < 0.1