1 Introduction

When firms deviate from competitive behavior and instigate a cartel, they secretly conspire to raise prices or lower the quality of goods or services. As such, conspiracies directly harm taxpayers, buyers, or sellers. Cartel formation remains a pervasive problem and has been considered in a range of studies. See for instance the Swedish asphalt cartel described in Bergman et al. (2020), collusion among seafood processors in the US (Abrantes-Metz et al. 2006), bid rigging in public procurement auctions for construction works in Japan (Ishii 2014), in Poland (Foremny et al. 2018), in Canada (Clark et al. 2018) and in the US (Porter and Zona 1993; Feinstein et al. 1985) and bid rigging for school milk contracts in Ohio (Porter and Zona 1999), Florida and Texas (Pesendorfer 2000). To enhance the fight against cartels, the OECD recommends competition agencies to promote pro-active methods for uncovering conspiracies, as such methods may help to discover cartels where leniency is unlikely to be sought (OECD 2013). Answering the need for statistical tools in this context, Porter and Zona (1993), Bajari and Ye (2003), Harrington (2008), Jimenez and Perdiguero (2012), Imhof et al. (2018), Crede (2019) and Bergman et al. (2020), among others, have proposed different methods for uncovering cartels.

However, the detection of cartels might be more challenging in the presence of competitive bidders participating in markets in which a cartel is active (McAfee and McMillan 1992; Hendricks et al. 2008; Asker 2010; Bos and Harrington 2010; Conley and Decarolis 2016; Decarolis et al. 2020). When a cartel is incomplete due to competitive bidders, it weakens the statistical pattern produced by bid rigging in the distribution of bids, increasing the difficulty of detecting a cartel. Moreover, a cartel might temporarily collapse because of deserters, i.e., it is not always stable. This instability in the cartel might affect the screens rendering the statistical signals of bid rigging more challenging to detect. Finally, a cartel aware of methods for uncovering cartels might try to weaken the statistical pattern due to bid rigging in order to decrease the ability of such methods to predict the cartel’s presence.

Thus, this paper offers an original application of a detection method based on screens to detect both incomplete and complete bid-rigging cartels. Screens are statistics derived from the distribution of bids in a tender capturing the distributional changes produced by bid rigging (see Abrantes-Metz et al. 2006; Hueschelrath and Veith 2014; Abrantes-Metz et al. 2012; Jimenez and Perdiguero 2012; Imhof et al. 2018; Imhof 2019). Our novel approach consists of calculating screens for all possible subgroups of three or four bids in a tender and not only for all bids. We then use the screens calculated for all the subgroups in a particular tender to calculate descriptive statistics of each screen, which synthesize the properties of the distribution of bids in a tender. Those descriptive statistics of screens, henceforth called ’summary screens’, circumvent the distortion that competitive bidders or deserters generate in the statistical signals produced by bid rigging in a tender, rendering our suggested method robust to the presence of competitive bidders.

In our study, we combine the summary screens with machine learning as in a prediction policy problem (see Kleinberg et al. 2015). Machine learning has been applied in a rapidly increasing number of studies (Rabuzin and Modrusan 2019; Imhof and Wallimann 2021; García Rodríguez et al. 2020; Rodríguez et al. 2022; Silveira et al. 2022; Huber et al. 2022) and aims at finding the optimal combination of covariates that best predicts the presence or absence of bid rigging in a tender. Also related to our paper is the recent study of Uslu et al. (2021), applying machine learning to investigate trade-based manipulations of capital market instruments. Moreover, our paper is related to studies analyzing bidding strategies (see, e.g., Liu et al. 2020; Cai et al. 2019) and applying predictive models (see, e.g., Mir et al. 2020; Mirzapour et al. 2019) in other research fields. As we focus on the predictive performance, we do not have to construct explicit structural models for collusion. To train and evaluate models, we focus on the random forest (see Breiman 2001) as machine learner because it provides a flexible prediction method that does not impose any parametric (e.g., linearity) assumptions when considering our large set of screens. In contrast to many other machine learners, random forests do not require tuning specific penalty terms, see the discussion in Athey and Imbens (2019), and are therefore easier to be implemented. This appears desirable if a competition agency applies our detection method for screening procurement markets.

Calculating screens for subgroups as in our approach is also considered in Conley and Decarolis (2016) and Chassang et al. (2022). First, Conley and Decarolis (2016) investigate subgroups to detect cartels in collusive auctions in Italy, but in contrast to our method (which considers all possible subgroups in a tender), they exploit firm-specific covariates (such as, e.g., common owner, municipality, or country) to form subgroups. Relying on firm-specific covariates could impede a broad screening activity if firm-specific data are unavailable or if the time needed to collect them without raising the attention of potential cartel participants is lacking. Chassang et al. (2022) show that winning bids tend to be isolated in terms of value when bidders collude. For analyzing the missing density of bids between the first and the second-lowest bids, they calculate the normalized margin. First, bids are normalized by the reserve price (which would be impossible in our data). Second, they calculate for each normalized bid i the difference with the minimal normalized bid (other than i) in each tender. A missing density around zero for the normalized margin is incompatible with competition, especially if repeatedly observed. A competitive bidder rationally maximizing profits would be tempted to increase her or his bids if she or he regularly observes that other bidders submit substantially higher bids. Therefore, a density gap around zero is incompatible with competition.

Two important arguments favor of our approach based on machine learning and synthetized screens. First, it exclusively relies on information about bids rather than firm-specific characteristics or cost-related variables required for econometric tests (see for instance Bajari and Ye 2003; Aryal and Gabrielli 2013). Our suggested method requires only bid summaries, which are either public or readily accessible for competition agencies and thus not as costly to acquire as firm- or cost-specific information. The necessity to gather firm-level information can attract, in some cases, the attention of the cartel, decreasing the chance of success in acting against it. Second, machine learning relies on the hypothesis that bid rigging affects the distribution of bids in a tender (also common to other methods for flagging bid-rigging cartels as the econometric tests suggested by Bajari and Ye (2003)) but remains agnostic about how the distribution is affected. In our case, it is sufficient to assume that bid rigging modifies the distribution of bids and that screens can capture these changes.

Our study investigates the correct classification rates of different methods in the context of incomplete cartels. We first apply a benchmark method, suggested by Imhof et al. (2018), which implements two screens with benchmarks, i.e., a rule of thumb, for classifying a tender as collusive or competitive. The second method applies machine learning using a set of screens, calculated based on all bids in a tender, so-called ’tender-based screens’, to predict collusion. Finally, the third method is the novel approach suggested in this paper, which includes summary statistics of the screens (median, mean, maximum and minimum) calculated for all possible subgroups of bids in a tender as predictors in the random forest.

We use data from Switzerland, where the incidence of collusive and competitive tenders is known. We apply our approach to two investigations of the Swiss competition commission (hereafter COMCO): See-Gaster and Strassenbau Graubünden. Both cases were characterized by well-organized bid-rigging cartels, which sometimes faced competition from outsiders. These competitive bidders might have tried on one hand to benefit from the umbrella effect of the cartel by bidding higher than they would have done in a competitive situation (Bos and Harrington 2010). On the other hand, too many competitive bidders could have destabilized the formation of cartels.

We find that the benchmarking approach exhibits low correct classification rates for incomplete cartels. Using tender-based screens in predictive models, we obtain correct classification rates from 61 and 77% when competitive bidders are present. Applying our novel approach based on summary screens increases the performance to correct classification rates ranging between 67 and 84%. Further, we note that the performance of machine learning decreases with the proportion of competitive bids. This result confirms the findings from the investigations that cartel participants partially endogenize the presence of competitive bidders by adopting a more competitive behavior, at least in some cases.

The remainder of this study is organized as follows. Section 2 presents the bid-rigging cartels uncovered in Switzerland from which our data are drawn. Section 3 outlines the detection methods for flagging both complete and incomplete bid-rigging cartels. Section 4 applies our original application to incomplete cartels based on empirical data from the cases of See-Gaster and Strassenbau Graubünden. Section 5 concludes.

2 Bid-Rigging Cartels and Data

The Swiss Parliament revised the federal Cartel Act and introduced a sanction regime in April 2004, with an adaptation period of one year, alongside a compliance program. This legislative modification helped initiating a change in the praxis towards economically harmful bid-rigging cartels. At the end of 2004, COMCO began investigating the Ticino cartel, releasing its decision in 2007. The Ticino cartel dissolved without sanctions since it had ended its illegal conduct precisely before April 2005, consuming the entire adaptation period. However, it stressed the damage and mischief of a bid-rigging cartel with a price increase of over 30% (see Imhof 2019). In 2008, COMCO decided to prioritize fighting bid rigging.

Following its decision in the Ticino case, the authority prosecuted many bid-rigging cases. Initially, COMCO rendered an essential decision against bid rigging every other year. From 2015 onwards, however, COMCO rendered more decisions, emphasizing its determination to prosecute bid-rigging conspiracies. Table 1 lists COMCO’s most important decisions in bid-rigging cases and the sanctions it imposed in each case.

Table 1 Decisions of COMCO in bid-rigging cases

Overall, COMCO opens an investigation if there are reasonable grounds to assume the existence of a bid-rigging cartel. Compliance programs, whistleblowers, and procurement agencies can provide insightful information leading to the opening of an investigation. However, COMCO decided to reduce its dependence on such sources and started to develop statistical methods for detecting bid rigging based on screens (see also Imhof et al. 2018). Based on the latter method, COMCO opened an investigation of bid rigging in the region of See-Gaster in 2013.

Considering the evolution of the cases investigated by the COMCO in recent years, incomplete bid-rigging cartels occur more often than well-organized complete cartels. Therefore, if COMCO desires to reduce the dependence on external sources to open investigations, it must continue to improve its detection methods. Our approach for flagging both incomplete and complete bid-rigging cartels proposed in this paper responds to that need. It is likely to be of interest to competition agencies around the world.Footnote 1

In the empirical analyses, we use data from two of Switzerland’s most important cases: the See-Gaster cartel and the Graubünden asphalt cartel. Finally, after discussing procurement in Switzerland, we synthesize the main aspects of Swiss procurement data in both cases.

2.1 Procurement Data

Procurement agencies of cantons and cities in Switzerland follow the Agreement on Public Markets, which states that procurement agency can choose between four procedures: the open, the invitation, the selection, and the discretionary procedure.Footnote 2 In the construction sector, a procurement agency generally uses either the open procedure or the procedure on invitation. The open procedure does not restrict the participation of submitting firms, in contrast to the procedure on invitation, where the procurement agency invites only a small number of firms, in general, three to five, to submit a bid. This changes the nature of the competition, as the participating firms are aware of the restricted number of potential competitors.

A procurement agency announces future contracts and the deadline for submitting bids (varying according to the procedure) in an official journal. If a firm is interested in submitting, the procurement agency provides the firm with all the relevant documents or information for the contract. Firms prepare their bids for submission between the time of the announcement and the deadline. Collusive agreements, if any, between firms are typically concluded during this period.

At a pre-announced date, the procurement agency gathers the incoming bids for the contract and opens them. It officially records all the bids received on time in a bid summary or so-called official record of the bid opening and registers the firms’ names, addresses, and bids. Having registered the official record of the bid opening, the procurement agency proceeds with a detailed examination of the bids. In awarding the contract, the agency considers not only the price of the bids, but also other criteria such as quality, references and environmental or social aspects. However, as contracts are relatively homogeneous in the construction sector, especially in road construction and associated civil engineering, the price in practice remains the most important criterion for awarding the contract. Furthermore, the differences in firms’ criteria other than price are typically small. We, therefore, consider the procurement process as an almost first-price sealed-bid auction.

2.2 The Cartel in See-Gaster

COMCO opened its investigation in the region of See-Gaster mainly because of a statistical analysis based on procurement data from 2004 to 2010 provided by the canton of St. Gallen (see Imhof et al. 2018).Footnote 3 In total, eight firms participated in bid-rigging conspiracies in the region of See-Gaster, including the district of See-Gaster in the canton of St. Gallen and the districts of March and Höfe in the canton of Schwyz.Footnote 4 Cartel participants regularly met once or twice a month. They discussed future contracts being put out to tender in their meetings and exchanged their interest in them. The contracts included road construction, asphalting and civil engineering. Before each meeting, one cartel participant sent an actualized table to all the others, listing all future contracts in the region of See-Gaster. Each cartel participant had a column to put a star to a contract if it was interested in obtaining the contract, or two stars if it wished to register a very high interest.Footnote 5 When the tender procedure for a contract started, the cartel typically designated the cartel participant that should win it. The allocation mechanism was based on the interests that had been announced and fairness in allocating contracts to participants for maintaining cartel stability.Footnote 6 In addition, if two cartel participants had both put two stars for a specific contract, they might have formed a consortium to share the contract, while other participants covered the consortium.Footnote 7

The cartel took decisions on contract allocation during the meetings in which they discussed the list, but they organized separate meetings to discuss the price of the bids.Footnote 8 One reason for separate meetings is that not all cartel participants were interested in fixing the price since not all necessarily participated in the tender. Second, discussions about price might have taken up too much time, such that the cartel preferred the designated winner to invite the other bidders to a separate meeting for discussing the price. COMCO found some evidence that from time to time, the cartel used the mechanism of the mean in determining the bid to be made by the designated winner,Footnote 9 which implies that the latter had to submit either its own bid or the mean of all the exchanged bids in the separate meetings. Using this mechanism, the designated winner had some incentive to provide a relatively high bid to influence the calculated mean in the separate meeting. All the other cartel participants whose announced bids were below the mean or below the winner’s bid increased their bids to cover the designated winner. As a result, they generally ensured a minimal price difference of 2–3% between the bid of the designated winner and their own bids.Footnote 10

Finally, the cartel also made decisions about contracts that were left free for competitive bidding.Footnote 11 This decision was also determined by the presence of external bidders. When the extent of external bidders increased, the chances of a cartel’s success decreased, and, thus, the incentive to collude declined. This was the case for some high value contracts, for which more non-cartel firms were interested in bidding. Sometimes, the cartel also tried to bring external firms into the agreement.

In June 2009, the cartel ended its illegal conduct after COMCO launched house searches in the canton of Aargau, which to a certain extent explained the breakdown of the cartel. In its decision, COMCO attested that the cartel had discussed more than 400 contracts in the region of See-Gaster from 2004 to 2009 with a value of 198 million CHF. COMCO also proved that the cartel had attempted to rig at least 200 contracts with a value of 67.5 million CHF.Footnote 12 In making its decision, COMCO sanctioned the firms involved for bid-rigging conspiracies with more than five million CHF. Two firms applied to the leniency program, and two other firms settled an agreement to close the case. Four firms appealed against the decision.

2.3 The Strassenbau Cartel in Graubünden

The local trade association members organized the cartel in the canton of Graubünden for road construction. In its decision, COMCO proved that the cartel participants met regularly in the period being investigated, from 2004 to the end of May 2010. The meetings, called “allocation meetings” or “calculation meetings”, were mainly held at the beginning of the year since the canton and the local municipalities put most of their contracts out to tender in the spring of each year.Footnote 13 The cartel discussed contracts for road construction and asphalting tendered by the canton of Graubünden and the local municipalities. Since mountains and valleys profoundly mark the geography of Graubünden, the cartel was divided into firms operating in the north and south, respectively.

In the north of Graubünden, the cartel mostly organized its meetings in the office of the most important mixing plant in the canton and, to a lesser extent, in the offices of the cartel participants. The meetings included either all of the twelve to thirteen cartel participantsFootnote 14 or two different subgroups.Footnote 15 In the south, the total of six cartel participantsFootnote 16 also organized such meetings, though changing their locations.

COMCO stated in its press release that the cartel decided upon the allocation of contracts based on a contingent determination for all the cartel participants in the canton of Graubünden.Footnote 17 The cartel allocated contracts according to the interests of each firm and fixed the price of the designated winner following a specific calculation method.Footnote 18 The price of the designated winner was usually above the minimal bid announced in the respective meeting. The calculation method, therefore, contributed to raising the price.

During the period investigated, from 2004 to the end of May 2010, the cartel distributed 70% to 80% of the total value of the cantonal and communal road construction contracts. The cartel rigged approximately 650 road construction contracts concerning with a total value of 190 million CHF of market volume.Footnote 19 The cartel ceased its illegal conduct in the summer of 2010 since some firms decided to stop, mainly because of increasing concerns regarding the Cartel Act.Footnote 20

2.4 Data from the Cases See-Gaster and Graubünden

We requested data on all bid summaries from the investigations of See-Gaster and Graubünden based on the Federal Act on Freedom of Information in the Administration (Freedom of information Act, FoIA).Footnote 21 COMCO approved the request and sent us the data, referred to hereafter as the Swiss data. They contain the bids, a running number for each contract, a dummy variable for each of the anonymized cartel participants and a dummy variable indicating whether the tender took place in the cartel period (taking the value of 1 for a cartel and 0 otherwise). Moreover, it includes a categorical variable for the contract type (taking the value of 1 for contracts in road construction and asphalting, 2 for mixed contracts, including road construction and civil engineering and 3 for civil engineering contracts), as well as the anonymized date and year. The first year in our sample begins with a value of 1 and the last year ends with a value of 14. The first anonymized date equals 42, and the last 4,886. To ensure anonymization of the bids, COMCO multiplied them with a factor between 1 and 1.2. The transformation does not affect the calculation of the screens.

Table 2 Overview for the Swiss data

Table 2 provides key information on the Swiss data. In order to calculate the predictors of our empirical analysis, we consider tenders with four bids or more. In total, there are 310 tenders with complete cartels with a total value of more than 110 million CHF and 2,031 bids submitted by the cartel participants. Furthermore, there are 287 tenders with incomplete cartels with a total value of more than 114 million CHF. Cartel participants submitted 1,414 bids in these tenders and external firms 650 bids. Finally, we observe 2,398 competitive tenders with a value of roughly 1,700 million CHF and 13,925 submitted bids. In Appendix D, we present additional descriptive statistics of the Swiss data.

3 Detection Methods

This section outlines our novel approach to detect bid rigging. We first describe the concept of a random forest, the machine learning algorithm used for training and testing predictive models for collusion (see Ho 1995; Breiman 2001). Second, we present the screens that enter in the algorithm as potential predictors in detail. Third, we discuss five different predictive models applied to our data that differ in the included screens. Finally, we provide descriptive statistics for two important screens in each dataset.

3.1 Random Forest

We use the random forest as a machine learning algorithm for predicting collusive and competitive tenders. In our data, the outcome is given a value of 1 for collusive tenders, including both incomplete and complete bid-rigging cartels, and 0 for competitive tenders. Note that we intentionally do not distinguish between incomplete and complete cartels, as we aim to construct a reliable method for detecting any form of bid rigging. Tenders are therefore either collusive or competitive.

Machine learning requires the data to be randomly split into the so-called training data, used to develop the predictive model, and the test data used to evaluate the model’s performance. We randomly split the data such that the training and test data consist of 75 and 25% of the observations, respectively. The random forest is a so-called ensemble method that averages over multiple decision trees to predict the outcome. Tree-based methods split the predictor space (according to the values, which the screens might take) of the training data recursively into a number of non-overlapping regions. Each split aims to maximize the homogeneity of the dependent variable within the newly created regions according to a goodness of fit criterion like the Gini coefficient. The latter measures the average gain in purity (or homogeneity) of outcome values when splitting and is popular for binary variables like our collusion dummy. Splitting is continued until the decision tree reaches a specific stopping rule, e.g. a minimum number of observations in a region or a maximum number of splits. Tree-based predictions of bid rigging (1) or competition (0) are based on whether collusive or competitive tenders dominate in the region that contains the values of the screens for which the outcome is to be predicted.

Importantly, there exists a bias-variance trade-off in out of (training) sample prediction when using such tree-based (and other machine learning) methods when it comes to model generality. More splits reduce the bias and increase the flexibility of the model specification, though at the cost of a greater variance in the unseen data, as the test sample is not used for training due to the regions being smaller. The issue of a too large variance can be mitigated by repeatedly drawing many subsamples from the initial training data and estimating the predictive model, i.e. the tree (or splitting) structure, in each of the newly generated samples. For this reason, we apply a random forest algorithm, which predicts the collusion outcome by a majority rule based on the individual trees. This means that the outcome is classified as collusion or competition depending on whether the majority of the trees estimated in the various subsamples predicts collusion or competition, respectively, for particular values of the screens. A further feature of the random forest is that at each splitting step in a specific subsample, only a random subsample of possible predictors (i.e. screens) is considered, reducing the correlation of tree structures across the subsamples and thus further reducing the prediction variance. In our application, we use the randomForest package by Breiman and Cutler (2018) for the statistical software R with growing 1,000 trees to estimate the predictive models in the training data and assess their performance in the test data based on the correct classification rate.

Note that we repeat the random sample splitting into 75% training and 25% test data and assess the predictive performance in the latter 100 times. Our reported correct classification rate corresponds to the average of the correct classification rates across the 100 repetitions. This procedure is likely to entail a smaller variance in estimating the correct classification rate than relying on a single random data split.

3.2 Screens

Screens are statistics that permit data analysis to flag anomalous outcomes indicating potential anticompetitive issues. The literature usually differentiates structural from behavioral screens in cartel detection (see Harrington 2008; OECD 2013; Froeb et al. 2014). Structural screens focus on the factors facilitating the emergence of collusive agreements and help to identify markets in which collusion is more likely. Among these factors, distinctions are made between market structure, demand-related factors, and supply-related factors (OECD 2013). In contrast, behavioral screens empirically measure the behavior of market participants and assess whether the observed behavior significantly departs from competitive behavior to flag it as a potential issue worth scrutinizing further. Following Huber and Imhof (2019) we propose using various descriptive statistics as screens and combining them with machine learning, however, to uncover not only complete but also incomplete bid-rigging cartels.Footnote 22 We consider three classes of screens: variance, asymmetry, and uniformity.

As variance screens, we implement the coefficient of variation (CV) and the kurtosis statistic (KURTO), as suggested by Huber and Imhof (2019) and Imhof (2019). In addition, we also implement the spread (SPD) of the distribution of the bids as a screen.

The coefficient of variation is widely discussed in the literature (see Abrantes-Metz et al. 2006; Esposito and Ferrero 2006; Jimenez and Perdiguero 2012; Abrantes-Metz et al. 2012; Imhof 2019) and is defined as the standard deviation divided by the arithmetic mean of all bids submitted in a tender:

$$\begin{aligned} CV_{t}=\frac{s_{t}}{\bar{b}_{t}}, \end{aligned}$$
(1)

where \(s_{t}\) is the standard deviation and \(\bar{b}_{t}\) is the mean of the bids in some tender t. The coordination and manipulation of bids by cartel participants might affect the convergence in the distribution of the bids. More precisely, we suspect that bids converge when firms in an auction form a cartel. This is the case because cover bids are somewhat higher than that of the designated winner and concentrate around similar values which are considered to be large enough to make sure that the lowest bid wins. For this reason, the following kurtosis statistic appears appropriate for capturing such convergence effects in cover bids:

$$\begin{aligned} KURTO_{t}=\frac{n_{t}(n_{t}+1)}{(n_{t}-1)(n_{t}-2)(n_{t}-3)}\sum _{i=1}^{n_{t}}(\frac{b_{it}-{\bar{b}_{t}}}{s_{t}})^{4} - \frac{3(n_{t}-1)^3}{(n_{t}-2)(n_{t}-3)}, \end{aligned}$$
(2)

where \(b_{it}\) denotes the bid i in tender t, \(n_{t}\) the number of bids in tender t, \(s_{t}\) the standard deviation of bids, and \(\bar{b}_{t}\) the mean of bids in that tender. To put it bluntly, the smaller the difference between the bids, the higher the kurtosis statistic, and thus, the higher the incidence for a collusive situation. Furthermore, we estimate the spread using the following formula:

$$\begin{aligned} SPD_{t}=\frac{b_{max,t}-b_{min,t}}{b_{min,t}}, \end{aligned}$$
(3)

where \(b_{max,t}\) denotes the maximum bid and \(b_{min,t}\) the minimum bid in some tender t.

As bid rigging may produce asymmetries in the distribution of bids, we implement the following cover-bidding screens as in Huber and Imhof (2019): the percentage difference (DIFFP), the skewness (SKEW), the relative distance (RD), and the normalized distance (RDNOR). In addition, we add an alternative measure for calculating the relative difference, namely the alternative relative distance (RDALT).

It seems plausible that cartel participants manipulate the difference between the lowest and second lowest bids to secure that the contract is awarded to the cartel’s designated winner. To analyze the difference between the two lowest bids, we use the following formula to calculate the percentage difference:

$$\begin{aligned} DIFFP_{t}=\frac{b_{2t}-b_{1t}}{b_{1t}}, \end{aligned}$$
(4)

where \(b_{1t}\) is the lowest bid and \(b_{2t}\) the second-lowest bid in some tender t. We also consider the absolute difference between the first and second-lowest bids \( D_{t}=b_{2t}-b_{1t}\) in the empirical analysis.

The manipulation of bids by cartel participants can simultaneously affect both the difference between the first and the second-lowest bid and the differences across the losing bids. Therefore, following Imhof et al. (2018), we calculate a relative distance (relative to a measure of dispersion) in a tender by dividing the difference between the first and the second-lowest bid by the standard deviation of the losing bids:

$$\begin{aligned} RD_{t}=\frac{b_{2t}-b_{1t}}{s_{losing bids,t}}, \end{aligned}$$
(5)

where \(b_{1t}\) denotes the lowest bid, \(b_{2t}\) the second-lowest bid, and \(s_{t, losing bids}\) the standard deviation calculated among the losing bids in some tender t. In terms of its predictive power the RD was outperformed by the difference between the first and the second-lowest bid divided (or normalized) by the average of the differences between all adjacent bids (see Huber and Imhof 2019)). We also consider this normalized distance in our study:

$$\begin{aligned} RDNOR_{t}=\frac{b_{2t}-b_{1t}}{\frac{(\sum _{i=1,j=i+1}^{n_{t}-1}b_{jt}-b_{it})}{n_{t}-1}}, \end{aligned}$$
(6)

where \(b_{1t}\) is the lowest bid, \(b_{2t}\) the second-lowest bid, \(n_{t}\) is the number of bids and \(b_{it}\), \(b_{jt}\) are adjacent bids (in terms of price) in tender t, with bids being arranged in increasing order.

We consider a further alternative measure for the relative distance, initially suggested by Imhof et al. (2018):

$$\begin{aligned} RDALT_{t}=\frac{b_{2t}-b_{1t}}{\frac{(\sum _{i=2,j=i+1}^{n_{t}-1}b_{jt}-b_{it})}{n_{t}-2}}, \end{aligned}$$
(7)

where \(b_{1t}\) is the lowest bid, \(b_{2t}\) the second-lowest bid, \(n_{t}\) is the number of bids and \(b_{it}\), \(b_{jt}\) are adjacent losing bids in a tender t, with bids being arranged in increasing order. In contrast to the normalized distance, the mean of the differences in the denominator is calculated using only losing bids. Furthermore, bid manipulation might affect the symmetry of the distribution of bids. For example, due to a greater difference between the first and the second-lowest bid, we assume that bid-rigging causes a more asymmetric distribution of bids. We, therefore, include the skewness as a screen:

$$\begin{aligned} SKEW_{t}=\frac{n_{t}}{(n_{t}-1)(n_{t}-2)}\sum _{i=1}^{n_{t}}(\frac{b_{it}-{\bar{b}_{t}}}{s_{t}})^{3}, \end{aligned}$$
(8)

where \(n_{t}\) denotes the number of the bids, \(b_{it}\) the \(i^{\text {th}}\) bid, \(s_{t}\) the standard deviation of the bids, and \(\bar{b}_{t}\) the mean of the bids in tender t.

Finally, we verify whether bid rigging (or competition) transforms the distribution of the bids into a less uniform distribution. More concisely, we again suspect that the higher difference between the first and the second-lowest bid influences the asymmetry, and thus a cartel leads to a less uniform distribution. Therefore, we consider the nonparametric Kolmogorov–Smirnov statistic (KS):

$$\begin{aligned} D_{t}^{+}=max_{i}(x_{it}-\frac{i_{t}}{n_{t}+1}), D_{t}^{-}=max_{i}(\frac{i_{t}}{n_{t}+1}-x_{it}),KS_{t}=max(D_{t}^{+},D_{t}^{-}), \end{aligned}$$
(9)

where \(n_t\) is the number of bids in a tender, \(i_t\) the rank of a bid and \(x_{it}\) the standardized bid for the \(i^{\text {th}}\) rank in tender t. The standardized bids \(x_{it}\) are the bids \(b_{it}\) divided by the standard deviation of bids in tender t to facilitate the comparison of tenders with different contract values. We suspect that the KS-statistic should generally differ across cartels and competitive periods.

3.3 Summary Screens

In incomplete cartels, competitive bidders distort the statistical signals produced by bid rigging in the distribution of bids in a tender. We demonstrate this in Fig. 1. Suppose we have four colluding firms colluding. We would assume that the bids converge when firms form a cartel, which could e.g. be detected based on a reduced coefficient of variation, as exemplified in the top-left panel of Fig. 1. However, a competitive bidder might distort the statistical signal produced by the cartel if bidding (significantly) lower or higher than cartel members, as exemplified by bidder 5 in the bottom-left and the top-right panel of Fig. 1. Only if the competitor submits a bid close to the collusive bids, the signal will remain (almost) unaffected, as it is the case in the bottom-right panel. Such a situation can result from a competitor trying to enjoy the umbrella effect and bidding closer to the collusive bids or from pure coincidence.

Fig. 1
figure 1

The potential effect of a competitive bidder

Therefore, the tender-based screens can fail to recognize bid rigging if they are calculated for all bids. We circumvent that distortion by calculating the screens not (only) for all the bids in a tender but all possible subgroups of three and four bids. Table 3 gives the number of possible subgroups of three or four bids, respectively, when the total number of bids in a tender varies between four to ten bids.

Table 3 Example of possible subgroups for three and four bids in a tender

For instance, in a tender with a total number of six bids, we calculate the same screen but for 15 different subgroups containing four bids and for 20 different subgroups containing three bids. In each tender, we then include summary statistics for each screen: the mean, the median, the minimum, and the maximum of the respective screen across the various subgroups of three or four bids. We use these summary statistics, so-called ’summary screens’, as predictors for flagging collusive and competitive tenders and, therefore, permit comparing tenders with different numbers of bids. We subsequently exemplify the computation of such summary screens by means of the coefficient of variation for subgroups formed on four bids.

The mean of all coefficients of variation calculated for subgroups of four bids in each tender is:

$$\begin{aligned} MEAN4CV_{t}=\sum _{s=1}^{N_{t}}(\frac{s_{st}/\bar{b}_{st}}{N_{t}}), \end{aligned}$$
(10)

where s and t denote the indices for some sub-group s and some tender t respectively, \(N_{t}\) is the number of all the possible subgroups of four bids in tender t and \(s_{st}\) and \(\bar{b}_{t}\) are the standard deviation and the mean of the bids respectively. Likewise, the minimum and maximum of the coefficients of variation across the subgroups in a tender correspond respectively to:

$$\begin{aligned}&MIN4CV_{t}=min_{s}{\frac{s_{st}}{\bar{b}_{st}}}, \end{aligned}$$
(11)
$$\begin{aligned}&MAX4CV_{t}=max_{s}{\frac{s_{st}}{\bar{b}_{st}}}, \end{aligned}$$
(12)

In order to calculate the median for subgroups of four bids in each tender, define the coefficient of variation in subgroup s and tender t as \(CV_{st}=\frac{s_{st}}{\bar{b}_{st}}\) and order the coefficients in so that

$$\begin{aligned} CV_{1t} \le CV_{2t}\le ... \le CV_{st} \le ... \le CV_{N_{t}t} . \end{aligned}$$

If the number of subgroups \(N_{t}\) in a tender is uneven, the median of the coefficient of variation in tender t is calculated as follows:

$$\begin{aligned} MEDIAN4CV_{t}=CV_{(N_{t}+1)/2,t}, \end{aligned}$$
(13)

If the number of subgroups is even, the median corresponds to:

$$\begin{aligned} MEDIAN4CV_{t}=\frac{CV_{N_{t}/2,t}+CV_{N_{t}/2+1,t}}{2}. \end{aligned}$$
(14)

We apply these approaches to all the screens discussed above across the different tenders. Note that we do not calculate summary screens for subgroups of two bidders because of the impossibility of calculating some screens as the relative distance (RD), the alternative measure for the relative distance (RDALT), the normalized distance (RDNOR), the kurtosis statistic (KURTO), or the skewness (SKEW). Moreover, cartel participants usually numbered more than two in tenders characterized by incomplete cartels. We also renounce calculating screens for sub-groups of five bidders or more. Using summary screens calculated for subgroups of five bidders only makes sense for tenders with six bids and more. Using tenders with six bids or more would have restricted our sample too much and limited the application of our suggested methods in other cases. Finally, our original application of summary screens does not require the identity of bidders. Instead, we only need the bids in each tender to apply them in many different contexts.

Appendix E presents the descriptive statistics for the samples used in the empirical analyses of the Swiss data.

3.4 Model Specification

In the empirical analyses, we consider five different predictive models that vary in terms of screens considered and a benchmarking method. For the latter, we use the benchmarks suggested by Imhof et al. (2018), developed for and applied to the Swiss construction market.Footnote 23 Model 1 only includes screens calculated for all bids in a tender (rather than summary screens). This approach relates to the one discussed by Huber and Imhof (2019). Still, it extends the set of predictors compared to the study by including the relative measure for the alternative distance (RDALT), the spread (SPD), and the Kolmogorov-Smirnov statistic (KS). In total, we use nine predictors and exclude all screens based on the absolute bid value to consider only scale-invariant screens in model 1.

In contrast, model 2 exclusively includes the summary screens, calculated for all possible subgroups of three bids in a tender. In total, we consider the application of 32 of these summary screens, using all screens of model 1 but the kurtosis (KURTO), which requires at least four bids. Model 3 uses the summary screens of all screens presented above for all possible subgroups of four bids in a tender, making a total of 36 predictors (now including the kurtosis). Model 4 considers all predictors included in models 1, 2, and 3, resulting in 77 screens in total, mixing the summary screens with the tender-based screens. Finally, model 5 also includes three screens based on absolute bid values (and thus not scale-invariant) and the number of bids in a tender (NBRBIDS), producing 81 predictors in total. The motivation for including the number of bids is that it might be easier to settle an agreement with fewer rather than more bidders. Moreover, we can account for behavioral responses of bidders to fiercer competition due to an increased number of bidders (see, e.g., Vickrey 1961). The three value-based screens are the mean bid in a tender included as a proxy for the contract value (MEANBIDS), the standard deviation of the bids in a tender (STDBIDS), and the absolute difference between the first and the second-lowest bid (D).

4 Flagging Incomplete Bid-Rigging Cartels

4.1 Application

We apply our detection method to data drawn from the cases of See-Gaster and Strassenbau Graubünden, characterized by well-organized bid-rigging cartels, which, however, faced competitive outsiders from time to time. In these real cases, competitive and collusive bidders were aware of their reciprocal existence. Evidence from COMCO’s investigations has pointed out that cartel participants adopted a more competitive behavior in the presence of competitive bidders by deciding not to collude in some tenders. An agreement’s poor chance of success due to several (potential) competitive bidders motivated such decisions to bid independently for some contracts. However, in other tenders, the cartel faced only one competitive bidder and tried to include her or him in the agreement. Moreover, competitive bidders aware of the existence of the cartel might have tried—if not enrolled in the agreement—to benefit from the umbrella effect of a cartel by bidding higher than they would have in a fully competitive situation. As a consequence of the umbrella effect, bids of the competitive bidder fall nearer to bids of collusive bidders, such as competitive bids distort less the statistical pattern produced by bid rigging (as illustrated in the bottom-right panel of Fig. 1).

We construct different samples of collusive tenders. Sample 1 includes all tenders with incomplete bid-rigging cartels and at least two cartel participants. As stated in Table 4, the average percentage of cartel participants in sample 1 amounts to 71%. Sample 2 includes tenders with incomplete bid-rigging cartels formed with at least three cartel participants. Since sample 2 excludes tenders with only two cartel participants, the average rate of cartel participants of 75% in sample 2 is superior to that in sample 1. The logic is the same for samples 3 to 5. Consequently, sample 5 has the highest average percentage of cartel participants and contains fewer competitive bidders but at least one per tender. In addition, we construct a sample including all tenders with complete cartels.

First, we investigate the performance of predictive models starting with complete cartels. As shown in Table 4, the correct classification rates do not differ notably across machine learning-based models 1 to 5, the range being 81.3% to 83.3%. However, for the benchmarking method, the correct classification rate of 61.7% is clearly below that of models 1 to 5. In addition, it differs strongly between competitive and collusive tenders, amounting to only 33.4% in the latter case. Possible explanations for this poor performance are the reliance on only two screens, which are not necessarily the optimal predictors, and the use of benchmark values for these two screens drawn from two previous investigations, which are not necessarily optimal in the dataset being considered. In contrast, machine learning approaches use a more extensive set of screens and weight their importance in a data-driven way.

However, if we adjust the benchmarks of our benchmarking approach applied, we can achieve better prediction rates for complete cartels. In Appendix B, we depict a decision tree on Fig. 2 corresponding to the minimal cross-validation error. Our pruned tree, using as predictors only the relative distance (RD) and the coefficient of variation (CV) as in Imhof et al. (2018), shows a correct classification rate of 81.6% for complete cartels. This discrepancy illustrates the fundamental difference between a benchmark method and machine learning: benchmarks are exogenous, whereas machine learning outperforms benchmarks since it chooses the best predictors in each case. While a benchmark can still be adapted to different cases, machine learning algorithms are far more precise. Nonetheless, a benchmark method requires less information to be implemented and therefore remains a simple (first) step in flagging cartels.

Considering models 1 to 5, the correct classification rates vary between 61.2% and 84.1%, depending on the sample and the model. When the proportion of competitive bidders increases, the correct predictions generally decrease, as depicted in Table 4. This result suggests that cartel participants anticipated competitive bids and decided not to collude in some peculiar tenders, for example, in the case of See-Gaster. The models with summary screens calculated for subgroups outperform model 1. Among them, models 3 and 4 slightly outperform model 2, indicating that in our case, summary screens calculated for subgroups of four bids exhibit a higher predictive power than those calculated for subgroups of three bids. The fact that we have four cartel participants per tender in most cases likely explains this result. In contrast, summary screens calculated for subgroups of three bids may work better if we mainly observe three cartel participants per tender.

Model 5, the only one including the number of bidders or the contract value as predictors, outperforms the other models and has correct classification rates of 5 to 10 percentage points higher than model 1. The advantage of models 3 or 4 over model 1 varies from 3 to 5.7 percentage points. This points to a decrease in the error rate by roughly more than 20% in some cases, even in the presence of potentially strategic interactions, i.e., outsiders aware of the existence of bid-rigging cartels and trying to benefit from the umbrella effect (Bos and Harrington 2010). Therefore, competition agencies should consider summary screens for subgroups to detect both complete and incomplete bid-rigging cartels.

Moreover, note that the benchmarking method poorly performs when flagging incomplete bid-rigging cartels and does no better than tossing a coin. Specifically for truly collusive tenders, the correct classification rates vary only between 8.7% and 14.7%.

When looking at the variable importance as reported in Table 5, we find for all models and samples that the Kolmogorov-Smirnov statistic (KS) is an important predictor. In many cases, it is among the three most important variables. This suggests that even if collusive and competitive tenders generally do not follow a uniform distribution, collusive bids are usually far less uniform than competitive bids. Therefore, the Kolmogorov-Smirnov statistic for deviations from the uniform distribution tends to exhibit notably higher values in rigged tenders than in competitive tenders.

The random forest generally picks up a balanced set of screens for the variance and asymmetry along with the Kolmogorov-Smirnov statistic for model 1 in all samples. Specifically for the sample with complete cartels, we observe for models 2 to 5 that the random forest selects screens for the variance, mainly the coefficient of variation (CV) and the spread (SPD), along with the Kolmogorov-Smirnov statistic (KS). Screens for asymmetry in the distribution of bids remain unselected for models 2 and 5 when the cartel is complete. However, when cartels are incomplete, the random forest selects for models 2 to 5 screens for asymmetry in the distribution of bids, mostly the skewness (SKEW), the relative distance (RD), the percentage difference (DIFFP), and the alternative distance (RDALT). Even though the results suggest that the screens for asymmetry are less important than the screens for variance and the Kolmogorov-Smirnov statistic (KS).

For all samples with incomplete cartels, the minima and maxima of the summary screens are the most important predictors, while the mean and median are most important in complete cartels. The results suggest that a few competitive bids sufficiently disturb the statistical pattern produced by bid rigging that it becomes difficult to detect collusion using tender-based screens. In contrast, the use of the minimum or maximum of summary screens mitigates the distortion of competitive bids in the statistical patterns produced by bid rigging and it allows us to detect with a high probability both incomplete and complete bid-rigging cartels in the Swiss data.

Table 4 Correct classification rate in the Swiss data
Table 5 Important predictors for the Swiss data

4.2 Robustness Analysis

We investigate the robustness of our results by discarding the most important predictors and applying the random forest to the remaining predictors. Since model 1 uses fewer predictors than the other models, we leave out the three most important variables, while for models 2 to 5, we drop the five best predictors. Table 6 reports the difference in percentage points in the correct classification rates when keeping vs. dropping the respective predictors.

Table 6 Differences between original random forest and random forest with discarded variables

The overall correct classification rate of model 1 in samples 1, 3, and 4, keeping all variables, predominates when dropping the three best predictors by 3.4 to 4.8 percentage points. Considering the other models and samples, we observe more or less the same predictive power when discarding the most important variables. Therefore, the remaining predictors seem to be suitable substitutes for the discarded ones. Other variables become more important when the most important predictors are omitted, and the correct classification rate is hardly affected.

Furthermore, we investigate the robustness for the type of contract. We subsequently only consider contracts for road construction and asphalting for both the cartel and post-cartel periods. We exclude contracts for civil engineering and mixed contracts combining civil engineering with road construction or asphalting. The reason for this is that certain specific characteristics of contracts in civil engineering might affect the screens and, therefore, the correct classification rate. Dropping mixed contracts and contracts for civil engineering permits us to verify whether this importantly affects the correct classification rate among the remaining contracts for road construction and asphalting. Table 7 reports the difference in percentage points in the correct classification rates when using all contracts vs. using contracts for road construction and asphalting only.

Table 7 Differences between original random forest and random forest using only contracts for road construction and asphalting

In samples 1 and 2, we find the correct classification rates of the random forest for road construction and asphalting contracts to be superior to the classification rate of the random forest with all types of contracts. For example, the difference in the (overall) classification rate of model 1 in samples 1 and 2 accounts for 6.2 and 2.8 percentage points, respectively. A possible explanation could be that we implicitly suppress some competitors when we keep only the road construction and asphalting contracts. For example, in sample 1, the average percentage of collusive bidders is 81%, which is considerably higher, as is the situation with all types of contracts (71%, see Table 4). Therefore, the cartel percentage is higher for this restricted sample of road construction and asphalting contracts alone and explains the higher performance in samples 1 and 2. In sample 3, the situation begins to change for both types, the correct classification rates being almost identical. Noticeably, the differences increase again for all models in samples 3 and 4. However, not as strong as before and in the opposite direction. Therefore, for an almost identical average percentage of cartel participants, the correct classification rates of the random forest for all types of contracts are slightly superior to those for road construction and asphalting.

To investigate the robustness of the correct classification rate across different machine learning algorithms, we also assess the performance of lasso regression and an ensemble method (including bagged trees, random forest, and neural networks) for all models and samples. We explain these algorithms, also outlined by Huber and Imhof (2019), in more detail in Appendix C. Table 8 reports the difference in percentage points in the correct classification rates of the random forest minus the correct classification rates of the lasso and ensemble method.

Table 8 Differences between original random and the lasso and the ensemble method

Considering samples 1 and 2 in Table 8, we find that the lasso and ensemble method slightly outperform the random forest. The maximum difference in (overall) correct classification rates across models and samples is 2.9 percentage points. While the somewhat lower rates speak against the random forest, performance is more uniform. Therefore, there is less divergence across both the competitive and collusive periods, which may be important to practitioners. For samples 3, 4 and 5, in general, the lasso and ensemble method slightly outperform the random forest, in two cases even more profoundly, with higher correct classification rates of 4.3 to 6.7 percentage points for model 1 in samples 4 and 5. This implies that in samples 4 and 5 (with a high amount of collusive bidders), considering summary screens does not significantly improve the predictive power of the lasso and ensemble method, in contrast to the random forest. On the other hand, and as for samples 1 and 2, the random forest shows a more uniform performance (e.g. correct classification rates are not too different for competitive and collusive tenders). We find a similar performance with regard to the (overall) correct classification rates between the random forest and the ensemble method for complete cartels. However, the random forest slightly dominates the lasso regression. Considering imbalances in the predictive performance across competitive and collusive periods, the random forest and the ensemble method have a more homogeneous performance than the lasso regression.

To conclude, in Table 8, the random forest shows a somewhat lower correct classification rate than the lasso and the ensemble method. Still, it exhibits a more homogeneous correct classification rate across both the competitive and collusive tenders. All in all, this robustness check shows the stability of our results.

5 Conclusion

In this paper, we have suggested a robust method for flagging bid rigging in tenders that is likely to be more powerful for detecting incomplete cartels than previously suggested methods. Our approach combined screens, i.e, statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculated the screens for all possible subgroups of three or four bids within a tender and considered summary statistics as the mean, median, maximum, and minimum for each screen as predictors in the machine learning algorithm. By doing so, we improved on the issue that competitive bids may distort the statistical signals produced by bid rigging.

We applied our method to data from the investigations involving incomplete cartels in the regions See-Gaster and Graubünden in Switzerland. The out-of-sample performance of machine learning using summary screens (calculated for all possible subgroups of three and four bids) as predictors outperformed other screening methods. However, the performance of all machine learning-based methods in all models still decreased concerning the relative number of competitive bids in the data of the investigations involving incomplete cartels. This decrease indicates that cartel participants anticipated competition from non-cartel bidders.

Compared to tender-based screens, summary screens increased the correct classification rate by 3 to 5.7 percentage points for incomplete cartels. This implies a substantial decrease in the error rate (one minus the correct classification rate) of 22.2%, despite the threat that predictive performance might be partially compromised by competitive bidders trying to benefit from the umbrella effect, i.e. bidding closer to collusive bids (Bos and Harrington 2010). As screening by competition agencies can trigger investigations with legal consequences for potential cartel members, such a decrease in the error rate of 22.2% appears highly desirable. Thus, our results demonstrate the usefulness of combining machine learning with an improved set of statistical screens to reduce distortions of competitive bids in incomplete cartels. Moreover, the method appears promising for detecting collusion in other industries or countries.

A limitation of our study is that we restricted ourselves to summary screens calculated per (within a) tender. On the one hand, this makes the method simple to implement on a large scale, i.e., for many tenders, in order to flag those ones appearing suspicious. On the other hand, competition authorities are in a second step required to identify those bidders worth investigating further, e.g., by verifying which bidders participated in multiple tenders among those flagged suspicious. The burden of taking this second step might be overcome by screening methods capable of directly flagging suspicious bidders (rather than tenders) (see, e.g., Imhof and Wallimann 2021). Therefore, combining our proposed summary screens with firm-specific predictors of collusion appears to be a promising agenda for future research.