False alarm rates of liquid explosives detection systems

Restrictions on the carriage of liquids, aerosol, and gels (LAGs) by airline passengers have been in place since November 2006, following the discovery of a terrorist plot involving homemade liquid explosives to be used on transatlantic flights (Wikipedia, 2006). Restrictions on the carriage of LAGs remain today, and the operational impact of introducing further screening of liquids is subject to ongoing debate. This paper addresses one of the concerns, namely that the false alarm rates of liquid explosive detection systems (LEDS) are adversely affected by the filling level of LAGs containers. Our study contributes to a better understanding of the operational impact of screening partially full containers, based on a large number of screening repetitions under laboratory conditions and robust statistical analysis. False alarm rates were observed for 39 LAGs screened with 5 different LEDS. For each combination, four different container filling levels (100%, 75%, 50% and 25%) were studied. These observations were used to model the impact of partially filling for sequential combinations of equipment. Three possible scenarios were considered, namely passengers being allowed to carry (1) only water, (2) water & soft drinks, and (3) all LAGs. The results show that, for a sequential combination of two equipment types, the impact of partially filled containers on the overall false alarm rate is negligible. Nevertheless, partially filled containers do result in an approximate two-fold increase in the number of items requiring level-two screening, which may be significant for airports when managing their screening processes.


Introduction
Passengers in the European Union (EU) have faced restrictions on the carriage of liquids, aerosol, and gels (LAGs) since November 2006, following the discovery of a terrorist plot involving homemade liquid explosives to be used on transatlantic flights departing the United Kingdom. Today, LAGs may be freely carried by air passengers only when in individual containers no larger than 100 ml, and with all containers fitting in a transparent, one-litre, re-sealable bag. The only exceptions are LAGs to be used during the flight for medical purposes or special dietary requirements (e.g. baby food), as well as LAGs that have been obtained at EU airports or on board an aircraft, provided they are packed in a security tamper-evident bag (STEB) with proof of purchase inside. These items have to be screened by technological means. The 'liquids ban' was envisaged as a temporary restriction to be lifted when suitable technology to screen liquids for the presence of explosives became available. In close cooperation with international partners, the European Commission developed a roadmap leading to a phased approach for replacing the restriction on the carriage of LAGs with technological screening. On 19 March 2013, the Commission adopted the necessary legal changes (European Commission 2013) to implement the first step of this phased approach. Phase 1 entered into force on 31 January 2014 and made it mandatory for airports to screen with special liquid explosives detection systems (LEDS) at least LAGs in STEBs purchased at airports and any liquid medicine or special dietary needs.
In May 2014, the Commission received a study (LeighFisher Ltd, 2014) assessing the impact of the Phase 1 of the implementation of LAGs screening (European Commission 2015). After careful analysis of operational data such as number of screened passengers and number of liquids carried per passenger, the study concluded that there was no operational impact from the first phase of lifting the liquids restrictions and that the legislation was successfully implemented. In November 2014, another study conducted by ICF International (ICF International, 2014) assessed the possibility of permitting passengers to carry bottled water through a series of airport trials. The study concluded that a significant impact on throughput and cost may occur for the airports if bottled water is permitted to be carried by passengers. This finding led to a decision taken by the Commission together with Member States and the respective stakeholders to postpone the implementation of Phase 2.
The Commission recently awarded a contract to evaluate possible solutions to the challenge of relaxing the restrictions on the carriage of LAGs through a series of operational trials at several airports across Europe (o&i Consulting, 2016). Airports participating in this pilot project include Amsterdam Schiphol Airport, Budapest International Airport, Alicante International Airport, Dublin International Airport and Malta Airport (Malta International Airport, 2016). Passengers will be allowed to carry LAGs of more than 100 ml, however they will be subjected to appropriate further screening. To complement this ongoing study, the Commission jointly with O&I Consulting initiated a survey (European Commission, 2016) during July and August 2016 to better understand passengers' preferences for the carriage of liquids through EU airports.
Screening equipment for the detection of liquid explosives in hand luggage obviously plays a key role in enabling a further removal of restrictions. Several companies have developed screening equipment based on different technologies, such as Raman spectroscopy, x-ray transmission and electromagnetic sensing. To be approved for use at European airports, they must comply with minimum EU performance standards. In a study from 2013 (Wetter and Fuhrer, 2013), several LEDS were evaluated and significant differences in performance were observed, depending on which liquids were screened. The impact of LAGs screening in checkpoint operations depends on a number of factors, including the design of the screening process, the performance of the operators, the number and type of LAGs carried by passengers, the LEDS equipment used and its false alarm behaviour.
False alarm rate (FAR) is a key parameter for screening processes. As there is no a priori knowledge about the cause of an alarm, each alarm needs to be resolved. This requires additional inspection, staff effort and time, and is likely to affect the passenger's perception of the screening process in a negative manner. One specific concern raised by airports' representatives to the Commission in 2015 is that detection equipment will produce significantly more false alarms during real operations compared to laboratory testing, particularly when passengers carry partially filled containers. Comprehensive false alarm data of LEDS under operational circumstances, including partially filled containers, is not available because it was not collected before the restrictions came into effect. The European Civil Aviation Conference (ECAC), through its Common Evaluation Process for LEDS (ECAC, 2014), has collected FAR data of commercial LEDS equipment against a standardised set of benign liquids. However, FAR data for partially filled containers is quite limited and not readily available. The new contribution of our study is the large number of screening repetitions under laboratory conditions and robust statistical analysis. We analysed our results in a two-step approach. Firstly, the hypothesis that LAGs container fill ratios have no influence on the FAR of LEDS compared to the situation of only full containers was tested to a high degree of statistical confidence, on the basis of a large set of measurements. Secondly, the impact of partially filled containers on operational false alarm behaviour was modelled, for typical combinations of LEDS and with realistic assumptions of carriage levels, for three policy scenarios, namely passengers being allowed to carry i) water only, ii) water and soft drinks, and iii) all LAGs. Consequently, our results provide clear conclusions to decision makers on the likely impact during real operations.

LEDS
Liquid explosives can be detected by various technological means, and by various concepts of operations. For the purpose of testing equipment performance, ECAC has adopted a classification scheme consisting of five categories. Type A LEDS screen individual containers that must be removed from cabin baggage and require containers to be opened for sampling. Type B LEDS screen individual container that must be removed from cabin baggage and without a requirement to be opened (original seal remains intact). Type C LEDS screen multiple containers that must be removed from cabin baggage and with no requirement to be opened. Type D LEDS screen containers inside the cabin baggage and without requirements to be opened, and finally Type D+ LEDS screen containers inside cabin baggage which also contains complex electronics (e.g. laptops). For this study, five different models of LEDS were employed, all of which are already widely installed in European airports. Three of the LEDS are categorised as Type B and two as Type C. Testing on one of the Type C models was carried out at the premises of the UK Department for Transport, and testing on the remaining four was carried at facilities of the European Commission's Joint Research Centre in Geel, Belgium.
The names of the manufacturers and devices are not identified in this article. Instead, the three Type B LEDS will be referred to as B1, B2 and B3, and the two Type C LEDS will be referred to as C1 and C2. The selected Type B equipment employ different technological approaches for detection (Raman spectroscopy, x-ray, wideband radio frequency, infrared, magnetic inductive and gravimetric analysis), while the Type C equipment are based on x-ray transmission, however implemented in different ways. The LEDS are binary detectors which produce either an 'alarm' or a 'clear' result after each screening, and the possible outcomes are summarised in the so-called 'confusion matrix' in Table 1. In this study, the experimental part involved direct observations of false alarm rates when challenged with a variety of LAGs, without any threat material present. The false alarm rate is calculated by dividing the number of false alarms by the total number of screenings.

Test items
To measure the FAR of the five LEDS, a subset of 39 benign test items (see Annex A Table 8 for a detailed description) were selected from the so-called Ventress test set, which is a set of 500 LAGs used by ECAC test centres during type testing. The 39 items were selected on the basis of their likelihood to be carried through a checkpoint in a partially filled container. For example, it was considered unlikely that passengers would carry partially filled bottles of alcohol; hence no alcoholic beverages were selected. Six main LAGs categories were chosen to cover a range of container types and clusters. They consisted of water (7 items), soft drinks (9 items), medicine (6 items), toiletries (10 items), household (2 items), and food (5 items). Most containers were plastic (29), followed by glass (5), metal (3), tetra pak (1), and aluminium pouch (1).

Statistical considerations
The false alarm rates of the LEDS were determined by performing repeated trials. Each trial had only two possible outcomes: alarm (including failure to measure) or clear. It was assumed that each trial has constant probabilities of a particular outcome, and that the trials are independent (result of one trial does not affect another). Based on these assumptions, it was considered that the experimental data follows the properties of binomial distributions. The probability, p, of getting exactly k successes (where k is the number of false alarms, in this study) in n trials is given by the probability mass function: where is the binomial coefficient. Various approaches exist for the estimation of the binomial confidence interval (Brown and Li, 2005). In this study, the Wilson-score interval (Wallis, 2013;Wilson, 1927) was chosen for the calculation of the confidence intervals. This approach gives good results, particularly for extreme probabilities, which is the case in this study (i.e. many false alarm rates close to 0% or close to 100%). According to Wilson-score interval, the upper and lower bounds of the confidence interval for a binomial distribution may be written as: where, p = probability of a particular outcome (in our case, of a false alarm) n = number of trials z α/2 = critical value (i.e. 1.96 for a 2-sided 95% confidence interval) A part of the experimental design was determining how many trials to perform for each test item. The number of trials performed is a compromise between multiple variables, including the desired degree of coverage of the test items (i.e. the sample of the population), the available staff time and the magnitude of potential differences one wishes to discern with a given degree of statistical certainty. Using the method for calculating confidence intervals described with Eqs. 1 and 3, the maximum observable difference, also called the significant delta, was determined for a range of n (number of trials). The results have shown that to be able to observe (with statistical significance) an absolute difference of 20% between two probabilities, it is needed to perform at least 100 trials for both samples (although in the best case, performing 100 trials may allow to observe differences as small as 8%). The focus of this study was on being able to detect statistically significant differences in behaviour between full and partially filled bottles. It was more important to detect relative differences in FAR with higher precision, than to determine absolute, or rather, more representative (of a larger population) values of FAR, but with lower precision. Based on these considerations, it was decided to perform 100 trials for each of the four filling levels on the 39 test items.
To determine whether there is a difference in FAR between a full container and a partially filled container, the approach of statistical hypothesis testing was adopted with the following null and alternative hypotheses: Null hypothesis, H 0 : the container filling level has no influence on the FAR of LEDS. Alternative hypothesis, H 1 : partially filled containers have higher FAR from LEDS comparing to fully filled containers Since the samples are large, the normal approximation to the binomial can be used to derive a z-score as the test statistic to determine whether to accept or reject the hypotheses (NIST 2012). If sample 1 has x 1 false alarms out of n 1 and sample 2 has x 2 false alarms out of n 2 , then the proportion of false alarms for each sample, p 1 and p 2 , are used to calculate the z-score: The absolute value of the z-score, |z| is compared to the normal z 1 − α/2 value for a two-sided test (H 0 in this study). For a one-sided test (H 1 in this study), where the alternative hypothesis is p1 < p2, z is compared to z α . In both cases, if the z-score exceeds the critical value, then the hypothesis is rejected. For the purpose of this work, α = 0.05, corresponding to a 95% confidence level.

Experimental procedure
The 39 selected test items were each screened 100 times (fully filled) on the four LEDS located at JRC (B1, B2, B3 and C1). The number of runs for the additional Type C equipment (C2) tested at the premises of the UK Department for Transport was 50, due to the reduced time available for measurements with this equipment. The procedure was repeated for three additional filling levels (75% full, 50% full, 25% full). To reduce the quantity of the LAGs, a graduated cylinder was used to measure the amount that needed to be decanted. In some case (more viscous liquids), the quantity to decant was determined using an analytical balance. A total of 67,850 screenings were recorded for this study. In line with ECAC practices, a failed analysis was recorded as an alarm. Figure 1 and Table 2 summarize the laboratory FAR per test item, per filling level, and per equipment. With the exception of B2, all the equipment demonstrated statistically significant higher FAR for partially filled items, compared to those for fully filled containers. There were no cases where the FAR for a partially filled item was significantly lower than that of fully filled item. Some specific cases of isolated high FAR were registered. Based on our experimental observations and manufacturers' comments (manufacturers were invited to comment on the results), we note that (1) certain items could not be analysed with one Type B equipment due to their metallic packaging, (2) one particular item had a higher FAR at the 25% filling level, possibly due to large difference between the weight of the container and the remaining liquid itself (weight of container was twice that of the liquid), and (3) the shape of the bottom part of certain containers affects the FAR in certain cases. For Type C equipment, the FAR behaviour across the four filling levels followed one of three patterns for many of the LAGs tested. These three patterns are generalised in Fig. 2.

Results and discussion
The first pattern describes ( Fig. 2a) a peak in the FAR at the 75% filling level, that is often significantly higher than the FAR for the 100% filling level, and is observed for 12 test items. This behaviour was possibly due to the movement of liquid on the belt of Type C equipment during the screening process. The second pattern ( Fig. 2b)     COUNTS 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 3900 1950 1950 1950 1950 *In line with ECAC practices, a failure to measure was recorded as an alarm False alarm rates of liquid explosives detection systems containers, dropping significantly, sometimes to zero, for the lowest filling level (25%).
Often this pattern correlates with items where the volume of liquid is less than 100 ml. The third pattern (Fig. 2c) is characterised by a low or zero FAR at the 100% filling level, significantly increasing at the other three filling levels. This was observed for 11 items, and an explanation for this has not been found. Figure 3 shows the same data as Fig. 1, but this time excluding any items which result in a volume less than 100 ml when partially filled (75%, 50%, and 25% filling levels). The reason for looking at the impact only on items above 100 ml is that European legislation (European Commission, 2008) currently requires screening only above this volume, hence it is reasonable to assume that equipment performance has been optimized for operations above this threshold. Considering only screened volumes above 100 ml, 27 out of the 156 'LAGfilling level' combinations were excluded. The excluded items were mainly from the categories of toiletries and medicines, which generally exhibit higher FARs than those for water and soft drinks. For this reason, the overall FARs in Fig. 1 are generally higher than those of Fig. 3 Fig. 3 Average FAR of all test items, excluding those which would result in a volume less than 100 ml at lower filling levels. Excluding items resulting in volumes less than 100 ml has almost no impact on instances of statistically-significant increases (a) (b) (c) Fig. 2 Observed patterns for FAR vs. filling levels, exhibited by Type C equipment in this study. a pattern observed for 12 items with equipment C1, b pattern observed for 6 items with equipment C2, and c pattern for 11 items with equipment C2 items with volume less than 100 ml has almost no impact on the instances of significantly higher FAR for partially filled items (the only difference is in the 50% filling level for equipment B1).

Laboratory obtained FAR per categories of test items
The observed FARs were grouped in clusters and are shown with Figs. 4, 5, 6, and 7. In Fig. 4, we see the results for the seven items from the water cluster, it can be seen that FAR behaviour is different from that across all items (shown in Fig. 1). For the Type B equipment, the only incidences of higher FAR are at the 25% filling level for B1 and B3, and here the increase is caused by 2 of the 7 test items. These points can be considered anomalies because one of the items had large weight difference between container and remaining liquid and the second one was with a specific shape of the bottom part, hence resulting in an erroneous screening analysis. Based on these observations it can be stated that for the cluster of 7 water items, there are almost no incidences of significantly higher FAR at lower filling levels for the Type B equipment. On the other hand, for Type C equipment, it is quite surprising to see such a significant FAR increase, when moving from fully filled bottles of water to partially filled. One plausible explanation for this behaviour could be erroneous Xray transmission readings caused by motion of liquid in a partially filled container, during transport on a conveyor belt of the Type C equipment. This motion would not be present in a full container, and also not in a Type B machine, as the concept of operation is different. Another reason could be that data for lower filling levels are of poorer quality due to the X-ray beam travelling through less liquid material compared to the thickness of the container material itself. The situation for the cluster of nine soft drinks (carbonated, still, and concentrated) is shown in Fig. 5, and is somewhat similar to that of the water cluster. By cross-referencing with is seen that there is one data point which causes a slight increase at the 25% filling level of B1, otherwise there is no significant increase in FAR at lower filling levels for Type B equipment. When observing the results from Type C equipment, it is seen that that significant increases at lower filling levels remain. Finally, Figs. 6 and 7 show the averages for the medicine and toiletries clusters, respectively. In these cases, Type C equipment exhibits noticeably lower FAR at the 25% filling level, which reflects the probable cut-off behaviour of the equipment at low volumes (˂100 ml). The toiletries category (Fig. 7)  rates mirror that of the entire dataset, shown in Fig. 1. The household and food clusters contain two and five items respectively. Although this data is included in the overall analysis in Fig. 1, we do not present graphs for these individual clusters, because we consider the limited number of items are not representative of the wide variety of possible LAGs in these clusters. The observed FAR data was used to test the null and alternative hypotheses. The objective was to determine under controlled laboratory conditions, and to a high degree of statistical confidence, whether or not the false alarm rate of partially filled containers is higher than that of fully filled containers. This was done by combining the experimental data included in Annex A Table 8 and Eqs. (4) and (5). The results are shown with Table 3, where the average FAR of all fully filled containers were compared to the FAR of all partially filled containers (averaged over 75%, 50%, and 25% filling ratio). From Table 3, it can be seen that partially filled containers do result in statistically significant higher FAR for equipment B1, B3, C1, and C2 and the null hypothesis H 0 for container filling level not influencing FAR was rejected. However, the same hypothesis was not rejected for B2 equipment, resulting in the conclusion that in this particular machine, FAR of partially filled containers was considered not statistically significant when compared to the FAR of full containers. These conclusions are valid for all three policy scenarios considered.

Simulated operational FAR results per categories of test items and combination of equipment
The aim of this part of the work was to simulate the impact of partial filling on the FAR during real operations of sequential combinations of equipment, using reasonable assumptions of the relative prevalence of filling levels and proportions of  elements was 100%. The operational FAR of single LEDS equipment (Type B or Type C) was determined by summing the products of each element in the FAR matrix of the equipment and in the weighting matrix. For combinations of equipment, the operational FAR of a sequential combination of Type C and Type B LEDS was determined by summing the products of each element in the Type C FAR matrix, in the Type B FAR matrix, and in the weighting matrix. This approach, in combination with the experimentally obtained dataset, allows the simulation of the overall operational FAR, where only LAGs that cause an alarm on the first equipment are sent to the second (and only objects which cause an alarm on the second equipment are sent to the third). This approach ensures more realistic overall FAR values than those that would be determined simply by multiplying the average FAR for each equipment, and the outcomes reflect any correlation between occurrences of false alarms on different screening technologies. Limited data exists on the filling levels of liquids in air-passenger hand luggage, as this data was not recorded prior to the sudden onset of the liquids ban in 2006. In order to be able to perform the simulation approach, and to demonstrate results for FAR on an operational level, data for filling levels and type of LAGs were extracted from two reports. The two studies were carried out by LeighFisher Ltd and ICF International. The first one (Leigh Fisher Ltd, 2014) was conducted to assess the impact of the first phase of the implementation of liquids screening. The study analysed the type and number of LAGs over 100 ml, False alarm rates of liquid explosives detection systems carried by passengers before and after completion of the first phase roll-out. The data was collected from six airports between November 2013 and April 2014 and comprised a total sample of 10,332 items. It has been reported that from the total number of items carried at the airports during this period, water was 37%, followed by cosmetics (27%), soft drinks (15%), food (13%), STEB (3%), alcohol (2%), perfume (2%), and medicines (1%).
To assess the possible second phase for lifting of the liquid restrictions, ICF International carried out a separate study (ICF International, 2014) which investigated the possibility of permitting passengers to carry bottled water at the airports. For the purpose of the study, passenger surveys were conducted of carriage levels of bottled water at three Eurostar terminals (London, Paris, and Brussels) in September 2014. It was reported that 13% of containers were 100% full, 46% of containers were 75% full, 18% of containers were 50% full and 23% of containers were 25% full. Carriage of liquids on Eurostar services was considered reasonably similar to that of air passengers if there would be no ban. The mapping of the relevant data in these reports to this work is shown in Table 4. The LAGs cluster prevalence was combined with the filling levels to produce a weighting matrix, reproduced in Annex B Table 9. Results from the modelling are   Tables 5, 6, and 7. Table 5 reports the overall FAR for the water-only scenario.
Looking first at the Type C equipment, it can be seen that the FAR of both C1 and C2 increase significantly when moving from the situation of only full containers to the one with including partially filled containers. C1 increases from 6.3% to 26.0%, and C2 increases from 0.1% to 15.0%. On the other hand, none of the three Type B equipment exhibits any significant increase from full containers to partially filled containers, for the water -only policy. Additionally, it can be seen that for any 2-step combination of Type C and Type B equipment, the overall FAR is at most 0.7% and for the 3-step combination (C:B:B), the FAR is effectively zero. The modelled FAR for the second policy option (passengers allowed to carry both water and soft drinks), is given in Table 6. The results are similar to the water-only-policy: the FAR of the Type B equipment exhibit negligible increases, while the Type C increases significantly (although not quite as dramatically as for water only). C1 increases from 5.8% to 20.0%, while C2 increases from 3.7% to 16.6%. For any 2-step combination of Type C and Type B equipment, the overall FAR for water and soft drinks is at most 1.6%. For a 3-step combination, the FAR is effectively zero. Finally, Table 7 shows the simulated overall FAR for the third policy scenario where passengers can carry all kinds of  Although all equipment models exhibit increased FAR for partial filling, again a notable difference in magnitude between the Type B and Type C is observed. Equipment C1 increases from 9.6% to 21.1%, and C2 increases from 13.0% to 24.3%. Here, B3 also exhibits a more significant increase, from 8.1% to 13.2%. The highest overall FAR for a 2-step combination for the all-LAGs-policy is 6.6%. For a 3-step combination, the highest overall FAR is 2.0%. The results from Tables 5, 6, and 7 can be summarised by stating that, across all three policy scenarios, a Type B equipment in isolation exhibits a negligible increase of FAR with partial filling, while the Type C equipment in isolation exhibits a significant increase of FAR with partial filling (up to a maximum of 26%). With respect to sequential combination of two equipment (Type C:B), there is no significant increase of overall FAR for partial filling compared to fully filled, since the Type B equipment is used to resolve alarms generated by the Type C equipment. However, partial filling causes greater number of items to require level-two screening (around a two-fold increase). The overall FAR for equipment in combination is not simply the product of the individual averaged FAR values. The lowest overall FAR occurs when Type C and

Conclusions
This study was undertaken to provide further insight into the operational impact of screening of liquids, aerosols and gels (LAGs) in air passengers' hand luggage resulting from equipment false alarm rates. In particular, this study addressed a lack of systematic data on the potential influence of partially filled containers on the false alarm rates of commercial LEDS. Five different LEDS (three Type B and two Type C) were systematically challenged with 39 benign LAGs likely to be carried through checkpoints in full and partially filled containers (75%, 50% and 25%.), in a laboratory environment. Conclusions per clusters of LAGs (water, soft drinks, medicines, toiletries, and food) could be drawn regarding the impact of partial filling on FAR of individual equipment as well as overall FARs of typical sequential combinations of two or three LEDS equipment, as foreseen in EU legislation.
The results show that partially filled LAGs containers have little to no effect on the FARs for Type B equipment, compared to those for fully filled containers. However, a significant increase for Type C equipment was found. Under the 'all LAGs' scenario, the overall FAR for Type C equipment increased from 10 to 13% for full containers to 21-24% for partially filled containers. We modelled the influence of partially filling on the overall FAR of sequential combinations of equipment under operational conditions, using reasonable assumptions of carriage levels of LAGs. The FAR for double (Type C:B) and triple (Type C:B:B) combinations of equipment were calculated, for three possible policy scenarios. In a typical sequential combination of two equipment (Type C, followed by Type B), the impact of partially filled containers on the overall false alarm rate is negligible, since Type C alarms are resolved by Type B equipment. In a scenario where passengers can carry all LAGs, including partially filled containers, the overall FAR for a two-step combination (Type C:B) is expected to lie between 0 and 7%. For a combination of 3 cascading LEDS (Type C:B:B) the highest overall FAR for the 'all LAGs' scenario was about 2%. Nevertheless, partially filled containers do result in an approximate two-fold increase in the number of items requiring level-two screening, which may be significant for airports when managing their screening processes. Based on this study, it is recommended that type testing is modified to include more partially filled containers for Type C equipment, in order to improve the representativeness of laboratory testing with real operational performance.

Annex B
This table combines the information from the matrix that was used as a basis for deriving additional weighting matrices, simply by excluding certain columns or rows, and re-normalising the sum of the remaining values to 100%. For each of the three policy scenarios, two matrices were generated (fully filled, and a weighted mixture of partially filled), yielding a total of six matrices.