Estimated w/c ratios
The obtained results are summarised in Table 4. Note that Labs 01, 08, 09 and 14 reported some of their results as ranges. There are various statistical options for dealing with such values, but it was decided based on both the nature of the tests and the main objective of the study, to treat each range as the mid-point, and to treat inequalities as the actual value. Therefore, if a lab returned a w/c estimate as ranging from x to y, then the average [= (x + y)/2] was plotted. If a lab estimated w/c as < x (or as > x), then x was taken as the returned value.
It is also worth noting that accuracy measures were claimed for many of the results, i.e. in the form of w/c ± error. Examining the detailed submissions indicated that some measures were based on statistical analysis. For example, the estimated errors provided by Lab 05 were based on the standard deviation of two replicates per series (30 images each) while those of Lab 11 were based on the 95% confidence interval (two replicates, 30 images each). However, the claimed errors reported by other labs were non-quantifiable and based on ‘previous work experience’. It was therefore decided not to use this information. Additionally, it has little bearing on the main objective of the study.
From the 20 sets of w/c ratio estimates, 14 sets (70%) from 7 laboratories (64% of participating labs) gave the exact correct order of mix w/c ratios from low to high (A, C, E, B, D) regardless of the error. These were Labs 02b, 04a–e, 05, 07, 09a–d, 11 and 13.
In the detailed submissions, many labs provided micrographs showing typical microstructural features of their sub-specimens. An example is shown in Fig. 2. The submitted micrographs clearly show that the specimens have distinct fluorescence intensity and capillary porosity. Specimen A had the lowest florescence intensity and capillary porosity, followed by C, E, B and D, in line with the increase in w/c ratio. Specimens with higher w/c ratio also had lower amount of unreacted cement, but larger amount and size of portlandite, again consistent with expectation.
Comparison between laboratories and methods
Figure 3 shows the estimated w/c ratios plotted against actual mix values from all participants. Results from laboratories that did not use reference standards (Labs 01, 02, 08, 13) are plotted separately to those from laboratories that did, the latter divided into FM-V (Labs 07, 09, 12, 14b) and FM-Q (Labs 04, 05, 14a). Data from the BSE method are treated as a separate category. Figure 4 presents the errors in the estimated w/c ratios for each participant, grouped according to the method used.
The data show that errors ranged from − 0.058 to + 0.23 or from − 14 to + 43% of the actual w/c ratios. The magnitude of error appears to be independent of the w/c ratio in some labs, but increased with increasing w/c for Labs 04, 05, 09 and 11. Of the 100 individual determinations, 61% over-estimated w/c ratio (positive errors), 29% were exact, and only 10% were under-estimated (negative errors). This suggests that there is a tendency for w/c ratio to be over-estimated, particularly for VA, FM-V and FM-Q methods. Implications of this will be discussed later.
The mean error for each lab ranged from as low as 0.01–0.15 (Table 4). It should be noted that the mean error is calculated using absolute values so that positive and negative errors do not cancel. There is a clear variation in performance between labs, even when using the same method, or between operators from the same lab applying a particular method to the same sub-specimen set and reference standards (Lab 04 and 09). This suggests that some amount of subjectivity is inevitable when interpreting fluorescence intensity. The largest errors occurred in Lab 01 and 13. Lab 01 over-estimated w/c by 0.15 in the majority of their results. Lab 13 consistently over-estimated w/c by 0.15, but gave the correct order. Both labs did not use reference standards.
It is also worth noting that several labs performed consistently well across the range with low errors for all specimens. Those that returned the most accurate estimates were Labs 07 (VA + FM-V), 14b (VA), 05 (FM-Q), 14a (FM-Q) and 11 (BSE), with errors no greater than 0.05.
Figure 5 shows the maximum, minimum and average absolute error in the estimated w/c ratio, grouped to test method. Data from the UK Concrete Society inter-laboratory precision trial  using the BS 1881-124 physicochemical method are also included for comparison (discussed later). Overall, the microscopy-based methods gave much lower errors than the BS 1881-124 method. Within the optical microscopy methods (VA, FM-V, FM-Q), labs that used reference standards performed better than those that did not. The BSE method gave the lowest range and average error, the magnitude of these are similar to those reported in an earlier study .
Figure 6 presents the frequency distribution and cumulative histogram of absolute error from all w/c ratio determinations (100) in this study. The data show that 37% of the estimated w/c ratios are within 0.025 of the target mix values, 58% are within 0.05 and 81% are within 0.1. In contrast, only 68% of the estimated w/c ratios using BS 1881-124 are within 0.1 of the target mix values.
Comparison to BS 1881-124
The Concrete Society inter-laboratory precision trial  was conducted in 2012-13 to investigate the accuracy of BS 1881-124  for determining the contents of cement, chloride, sulfate and w/c ratio. Four contemporary concrete mixes were prepared: Mix 1 and 2 were blended with fly ash and slag respectively, while Mix 3 and 4 had CEM I only with target free w/c ratios of 0.44 and 0.59. 100 mm cube specimens (56-day cured) were then distributed to 11 UKAS accredited construction materials testing labs for a round-robin assessment, of which 7 labs estimated w/c ratio. Full details are published elsewhere [3, 5, 6]. In this comparison we will ignore the blended mixes and use only data from Mix 3 and 4. In total, 29 separate w/c determinations were obtained. The estimated w/c ratios had errors ranging from − 0.24 to + 0.24 or from − 38 to + 45% of the mix values. The average error for each lab ranged from 0.03 to 0.18. Average error for all 29 determinations was 0.08. When these are compared to our data (Table 4, Figs. 4, 5, 6), it is clear that microscopy methods are more accurate and reliable compared to BS 1881-124.
The application of microscopy techniques for determining w/c ratio is based on the principle that capillary porosity of cement paste increases as w/c ratio increases. Using microscopy, it is possible to directly establish the microporosity of the cement paste rather than the concrete as a whole. This is a significant advantage over other porosity based test methods such as that given in BS 1881-124 , which cannot distinguish capillary porosity from porosity due to aggregate particles, air voids and cracks. Furthermore, the BS 1881-124 method requires a separate determination of cement content by chemical analysis of soluble silica and calcium oxide content and this is also prone to various errors . In contrast, microscopy methods do not require a priori knowledge of the aggregates and cement content, or presence of voids and cracks .
A recurring observation from several labs was the non-uniform distribution of microporosity in their sub-specimens. For example, Lab 02 observed that “all five samples has a heterogeneous texture” and reported signs of “segregation and bleeding”. Lab 04 noted that the paste was “inhomogeneous and with many plastic defects”. According to Lab 12, the “porosity distribution suggests that mix water was not uniformly distributed”. These suggest inadequate mixing or possibly artefacts from compaction and this may be factor in some of the outlying results. However, it is not unusual for concrete to show heterogeneous pore structure when viewed microscopically, even for laboratory prepared specimens, due to the random distribution and relative movement of water and cement that vary on a local scale. Presence of aggregate particles further increases heterogeneity by causing well-known microstructural gradients [28, 29].
To illustrate the above, Fig. 7 presents data from Lab 11 showing the spread in w/c ratio estimated from BSE images of Mix A and D. Substantial variability in the ‘local’ w/c ratio can be seen, ranging from 0.23 to 0.45 for Mix A and 0.40–0.60 for Mix D, but this is consistent with data reported previously using the BSE technique [21,22,23]. The variability in local w/c ratio should not be surprising given that the microstructure of concrete is inherently heterogeneous and that each w/c estimate is based on the analysis of a single image of 228 × 171 μm field of view captured at high spatial resolution. Nevertheless, when a sufficiently large number of images have been measured, the cumulative average will stabilise indicating that a representative volume has been analysed.
The participating labs are expected to ensure representative sampling. However, there are differences in the magnification, resolution and number of images analysed between labs, according to their routine in house methodology (Table 2). The total area analysed per sub-specimen ranged from as low as 1 mm2 (Lab 11)–400 mm2 (Lab 02). This could potentially be a source of error, but Fig. 8 shows no clear correlation between lab performance and total area analysed. This is perhaps not surprising given that the size of capillary pores is in the micron range and the representative elementary volume for cement paste is ~ 1003 μm3  Therefore, a sampling area of 1 mm2 is not unduly small and the variation in area analysed is not a major source of error in the estimated w/c ratio.
Starting with summary statistics, distribution-free (non-parametric) statistical tests were carried out. The most basic is a rank test to establish if the labs correctly ordered the samples. This showed that only 7 out of 80 pairs were out of order, which implies that microscopy provides a good relative test for w/c ratio determination, whether or not it is a good absolute test. Furthermore, arithmetic averages (over all labs) were in strict order and showed remarkable consistency: all five averages of measured w/c ratios were over-estimates and the errors (of each mix) were well correlated with the results, cc 0.997 (confidence > 99.99%). This implies that a simple linear calibration could provide very good absolute accuracy (discussed below), in addition to the extant relative accuracy, if given access to a sufficient number of replications.
Splitting the methods at the top level into ‘V’ and ‘F’, where ‘V’ is all results using VA or FM-V and ‘F’ is just FM-Q (and temporarily disregarding BSE because there is only one set of results), it is observed that ‘V’ has a rank order of 5/16 whereas ‘F’ has a rank order of 2/36. This is sufficient difference to justify undertaking tests to establish whether there is statistical basis for declaring one method is better. However, the arithmetic mean error (Fig. 5) shows no clear distinction between the accuracy of the methods. There is also no significant difference between ‘V’ and ‘F’ at the 80% level on a student’s T test for means (unknown variance).
However, there is a noticeable difference between methods when the results at each w/c ratio are analysed (Table 5). ‘V’ shows relatively consistent positive error while ‘F’ shows increasing error with increase in w/c ratio. This warrants further investigation, which externally to statistical analysis could include examination of the standards used for visual comparison, the ability of the method to determine correctly at high w/c ratios, and whether the physical basis for the method scales linearly with w/c ratio. Further subdividing ‘V’ into VA and FM-V did not introduce any further significant differences in these statistical tests.
Having noted that the errors appear to be linearly related to data values, we investigated the possibility that each could be corrected using a straight-line calibration based on a least squared error fit within its own subdivided data set. Whilst this is the same underlying principle (a least squared errors) as underpins bivariate regression, which is a parametric test, we are not making any assumptions about distributions if we simply fit a straight line by minimising the errors of the points from the line. The correction reduces the errors to those shown in Table 5. This indicates that calibration and formalisation of standards may have the potential to improve the accuracy of these methods.