1 Introduction

Total suspended matter (TSM) concentration maps from satellites are used in ecosystem modelling (Huret et al. 2007; Lacroix et al. 2007) to determine the light available for photosynthesis and hence the primary production. In various estuarine and coastal marine ecosystems, TSM was shown to govern almost entirely the underwater light attenuation (Devlin et al. 2008; Christian and Sheng 2003; Tian et al. 2009). However, remote sensing (RS) products are highly biased towards “good weather” conditions, mainly due to the cloud cover (Fettweis and Nechad 2011), and frequent gaps may occur in the time series of a RS product. To fill the gaps in TSM data retrieved from the MODerate-resolution Imaging Spectroradiometer (MODIS) over the southern North Sea (SNS), the Data Interpolating Empirical Orthogonal Functions (DINEOF) technique—previously used to reconstruct the sea surface temperature (SST) fields from satellites (Beckers et al. 2006)—is applied here to MODIS daily TSM maps as a follow-up to the work by (Sirjacobs et al. 2011). In a study by Alvera-Azcárate et al. (2005), it was shown for the reconstruction of SST that DINEOF is better than the most commonly used technique for data reconstruction, which is the Optimal Interpolation, in terms of quality of the results as well as the computational speed.

Measurements of the optical backscattering (OBS), which give a measure of water turbidity and are a good proxy for TSM concentrations (Boss et al. 2009), were collected by surface SmartBuoys from the Centre for Environment, Fisheries and Aquatic Sciences (Cefas). They represent the sea truth and are used to validate both MODIS and DINEOF TSM products (Mills et al. 2005). The study site is the SNS coastal waters (Fig. 1). In this region, the water column is well mixed due to the hydrodynamic forcing (semi-diurnal tides, high currents, the south westerly dominating winds and the relatively shallow bathymetry (<20 m)). The main sources of TSM in these waters are river discharges and coastal erosion (Fettweis et al. 2007).

Fig. 1
figure 1

Top of atmosphere image taken by MODIS on 17 April 2010 at 13:15 UTC over the southern North Sea. The filled white circles show the locations of Warp (WA, 15 m depth) and West Gabbard (WG, 25 m depth)

The Cefas OBS and TSM data, MODIS and DINEOF TSM products obtained over the SNS are described in Section 2. The results of comparison between Cefas and MODIS (respectively DINEOF) TSM products are shown in Section 3. The conclusions are given in Section 4 of this paper.

2 Data and methods

2.1 Cefas time series

Optical backscatter in formazine turbidity units (FTU) are collected continuously by the Seapoint turbidity metre on the surface SmartBuoys (Mills et al. 2003) with a high frequency time step of 30 min, between 1 and 2 m below the water surface (Mills et al. 2005). The turbidity metre measures the light scattered by particles in the scatterance angles of 15° to 150° from a light source emitted at 880 nm. The time series of OBS measurements at Warp Anchorage (WA; 51.53° N, 1.03° E at 15 m depth) and West Gabbard (WG; 51.98° N, 2.08° E at 25 m depth) are shown in Fig. 2 and exhibit high-frequency variability inside the seasonal variations. This represents a large amount of high-frequency data (more than 55,000 OBS values at each location). Measurements at WA are generally higher than at WG by a factor that may range from 3 to 8, e.g. March 2005 in Fig. 2.

Fig. 2
figure 2

The time distribution of Cefas OBS data in Warp Anchorage (red) and West Gabbard (dark blue) locations

TSM concentrations are estimated from the OBS data by calibration from gravimetric measurements of TSM collected by several AquaMonitor water samplers and also by a rosette sampler during mooring servicing. The OBS-estimated TSM will be referred to as “Cefas-TSM” and denoted hereafter by TSMOBS. The calibration factors α and β are used in a linear relationship: \( {\text{TS}}{{\text{M}}_{\text{OBS}}} = \alpha {\text{OBS}} + \beta \). The time and space variation of these factors indicates that the specific backscatter of suspended matter changes with the size and composition of particle which varies because different particles are resuspended or transported to these locations.

Thirty-seven (respectively 36) gravimetric TSM and OBS datasets, with data size ranging from 7 to 42, were available at WA (respectively WG) throughout the period 2003–2006 and were used to calibrate the OBS sensor and set up a predictive model for TSM estimation from OBS. Average, minimum and maximum values computed from these 37 (and 36) coefficients α and β are reported in Table 1. The average values of the slopes are quite similar for WA and WG, but average intercepts are two times larger at WA (more turbid waters) than at WG (clearer waters). In the range of TSM >5mg l−1, the ratio TSMOBS/OBS may be 30% higher than the average ratio, which means the calibration coefficients are biased towards the higher values (because of the linear regression used). The average values \( \bar{\alpha } \)and \( \bar{\beta } \) and the median values α m and β m were computed without consideration of the number of data in each OBS calibration dataset. If a linear regression analysis is performed on the total datasets of Cefas-TSM and OBS at each location, then a relationship taking into account the distribution of the data is established yielding: \( {\text{TS}}{{\text{M}}_{\text{OBS}}} = {\alpha_{\text{distr}}}{\text{OBS}} + {\beta_{\text{distr}}} \), where α distr is between the median and average values α at WA and WG, and β distr between average and median values of β at WG and higher at WA (as reported in Table 1).

Table 1 The total number of OBS and TSMOBS data at WA and WG

2.2 MODIS TSM products

MODIS TSM maps considered in this study cover the SNS area and the 4-year period 2003–2006. The percentage of cloud-free pixels in TSM maps varies daily, with an average of 60 partially cloud-free images each year. The number of images where at least 50% of the SNS area is cloud-free is reported in Fig. 3. There are 1,903 maps covering the SNS from which 927 images are constituted by one MODIS overpass per day and 976 images of two MODIS overpasses per day. A total of 488 images were obtained by merging two images taken on the same day (by averaging the two valid values in each pixel), which leads to a total number of \( {927} + {488} = {1},{415} \) daily maps. If two images are available on the same day, then the merged map is to feed the DINEOF processing scheme (described in the next section) in view to obtain the highest number of valid pixels from a TSM map on a given day. Otherwise, the two images per day would be rejected if each of them contains less than 2% valid pixels (see Section 2). However, it is the original TSM dataset that is used in the comparison with Cefas time series (method described hereafter).

Fig. 3
figure 3

Number of MODIS images with less than 50% cloud coverage over the SNS from 2003 to 2009

TSM is retrieved from the water-leaving reflectance at band 667 nm (ρ w) after atmospheric correction of MODIS top of atmosphere radiances, using the SeaDAS software version 5.2 (NASA Ocean Color Biology Processing Group). Conversion of ρ w to TSM is achieved using the TSM algorithm of (Nechad et al. 2010) of the form:

$$ {\text{TSM}} = \frac{{A\,{\rho_{\text{w}}}}}{{1 - {\rho_{\text{w}}}/C}}\,\left[ {{\text{mg}}\,{{\text{l}}^{{ - {1}}}}} \right] $$
(1)

where A is the ratio of absorption by non-particles, a np, to the specific backscatter of particles, b bp*. A was calibrated from in situ measurements of TSM and ρ w in (Nechad et al. 2010) yielding 362.09 mg l−1 for 667 nm. C is the ratio of the specific backscatter to the specific absorption of particles, with C = 17.36 10−2 at 667 nm (Nechad et al. 2010). Note that A was found to be well approximated (Nechad et al. 2010) by:

$$ A\,\sim \,a_{\text{w}}^{\text{667nm}}\,/\,b_{\text{bp}}^{{*{\text{667nm}}}}\,\left[ {{\text{mg}}\,{{\text{l}}^{{ - {1}}}}} \right] $$
(2)

During the atmospheric correction of MODIS, level 2 flags are assigned to ρ w pixels, describing the performance of the atmospheric correction and quality of the level 2 product (Patt et al. 2003). The TSM product is hence also flagged where the quality of reflectance is suspicious because of atmospheric correction failure, cloud, stray light, high aerosol path concentration, negative reflectance, sun glint, high sun or sensor zenith angle.

To allow for comparison between in situ data from Cefas and MODIS data, the quasi-simultaneous MODIS matchups (within 15 min of Cefas time) are averaged at 5 × 5 pixel boxes around the locations of Warp Anchorage and West Gabbard, discarding the flagged pixels. These TSM spatial averages are further filtered using only those computed from more than 12 unflagged pixels in the 25-pixel box, leading to what will be referred to as the “good quality dataset”. The remaining data where only a few pixels could be used to estimate the average value have less reliability—since their averaging boxes are located in areas of flagged pixels (i.e. at cloud edges)—are called “questionable quality dataset”. Finally, points where no average value could be determined (all pixels flagged) are the “missing data”. The three categories of good, questionable quality data and missing data will be respectively referred to as G, Q and M (which stands for “Missing”) datasets.

The total number of MODIS TSM data gathered at WA and WG is 183 and 184, respectively. One hundred good quality MODIS data were obtained at WA, which is significantly lower than at WG (141) due to contamination of the pixels by land (stray light flagged pixels).

2.3 DINEOF data

DINEOF is a technique to infer missing data in satellite datasets (Beckers and Rixen 2003; Alvera-Azcárate et al. 2005). DINEOF uses, through an iterative procedure, a truncated empirical orthogonal function (EOF) basis to calculate the value of the missing data. The procedure starts by removing the temporal and spatial mean from the original data and initializing the missing values to zero (i.e. to the mean). Then, at each iteration, the EOF basis is used to infer the missing data, and a new EOF basis is calculated using the improved dataset.

Images with less than 2% of valid data were removed from the complete TSM dataset of the southern North Sea. Also, individual pixels that were present in less than 2% of the time series length were removed as well. These data are too sparse to be reconstructed accurately, and they would even affect the quality of the overall reconstruction. The final dataset has 71% of missing data. The probability density distribution of a dataset can be completely described by its mean and the eigenvectors of the covariance matrix (the EOFs) if the data are normally distributed. Variables such as TSM, however, do not have a Gaussian distribution since TSM is never smaller than zero. DINEOF typically does not take this into account. To overcome this, a logarithmic transformation is performed on the TSM dataset before applying DINEOF.

A first DINEOF reconstruction is performed in order to remove the outlier data, following (Alvera-Azcárate et al. 2011). The truncated EOF basis is used, together with a cloud proximity test and a local median test, to identify suspicious data. An equal weight of 1/3 is given to each of these three sub-tests, and data with an outlier index—as defined by Eq. 1 in (Alvera-Azcárate et al. 2011)—higher than 3.5 were identified as outliers and removed (0.4% of the total data are removed in this step). The result of this outlier treatment is illustrated in Fig. 4 with MODIS image taken on May 30, 2005 and the derived DINEOF maps. They show an overall smoothing of the TSM field in the final reconstructed data: while recurrent features such as the high TSM in zones 1 and 2 (Fig. 4) may pass the outlier test, less frequent patterns at smaller scales, e.g. in area 3 in the English Channel, are smoothed.

Fig. 4
figure 4

DINEOF reconstruction of MODIS TSM image taken on 30 May 2005 at 13:00 UTC. a MODIS image, b the first guess reconstruction and c reconstruction after removal of outlying data

Five EOF modes are retained by DINEOF as the optimal number for the reconstruction. The optimality is defined by a cross-validation approach, in which 3% of valid data are set apart at the beginning of the reconstruction. These data are taken in the form of clouds to ensure that the error of the cross-validation is representative of the whole dataset (see (Beckers et al. 2006)). At each new EOF mode calculation, the root mean square (RMS) error between these initial data and the reconstruction proposed by DINEOF is calculated, and the number of EOFs that minimizes this RMS error is considered the optimal number of EOFs for the reconstruction of the missing TSM data. The RMS error for the reconstruction using these five EOFs is 1.7 mg l−1 (the standard deviation of the dataset is 3.1 mg l−1). The five retained modes explain 94% of the total variance.

2.4 Methodology

DINEOF and MODIS TSM datasets are validated against Cefas-TSM data at Warp Anchorage and West Gabbard locations, for the four groups:

  • G: Cefas-TSM, MODIS and DINEOF TSM products at good quality MODIS pixels

  • Q: similar to G, but at questionable MODIS pixels

  • M: Cefas-TSM and DINEOF TSM products at pixels where MODIS is missing and D: Cefas-TSM and DINEOF TSM products (G + Q + M).

Validation of TSM products conducted on the separate groups G and Q will indicate how DINEOF is performing when the input data are of good or bad quality. In group M, the validation will give information on how well DINEOF predicts TSM data when these are missing in the satellite dataset, and in group D, it will show the global performance of DINEOF.

As a first step in this validation, MODIS TSM and Cefas-TSM taken from group G are used to compute the correlation coefficients, root mean square errors and the relative errors defined by:

$$ \varepsilon = \sum\limits_{{i = 1}}^n {\frac{{\left| {{\text{TS}}{{\text{M}}_i}\,{ - }\,{\text{TS}}{{\text{M}}_{{{\text{OBS}}i}}}} \right|}}{{n\,{\text{TS}}{{\text{M}}_{{{\text{OBS}}i}}}}}} $$

where n is the size of the G dataset. This first step aims to inspect the validity of a long TSM time series retrieval from MODIS imagery at two fixed locations. A validation study was already carried out in (Nechad et al. 2010) on the basis of a small dataset constituted by in situ reflectances and TSM in the SNS and by MODIS-derived TSM and concurrent in situ TSM (21 matchups; Nechad et al. 2010). However, in that study, there was no information on the particulate specific backscatter. The Cefas data offer a larger dataset of in situ OBS and TSMOBS which may be affected by the local variations of the optical properties of particulates. The MODIS-derived TSM does not take these variations into account because the algorithm (Eq. 1) assumes an average specific particulate backscatter coefficient, \( b_{\text{bs}}^{{{\text{*667nm}}}} \). Hence, impact of this main limitation in the TSM algorithm will be examined.

In a second step, DINEOF TSM products from datasets G, Q, M and D are compared to Cefas-TSM and TSM retrieval by DINEOF in datasets G and Q is compared to that by MODIS.

3 Results

3.1 MODIS TSM

Figure 5 shows a very good agreement between MODIS TSM products and Cefas-TSM at WA (correlation factor of 85.4% and 29% relative error). There is only a slight general underestimation of TSM by MODIS as denoted from the slope value of 0.96 of the regression line. At WG, the correlation is slightly higher (87.2%), although a much higher scatter in MODIS TSM is noticed at WG for TSM <10 mg l−1, giving higher relative errors of 34%.

Fig. 5
figure 5

The MODIS TSM vs Cefas-TSM in Warp Anchorage (red, left figure) and West Gabbard (blue, right figure) locations, from dataset G. The solid line is the regression line between in situ and MODIS TSM products, the dotted line is the 1:1

3.2 DINEOF TSM

DINEOF TSM products derived for Warp Anchorage in group data Q have a quite similar relative error as for group G (even 4% slightly lower; Fig. 6); whereas a larger scatter of point is shown for MODIS products in group Q than in G, this is less noticeable in DINEOF data. The global performance of DINEOF is about 11% less than MODIS in dataset G and only 6% less in Q. This means that DINEOF products are not affected by the bad quality products from MODIS. Note that while MODIS overestimates TSM in group Q, DINEOF provides underestimated TSM products. Nevertheless, high correlation coefficients are found for G and Q, about 70% and 82%, respectively.

Fig. 6
figure 6

The MODIS and DINEOF TSM versus TSMOBS in WA, from groups G (upper figures) and Q (bottom). The solid line is the regression line between TSMOBS and MODIS (and DINEOF) TSM

Figure 7 displays DINEOF and MODIS TSM products obtained at West Gabbard, from datasets G (140 points) and Q (42 points). It is remarkable how DINEOF could correct the MODIS input TSM in the group Q, by treating outliers, and generate TSM data with a better accuracy and 37% relative errors, which is significantly lower than 82% relative errors in MODIS TSM. Again, DINEOF showed an overall underestimation of TSM, with slope values of 0.89 and 0.82 respectively for datasets G and Q.

Fig. 7
figure 7

The MODIS and DINEOF TSM versus TSMOBS in WG, from groups G (upper figures) and Q (bottom). The solid line is the regression line between TSMOBS and MODIS (and DINEOF) TSM

Note that the scatter of DINEOF TSM products exhibited in Figs. 6 and 7 is rather uniform along the TSM values, which is not the case with MODIS TSM products which generally have higher scatter in the lower TSM range.

Figure 8 shows the DINEOF TSM products obtained in group M where MODIS TSM data are missing. Significant correlation coefficients were found between in situ and DINEOF TSM at WA (68%) and WG (64%). There are 5% less relative errors in the prediction of TSM at WA than at WG, where the mean absolute errors (5.64 mg l−1) are though less than at WG (11.85 mg l−1). Hence, considering the range of the data (higher at WA than at WG), a comparable performance could be expected if both of the datasets were covering the same range of TSM values.

Fig. 8
figure 8

The DINEOF TSM vs TSMOBS in WA (left) and WG (right) from dataset M. The solid line is the regression line between TSMOBS and DINEOF TSM. The dotted line is the 1:1

The global assessment of the performance of DINEOF in reconstructing TSM time series is conducted by comparison of its TSM data with Cefas-TSM in group D (not shown here). This gives high correlation coefficients at WA (69%) and WG (65%), respectively with relative errors of 39% and 43%. This is only about 10% less accuracy than the satellite-derived TSM products.

4 Discussion

Due to meteorological and hydrodynamic effects, TSM is resuspended in the water column with varying composition and particle size. These changing features modify the specific inherent optical properties (specific absorption and backscattering) of water. Particularly, the particle size distribution affects the specific scattering of particles (Doxaran et al. 2009) which consequently varies seasonally, daily or at lower time scales (i.e. with phytoplankton blooms, tides, wind and currents effects). The remotely sensed TSM concentrations estimated from marine reflectances using Eq. 1 and assuming an average TSM-specific backscatter may be largely over- or underestimating the TSM in a changing sea conditions. To assess the quality of the MODIS-derived TSM products, the Cefas-TSM datasets, witnesses of such variations, are considered as the sea truth data to which the satellite-derived TSM are compared.

The results of MODIS TSM validation show a good capability of the global TSM algorithm to retrieve TSM concentrations in turbid waters, with an error in TSM estimates less than 30%, which includes atmospheric correction errors. Higher relative errors for MODIS estimates of low TSM at WG may indicate a higher variability in particle composition and sizes in clearer waters, at WG, or a less accurate atmospheric correction.

A linear regression analysis was applied to MODIS TSM and Cefas OBS data to derive time-averaged calibration factors α MOD and β MOD (respectively 0.82 FTU−1 mg l−1 and 8.11 mg l−1 at WA and 1.04 FTU−1 mg l−1 and 3.44 mg l−1 at WG) and retrieve a new set of in situ-like TSM data from OBS: \( {\text{TS}}{{\text{M}}_{{{\text{MOD}} - {\text{OBS}}}}} = {\alpha_{\text{MOD}}}{\text{OBS}} + {\beta_{\text{MOD}}} \). Note that the new α MOD and β MOD are within the range of α and β reported in Table 1, except β MOD for WA. This dataset shows a better agreement with MODIS TSM giving high correlation coefficients of 88.5% and 88.9%, respectively, at WA and WG, quite comparable to the correlations found between MODIS TSM and Cefas-TSM datasets. However, about 4% lower relative errors are obtained: 25% and 30%, respectively, at WA and WG.

This means that the MODIS TSM products are more related to OBS data than to the Cefas-TSM data. The use of the MODIS-derived calibration coefficients α MOD and β MOD provides a dataset of TSMMOD-OBS where the information about the specific particle backscattering, contained in Cefas-TSM data, has simply disappeared. If the specific particle backscattering coefficient was allowed to vary in Eq. 1, in the parameterization of the coefficient A, i.e. with the seasons to account for varying particle size distribution especially during the spring blooms, then the MODIS TSM would be more correlated to Cefas-TSM.

In clearer waters, there is less accuracy in TSM estimation from MODIS because of higher relative errors (although low absolute errors) due to atmospheric correction and to TSM algorithm designed for turbid waters, which was also stressed in Nechad et al. (2010).

The DINEOF technique was able to fill in the gaps in TSM time series (as shown in and Figs. 9 and 10, respectively, for WG and WA) with a quite satisfactory accuracy in turbid waters (errors less than 40%). This technique succeeded in removing the noise from the input data, which was mainly due to bad quality MODIS TSM products at high TSM levels. However, in the lower TSM ranges, DINEOF could not cope with the general noise affecting low TSM products. The outlier treatment may then be tailored to give less confidence to the daily variability of MODIS TSM products in the low TSM range.

Fig. 9
figure 9

TSM time series retrieved from MODIS and DINEOF from 2003 to 2006 at WG (respectively blue-filled and empty circles) and Cefas-TSM at WG (grey lines)

Fig. 10
figure 10

TSM time series retrieved from MODIS and DINEOF from 2003 to 2006 at WA (red-filled and empty circles) and Cefas-TSM at WA (grey lines)

One strong point of the DINEOF technique is that it provides a better quality TSM time series at the pixels with questionable quality. The outlier treatment proved to efficiently remove suspicious data, and DINEOF reconstruction gives reasonable TSM products. Note that only about 10% less accuracy is obtained from these DINEOF TSM products with regard to the accuracy of input datasets. Gaining a better performance in TSM retrieval from satellites by improving the atmospheric correction in both turbid and clear waters, and calibrating the TSM algorithm seasonally, will translate into a better performance in DINEOF reconstruction of TSM maps.

Finally, this 40% global accuracy in DINEOF products may be the limit reached by this method of reconstruction which takes into account only the MODIS TSM maps. A multivariate DINEOF (Alvera-Azcárate et al. 2007), fed by hydrodynamic fields which strongly affect TSM (i.e. bathymetry, tide cycle, bottom stress), are expected to significantly improve the results.