1 Introduction

Due to its robustness, the Most Frequent Value (MFV) method (Steiner 1991) can be well applied in the processing of noisy datasets in geophysical (Dobróka et al. 1991; Szabó and Balogh 2016; Szabó and Balogh 2018), hydrogeological research (Szűcs and Zákányi 2007) and various other fields (Zhang 2017). Such data systems, especially those with extreme noise, are also present in the field of spatial informatics, including digital elevation modelling and satellite transmission. A similarly widely used, but less robust and sophisticated technique is the median filtering method (Stone 1995; Huang et al. 1979).

In this paper, a new median filtering method, improved by histogram operations and weighted averaging is presented, and compared to the original median filter and the MFV based procedure. The method’s modified and also presented version – which aims to eliminate zero mean noises – contains Steiner’s MFV filter as a core part.

The aim of the proposed method’s both version is to eliminate scattered noise from digital elevation models in a moving windowed manner (i.e. the procedure corrects the central element of the actual window). This study was also conducted for noise exposure at four different percentages of data points.

2 Input dataset

The analysed data consisted of three 25 m spatial resolution digital elevation models of different areas, created using Topo to Raster interpolation in ArcGIS software by digitizing the contour lines, elevation points and water network of 1:10,000 scale EOTR map sheets. The histograms of the three datasets can be seen in Fig. 1. The mean of the data in the first dataset is 211.99, while the standard deviation is 8.440. The same statistical data for the second dataset is 191.31 mean and 8.752 standard deviation, while value of the mean is 96.39 and the standard deviation is 0.457 in the third data system.

Fig. 1
figure 1

Histograms of input data matrices

For each of the resulting digital elevation model data systems, firstly normally distributed noise added to the data matrices, with a standard deviation to have the average noise amplitude at around 1% of the mean of the data matrix. After this—as outlier non-zero mean impulse noise—additional noise was added to 10, 15, 20, and 25 percent of the points randomly. In order to achieve this, a normally distributed noise vector is generated for every row with the mean equal of the mean of the given data row, and standard deviation to have the average noise amplitude around 100% of the mean of the data row.

Then, the elements of the noise vector generated for a given row of data were randomly scattered across the row with a multiplier between 0.1 and 0.7, giving an additional ~ 10–70% noise to the data in the different test cases (referred as 0.1–0.7 noise amplitude below).

3 Introduction of the weighted median (WM) method

The method is used to produce the corrected value of the central element at each position of the moving window going through the image matrices using a weighted mean. The following version of the presented method is mainly for eliminating non-zero mean noises (due to, for example, measurement device problem or long distance data transfer).

The weighted mean is calculated for each data point, with two weights (w1, w2) defined below. In order to achieve this, two independent window narrowing process occurs as an initial step, before calculating the weights. These narrowed windows are created from the actual (5 × 5) data window, at every window position. As a test, a window size of 9 (3 × 3) and 49 (7 × 7) was also applied, but this did not prove to be optimal for solving the problem.

The process of the first window narrowing is the following.

The range of the values of the elements in the moving window is divided into two and three bins with equal range widths, and then two ratios are generated:

  • λ1 the element count ratio between the larger and the smaller element count domain out of the two,

  • λ2 ratio of the largest and second largest domains out of 3 (regarding the element count again).

If \({\lambda }_{1}\)>\({\lambda }_{2}\), the new set (\(D\)) is defined as the most element count domain of the 2 domains, otherwise as the most element count domain of the 3 domains. Then the value of \({m}_{s}\, \mathrm{defined \, as}\)

\(m_{s} = {\text{median}}\left( {\text{D}} \right)\).

Thus, we used higher \(\lambda\) value as an indication of a sharper cut. The ideal number of 2 and 3 as the number for ranges chosen because in case of splitting into 4 (or more), a bin would possibly not contain a sufficient number of elements from the initial 5 × 5 window, in order to do the further steps described.

Then another (independent) second window narrowing process occurs, for determining the value of parameter \({m}_{e1}\). In order to achieve this, the original moving window’s elements are sorted by value, then divided into two and three equal width ranges (based on the set of values).

For example, in the case of splitting into three, if the ordered vector is v, and \(\mathrm{max}\left({\varvec{v}}\right)\) is its highest value element, and \(\mathrm{min}\left({\varvec{v}}\right)\) is the lowest, then in the third with the lowest values will be the values lower than \(\mathrm{min}\left({\varvec{v}}\right)+(\mathrm{max}\left({\varvec{v}}\right)-\mathrm{min}\left({\varvec{v}}\right))/3\).

Here we calculate ratio \({\lambda }_{3}\) from the case when splitting into two. The value of \({\lambda }_{3}\) is \(1/n\), where \(n\) is the sum of element count of the two bins without the highest valued bin (i.e. the bin with the lower values). Then we calculate \({\lambda }_{4}\) ratio from the case when splitting into three bins. The value of this ratio is \(1/m\), where \(m\) is the sum of the element count of the three bins without the highest valued bin.

If \({\lambda }_{3}\)>\({\lambda }_{4}\): we take the highest valued bin from the two (the bin with the highest values), otherwise we take the highest valued bin from the three as the chosen set (E). Finally, \({m}_{e1}\) will be the average of the elements of the chosen new set.

A similar value, \({m}_{e2}\) is determined with the same method, however by splitting the original window into 3 and 5 parts (instead of 2 and 3). With this step, a higher division number is reflected in the result, if the current window's value set allows it (i.e. if the new intervals’ element count is not zero).

As \({m}_{e1}\) and \({m}_{e2}\) are both calculated on chosen subsets of the largest values from the value set of the window, both \({m}_{e1}\) and \({m}_{e2}\) are related to maximums. Important difference between \({m}_{e}\) and \({m}_{s}\) that in \({m}_{e1}, {m}_{e2}\) the mean is calculated while in \({m}_{s}\) the median is calculated, with different bin numbers. In addition, in \({m}_{e1}\) and \({m}_{e2}\), the narrowest set has the largest valued elements, and in \({m}_{s}\), the set contains the largest number of elements (which may not necessarily contain the largest values).

Now we can calculate the first weight (\({w}_{1}\)) of the current point’s weight vector as

$$w_{1} = (m_{s} /m_{e1} )*\alpha,$$
(1)

where \(\alpha\) is a scaling factor that ensures that the value obtained by \(w_{1}\) falls within the same range of values as \(w_{2}\), described below. Its value (\(\alpha = 1/3\)) is determined on experimental basis to fulfill this purpose.

We can also define \(m_{w }\) as the median of the values in the original moving window.

Using the values described above, three sub-weights are produced as follows (all will have a role in determining the value of \(w_{2}\) weight).

$$w_{a} = \frac{1}{{max_{1} }}*\left( {m_{e1} - max_{1} } \right) + 1,$$
(2)

where \(max_{1}\): \({\text{max}}\left( {m_{w } ,m_{e1} } \right)\),

$$m_{as} = \left| {m_{e1} - m_{e2} } \right|,$$
(3)
$$w_{p} = \frac{\beta }{{max_{2} }}*\left( {m_{as} - max_{2} } \right) + 1,$$
(4)

where \(max_{2} :\max \left( {m_{w } ,m_{e1} ,m_{e2} } \right),\)

$$w_{p2} = \frac{\gamma }{{max_{3} }}*\left( {m_{as} - max_{3} } \right) + 1,$$
(5)

where \(max_{3} :\max \left( {m_{w } ,mean\left( {window} \right)} \right).\)

The calculation of the max value applied in a given sub-weight (\({w}_{a}\),\({w}_{p},{w}_{p2}\)), shall in all cases include the median of the window, as well as the mean (\({m}_{e1}, {m}_{e2}\)) or median (\({m}_{s}\)) value of the narrowed window.

In Eqs. 4 and 5, fewer elements are omitted from the original window (because of containing \({m}_{e2} \: \mathrm{via } \: {m}_{as})\), so that the sub-weights \({w}_{p},\) and \({w}_{p2}\) both calculated with a smaller multiplier than in Eq. 2. The values of these \(\beta\) and \(\gamma\) constants are chosen as 0.5 (the adjustment procedure’s results can be in seen Sect. 6).

Both \({w}_{p}\) and \({w}_{p2}\) weights have a corrective role. Their value can be high if there is a large difference between the averages of the subsets obtained by splitting the weights into 3 and 5 (\({m}_{e1}\) and \({m}_{e2}\)). As can be seen in Fig. 2, a large difference between the values of \({m}_{e1}\) and \({m}_{e2}\) results in a large \({L}_{1}\) norm error. Thus, a large difference indicates that the histogram operations in the current window position may distort the result, so increasing value of the difference increases the weight of \({w}_{2}\) (i.e. the conventional median without histogram operations).

Fig. 2
figure 2

L1 norm and mas value relation

The subweight \({w}_{a}\) can take a high value other than 1, if the median value of the original, unconstrained window \({m}_{w}\) is greater than the average of the elements of the constrained window (\({m}_{e1}\)). Because the narrowed window contains the largest values of the subsets, if the difference between the median of the original window and the mean of the elements of this narrowed window is outstanding, it indicates that histogram operations at the current window position distorting the result. As in the previous, a large positive difference between \({m}_{e1}\) and \({max}_{1}\) results in a large \({L}_{1}\) norm error.

Therefore, this should be reflected in the final weight vector \(\mathbf{w}\) in the form of either a reduction in the value of \({w}_{1}\) or an increase in the value of \({w}_{2}\) (i.e. an increase in the weight of the result of the traditional median method). The latter is achieved with the usage of \(w_{a}\) in weight \(w_{2}\). Since \(w_{a}\) is the most important of the three correction factors (\(w_{a}\), \(w_{b}\), \(w_{c}\)), its square is included in the formula \(w_{2}\). The higher effect was achieved by squaring the weight, because the maximum value of the weight before adding one to its value is 1, so squaring the weight will not result in an extreme weight value, even at its maximum.

The + 1 in the formulas for the partial weights \(w_{p}\) and \(w_{p2}\) is included because they all include a maximum value subtraction, which in most cases results in a negative value, so the constant provides a shift into the positive range. In the formula of \(w_{a}\), the role of adding 1 is shifting its minimum to greater than one (in order to be able to increase \(w_{2}\) weight in its squared value).

Finally, the following two weights (\(w_{b}\), and \(w_{c}\)) are produced using \(w_{p}\) and \(w_{p2}\) respectively.

$$w_{b} = 1 + \frac{{w_{p} }}{2},$$
(6)

where mw is the median of the values in the original moving window. Since the maximum value of the partial weight \(w_{c}\) is not a function of the different narrowed windows, but of the median or average value of the original window, this partial weight is taken with a smaller constant:

$$w_{c} = 0.5{ } - \frac{{w_{p2} }}{2}.$$
(7)

With the components defined above, the weight \(w_{2}\) takes the following form

$$w_{2} = w_{a}^{2} *w_{b} *w_{c}.$$
(8)

At this point, we know the \({\mathbf{w}}\) weight vector of the current data point

$${\mathbf{w}} = \left[ { w_{1} w_{2} } \right].$$
(9)

In weight vector \({\mathbf{w}}\) (Eq. 9), weights \(w_{1}\) and \(w_{2}\) have an effect on the median of the current data window (\(m_{w }\)) on the one hand, and on the median of the reduced set of the same window (\(m_{s }\)) on the other hand, \(w_{1}\) weighting the latter, and \(w_{2}\) weighting the former as follows (for example, on the k-th element of the data matrix):

$$res_{WMk} = (w_{1} *m_{s} + w_{2} *m_{w} )/(w_{1} + w_{2} ).$$
(10)

As described above, the median of a narrowed window (\(m_{s } )\), and the original window’s median (\(m_{w }\)) is weighted at every window position for the final result of the actual point. Weight of \(m_{s}\) is (ms/me1)*α (i.e., the median is divided by the average of the maximal values of narrowed window). If this ratio is for example low for the given moving window position due to the noise (high \(m_{e1}\) average of maximums), then \(m_{s }\)’s weight should be proportionally low, because otherwise, the high value of the outlier maximums would have negative effect on the final result. In such cases \(m_{w}\)’s weight will be proportionally high – not only because of the low weight of \(m_{s}\), but due to the fact that \(m_{w }\) is weighted by \(w_{a}\),\(w_{b} ,w_{c}\), all containing \(m_{e1}\) or \(m_{e2}\) values.

4 Most Frequent Value Method

A much more reliable statistical characteristic than the arithmetic mean, the weighted mean, is obtained by assigning a small weight (\(w_{k}\)) to points far from the majority of the data (\(X_{k}\)) and a larger weight (\(w_{k}\)) to points in the highest data density location (Eq. 11).

$$M = \mathop \sum \limits_{k = 1}^{N} X_{k} w_{k} \left[ {\mathop \sum \limits_{k = 1}^{N} w_{k} } \right]^{ - 1} (k = 1,2, \ldots ,N ).$$
(11)

The k-th weight is chosen by Steiner (Steiner 1991) as follows:

$$w_{k} = \varepsilon^{2} /\left[ {\varepsilon^{2} + \left( {X_{k} - M} \right)^{2} } \right].$$
(12)

In the above, \(N\) is the number of data and ε is the dihesion, a scalar parameter. If ε is large, then all data are given nearly equal weights and outliers will spoil the value estimate, and if ε is too small, care must be taken to avoid ignoring some data.

The weighted mean, called the most frequent value (M) defined by Eq. (11), should be known in advance in order to assign weights with maximum values at its location and smaller and smaller weights away from it. Therefore, this procedure requires an iterative algorithm in which M and ε are determined jointly. In the first iteration step, the dihesion can be estimated from the sample space using the following formula:

$${\upvarepsilon }_{1} = \left( {\sqrt{\frac{3}{2}} } \right)\left[ {max\left( {X_{k} } \right) - {\text{min}}\left( {X_{k} } \right)} \right],$$
(13)

while for the \(M_{1}\) the initial value is preferably chosen as the sample mean or median. In this study, the median value was used.

In subsequent iteration steps, \(M\) and ε can be derived from each other according to the following procedure:

$$\varepsilon_{j + 1}^{2} = \frac{{3\mathop \sum \nolimits_{k = 1}^{N} \frac{{\left( {X_{k} - M_{j} } \right)^{2} }}{{\left[ {\varepsilon_{j}^{2} + (X_{k} - M_{j} )^{2} } \right]^{2} }}}}{{\mathop \sum \nolimits_{k = 1}^{N} \frac{1}{{\left[ {\varepsilon_{j}^{2} + (X_{k} - M_{j} )^{2} } \right]^{2} }}}} \leftrightarrow M_{j + 1} = \frac{{\mathop \sum \nolimits_{k = 1}^{N} \frac{{\varepsilon_{j + 1}^{2} }}{{\varepsilon_{j + 1}^{2} + (X_{k} - M_{j} )^{2} }}X_{k} }}{{\mathop \sum \nolimits_{k = 1}^{N} \frac{{\varepsilon_{j + 1}^{2} }}{{\varepsilon_{j + 1}^{2} + (X_{k} - M_{j} )^{2} }}}}.$$
(14)

5 Description of the resulting numerical values

For comparing the results of the different filtering methods, the following metrics are used in the paper. Calculation of the RMSE (Root Mean Square Error) value for the MFV method and median filtering (where inp is the noise-free data matrix):

$$RMSE_{St} = \sqrt {\frac{{\sum\limits_{i = 1}^{N} {(res_{St{i}} - inp_{i} )^{2} } }}{N}},$$
(15)

where \(res_{St}\): matrix corrected by Steiner’s MFV,

$$RMSE_{Med} = \sqrt {\frac{{\sum\limits_{i = 1}^{N} {(res_{Med{i}} - inp_{i} )^{2} } }}{N}},$$
(16)

where \(res_{Med}\): matrix corrected by median method,

$$RMSE_{WM} = \sqrt {\frac{{\sum\limits_{i = 1}^{N} {(res_{WM{i}} - inp_{i} )^{2} } }}{N}},$$
(17)

where \(res_{WM}\): matrix corrected by weighted median.

Deviation regarding the three procedures:

$$Std_{St} = \overline{{std\left( {res_{St} - inp{ }} \right)}},$$
(18)
$$Std_{Med} = \overline{{std\left( {res_{Med} - inp{ }} \right)}},$$
(19)
$$Std_{WM} = \overline{{std\left( {res_{WM} - inp{ }} \right)}},$$
(20)

where \(res_{St}\), \(res_{Med}\), \(res_{WM}\) inp: as above.

\(L_{1}\) norm:

$$L_{1St} = \parallel res_{St} - inp\parallel _{1},$$
(21)
$$L_{1Med} = \parallel res_{Med} - inp \parallel _{1},$$
(22)
$$L_{1WM} =\parallel res_{WM} - inp \parallel _{1},$$
(23)

where \(res_{St}\), \(res_{Med}\), \(res_{WM} ,\) inp: as above.

6 Adjusting constants

Regarding the 0.5 constant value of \(w_{p}\), it was also tested in some randomly chosen test cases, that how the \(L_{1}\) norm distance value between the noise-free matrix, and the weighted median-corrected matrix changes with the different values of the constant. The value of the norm monotonously decreased in every such cases, as can be seen as an example in Table 1 (10% noise rate, 0.3 noise amplitude):

Table 1 Example of adjusting constant value used for calculating wp

Similarly, the 0.5 constant of \(w_{p2}\) was also tested, and it produced similar results, as can be seen in the example of Table 2 (for the same parameters as in the previous case):

Table 2 Example of adjusting constant used for calculating \(w_{p2}\)

All of the constants are examined with a few chosen values, however, the global optimisation of them is not part of this paper.

7 Comparative results

In all cases, a 5 × 5 window ran through the matrices, and both the Steiner-MFV method (using as a filter – Dobróka 2021) and the weighted median method described here always corrected the value of the window's central element, with all elements of the window as input.

In all cases, the number of iterations for the Steiner filter was 20 and the initial weight was the median of the window elements.

In all cases the results were also compared with those obtained using the classical median filter in MATLAB software. The median filter ran on the same noisy data matrices with the same window size as the Steiner method and the weighted median method.

The example shown in Table 3 shows the values of the result metrics on the first data series, with noise on 15% of the data points, with a noise amplitude multiplier of 0.3

Table 3 Example of comparative results (10% additional outlier noise on 15% of the data points)

Another example can be seen in Fig. 3, again noise on 15% of data points, now with a noise amplitude multiplier of 0.5. Figure 3. (a) shows the original data, (b) the noisy data, (c) is the result of Steiner method, (d) is the result of the weighted median method and (e) is the result of the classical median method.

Fig3
figure 3

Visual example of results on the first dataset

Table 4 shows the values of the \({L}_{1}\) norms showing the distance from the noise-free input data matrix and their ratios, using the weighted median method (\({L}_{1WM}\)) and the Steiner method (\({L}_{1St}\)), with 25% of the points contaminated with noise, as a function of different noise amplitudes (0.1,…,0.7) on the first data set. The values show that in two cases the Steiner method gives better results, by about 6%, and in the other cases the weighted median method proved to be better. The latter gives better results by 6.3% on average (since the average of the \({L}_{1WM}/{L}_{1St}\) ratios is 0.937).

Table 4 L1 norm values at 25% noise ratio

Table 5 shows the results of the weighted median procedure and the standard median filtering at the same noise level. In this case, the weighted median procedure proved to be better on the data set by 26.4% on average.

Table 5 L1 norm values at 25% noise ratio

Tables 6 and 7 show the same comparison as before, regarding the \({L}_{1}\) norm for noise affecting 20% of the points. In this case, there are noise amplitude values where the Steiner method gives a smaller distance to the noise-free matrix than the weighted median procedure, but in none of the cases the standard median procedure could achieve this. The weighted method is on average 23% better than the latter.

Table 6 L1 norm values at 20% noise ratio
Table 7 L1 norm values at 20% noise ratio

Tables 8 and 9 show the case with 15% noisy points.

Table 8 L1 norm values at 15% noise ratio
Table 9 L1 norm values at 15% noise ratio

The results of Table 8 comparing the Steiner method with the weighted median procedure show that for a noise amplitude multiplier of 0.1, the Steiner method is superior.

The weighted median procedure outperforms the unweighted median procedure by an average of 21.2% on the first data set for 15% noisy points (Table 9). Tables 10 and 11 show the data distances of the three procedures for the case where 10% of data is contaminated by noise according to the \({L}_{1}\) norm.

Table 10 L1 norm values at 10% noise ratio
Table 11 L1 norm values at 10% noise ratio

In this case (with noise on 10% of the points), the Steiner method outperforms the weighted median method in most of the different noise amplitudes—in 4 cases, with an average of 5.6% in the four cases, and in the remaining three cases the WM method outperforms on the data set, by 13%.

Table 12 shows the RMSE values obtained by the three procedures and their ratios, for a noise exposure of 25% on the left and 20% on the right, for different noise amplitude multipliers (0.1,…,0.7) in both cases.

Table 12 RMSE values at 25% and 20% noise ratio

Table 13 shows the RMSE values as before, now for noise at 15% and 10% of the data points.

Table 13 RMSE values at 15% and 10% noise ratio

It can be seen that the weighted median method performs worse for the highest noise amplitude multipliers (0.7), but this is also true for the other methods, so the ratio to the other methods does not deteriorate. The RMSE obtained by the MFV method is closest to that obtained by the weighted median method for the smallest noise amplitude multiplier (0.1). The RMSE value for the weighted median method is on average 79.5% of that obtained by the conventional median method.

As the weighted median procedure was found to be the least efficient when 10% of the data points were contaminated with noise in the above tests, the standard deviations were also examined in this case. An example of this compared to the standard median method is shown in Table 14.

Table 14 Standard deviation values at 10% noise regarding the two median methods

Table 15 shows the average of the results for the first and second data sets. The minimum, maximum and average of the data distance ratios for the \({L}_{1}\) norm as a function of the different noise levels can be seen for the Steiner method and the weighted median method. The mean of the minima (i.e. the mean of the cases where the largest difference is in favour of the weighted median method) is 0.82, i.e. in these cases the method is 18% better. The average of the maxima is 1.032, i.e. in the opposite case the Steiner method is on average 3.2% better for the two data systems combined.

Table 15 L1 norm ratios for two data sets combined

Table 16 shows the \({L}_{1}\) norm results grouped according to the same method, in this case for the two median procedures. The average of the minima is 0.692, so in the cases where the weighted median procedure is the best, this method is better by more than 30%. The average of the maxima is 0.86, so that even in the worst cases the weighted median procedure is on average 14% better than the standard median filtering for the two data sets combined.

Table 16 L1 norm ratios for two data sets combined

A less detailed comparison was carried out on the third set of data (examining only the \(L_{1}\) norm ratios). This study showed similar characteristics as the previous ones, however the Steiner method was found to be the best of the three in more cases than before. For 10% and 15% noisy data points (both with 7 different noise amplitudes, as before), the Steiner method was found to be superior to the weighted median method in 11 out of 14 cases, with an average of 14.6% (regarding \(L_{1}\) norm ratios). In the 20% and 25% noisy point cases, the weighted median method gave a better result, in 14 out of 14 cases, with an average of 19.04%.

Comparing the two median methods on the data set (again with \(L_{1}\) norm ratios), in 27 out of 28 cases, the weighted median method proved to be better, with an average of 14.32%.

8 Handling zero mean noises

The previously presented version of the proposed method developed for mainly non-zero mean noises. In the following, a second, modified version of the method is introduced, what purpose is mainly handling zero-mean normal distribution noises. This version of the method calls upon and uses Steiner’s MFV values to correct the actual central element of the given data window.

8.1 Noise generation

Regarding the noise generation process, in the first step, a general zero mean noise was added to the data matrix. In order to achieve this, a normal distribution noise generated for every row of data, with the mean value equal to the mean of zero, and standard deviation 1.

In order to add outlier noise to given percentage of points for the examination, additional zero mean normal distribution noise is added randomly to 20, 15, 10, and 5 percent of the points, with 0.1–0.7 amplitude multiplier in all of such cases (as in the previously introduced version of the method). The noise’s standard deviation was always the mean of the current data row.

8.2 Modified version of the weighted median method

As with the method’s previously introduced version, the first steps are histogram filterings.

Firstly a median, then and a mean value is generated from filtered windows, however here with a two-step filtering for both. For producing the value of \(m_{s}\) median value, the histogram-based filtering is the following.

Based on the set of values, the elements of the current data window are divided into two and three ranges (bins) with equal range widths, and then two ratios are generated:

  • \(\lambda_{1}\): the ratio of the largest and second largest domains out of 2,

  • \(\lambda_{2}\): ratio of the largest and second largest domains out of 3 (in both cases regarding the element count).

If \(\lambda_{1}\) > \(\lambda_{2}\), the new set (\({\text{D}}\)) will have the most element count domain of the 2 domains, otherwise it will have the most element count domain of the 3 domains. Finally, the value of \(m_{s}\) is: \(median\left( D \right)\).

For determining the value of \(m_{e3}\), a second narrowing process is done. In this process, the window elements are first sorted by value and then divided into three equal width ranges based on the set of values. Here we calculate \(\lambda_{3}\) ratio, as the ratio between the element count of the window and the sum of the element count of the thirds without the highest valued third. Then we calculate \(\lambda_{4}\), whose value will be the ratio between the element count of the most and the second most element count bins. Thus, \(\lambda_{3}\) and \(\lambda_{4}\) are calculated differently than in the previous version of the method.

If \(\lambda_{3}\) > \(\lambda_{4}\): we take the highest third (the bin with the highest values), otherwise take the bin with the most element count as the chosen set (E). Thus, we used higher \(\lambda\) value as an indication of a sharper cut.

We take this truncated E set, and distribute its values into bins. The bin width is determined with Scott’s rule (Scott 1979, 1992):

$$3.5*std\left( E \right)/numel\left( {E^{1/3} } \right).$$
(24)

We must also determine the number of bins, in order to be able to distribute all the values into them (what is a trivial step because of having the bin widths and the data values). We take the bin with the largest element count, and \(m_{e3}\) is the average of the elements of the chosen bin.

Since we have the values of \(m_{s}\) and \(m_{e3}\), we can replace the actual window’s middle element with \(m_{s}\) (forming \(w_{r\_ms}\)), and similarly with \(m_{e3}\) (constructing \(w_{r\_me}\)). Finally, let the window with the MFV method’s result at its centre \(w_{r\_St}\).

In the next step, we concatenate \(w_{r\_ms}\), \(w_{r\_me}\) and \(w_{r\_St}\) one by one with the original (noisy) actual window, forming \(w_{u1}\), \(w_{u2}\), \(w_{u3}\).

Now we can calculate three gradient measures in the following way:

$${\varvec{G}}\left( {x,y} \right) = \sqrt {\frac{{\partial w_{u} }}{\partial x}*\frac{{\partial w_{u} }}{\partial x} + \frac{{\partial w_{u} }}{\partial y}*\frac{{\partial w_{u} }}{\partial y}},$$
(25)
$$g = \frac{1}{\left| X \right|*\left| Y \right|}\mathop \sum \limits_{x \in X} \mathop \sum \limits_{y \in Y} G\;\;(x,y).$$
(26)

In the formula, \(g_{1}\) is the result of \(g\) when using \(w_{repl\_St}\), \(g_{2}\) is the value of \(g\) in case of \(w_{repl\_ms}\), and \(g_{3}\) is the value of \(g\) when using \(w_{repl\_me}\).

If \({\text{min}}\left( {g1,g2,g3} \right)\) is \(g_{1}\), then in the actual \({\mathbf{w}} = \left[ { w_{1} w_{2} } \right]\) weight vector, the value of \(w_{1}\) is 0, and the value of \(w_{2}\) is 1. Thus, in this case only the result of the MFV method counts in the given data window’s correction. If \({\text{min}}\left( {g1,g2,g3} \right)\) is \(g_{2}\), then \(w_{1}\) is 0.15 and \(w_{2}\) is 0.85. If \({\text{min}}\left( {g1,g2,g3} \right)\) is \(g_{3}\), then \(w_{1}\) is 0.4 and \(w_{2}\) is 0.6. Thus, in this case, the weight of the MFV method’s result is 0.6 for the given window.

As we can see, in all of these cases, we are weighting the Steiner MFV method’s results, and increasing or decreasing its weight in the correction of the actual data window’s central element.

Similar to the previous version of the method, we get poor results, if the value of \(m_{as}\) i.e. \(\left| {m_{e1} - m_{e2} } \right|\) is large. In order to be able to handle this, here we have to calculate both \(m_{e1}\) and \(m_{e2}\) (the same way as in the previous version), and if the difference is greater than 2% of the mean of raw data, then \(w_{1}\) should be 0, and the value of \(w_{2}\) should be 1.

8.3 Results of the modified version of WM filtering procedure

In Table 17, \(L_{1}\) norm ratios can be seen for the first data set, comparing WM method’s results with both the MFV’s result and the original median method’s. In the former case, the WM method performed better in 26 test cases out of the 28. In that cases, the average of the \(L_{1}\) norm ratio was 0.985, thus usage of the method resulted in 1.4% lower \(L_{1}\) norm values on average. In the remaining two cases, the mean of the ratio was 1.005, thus the WM method performed 0.5% worse in those two cases. The best \(L_{1}\) norm ratio value was 0.97, thus the WM method gave 3% lower \(L_{1}\) norm value in that particular noise reduction.

Table 17 L1 norm ratios on first data set

Regarding the comparison with the original median method, the WM method performed better in all of the 28 cases, by 16.4% on average (0.836 average \(L_{1}\) norm ratio). Here the best result was a 29.5% improvement (0.705 L1 norm ratio).

Table 18 shows results in the same structure as the previous one, here on the second data set. Comparing with Steiner’s MFV, the WM method gave better results according to \(L_{1}\) norm in 23 cases (by 1.25% on average), and in the remaining 5 cases, performed worse by 0.21% avg. At its best, the WM method gave lower \(L_{1}\) norm value by 3.12%.

Table 18 L1 norm ratios on second data set

In comparison with the other median method, WM performed better in all of the cases (16% avg., 29.4% max.).

In Table 19 the third data set’s \(L_{1}\) norm ratios can be seen. Comparing WM’s results with the MFV method’s, the former performed better in 23 of the 28 cases (by 1% avg., 2.4% max.), and the MFV method was superior in 5 cases (1.4% avg.).

Table 19 L1 norm ratios on third data set

Regarding the comparison with conventional median method, WM performed better in all of the cases, by 10.7% avg., 20.2% max.

9 Conclusions

The effectiveness of the histogram-based weighted median procedure described above has been demonstrated for noise elimination in digital elevation model data. The method’s main purpose is eliminating outlier noise in data matrices, especially if a high percentage of the matrix points are contaminated with outlier noise.

Averaged over the different noise amplitudes and noise exposure percentages investigated, the WM method outperformed the standard median filtering procedure on the different data sets by 14–23% regarding data distance calculated with \(L_{1}\) norm for eliminating non-zero mean noises. The version of the method for filtering zero mean noises, performed better by 14.3% on average against the conventional median filter.

Beyond general refinement and optimisation of the method, there is room for improvement particularly in more effective handling of the low noise exposure cases.