Statement of Novelty

The novelty of our study is significant, because of unique approach to sewage sludge conditioning and statistical evaluation efficiency of this process. It presents very important problem of potential usability of bio-preparations in improving sewage sludge quality and its application for agricultural and reclamation purpose. It is highly related to waste disposal and circular economy. There are rare literature data concerning such bio-preparation in terms of sewage sludge conditioning. For the purpose, in the studies the sophisticated statistical methods were applied. Analysing the functional data gives us the information about the whole process not only information about some specific points. Our original variable selection method allows not only for data reduction, but also improves the quality of groups.

Introduction

Sewage sludge is the residue produced by the wastewater treatment process, during which liquids and solids are being separated. Sewage sludge poses one of the urgent environmental problems, which can easily cause secondary pollution, so its proper management is a very important issue. Currently in Europe the most attractive disposal method for sewage sludge is incineration. Many advantages are connected with incineration, but it does not constitute a complete disposal method, as a significant volume of solids remains as ash. On the other hand, sludge is rich in nutrients such as nitrogen, phosphorous, sulphur or magnesium and contains valuable organic matter, so environmental management of these wastes is more cost-effective and valuable in the conservation of natural resources [1]. However, before sewage sludge application for agricultural or reclamation purposes, sewage sludge must be properly treated to avoid potential deleterious effects on soil properties and to reduce environmental or health hazards associated with raw wastes. A critical issue for sludge is its hygienisation especially for agricultural re-use. Obligatory hygienisation is achieved through stabilisation processes and it is advantageous in improving the quality of the produced sludge [2]. As a result, the content of organic matter is reduced and simultaneously nutrients become more readily available for plants. There are many strategies which are applied at the wastewaters treatment plants in order to achieve this goal, and thus stabilisation can be provided by thermal, chemical and biological processes. Recent studies [3,4,5,6] have been focused on applying combined biological and chemical methods on stabilised sewage sludge as well as improving the wastewater purification process. Wong et al. [7] indicated sludge stabilisation with bio-acidification as a promising solution. Thus, one of the methods which is gaining in popularity is connected with the use of bio-preparations. Among various biological products, the EMTM preparation is widely discussed and evaluated [8]. On the assumption EMTM impact manifests itself in a faster rate of mineralisation-humification transformations of organic matter and followed by a more effective nutrients liberation. So far this technology as a biofertilizer has been tested mainly in soil conditions [9,10,11,12]. Utilisation of bio-preparation in waste management, especially for sewage sludge conditioning or wastewater purification, is not a popular practice and few experiments have been reported [2, 13,14,15,16]. Unfortunately, the cited researches did not definitely quantitatively evaluate applicability of various bio-preparations. Sewage sludge is a highly heterogenic and rough mixture, so it is not easy to find a good explanation or assess proper effects of bio-preparations, particularly as they are very susceptible to many factors. Moreover, we also have to take into consideration analytical difficulties connected with sewage sludge examination. As it was mentioned above, sewage sludge is very abundant in macro- and micronutrients and before its utilisation, especially for agricultural purposes, the chemical composition must be assessed. Relatively often in chemical analyses of sewage sludge we face the problem of having both a large number of nutrients and procedures helping in their evaluations. These methods are time-consuming and expensive, especially when we are obliged to evaluate sewage sludge on various stage of its conditioning. In this work we would like to present a proposal to simplify chemical evaluation of sewage sludge. The distance covariance provides an opportunity to identify the nutrients, which are mainly responsible for the chemical composition of sewage sludge irrespective of the advanced stage of its conditioning.

In order to precisely describe this problem, the dynamic experimental system was applied concerning the transformations of sewage sludge incubated with and without EMTM preparation. Therefore, an experiment was undertaken, which aim was to evaluate the effect of EMTM preparation in sewage sludge conditioning. Additionally, using the distance covariance analyses one may be able to point out the nutrients or groups of nutrients reflecting the fundamental changes during sewage sludge transformations. It will provide highly practical and valuable information, because it will allow to assess sewage sludge quality with the same high accuracy but using a smaller number of analytical methods. As a result, the time, labour intensity and costs may be reduced. Additionally, one may expect that the selection of nutrients or groups of nutrients will be the most important and significant and only they will be tested to verify real sewage sludge quality.

Functional data analysis (FDA) is a relatively new branch of statistics that analyses data providing information about curves or surfaces (in particular time series) that has started to receive attention in the community, particularly in terms of its public health and biomedical applications. Commonly, time series data are treated as multivariate data because they are given as a finite discrete time series. On the other hand, the basic idea behind FDA [17] is to express discrete observations in the form of a function (curve) that represents the entire measured function as a single observation. The FDA approach is highly flexible in the sense that the timing intervals do not have to be equally spaced for all cases. FDA methods are not necessarily based on the assumption that the values observed at different times for a single subject are independent. Because the FDA approach essentially treats the whole curve as a single entity, there is also no concern about correlations between repeated measurements [18].

For such data, variables (elements) selection using the distance covariance [19, 20] is proposed. Distance covariance measures both linear and nonlinear association between random variables. This is in contrast to Pearson’s correlation, which can only detect linear association between two random variables. Variables selection is the process of selecting a subset of relevant variables for use. The central premise when using a variables selection is that the data contains many variables that are either redundant or irrelevant and can thus be removed without incurring much loss of information. In such a way, only few crucial elements which are responsible for chemical composition of sewage sludge are obtained.

In order to precisely describe this problem, the dynamic experimental system was applied concerning the transformations of sewage sludge incubated with and without EMTM preparation. Therefore, an experiment was undertaken, which aim was to evaluate the effect of EMTM preparation in sewage sludge conditioning. Additionally, using the distance covariance analyses one may be able to point out the nutrients or groups of nutrients reflecting the fundamental changes during sewage sludge transformations. It will provide highly practical and valuable information, because it will allow to assess sewage sludge quality with the same high accuracy but using a smaller number of analytical methods. As a result, the time, labour intensity and costs may be reduced. Additionally, one may expect that the selection of chemical elements or groups of elements will be the most important and significant and only they will be tested to verify real sewage sludge quality.

Materials and Methods

Material and Incubation Conditions

Sewage sludge in this experiment was collected from a municipal wastewater treatment plant located in the Wielkopolska region (western part of Poland).

The experiment was conducted in PVC boxes (volume 5 l) and employed a randomised, factorial design with one sewage sludge and one amendment treatment as EMTM preparation applied at two doses dissolved (the water: EMTM preparation ratio = 1:5) and concentrated. As a result, the organisation of the experiment included the following treatments: T0—sewage sludge control, T1—sewage sludge with 100 cm3 of dissolved EMTM preparation and T2—sewage sludge with 100 cm3 of concentrated EMTM preparation. Samples of 1 kg sewage sludge were weighed in duplicate and mixed with the doses of EMTM preparation. Each mixture was wetted to 60% moisture. The samples were incubated at 20 °C for 126 days in a separate room with humidity and temperature control. Moisture losses were corrected during the incubation, while the contents of each plastic boxes were mixed regularly. The experiment in the same scheme and conditions was repeated twice. The chemical composition of sewage sludge was recorded at 12 incubation dates corresponding to the following days from the beginning of the experiment: 1, 7, 14, 21, 28, 42, 56, 70, 84, 98, 112 and 126.

Chemical Analyses

The chemical analyses of sewage sludge were conducted on dried samples. Total organic carbon (Ctot), total nitrogen (Ntot) and total sulphur (Stot) were determined using the Vario Max CNS. Total contents of chemical elements (variables): P, K, Mg, Ca and Na of sewage sludge were determined after incrimination at 550 °C for 5 h and followed by digestion with 6 mol dm−3 HCl. The P content of sewage sludge was determined colorimetrically using the vanadate molybdate reagent. Concentrations of K, Mg, Ca and Na in extracts were determined by flame atomic absorption spectrometry (FAAS) using the Varian Spectra AA 220 FS apparatus.

Statistical Analysis

The classification problem involves determining a procedure by which a given object can be assigned to one of the q groups. The object being classified is described by a random pair \(\left( {\varvec{X}\left( t \right), \varvec{Y}} \right)\), where \(\varvec{X}\left( t \right) = \left( {X_{1} \left( t \right), \ldots ,X_{p} \left( t \right)} \right)^{\prime }\) is a vector of p continuous functions and \(\varvec{Y} = \left( {Y_{1} , \ldots ,Y_{q} } \right)^{\prime }\) is a vector of q labels. The functions \(X_{1} \left( t \right), \ldots ,X_{p} \left( t \right)\) are called functional data.

Define q labels \(Y_{1} , \ldots ,Y_{q}\) as follows:

$$Y_{i} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if an observation }}\varvec{X}\left( t \right) {\text{comes from the }}i{\text{th group}}} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} ,} \right.$$
$$i = 1, \ldots ,q.$$

The process X can be represented as [19]:

$$\varvec{X}\left( t \right) = {\varvec{\Phi}}\left( t \right)\varvec{\alpha},$$

where \({\mathbb{E}}\left(\varvec{\alpha}\right) = 0.\)

The vector \(\varvec{\alpha}\) can be estimated on the basis of n independent realizations \(x_{1} ,x_{2} , \ldots ,x_{n}\) of the random process X (functional data). This estimator is denoted by \(\widehat{\varvec{\alpha}}.\)

Let

$$\widetilde{\varvec{\alpha}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \widehat{\varvec{\alpha}}_{k} , \overline{\varvec{y}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \varvec{y}_{k} ,\widetilde{\alpha }_{k} = \widehat{\alpha } - \overline{\alpha } , \widetilde{y}_{k} = y_{k} - \overline{y} , k = 1, \ldots ,n$$

and

$$A = \left( {a_{kl} } \right), B = \left( {b_{kl} } \right), \widetilde{A} = \left( {A_{kl} } \right), \widetilde{B} = \left( {B_{kl} } \right),$$

where

$$a_{{kl}} = \;\left\| {\widehat{\varvec{\alpha }}_{k} - \widehat{{\varvec{\alpha }_{l} }}} \right\|_{{K + p}} ,~b_{{kl}} = \;\left\| {\varvec{y}_{k} - \varvec{y}_{l} } \right\|_{q} ,$$
$$A_{{kl}} = \| {\widetilde{\varvec{\alpha }}_{k} - \widetilde{{\varvec{\alpha }_{l} }}} \|_{{K + p}} ,~B_{{kl}} = \| {\widetilde{\varvec{y}}_{k} - \widetilde{{\varvec{y}_{l} }}} \|_{q},$$
$$k,l = 1,2, \ldots ,n.$$

Then, on the basis of the results of Székely et al. [20] and Górecki et al. [21], the estimate of the functional distance covariance between random process \(\varvec{X}\left( t \right)\) and a vector \(\varvec{Y}\) of q labels is equal to:

$$\widehat{\nu }_{{\varvec{X},\varvec{Y}}}^{2} = \widehat{\nu }_{{\varvec{\alpha},\varvec{Y}}}^{2} = \frac{1}{{n^{2} }}\mathop \sum \limits_{k,l = 1}^{n} A_{kl} B_{kl} .$$

For the sample distance covariance, if n is large enough, it is true that [22]:

$$\widehat{\nu }_{{\left( {{\mathbf{X}},{\mathbf{Z}}} \right),{\mathbf{Y}}}}^{2} \le \widehat{\nu }_{{{\mathbf{X}},{\mathbf{Y}}}}^{2} ,$$

under the assumption of independence between (X, Y) and \(\varvec{Z} \in R^{p}\).

This fact has been implemented as a stopping rule in the selections of responses. The procedure took the following steps:

  1. 1.

    Calculate marginal distance covariances for \(X_{k}\), \(k = 1, \ldots ,p\) with the response Y.

  2. 2.

    Rank the variables in decreasing order of the distance covariances. Denote the ordered predictors as \(X_{\left( 1 \right)} ,X_{\left( 2 \right)} , \ldots ,X_{\left( p \right)}\). Start with \(X_{S} = \left\{ {X_{\left( 1 \right)} } \right\}\).

  3. 3.

    For k from 2 to p, keep adding \(X_{\left( k \right)}\) to \(X_{S}\) if \(\widehat{\nu }_{{\varvec{X}_{S} ,\varvec{Y}}}^{2}\) does not decrease. Stop otherwise.

Results and Discussion

The use of sewage sludge as a fertiliser in the soils is promoted by the European Union due to its nutrient contents, but it is also in line with the concepts of bio-economy and circular economy providing an added value to the residue [12]. However, sewage sludge requires special conditioning, i.e. a process which improves its quality and usually is conducted by chemical techniques. An alternative to chemical methods, the use of biological ones has been gaining great interest in recent years due to its safe and non-toxic nature. Wong et al. [7] found a biogenic flocculant derived from an iron-oxidising bacterium as a potential sludge conditioning agent improving the quality of sewage sludge. At the same time, Tomei et al. [2] highlighted inconsistencies in the use of biopreparation in sewage sludge managements. In the presented study the used EMTM preparation is described as a combination of over 80 genetically unchanged coexisting beneficial microorganisms [23]. EMTM is reported to include a population of lactic acid bacteria and yeasts as well as smaller numbers of phototrophic bacteria, fungi and Actinomyces. Such a selection of microorganisms in practice guarantees its high and advantageous impact on the environment exerted by maximizing conversion of organic matter into soil humus, increasing beneficial native microbiological populations and limiting the development of pathogenic organisms. The literature overview conducted by Jakubus [8] indicates that some research projects confirmed a positive effect of EMTM manifested by increased quantities of organic carbon and improved soil fertility and some did not. The result of the presented study provides similar conclusions. The application of EMTM preparation slightly influenced the liberation of elements. Regardless of the bio-preparation dose and its dissolution, the rate and transformations of individual elements during sewage sludge incubation were similar (Fig. 1). Sewage sludge after the experiment showed lower Mg, K, Na and Ca amounts than before the incubation process. Higher contents were found only for C, N and P, i.e. elements which releasing closely is connected with sewage sludge organic matter mineralisation. The amounts of elements increased thanks to the concentration effect caused by the strong degradation of the labile organic carbon compounds, which reduced the weight of the sewage sludge mass [1]. The amounts of S, P and K gradually decreased from the beginning of experiment until the 28th day and then increased, reaching the highest contents at the end of incubation. A similar tendency was observed for Mg, Na and Ca, but in these cases the lowest amounts were assessed at the 42nd day of incubation. The quantitative changes of nitrogen and carbon in sewage sludge showed a different pattern. The data presented in Fig. 1 showed that the dose of the concentrated EMTM preparation influenced the N and C contents in sewage sludge, with the values being higher in comparison to those in sewage sludge from T0 and T1. It should be underlined that irrespective of treatment, the amounts of nitrogen were the lowest at the beginning of incubation and the highest at the end of this process. A similar tendency as described above was noticed in the carbon amounts assessed in sewage sludge represented as T0 and T2. The dose of the dissolved EMTM preparation caused limited changes in carbon amounts during incubation and finally they were lower at the end of the experiment than at the beginning.

Fig. 1
figure 1

Quantitative changes of elements during sewage sludge incubation with and without EMTM preparation

In the first step data are transformed into functional data. B-spline basis functions are used. It is worth to mention that during the smoothing process we can use different basis functions e.g. Fourier series, wavelets and others. The functional means of the examined data are presented on Fig. 1. It seems that the best for recognizing the groups are nitrogen, carbon, potassium, sulphur and phosphorus. On the other hand, it seems that sodium, calcium and magnesium are completely unimportant and remains the same for all groups for the whole time period.

Although sewage sludge is abundant in various nutrients, biogenic ones play an essential role in the modification of sewage sludge chemical composition and its quality. The data obtained on the basis of distance covariance indicated such an interrelationship (Fig. 2).

Fig. 2
figure 2

Selected variables (elements) for functional data and indicated groups

The most important element over the whole incubation period was phosphorus which was selected for all group (to recognize T0 and T1 is even the only one selected). It can be also observed that very important elements are carbon (3 selections), sulphur (3 selections) and unexpectedly calcium (2 selections). In this aspect, the valuable information gives the confrontation the chemical composition of sewage sludge from T1 and sewage sludge from T2, where C, S, P were selected for sewage sludge assessment. On the other hand, magnesium, sodium and potassium have never been selected. It can be concluded that the reduction of data is quite big. In one case we reduced data to only one variable, in two cases we reduced data to 3 variables and in one case to 5 variables. Hence, on average data are reduced to 37.5% of original data.

To confirm the results, our samples are recognized with five popular classifiers—linear discriminant analysis (LDA), Gaussian naïve Bayes (NB normal), kernel naïve Bayes (NB kernel), multinomial naïve Bayes (NB multinomial) and k-nearest neighbours (kNN). Separately, we performed analysis for reduced functional data (only for selected variables) and for full functional data. The cross-validation (leave-one-out) method is used to estimate the classification error rate. The results are presented in Table 1. It can be see that we achieved better classification results for reduced data for most of the cases (80%). Particularly interesting is LDA classifier which always gives us the perfect classification for reduced data. Thus, it can be see that the proposed variable selection method except of reduction data may also significantly help us in samples recognition.

Table 1 Classification results (the percentage of correct classification) for functional data

The conducted dynamic scheme of experiment gives a good opportunity to observe the elements transformations during sewage sludge conditioning. According to this the functional analysis was performed separately for sewage sludge representing the beginning and the end of incubation process. The results of variable selection are showed on Figs. 3 and 4. At the beginning of incubation (Fig. 3), phosphorus (3 times), carbon (3 times), calcium (3 times) and sulphur (2 times) were selected. On average data were reduced to 34.375% of original data. The end of incubation process caused, that carbon (4 times), nitrogen (3 times), phosphorus (2 times) and sulphur (once) were selected (Fig. 4). Therefore, data were reduced to 31.25%. In both cases, a little bigger reduction of data was obtained. Now, the classification analysis is performed. The appropriate information can be find in Tables 2 and 3. From Table 2 can be see that recognition of samples was quite simple at the beginning of the experiment. Especially, when only two groups are compared. It seems that the selection variables (elements) helps a lot in classification process. From Table 2 can be see that in the case of classification into one of three groups T0, T1 and T2, three classifiers give better results for four selected elements than for all eight elements and two classifiers give identical results for four selected elements and all eight elements. This means that four unselected elements interfere with the classification process. This observation is also confirmed in the case of classification into one of two selected groups. The classification process based on data on the end of incubation is more difficult (Table 3). But also, in this case 65% better or identical results are obtained for the selected elements compared to all elements.

Fig. 3
figure 3

Selected variables (elements) for nutrients in sewage sludge at the beginning of incubation and indicated groups

Fig. 4
figure 4

Selected variables (elements) for nutrients in sewage sludge at the end of incubation. and indicated groups

Table 2 Classification results (the percentage of correct classification) for nutrients in sewage sludge at the beginning of incubation
Table 3 Classification results (the percentage of correct classification) for nutrients in sewage sludge at the end of incubation

The obtained results for sewage sludge indicating on high importance of P, C, Ca and S at the beginning of conditioning process as well as P, C, N and S at the end of its are confirmed earlier findings. At the same time once again but in more precisely should be underlined significance of biogenic nutrients as C, N, P and S. The presence of calcium in this group could be linked to chemical stabilization of sewage sludge, because for this purpose commonly CaO is used.

Summarizing, analysing the functional data gives us the information about the whole process not only information about some specific points like start and end points. Our original variable selection method allows not only for data reduction, but also improves the quality of the group recognition process. Coefficient of distance covariance proposed by Szekély et al. [17] is very useful as a measure of the dependence between a vector of labels and a vector of variables (elements) that describe objects in a multi-class classification. In the experiment, variables (elements) are observed in many moments in time. The obtained data are transformed into continuous functions, called functional data. Thus, a covariance distance coefficient for functional data is necessary here. Such coefficient was defined by Górecki et al. [21]. In this paper, this coefficient was used to select the variables (elements) that best distinguish among the groups. The use of this coefficient turned out to be extremely useful. First of all, it allowed to reduce eight elements, up to five depending on the selected groups subject to classification. Secondly, it turned out that the five popular classifiers used to distinguish among groups gave in most cases results better or identical to the reduced number of elements. It seems (to be certain we should perform more similar experiments) that the three elements (sodium, calcium and magnesium) may not be analysed in similar future experiments, because they do not bring useful information, and even they disturb the process of classification. Unfortunately, conclusions arising from the experiment cannot be compared with others, because the proposed method of variable selection for functional data is relatively new and we do not know the results of its applications in other experiments.

During the computation process we used R software [24] and libraries: caret [25], energy [26] and fda [27].

Conclusions

The significance of EMTM preparation in sewage sludge conditioning should be evaluated as minor. The positive effect of the applied bio-preparation was observed only in the case of carbon and nitrogen, which amounts increased along the incubation time. Due to the lack of a significant bio-preparation influence on nutrient liberation from the sewage sludge the conducted statistical analysis gives more common results. These findings can be used in general not only for the performed experiment, so the applied distance covariance may assess as a valuable tool, because it facilitates nutrient selection. The selected elements, independently of experimental factors (the term of incubation and applied EMTM preparation) gave common information useful in the identification of the most characteristic nutrients for sewage sludge determining its quality. Conducted, detailed analyses—both chemical and statistical—indicated the high significance of phosphorous, carbon and sulphur in sewage sludge assessment. Such knowledge makes it possible to simplify the procedure of sewage sludge evaluation using only the necessary methods. Utilization only a few selected methods allows to cut time and costs of laboratory work, while ensuring high accuracy and precision of sewage sludge assessment.

These conclusions were obtained thanks to the author’s method of selecting variables (elements) based on the coefficient of covariance distance defined for multidimensional functional data. Data from the experiment, being double-dimensional data (many variables are observed in many moments of time) are transformed into multidimensional functional data. There is no doubt that modern statistical methods based on multidimensional functional data lead to in-depth conclusions resulting from the conducted experiment.