Skip to main content
Log in

Wavelet-Based Estimation of Generalized Discriminant Functions

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

In this work we propose a wavelet-based classifier method for binary classification. Basically, based on a training data set, we provide a classifier rule with minimum mean square error. Under mild assumptions, we present asymptotic results that provide the rates of convergence of our method compared to the Bayes classifier, ensuring universal consistency and strong universal consistency. Furthermore, in order to evaluate the performance of the proposed methodology for finite samples, we illustrate the approach using Monte Carlo simulations and real data set applications. The performance of the proposed methodology is compared with other classification methods widely used in the literature: support vector machine and logistic regression model. Numerical results showed a very competitive performance of the new wavelet-based classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, vol. 10 of Lecture Notes in Statistics, 2nd edn. Springer, New York.

    Book  Google Scholar 

  • Cai, T.T. and Brown, L.D. (1998). Wavelet shrinkage for nonequispaced samples. Ann. Statist.26, 1783–1799.

    Article  MathSciNet  Google Scholar 

  • Cai, T.T. and Brown, L.D. (1999). Wavelet estimation for samples with random uniform design. Statist. Probab. Lett.42, 313–321.

    Article  MathSciNet  Google Scholar 

  • Chang, W., Kim, S.-H. and Vidakovic, B. (2003). Wavelet-based estimation of a discriminant function. Appl. Stoch. Models Bus. Ind.19, 185–198.

    Article  MathSciNet  Google Scholar 

  • Daubechies, I. (1992). Ten Lectures on Wavelets. Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia.

    MATH  Google Scholar 

  • Daubechies, I. and Lagarias, J.C. (1991). Two-scale difference equations I: existence and global regularity of solutions. SIAM J. Math. Anal.22, 1388–1410.

    Article  MathSciNet  Google Scholar 

  • Daubechies, I. and Lagarias, J.C. (1992). Two-scale difference equations II. Local regularity, infinite products of matrices and fractals. SIAM J. Math. Anal.23, 1031–1079.

    Article  MathSciNet  Google Scholar 

  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Stochastic Modelling and Applied Probability. Springer, New York.

    Book  Google Scholar 

  • Donoho, D.L. (1993). Nonlinear Wavelet Methods for Recovery of Signals, Densities, and Spectra from Indirect and Noisy Data. In Proceedings of Symposia in Applied Mathematics. vol. 47, pp. 173–205. AMS.

  • Greblicki, W. (1981). Asymptotic efficiency of classifying procedures using the hermite series estimate of multivariate probability densities. IEEE Trans. Inform. Theory27, 364–366.

    Article  MathSciNet  Google Scholar 

  • Greblicki, W. and Pawlak, M. (1982). A classification procedure using the multiple fourier series. Inform. Sci.26, 115–126.

    Article  MathSciNet  Google Scholar 

  • Greblicki, W. and Pawlak, M. (1983). Almost sure convergence of classification procedures using hermite series density estimates. Pattern Recognition Letters2, 13–17.

    Article  Google Scholar 

  • Greblicki, W. and Rutkowski, L. (1981). Density-free bayes risk consistency of nonparametric pattern recognition procedures. Proceedings of the IEEE69, 482–483.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R. and Friedman, J.H. (2017). The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York.

  • Huang, J.Z. and Shen, H. (2004). Functional coefficient regression models for non-linear time series: a polynomial spline approach. Scand. J. Stat.31, 515–534.

    Article  MathSciNet  Google Scholar 

  • Kulik, R. and Raimondo, M. (2009). Wavelet regression in random design with heteroscedastic dependent errors. Ann. Statist.37, 3396–3430.

    Article  MathSciNet  Google Scholar 

  • Lichman, M. (2013). UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine. http://Archive.Ics.Uci.Edu/Ml.

    Google Scholar 

  • Mallat, S. (2008). A Wavelet Tour of Signal Processing: the Sparce Way, 3rd edn. Academic Press, New York.

    Google Scholar 

  • Mangasarian, O.L., Street, W.N. and Wolberg, W.H. (1995). Breast cancer diagnosis and prognosis via linear programming. Oper. Res.43, 570–577.

    Article  MathSciNet  Google Scholar 

  • Montoril, M.H., Morettin, P.A. and Chiann, C. (2014). Spline estimation of functional coefficient regression models for time series with correlated errors. Statist. Probab. Lett.92, 226–231.

    Article  MathSciNet  Google Scholar 

  • Ogden, R.T. (1997). Essential Wavelets for Statistical Applications and Data Analysis. Birkhäuser, Boston.

    Book  Google Scholar 

  • Pandit, S.M. and Wu, S.-M. (1993). Time Series and System Analysis with Applications. Krieger Publishing Company, Malabar.

    Google Scholar 

  • Ramírez, P. and Vidakovic, B. (2010). Wavelet density estimation for stratified size-biased sample. J. Statist. Plann. Inference140, 419–432.

    Article  MathSciNet  Google Scholar 

  • Restrepo, J.M. and Leaf, G.K. (1997). Inner product computations using periodized daubechies wavelets. Internat. J. Numer. Methods Engrg.40, 3557–3578.

    Article  MathSciNet  Google Scholar 

  • Van Ryzin, J. (1966). Bayes risk consistency of classification procedures using density estimation. Sankhyā, Ser. A28, 261–270.

    MathSciNet  MATH  Google Scholar 

  • Vidakovic, B. (1999). Statistical Modeling by Wavelets Wiley Series in Probability and Statistics. Wiley-Interscience, New York.

    Google Scholar 

  • Zhao, Y., Ogden, R.T. and Reiss, P.T. (2012). Wavelet-based LASSO in functional linear regression. J. Comput. Graph. Statist.21, 600–617.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors thank the Editor for the insightful suggestions and comments that led to a considerable improvement of the paper. The first author acknowledges FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) for the visit to the Georgia Institute of Technology (2013/21273-5) and for his post-doc at the University of Campinas (2013/09035-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michel H. Montoril.

Appendix: Proofs

Appendix: Proofs

In this appendix we provide the proofs of the theorems presented in Section 3. The proof of Theorem 1 is partially based on the proof of Theorem 1 in Montoril, Morettin and Chiann (2014) and the proof of Theorem 1 of Huang and Shen (2004). We use two lemmas and one proposition, which are given below. For the sake of simplicity, we will use hereafter the symbol≲ to represent the magnitude order O, i.e., we will write anbn to represent an = O(bn), for the two positive sequences an and bn. Hereafter, we will use the abbreviation a.s. to denote “almost surely”.

Lemma 1.

Under assumptions of the model in Section 3,

figure a

Lemma 2.

Under assumptions of the model in Section 3, there are 0 < AB < such that all the eigenvalues of\( \frac {1}{n} \boldsymbol {Z}^{\top } \boldsymbol {Z}\)fallin [A, B] a.s., asn, whereZis an × 2Jmatrix such that itsi-th line corresponds toϕJk(Xi), k = 0, 1,…, 2J − 1.

Proposition 1.

Following the notations in the proof of Lemma 1, for anyk, l = 0, 1,…, 2J − 1, η > 0, s ≥ 3, 2r < γ < 1 and 0 < δ < γ − 2r,

figure b

Proof of Theorem 1.

Observe that

figure c

where \(f_{J}(x) = {\sum }_{k = 0}^{2^{J}-1} c_{Jk} \phi _{Jk}(x)\) is the orthogonal projection of f on VJ. Since \(\| f_{J} - f \|^{2} = {{\rho }_{J}^{2}}\), it will be enough to verify that .

Note that the least squares estimator of the wavelet coefficients cJ can be written as

$$\hat{\boldsymbol{c}}_{J} = (\boldsymbol{Z}^{\top} \boldsymbol{Z})^{-1} \boldsymbol{Z}^{\top} \boldsymbol{Y}^{*}, $$

whereZ is a n × 2J matrix such that its i-th line corresponds to ϕJk(Xi), k = 0, 1,…, 2J − 1, and \(\boldsymbol {Y}^{*} = ({Y}_{1}^{*}, \ldots , {Y}_{n}^{*})^{\top }\), with \({Y}_{i}^{*} = 2Y_{i} - 1\), as in model (3.1). Denote \(\bar {\boldsymbol {Y}}^{*} = ({\bar {Y}}_{1}^{*}, \ldots , {\bar {Y}}_{n}^{*})^{\top }\), where \({\bar {Y}}_{i}^{*} = f(X_{i})\), and define \(\bar {\boldsymbol {c}}_{J} = (\boldsymbol {Z}^{\top } \boldsymbol {Z})^{-1} \boldsymbol {Z}^{\top } \bar {\boldsymbol {Y}}^{*}\). Thus,

$$\boldsymbol{c}_{J} - \bar{\boldsymbol{c}}_{J} = (\boldsymbol{Z}^{\top} \boldsymbol{Z})^{-1} \boldsymbol{Z}^{\top} \boldsymbol{\epsilon}, $$

where 𝜖 = (𝜖1,…,𝜖n), with 𝜖i defined as in the model (3.1).

Since the errors are assumed to be iid and independent of the covariates . This implies that

figure f

Then, by Lemma 2,

figure g

Hence, by Parseval’s identity,

figure h

Again, applying the Parseval’s identity and the Lemma 2,

figure i

Once \(\boldsymbol {Z} \bar {\boldsymbol {c}}_{J} = \boldsymbol {Z}(\boldsymbol {Z}^{\top } \boldsymbol {Z})^{-1} \boldsymbol {Z}^{\top } \bar {\boldsymbol {Y}}^{*} \) is an orthogonal projection, by (i) and (ii),

figure j

which implies that

figure k

The desired result follows from the fact that , by (A.1) and (A.2).

Proof of the Corolary 1.

By Corollary 6.2 in Devroye et al. (1996) and by assumption (ii),

figure n

Thus, applying the expectation in both sides, the result follows from Theorem 1,

figure o

Proof of Theorem 2..

By Corollary 6.2 in Devroye et al. (1996), we see that

$$L(g_{J}) - L^{*} \leq \sqrt{ \int (\hat{f}_{J}(x) - f(x)^{2} \mu(dx) }. $$

In order to verify the convergence of the right hand side of the inequality above, observe that the integral above can be written as

figure p

Combining the last two terms above and applying the assumption (ii),

figure q

Now observe that

figure r

where the class of functions \(\mathcal {T}\) is defined by

$$\mathcal{T} = \left\{ h(x,y) = (f(x)-y^{*})^{2} \colon \: f \in V_{J} \right\}. $$

Since y = − 1or y = 1,|y| = 1. Furthermore, since |ϕ(x)|≤ W, for some positive constant W, we have that

$$\begin{array}{@{}rcl@{}} 0 &\leq& h(x, y^{*}) = (f(x) - y^{*})^{2} = \left( \sum\limits_{k = 0}^{2^{J} - 1} c_{Jk} \phi_{Jk}(x) - y^{*} \right)^{2}\\ & \leq& \left( \sum\limits_{k = 0}^{2^{J} - 1} |c_{Jk}| |\phi_{Jk}(x)| + |y^{*}| \right)^{2} \\ & \leq& \left( W 2^{J/2}\sum\limits_{k = 0}^{2^{J} - 1} |c_{Jk}| + 1 \right)^{2} \\ & \leq& (W_{1} 2^{J} + 1 )^{2} \leq C 2^{2J}, \end{array} $$

for some positive constant W1. Observe that we used the fact that there exists a positive constant C such that \({\sum }_{k = 0}^{2^{J} - 1} |c_{Jk}| \leq C 2^{J/2}\).

By Theorem 29.1 in Devroye et al. (1996) and (A.4),

figure s

where \({{Z}_{1}^{n}} = (X_{1}, Y_{1}), \ldots , (X_{n},Y_{n})\), and \(\mathcal {N} (\epsilon , \mathcal {T}({{z}_{1}^{n}}))\) is the 1-covering number of \(\mathcal {T}({{z}_{1}^{n}})\), as in Definition 29.1 in Devroye et al. (1996).

For fixed \({{z}_{1}^{n}}\), one can estimate \(\mathcal {N} \left (\frac {\epsilon }{16}, \mathcal {T}({{z}_{1}^{n}})\right )\). For arbitrary functions f1, f2VJ, denote h1(x, y) = (f1(x) − y)2 and h2(x, y) = (f2(x) − y)2. Then, for any probability measure ν on

figure u

for some positive constant C1. Thus, for any \({{z}_{1}^{n}} = (x_{1}, {y}_{1}^{*}), \ldots , (x_{n}, {y}_{n}^{*})\) and 𝜖,

$$\mathcal{N} (\epsilon, \mathcal{T}({{z}_{1}^{n}})) \leq \mathcal{N} \left( \frac{\epsilon}{C_{1} 2^{J}},V_{J}({{x}_{1}^{n}})\right), $$

where\(V_{J}({{x}_{1}^{n}}) = \{ (f(x_{1}), \ldots , f(x_{n})): f \in V_{J} \}\). Thus, it suffices to estimate the covering number corresponding toVJ, which is a subspace of a linear space of functions. Then, following the Definitions 12.1 and 12.3, and Theorem 13.9 in Devroye et al. (1996), \( V_{V_{J}^{+}} \leq 2^{J} \). Hence, by Corollary 29.2 in the same reference,

$$\begin{array}{@{}rcl@{}} \mathcal{N} \left( \frac{\epsilon/16}{C_{1} 2^{J}}, V_{J}({x_{1}^{n}}) \right) \!\!\!\!&\leq&\!\!\!\! \left[ \frac{ 4 e W 2^{J} }{ \epsilon/(16 C_{1} 2^{J}) } \log \left( \frac{ 2 e W 2^{J} }{ \epsilon/(16 C_{1} 2^{J}) } \right) \right]^{2^{J}}\\ &=&\!\!\!\! [C_{2} 2^{2J} \log (C_{2} 2^{2J} )]^{2^{J}}\\ &\leq&\!\!\!\! (C_{3} 2^{4J})^{2^{J}}, \end{array} $$
(A.6)

for some positive constant C3.

By (A.5) and (A.6),

figure v

where C3 and C4 are positive constants, and the convergence follows from assumption (iii). The fact that ρJ = o(1), (A.3) and (A.7) ensure that gJ is universally consistent, as already stated in the Corollary 1.

In order to verify that \(\hat {g}_{J}\) is strongly universal consistent, it suffices, by the Borel-Cantelli lemma and Eq. A.7, to verify the convergence of the series

$$\underset{n}{\sum} (C_{3} 2^{4J} )^{2^{J}} e^{-n\epsilon^{2}/(512(C 2^{2J})^{2})} \equiv \underset{n}{\sum} a_{n}. $$

Based on assumption (iii), there are positive constants B, C and D such that

$$a_{n} = \left( \frac{C n^{B}}{e^{D n^{1-5r}}} \right)^{n^{r}}. $$

It is easy to see that there existsa natural n0 such that, for all nn0,

$$\frac{C n^{B}}{e^{D n^{1-5r}}} \leq \frac{1}{n}. $$

Furthermore, for every n ≥ 21/r,

$$\left( \frac{1}{n} \right)^{n^{r}} \leq \left( \frac{1}{n} \right)^{2}. $$

Then, there exist anatural m such that, for all n > m,

$$\underset{n \geq 1}{\sum} a_{n} \leq \sum\limits_{n = 1}^{m} a_{n} + \sum\limits_{n > m} \frac{1}{n^{2}} < \infty, $$

which ensures the desired result.

Proof of Theorem 3.

By the Parseval’s identity

figure w

Since 0 ≤ λjk ≤ 1, then \(0 \leq \lambda _{jk}^{2} \leq 1\) and 0 ≤ (1 − λjk)2 ≤ 1. Thus, the second term of the right hand side of the inequality above can be bounded by

figure x

because \({\sum }_{j = J_{0}}^{J - 1} {\sum }_{k} d_{jk}^{2} = {\rho }_{J_{0}}^{2} - {{\rho }_{J}^{2}} \).

Furthermore, we have observed in the proof of Theorem 1 that

figure y

Hence,

figure z

which yields the desired result, because \(\rho _{J_{0}} \asymp \rho _{J}\) (due to the fact that J0 and J have the same order of convergence).

The results of universal consistency and strong universal consistency can be analogously derived as in the proofs of the Corollary 1 and the Theorem 2, respectively.

Proof of Lemma 1.

Denote \({E}_n(Z.) = \frac {1}{n} {\sum }_{i = 1}^{n} Z_{i} \)and, where Z1, Z2,… is a stationary series. For any fVJ there is a vector \( \boldsymbol {c} = (c_{J0}, \ldots , c_{J,2^{J}-1})^{\top } \), |cJ|2 < , such that \( f(x) = {\sum }_{k = 0}^{2^{J}-1} c_{Jk} \phi _{Jk}(x) \). Fix η > 0. If |(EnE)ϕJk(X.)ϕJl(X.)|≤ η, then

$$\begin{array}{@{}rcl@{}} |({E}_n - {E}) [f(X.)]^{2}| \!\!\!\!&=&\!\!\!\! \left| \underset{k,l}{\sum} c_{Jk} c_{Jl} ({E}_n - {E}) \phi_{Jk}(X.) \phi_{Jl}(X.) \right|\\ &\leq&\!\!\!\! \eta \underset{k,l}{\sum} |c_{Jk}| |c_{Jl}| I_{k,l}, \end{array} $$

where Ik, l is one, if the supports of ϕJk and ϕJl overlap, and zero, otherwise.

It is easy to see that \({\sum }_{k} I_{k,l} \leq C\) and\({\sum }_{l} I_{k,l} \leq C\), for some positive constant C. Thus, applying the Cauchy-Schwarz inequality in the first and second inequalities below, and by the Parseval’s identity,

$$\begin{array}{@{}rcl@{}} \sum\limits_{k,l} |c_{Jk}| |c_{Jl}| I_{k,l} &\leq& \underset{k}{\sum} |c_{Jk}| \left\{ \underset{l}{\sum} {c}_{Jl}^{2} I_{k,l} \right\}^{2} C^{1/2}\\ &\leq& \left\{ \underset{k}{\sum} {c}_{Jk}^{2} \right\}^{1/2} \left\{ \underset{k}{\sum} \underset{l}{\sum} {c}_{Jl}^{2} I_{k,l} \right\}^{1/2} C^{1/2}\\ &\leq& C \left\{ \underset{k}{\sum} {c}_{Jk}^{2} \right\}^{1/2} \left\{ \underset{l}{\sum} {c}_{Jl}^{2} \right\}^{1/2}\\ &=& C |\boldsymbol{c}_{J}|^{2} = C \| f \|^{2}. \end{array} $$

This implies

$$|({E}_n - {E}) [ f(X.)]^{2}| \leq \eta C \| f \|^{2}. $$

Then, since \({\sum }_{k} {\sum }_{l} I_{k,l} \lesssim 2^{J} \asymp n^{r} \),

figure ab

By Proposition 1 and using \(\delta = \frac {2r-\gamma }{2}\), we have that, for every 2r < γ < 1 and s ≥ 3,

figure ac

Observe that, whenever 2r < γ < 1 and \(r + 1 + \frac {sr}{2s + 1} - \frac {2s \alpha (1 - \gamma )}{2s + 1} < -1\),

figure ad

Hence, since η > 0 is arbitrary, we have by Borel-Cantelli lemma that

$$ \sup_{f \in V_{J}} \frac{|({E}_n - {E}) [ f(X.)]^{2}|}{\| f \|^{2}} \overset{a.s.}{\longrightarrow} 0, \quad \text{as} \quad n \to \infty. $$
(A.10)

The fact that

$$\underset{s \to \infty}{\underset{\gamma \to 2r}{\lim}} r + 1 + \frac{sr}{2s + 1} - \frac{2s \alpha (1 - \gamma)}{2s + 1} = \frac{3}{2}r + 1 - \alpha (1 - 2r) $$

ensures that it isalways possible to find 0 < γ < 1 and s ≥ 3 satisfying

$$2r < \gamma < 1 \quad \text{and} \quad r + 1 + \frac{sr}{2s + 1} - \frac{2s \alpha (1 - \gamma)}{2s + 1} < -1. $$

Thus the desired result follows from (A.10), because assumption (ii) ensures .

Proof of Lemma 2.

Let \(\boldsymbol {c}_{J} = (c_{J0}, \ldots , c_{J,2^{J}-1})^{\top }\) be a vector such that |cJ|2 < , and denote f(x; cJ). Thus, by Lemma 1,

figure af

Hence, by assumption (ii) and the Parseval’s identity,

figure ag

which ensures the desired result.

Proof of Proposition 1.

From Theorem 1.4 of Bosq (1998),

figure ah

where c is a positive constant, q ∈ [1,n/2],

figure ai

Observe that, since any compactly supported orthonormal wavelet atom has the order ϕJk(x) ≲ 2J/2, then by assumption (iii),

$$m_{s} \lesssim 2^{J} \lesssim n^{r}, $$

which implies that

$${{m}_{s}^{v}} \lesssim n^{vr}, \quad v, r = 2, 3, \ldots. $$

Let q = nγ, γ ∈ (0, 1). Thus, it is easy to see that

$$ a_{1} \lesssim n^{1-\gamma}, $$
(A.11)
$$ \exp \left( - \frac{n^{\gamma} \eta^{2}}{25 {{m}_{2}^{2}} + 5 c \eta} \right) = o (\exp(- n^{\gamma - 2r - \delta}) ), \quad 2r < \gamma < 1, \quad 0 < \delta < 2r - \gamma, $$
(A.12)

and

$$ a_{2}(s) \lesssim n^{1 + \frac{sr}{2s + 1}}. $$
(A.13)

By assumption (iv),

$$ \alpha\left( \left[ \frac{n}{q + 1} \right] \right)^{\frac{2s}{2s + 1}} \lesssim n^{-\frac{2s \alpha (1-\gamma)}{2s + 1}}. $$
(A.14)

Then, the desired result follows from (A.11)–(A.14).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Montoril, M.H., Chang, W. & Vidakovic, B. Wavelet-Based Estimation of Generalized Discriminant Functions. Sankhya B 81, 318–349 (2019). https://doi.org/10.1007/s13571-018-0158-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-018-0158-1

Keywords and phrases.

AMS (2000) subject classification.

Navigation