Advertisement

Computational Statistics

, Volume 29, Issue 5, pp 1129–1152 | Cite as

A sliced inverse regression approach for data stream

  • Marie Chavent
  • Stéphane Girard
  • Vanessa Kuentz-Simonet
  • Benoit Liquet
  • Thi Mong Ngoc Nguyen
  • Jérôme Saracco
Original Paper

Abstract

In this article, we focus on data arriving sequentially by blocks in a stream. A semiparametric regression model involving a common effective dimension reduction (EDR) direction \(\beta \) is assumed in each block. Our goal is to estimate this direction at each arrival of a new block. A simple direct approach consists of pooling all the observed blocks and estimating the EDR direction by the sliced inverse regression (SIR) method. But in practice, some disadvantages appear such as the storage of the blocks and the running time for large sample sizes. To overcome these drawbacks, we propose an adaptive SIR estimator of \(\beta \) based on the optimization of a quality measure. The corresponding approach is faster both in terms of computational complexity and running time, and provides data storage benefits. The consistency of our estimator is established and its asymptotic distribution is given. An extension to multiple indices model is proposed. A graphical tool is also provided in order to detect changes in the underlying model, i.e., drift in the EDR direction or aberrant blocks in the data stream. A simulation study illustrates the numerical behavior of our estimator. Finally, an application to real data concerning the estimation of physical properties of the Mars surface is presented.

Keywords

Effective dimension reduction (EDR) Sliced inverse regression (SIR) Data stream 

Notes

Acknowledgments

The authors thank Sylvain Douté for his contribution to the data. They are grateful to the anonymous referees for contributing to the improvement of this paper through their useful remarks and detailed comments.

References

  1. Barreda L, Gannoun A, Saracco J (2007) Some extensions of multivariate SIR. J Stat Comput Simul 77(1–2):1–17MathSciNetCrossRefzbMATHGoogle Scholar
  2. Barrios MP, Velilla S (2007) A bootstrap method for assessing the dimension of a general regression problem. Stat Probab Lett 77(3):247–255MathSciNetCrossRefzbMATHGoogle Scholar
  3. Bernard-Michel C, Douté S, Fauvel M, Gardes L, Girard S (2009a) Retrieval of Mars surface physical properties from OMEGA hyperspectral images using regularized sliced inverse regression. J Geophys Res Planets 114:E06005Google Scholar
  4. Bernard-Michel C, Gardes L, Girard S (2009b) Gaussian regularized sliced inverse regression. Stat Comput 19:85–98MathSciNetCrossRefGoogle Scholar
  5. Chavent M, Kuentz V, Liquet B, Saracco J (2011) A sliced inverse regression approach for a stratified population. Commun Stat Theory Methods 40:1–22MathSciNetCrossRefGoogle Scholar
  6. Chavent M, Girard S, Kuentz V, Liquet B, Nguyen TMN, Saracco J (2012) Régression inverse par tranches sur flux de données. In: 44èmes Journées de Statistique (SFdS), Brussels, Belgium, http://hal.archives-ouvertes.fr/hal-00736584 (in French)
  7. Chen C-H, Li K-C (1998) Can SIR be as popular as multiple linear regression? Stat Sinica 8(2):289–316zbMATHGoogle Scholar
  8. Cook RD (2007) Fisher lecture: dimension reduction in regression (with discussion). Stat Sci 22:1–26CrossRefzbMATHGoogle Scholar
  9. Douté S, Schmitt B, Langevin Y, Bibring J-P, Altieri F, Bellucci G, Gondet B, Poulet F (2007) South pole of Mars: nature and composition of the icy terrains from Mars express OMEGA observations. Planet Space Sci 55(1–2):113–133CrossRefGoogle Scholar
  10. Duan N, Li KC (1991) Slicing regression: a link-free regression method. Ann Stat 19:505–530MathSciNetCrossRefzbMATHGoogle Scholar
  11. Ferré L (1998) Determining the dimension in sliced inverse regression and related methods. J Am Stat Assoc 93(441):132–140zbMATHGoogle Scholar
  12. Hall P, Li KC (1993) On almost linearity of low dimensional projections from high dimensional data. Ann Stat 21:867–889MathSciNetCrossRefzbMATHGoogle Scholar
  13. Harville DA (1999) Matrix algebra from a statistician’s perspective. Springer, New YorkGoogle Scholar
  14. Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86:316–342CrossRefzbMATHGoogle Scholar
  15. Liquet B, Saracco J (2008) Application of the bootstrap approach to the choice of dimension and the \(\alpha \) parameter in the \(\text{ SIR }_\alpha \) method. Commun Stat Simul Comput 37(6):1198–1218MathSciNetCrossRefzbMATHGoogle Scholar
  16. Liquet B, Saracco J (2012) A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches. Comput Stat 27:103–125MathSciNetCrossRefGoogle Scholar
  17. Lue H-H (2009) Sliced inverse regression for multivariate response regression. J Stat Plan Inference 139(8):2656–2664MathSciNetCrossRefzbMATHGoogle Scholar
  18. Saracco J (1997) An asymptotic theory for sliced inverse regression. Commun Stat Theory Methods 26(9):2141–2171MathSciNetCrossRefzbMATHGoogle Scholar
  19. Saracco J (2005) Asymptotics for pooled marginal slicing estimator based on \(\text{ SIR }_\alpha \). J Multivar Anal 96:117–135MathSciNetCrossRefzbMATHGoogle Scholar
  20. Schmidt F, Douté S, Schmitt B (2007) Wavanglet: an efficient supervised classifier for hyperspectral images. IEEE Trans Geosci Remote Sens 45(5):1374–1385CrossRefGoogle Scholar
  21. Schott JR (1994) Determining the dimensionality in sliced inverse regression. J Am Stat Assoc 89(425):141–148MathSciNetCrossRefzbMATHGoogle Scholar
  22. Scrucca L (2007) Class prediction and gene selection for DNA microarrays using regularized sliced inverse regression. Comput Stat Data Anal 52:438–451MathSciNetCrossRefzbMATHGoogle Scholar
  23. Shao Y, Cook RD, Weisberg S (2009) Partial central subspace and sliced average variance estimation. J Stat Plan Inference 139(3):952–961MathSciNetCrossRefzbMATHGoogle Scholar
  24. Tyler DE (1981) Asymptotic inference for eigenvectors. Ann Stat 9(4):725–736MathSciNetCrossRefzbMATHGoogle Scholar
  25. Zhong W, Zeng P, Ma P, Liu JS, Zhu Y (2005) RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics 21(22):4169–4175CrossRefGoogle Scholar
  26. Zhu LX, Ohtaki M, Li Y (2007) On hybrid methods of inverse regression-based algorithms. Comput Stat 51:2621–2635MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Marie Chavent
    • 1
    • 2
  • Stéphane Girard
    • 3
  • Vanessa Kuentz-Simonet
    • 4
  • Benoit Liquet
    • 5
    • 6
  • Thi Mong Ngoc Nguyen
    • 7
  • Jérôme Saracco
    • 1
    • 2
  1. 1.Institut de Mathématiques de Bordeaux, UMR CNRS 5251Université de BordeauxTalence CedexFrance
  2. 2.CQFD TeamInria Bordeaux Sud-OuestTalence CedexFrance
  3. 3.LJK, MISTIS TeamInria Grenoble Rhône-AlpesSaint-Ismier CedexFrance
  4. 4.Unité ADBX “Aménités et Dynamiques des Espaces Ruraux”IRSTEACestas CedexFrance
  5. 5.ISPED, Centre INSERM U-897-Epidémiologie-BiostatistiqueUniversité de BordeauxBordeaux France
  6. 6.ISPED, Centre INSERM U-897-Epidémiologie-BiostatistiqueINSERMBordeaux France
  7. 7.IRMA, UMR 7501Université de Strasbourg67084 France

Personalised recommendations