Skip to main content

Statistical Methods for Data Mining and Knowledge Discovery

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5986))

Abstract

This survey paper aims mainly at giving computer scientists a rapid bird’s eye view, from a mathematician’s perspective, of the main statistical methods used in order to extract knowledge from databases comprising various types of observations. After touching briefly upon the matters of supervision, data regularization and a brief review of the main models, the key issues of model assessment, selection and inference are perused. Finally, specific statistical problems arising from applications around data mining and warehousing are explored. Examples and applications are chosen mainly from the vast collection of image and video retrieval, indexation and classification challenges facing us today.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer Series in Statistics (2001)

    Google Scholar 

  2. Friedman, J.H.: Data Mining and Statistics: What’s the Connection? Keynote presentation at 29th Symposium on Interface: Computer Science and Statistics (1997), http://www-stat.stanford.edu/~jhf/

  3. Hand, D.: Classifier technology and the illusion of progress. Statist. Sci. 21(1), 1–14 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  4. Friedman, J.H.: Comment on classifier technology and the illusion of progress. Statist. Sci. 21(1), 15–18 (2006)

    Article  Google Scholar 

  5. Sarifuddin, M., Missaoui, R., Vaillancourt, J., Hamouda, Y., Zaremba, M.: Analyse statistique de similarité dans une collection d’images. Revue des Nouvelles Technologies de l’Information 1(1), 239–250 (2003)

    Google Scholar 

  6. Kherfi, M.L., Ziou, D., Bernardi, A.: Image retrieval from the world wide web: issues, techniques and systems. ACM Computing Surveys 36(1), 35–67 (2004)

    Article  Google Scholar 

  7. Ganter, B., Wille, R.: Formal concept analysis, mathematical foundations. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  8. Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge and data discovery: new challenges. In: Proc. Second Int. Conf. Formal Concept Analysis, Sydney, Australia, pp. 352–371 (2004)

    Google Scholar 

  9. Solo, V.: Topics in advanced time series analysis. Lecture notes in mathematics, vol. 1215, pp. 165–328. Springer, Heidelberg (1986)

    Google Scholar 

  10. Cremers, D.: Bayesian approach to motion-based image and video segmentation. In: Jähne, B., Mester, R., Barth, E., Scharr, H. (eds.) IWCM 2004. LNCS, vol. 3417, pp. 104–123. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Wahba, G.: Spline models for observational data. SIAM, Philadelphia (1990)

    MATH  Google Scholar 

  12. Daubechies, I.: Ten lectures on wavelets. SIAM, Philadelphia (1992)

    MATH  Google Scholar 

  13. Graffigne, C., Heitz, F., Perez, P., Preteux, F.J.: Hierarchical Markov random field models applied to image analysis: a review. In: Proc. SPIE, vol. 2568, pp. 2–17 (1995)

    Google Scholar 

  14. Graffigne, C.: Stochastic modeling in image segmentation. In: Proc. SPIE, vol. 3457, pp. 251–262 (1998)

    Google Scholar 

  15. Bentabet, L., Jodouin, S., Ziou, D., Vaillancourt, J.: Road vectors update using SAR imagery: a snake-based approach. IEEE Trans. on Geoscience and Remote Sensing 41(8), 1785–1803 (2003)

    Article  Google Scholar 

  16. Jodouin, S., Bentabet, L., Ziou, D., Vaillancourt, J., Armenakis, C.: Spatial database updating using active contours for multi-spectral images: application with Landsat 7. ISPRS J. of Photogrammetry and Remote Sensing 57, 346–355 (2003)

    Article  Google Scholar 

  17. Grenander, U.: Lectures in pattern theory, vol. I, II and III. Springer, New York (1981)

    Google Scholar 

  18. Geman, D., Geman, S.: Stochastic relaxation, Gibbs distributions and the bayesian restoration of images. IEEE Trans. Pattern Anal. Math. Intell. 6(6), 721–741 (1984)

    Article  MATH  Google Scholar 

  19. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc., B 36, 192–236 (1974)

    MATH  MathSciNet  Google Scholar 

  20. Besag, J.: On the statistical analysis of dirty pictures. J. Roy. Statist. Soc., B 48, 259–302 (1986)

    MATH  MathSciNet  Google Scholar 

  21. Gibbs, A.L.: Bounding the convergence time of the Gibbs sampler in Bayesian image restoration. Biometrika 87(4), 749–766 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  22. DeGraaf, S.R.: SAR imaging via modern 2-D spectral estimation methods. IEEE Trans. on Image Processing 7(5), 729–761 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  23. Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distributions and its applications. IEEE Trans. Image Processing 13(11), 1533–1543 (2004)

    Article  Google Scholar 

  24. Walther, G.: Multiscale maximum likelihood analysis of a semiparametric model, with application. Ann. Stastist. 29(5), 1297–1319 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  25. Severini, T.: Likelihood methods in statistics. Oxford Univ. Press, Oxford (2001)

    Google Scholar 

  26. Berger, J.O.: Statistical decision theory and bayesian analysis. Springer, Heidelberg (1980)

    Google Scholar 

  27. Prakasa Rao, B.L.S.: Asymptotic theory of statistical inference. John Wiley, Chichester (1987)

    MATH  Google Scholar 

  28. Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1998)

    Article  Google Scholar 

  29. Amit, Y., Trouvé, A.: POP: Patchwork of parts models for object recognition. Intern. J. Comp. Vision 75(2), 267–282 (2007)

    Article  Google Scholar 

  30. Missaoui, R., Sarifuddin, M., Vaillancourt, J.: Similarity measures for an efficient content-based image retrieval. In: IEE Proc. Vision, Image and Signal Processing, vol. 152(6), pp. 875–887 (2005)

    Google Scholar 

  31. Devroye, L.: A course in density estimation. Birkhauser Verlag, Basel (1987)

    MATH  Google Scholar 

  32. Efron, B.: The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia (1982)

    Google Scholar 

  33. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  34. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  35. Besse, P., Le Gall, C., Raimbault, N., Sarpy, S.: Data mining et statistique, avec discussion. Journal de la Société Francaise de Statistique 142, 5–35 (2001)

    Google Scholar 

  36. Tukey, J.W.: Exploratory data analysis. Addison-Wesley, Reading (1977)

    MATH  Google Scholar 

  37. Benzécri, J.P.: Histoire et préhistoire de l’analyse des données. Dunod (1982)

    Google Scholar 

  38. Genest, C., Rémillard, B.: Comments on T. Mikosh’s paper Copulas: tales and fact. Extremes 9, 27–36 (2006)

    Article  MathSciNet  Google Scholar 

  39. Bouguila, N.: A model based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans. Knowledge and Data Engineering 21(12), 1649–1664 (2009)

    Article  Google Scholar 

  40. Gras, R., Kuntz, P.: An overview of the statistical implicative analysis (SIA) development. Studies in computational intelligence, vol. 127, pp. 11–40 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vaillancourt, J. (2010). Statistical Methods for Data Mining and Knowledge Discovery. In: Kwuida, L., Sertkaya, B. (eds) Formal Concept Analysis. ICFCA 2010. Lecture Notes in Computer Science(), vol 5986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11928-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11928-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11927-9

  • Online ISBN: 978-3-642-11928-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics