Abstract
This survey paper aims mainly at giving computer scientists a rapid bird’s eye view, from a mathematician’s perspective, of the main statistical methods used in order to extract knowledge from databases comprising various types of observations. After touching briefly upon the matters of supervision, data regularization and a brief review of the main models, the key issues of model assessment, selection and inference are perused. Finally, specific statistical problems arising from applications around data mining and warehousing are explored. Examples and applications are chosen mainly from the vast collection of image and video retrieval, indexation and classification challenges facing us today.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer Series in Statistics (2001)
Friedman, J.H.: Data Mining and Statistics: What’s the Connection? Keynote presentation at 29th Symposium on Interface: Computer Science and Statistics (1997), http://www-stat.stanford.edu/~jhf/
Hand, D.: Classifier technology and the illusion of progress. Statist. Sci. 21(1), 1–14 (2006)
Friedman, J.H.: Comment on classifier technology and the illusion of progress. Statist. Sci. 21(1), 15–18 (2006)
Sarifuddin, M., Missaoui, R., Vaillancourt, J., Hamouda, Y., Zaremba, M.: Analyse statistique de similarité dans une collection d’images. Revue des Nouvelles Technologies de l’Information 1(1), 239–250 (2003)
Kherfi, M.L., Ziou, D., Bernardi, A.: Image retrieval from the world wide web: issues, techniques and systems. ACM Computing Surveys 36(1), 35–67 (2004)
Ganter, B., Wille, R.: Formal concept analysis, mathematical foundations. Springer, Heidelberg (1999)
Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge and data discovery: new challenges. In: Proc. Second Int. Conf. Formal Concept Analysis, Sydney, Australia, pp. 352–371 (2004)
Solo, V.: Topics in advanced time series analysis. Lecture notes in mathematics, vol. 1215, pp. 165–328. Springer, Heidelberg (1986)
Cremers, D.: Bayesian approach to motion-based image and video segmentation. In: Jähne, B., Mester, R., Barth, E., Scharr, H. (eds.) IWCM 2004. LNCS, vol. 3417, pp. 104–123. Springer, Heidelberg (2007)
Wahba, G.: Spline models for observational data. SIAM, Philadelphia (1990)
Daubechies, I.: Ten lectures on wavelets. SIAM, Philadelphia (1992)
Graffigne, C., Heitz, F., Perez, P., Preteux, F.J.: Hierarchical Markov random field models applied to image analysis: a review. In: Proc. SPIE, vol. 2568, pp. 2–17 (1995)
Graffigne, C.: Stochastic modeling in image segmentation. In: Proc. SPIE, vol. 3457, pp. 251–262 (1998)
Bentabet, L., Jodouin, S., Ziou, D., Vaillancourt, J.: Road vectors update using SAR imagery: a snake-based approach. IEEE Trans. on Geoscience and Remote Sensing 41(8), 1785–1803 (2003)
Jodouin, S., Bentabet, L., Ziou, D., Vaillancourt, J., Armenakis, C.: Spatial database updating using active contours for multi-spectral images: application with Landsat 7. ISPRS J. of Photogrammetry and Remote Sensing 57, 346–355 (2003)
Grenander, U.: Lectures in pattern theory, vol. I, II and III. Springer, New York (1981)
Geman, D., Geman, S.: Stochastic relaxation, Gibbs distributions and the bayesian restoration of images. IEEE Trans. Pattern Anal. Math. Intell. 6(6), 721–741 (1984)
Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc., B 36, 192–236 (1974)
Besag, J.: On the statistical analysis of dirty pictures. J. Roy. Statist. Soc., B 48, 259–302 (1986)
Gibbs, A.L.: Bounding the convergence time of the Gibbs sampler in Bayesian image restoration. Biometrika 87(4), 749–766 (2000)
DeGraaf, S.R.: SAR imaging via modern 2-D spectral estimation methods. IEEE Trans. on Image Processing 7(5), 729–761 (1998)
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distributions and its applications. IEEE Trans. Image Processing 13(11), 1533–1543 (2004)
Walther, G.: Multiscale maximum likelihood analysis of a semiparametric model, with application. Ann. Stastist. 29(5), 1297–1319 (2001)
Severini, T.: Likelihood methods in statistics. Oxford Univ. Press, Oxford (2001)
Berger, J.O.: Statistical decision theory and bayesian analysis. Springer, Heidelberg (1980)
Prakasa Rao, B.L.S.: Asymptotic theory of statistical inference. John Wiley, Chichester (1987)
Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1998)
Amit, Y., Trouvé, A.: POP: Patchwork of parts models for object recognition. Intern. J. Comp. Vision 75(2), 267–282 (2007)
Missaoui, R., Sarifuddin, M., Vaillancourt, J.: Similarity measures for an efficient content-based image retrieval. In: IEE Proc. Vision, Image and Signal Processing, vol. 152(6), pp. 875–887 (2005)
Devroye, L.: A course in density estimation. Birkhauser Verlag, Basel (1987)
Efron, B.: The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia (1982)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Besse, P., Le Gall, C., Raimbault, N., Sarpy, S.: Data mining et statistique, avec discussion. Journal de la Société Francaise de Statistique 142, 5–35 (2001)
Tukey, J.W.: Exploratory data analysis. Addison-Wesley, Reading (1977)
Benzécri, J.P.: Histoire et préhistoire de l’analyse des données. Dunod (1982)
Genest, C., Rémillard, B.: Comments on T. Mikosh’s paper Copulas: tales and fact. Extremes 9, 27–36 (2006)
Bouguila, N.: A model based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans. Knowledge and Data Engineering 21(12), 1649–1664 (2009)
Gras, R., Kuntz, P.: An overview of the statistical implicative analysis (SIA) development. Studies in computational intelligence, vol. 127, pp. 11–40 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vaillancourt, J. (2010). Statistical Methods for Data Mining and Knowledge Discovery. In: Kwuida, L., Sertkaya, B. (eds) Formal Concept Analysis. ICFCA 2010. Lecture Notes in Computer Science(), vol 5986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11928-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-11928-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11927-9
Online ISBN: 978-3-642-11928-6
eBook Packages: Computer ScienceComputer Science (R0)