Abstract
The implication of a considering large binary variables into the smoothed location model will create too many multinomial cells or lead to high multinomial cells and more worrying is that it will cause most of them are empty. We refer this situation as large sparsity problem. When large sparsity of multinomial cells occurs, the smoothed estimators of location model will be greatly biased, hence creating frustrating performance. At worst, the classification rules cannot be constructed. This issue has attracted this paper to further investigate and propose a new approach of the smoothed location model when facing with large sparsity problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krzanowski, W.J.: Mixtures of continuous and categorical variables in discriminant analysis. Biometrics 36, 493–499 (1980)
Wernecke, K.D.: A coupling procedure for the discrimination of mixed data. Biometrics 48(2), 497–506 (1992)
Krzanowski, W.J.: Discrimination and classification using both binary and continuous variables. J. Am. Stat. Assoc. 70(352), 782–790 (1975)
Krzanowski, W.J.: The location model for mixtures of categorical and continuous variables. J. Classif. 10, 25–49 (1993)
Hand, D.J.: Construction and Assessment of Classification Rules: Wiley Series in Probability and Statistics. Wiley, Chichester (1997)
Xu, L., Krzyżak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Olkin, I., Tate, R.F.: Multivariate correlation models with mixed discrete and continuous variables. Ann. Math. Stat. 32(2), 448–465 (1961)
Mahat, N.I., Krzanowski, W.J., Hernandez, A.: Variable selection in discriminant analysis based on the location model for mixed variables. Adv. Data Anal. Classif. 1(2), 105–122 (2007)
Mahat, N.I., Krzanowski, W.J., Hernandez, A.: Strategies for non-parametric smoothing of the location model in mixed-variable discriminant analysis. Mod. Appl. Sci. 3(1), 151–163 (2009)
Hamid, H.: A new approach for classifying large number of mixed variables. In: International Conference on Computer and Applied Mathematics, pp. 156–161. World Academy of Science, Engineering and Technology (WASET), France (2010)
Leon, A.R., Soo, A., Williamson, T.: Classification with discrete and continuous variables via general mixed-data models. J. Appl. Stat. 38(5), 1021–1032 (2011)
Hamid, H., Mahat, N.I.: Using principal component analysis to extract mixed variables for smoothed location model. Far East J. Math. Sci. (FJMS) 80(1), 33–54 (2013)
Asparoukhov, O., Krzanowski, W.J.: Non-parametric smoothing of the location model in mixed variable discrimination. Stat. Comput. 10(4), 289–297 (2000)
Vlachonikolis, I.G., Marriott, F.H.C.: Discrimination with mixed binary and continuous data. Appl. Stat. 31(1), 23–31 (1982)
Krzanowski, W.J.: Stepwise location model choice in mixed-variable discrimination. Appl. Stat. 32(3), 260–266 (1983)
Chang, P.C., Afifi, A.A.: Classification based on dichotomous and continuous variables. J. Am. Stat. Assoc. 69(346), 336–339 (1974)
Moussa, M.A.: Discrimination and allocation using a mixture of discrete and continuous variables with some empty states. Comput. Programs Biomed. 12(2–3), 161–171 (1980)
Aitchison, J., Aitken, C.G.G.: Multivariate binary discrimination by Kernel method. Biometrika 63, 413–420 (1976)
Hall, P.: Optimal near neighbour estimator for use in discriminant analysis. Biometrika 68(2), 572–575 (1981)
Wang, X., Tang, X.: Experimental study on multiple LDA classifier combination for high dimensional data classification. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Proceedings of the 5th International Workshop on Multiple Classifier Systems, 9–11 June 2004, Cagliari, Italy, pp. 344–353. Springer, Heidelberg (2004)
Lukibisi, F.B., Lanyasunya, T.: Using principal component analysis to analyze mineral composition data. In: 12th Biennial KARI (Kenya Agricultural Research Institute) Scientific Conference on Socio Economics and Biometrics, pp. 1258–1268. Kenya Agricultural Research Institute, Kenya (2010)
Yu, H., Yang, J.: A Direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)
Das, K., Osechinskiy, S., Nenadic, Z.: A Classwise PCA-based recognition of neural data for brain-computer interfaces. In: Proceedings of the 29th IEEE Annual International Conference of Engineering in Medicine and Biology Society, pp. 6519–6522. IEEE Press, France (2007)
Katz, M.H.: Multivariate Analysis : A Practical Guide for Clinicians, 2nd edn. Cambridge University Press, Cambridge (2006)
Li, Q.: An integrated framework of feature selection and extraction for appearance-based recognition. Unpublished doctoral dissertation, University of Delaware Newark, USA (2006)
Ping, H.: Classification methods and applications to mass spectral data. Unpublished doctoral dissertation, Hong Kong Baptist University, Hong Kong (2005)
Young, P.D.: Dimension reduction and missing data in statistical discrimination. Doctoral dissertation, Baylor University, USA (2009)
Zhu, M.: Feature extraction and dimension reduction with applications to classification and analysis of co-occurrence data. Doctoral dissertation, Stanford University (2001)
LouisMarie, A.: Analysis of Multidimensional Poverty : Theory and Case Studies. Springer, New York (2009)
Guttman, L.: The quantification of a class of attributes : a theory and method of scale construction. In: Horst, P., Wallin, P., Guttman, L. (eds.) The Prediction of Personal Adjustment, pp. 319–348. Social Science Research Council, New York, NY (1941)
de Leeuw, J.: Here’s looking at multi-variables. In: Blasius, J., Greenacre, M.J. (eds.) Visualization of Categorical Data, pp. 1–11. Academic Press, San Diego (1998)
Meulman, J.J., van Der Kooij, A.J., Heiser, W.J.: Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In: Kaplan, D. (ed.) The SAGE Handbook of Quantitative Methodology for the Social Sciences, pp. 49–70. Sage, Thousand Oaks (2004)
van Buuren, S., de Leeuw, J.: Equality constraints in multiple correspondenc analysis. Multivar. Behav. Res. 27(4), 567–583 (1992)
Tenenhaus, M., Young, F.W.: An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data. Psychometrika 50(1), 91–119 (1985)
Benzécri, J.P.: L’analyse des Données : l’analyse des Correspondances [Data Analysis : Correspndence Analysis]. Dunod, Paris (1973)
Nishisato, S.: Analysis of Categorical Data : Dual Scaling and Its Applications. University of Toronto Press, Toronto (1980)
Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1984)
Lebart, L., Morineau, A., Warwick, K.M.: Multivariate Descriptive Statistical Analysis : Correspondence Analysis and Related Techniques for Large Matrices. Wiley, New York (1984)
Gifi, A.: Nonlinear Multivariate Analysis. Wiley, Chichester (1990)
D’Enza, A.I., Greenacre, M.J.: Multiple correspondence analysis for the quantification and visualization of large categorical data sets. In: Di Ciaccio, A., Coli, M., Ibaňez, J.M.A. (eds.) Advanced Statistical Methods for the Analysis of Large Data-Sets : Studies in Theoretical and Applied Statistics, pp. 453–463. Springer, Heidelberg (2012)
Glynn, D.: Correspondence analysis: exploring data and identifying patterns. In: Glynn, D., Robinson, J. (eds.) Polysemy and Synonymy: Corpus Methods and Applications in Cognitive Linguistics, pp. 133–179. John Benjamins, Amsterdam (2012)
Messaoud, R.B., Boussaid, O., Rabaséda, S.L.: A multiple correspondence analysis to organize data cubes. In: Vasilecas, O., Eder, J., Caplinskas, A. (eds.) Databases and Information Systems IV : Frontiers in Artificial Intelligence and Applications, pp. 133–146. IOS Press, Amsterdam (2007)
Hoffman, D.L., Franke, G.R.: Corresponding analysis: graphical representation of categorical data in marketing research. J. Mark. Res. 23(3), 213–227 (1986)
Beh, E.J.: Simple correspondence analysis: a bibliographic review. Int. Stat. Rev. 72(2), 257–284 (2004)
Hwang, H., Tomiuk, M.A., Takane, Y.: Correspondence analysis, multiple correspondence analysis and recent developments. In: Millsap, R.E., Maydeu-Olivares, A. (eds.) The SAGE Handbook of Quantitative Methods in Psychology, pp. 243–263. Sage, Thousand Oaks (2009)
Acknowledgment
Author would like to thank to Universiti Utara Malaysia, Malaysia for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hamid, H.b. (2017). A New Framework of Smoothed Location Model with Multiple Correspondence Analysis. In: Ahmad, AR., Kor, L., Ahmad, I., Idrus, Z. (eds) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015). Springer, Singapore. https://doi.org/10.1007/978-981-10-2772-7_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-2772-7_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2770-3
Online ISBN: 978-981-10-2772-7
eBook Packages: EducationEducation (R0)