Advertisement

Classification, Clustering, and Visualisation Based on Dual Scaling

Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

In practice, the statistician is often faced with data already available. In addition, there are often mixed data. The statistician must now try to gain optimal statistical conclusions with the most sophisticated methods. But, are the variables scaled optimally? And, what about missing data? Without loss of generality here we restrict to binary classification/clustering. A very simple but general approach is outlined that is applicable to such data for both classification and clustering, based on data preparation (i.e., a down-grading step such as binning for each quantitative variable) followed by dual scaling (the up-grading step: scoring). As a byproduct, the quantitative scores can be used for multivariate visualisation of both data and classes/clusters. For illustrative purposes, a real data application to optical character recognition (OCR) is considered throughout the paper. Moreover, the proposed approach will be compared with other multivariate methods such as the simple Bayesian classifier.

References

  1. Berry MW, Browne M (eds) (2006) Lecture notes in data mining. World Scientific, SingaporeMATHGoogle Scholar
  2. Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml
  3. Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic, LondonMATHGoogle Scholar
  4. Kauderer H, Mucha HJ (1998) Supervised learning with qualitative and mixed attributes. In: Balderjahn I, Mathar R, Schader M (eds) Classification, data analysis, and data highways. Springer, Berlin, pp 374–382CrossRefGoogle Scholar
  5. Mucha HJ (2002) An intelligent clustering technique based on dual scaling. In: Nishisato S, Baba Y, Bozdogan H, Kanefuji K (eds) Measurement and multivariate analysis. Springer, Tokyo, pp 37–46CrossRefGoogle Scholar
  6. Mucha HJ (2009) ClusCorr98 for Excel 2007: clustering, multivariate visualization, and validation. In: Mucha HJ, Ritter G (eds) Classification and clustering: models, software and applications. Report 26, WIAS, Berlin, pp 14–40Google Scholar
  7. Mucha HJ, Siegmund-Schultze R, Dübon K (1998) Adaptive cluster analysis techniques – software and applications. In: Hayashi C, Ohsumi N, Yajima K, Tanaka Y, Bock HH, Baba Y (eds) Data science, classification and related methods. Springer, Tokyo, pp 231–238CrossRefGoogle Scholar
  8. Nishisato S (1980) Analysis of categorical data: dual scaling and its applications. University of Toronto Press, TorontoMATHGoogle Scholar
  9. Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum Associates, HillsdaleGoogle Scholar
  10. Parvez MT, Mahmoud SA (2013) Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit 46(1):141–154CrossRefGoogle Scholar
  11. Pölz W (1995) Optimal scaling for ordered categories. Comput Stat 10:37–41MATHGoogle Scholar
  12. Pölz W (1996) Überprüfung und Erhöhung der Diskriminierfähigkeit von Skalen. In: Mucha HJ, Bock HH (eds) Classification and multivariate graphics: models, software and applications. Report 10, WIAS, Berlin, pp 51–55Google Scholar
  13. Vamvakas G, Gatos B, Perantonis SJ (2010) Handwritten character recognition through two-stage foreground sub-sampling. Pattern Recognit 43(8):2807–2816CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Weierstrass Institute for Applied Analysis and Stochastics (WIAS)BerlinGermany

Personalised recommendations