For a classification problem that is implicitly represented by a training data set, analysis of data complexity provides a linkage between context and solution. Instead of directly optimizing classification accuracy by tuning the learning algorithms, one may seek changes in the data sources and feature transformations to simplify the data geometry. Simplified class geometry benefits learning in a way common to many methods. We review some early results in data complexity analysis, compare these to recent advances in manifold learning, and suggest directions for further research.


Data Complexity Locally Linear Embedding Manifold Learning Class Geometry Data Geometry 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: Telling Humans and Computers Apart. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003), CrossRefGoogle Scholar
  2. 2.
    Amazon, Mechanical Turk (2005),
  3. 3.
    Baird, H.S.: Complex Image Recognition and Web Security. In: [4], pp. 287–298Google Scholar
  4. 4.
    Basu, M., Ho, T.K. (eds.): Data Complexity in Pattern Recognition. Springer, London (2006)zbMATHGoogle Scholar
  5. 5.
    Bengio, Y., Paiement, J.-F., Vincent, P.: Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In: NIPS 2003, pp. 177–184 (2003)Google Scholar
  6. 6.
    Carlsson, G.: Topology and Data, Dept of Math., Stanford Univ., August 10 (preprint, 2008),
  7. 7.
    Cherkassky, V., Ma, Y.: Data Complexity, Margin-Based Learning, and Popper’s Philosophy of Inductive Learning. In: [4], pp. 91–114Google Scholar
  8. 8.
    Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. IEEE Trans. on Inf. Theory 13, 21–27 (1967)CrossRefzbMATHGoogle Scholar
  9. 9.
    Devroy, L.: Automatic Pattern Recognition: A Study of the Probability of Error. IEEE Trans. on Pat. Anal. and Mach. Intell. 10(4), 530–543 (1988)CrossRefGoogle Scholar
  10. 10.
    Duin, R.P.W., Pekalska, E.: Object Representation, Sample Size, and Data Set Complexity. In: [4], pp. 25–58Google Scholar
  11. 11.
    He, X., Cai, D., Niyogi, P.: Tensor Subspace Analysis, NIPS 2005 (2005)Google Scholar
  12. 12.
    Ho, T.K.: A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors. Pattern Analysis and Applications 5, 102–112 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Ho, T.K., Baird, H.S.: Large-Scale Simulation Studies in Image Pattern Recognition. IEEE Trans. on Pat. Anal. and Mach. Intell. 19, 1067–1079 (1997)CrossRefGoogle Scholar
  14. 14.
    Ho, T.K., Basu, M.: Complexity Measures of Supervised Classification Problems. IEEE Trans. on Pat. Anal. and Mach. Intell. 24(3), 289–300 (2002)CrossRefGoogle Scholar
  15. 15.
    Ho, T.K., Basu, M., Law, M.H.C.: Measures of Geometrical Complexity in Classification Problems. In: [4], pp. 3–23Google Scholar
  16. 16.
    Ho, T.K., Mansilla, E.B.: Classifier Domains of Competence in Data Complexity Space. In: [4], pp. 135–152Google Scholar
  17. 17.
    Li, X., Lin, S., Yan, S., Xu, D.: Discriminant Locally Linear Embedding with Higher-Order Tensor Data. IEEE Trans. on Sys., Man, and Cyb., Part B: Cyb. 38(2), 342–352 (2008)CrossRefGoogle Scholar
  18. 18.
    Macia, N., Mansilla, E.B., Orriols-Puig, A.: Preliminary Approach on Synthetic Data Sets Generation Based on Class Separability Measure. In: Proc. of the 19th Int’l. Conf. on Pat. Recog., Tampa, U.S.A, December 7-11 (2008)Google Scholar
  19. 19.
    Mansilla, E.B., Ho, T.K.: On Classifier Domains of Competence. In: Proc. of the 17th Int’l. Conf. on Pat. Recog., Cambridge, U.K, August 22-26, vol. 1, pp. 136–139 (2004)Google Scholar
  20. 20.
    Mansilla, E.B., Ho, T.K.: Domain of Competence of XCS Classifier System in Complexity Measurement Space. IEEE Trans. on Evol. Comp. 9(1), 82–104 (2005)CrossRefGoogle Scholar
  21. 21.
    Pranckeviciene, E., Ho, T.K., Somorjai, R.: Class Separability in Spaces Reduced By Feature Selection. In: Proc. of the 18th Int’l. Conf. on Pat. Recog., Hong Kong, China, August 20-24, vol. 2 (2006)Google Scholar
  22. 22.
    Netflix Prize (2006),
  23. 23.
    Raudys, S.: Measures of Data and Classifier Complexity and the Training Sample Size. In: [4], pp. 59–68Google Scholar
  24. 24.
    Raudys, S., Jain, A.K.: Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners. IEEE Trans. on Pat. Anal. and Mach. Intell. 13(3), 252–264 (1991)CrossRefGoogle Scholar
  25. 25.
    de Ridder, D., Kouropteva, O., Okun, O., Pietikainen, M., Duin, R.P.W.: Supervised Locally Linear Embedding. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 333–341. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  26. 26.
    Srivastava, A.: A Bayesian Approach to Geometric Subspace Estimation. IEEE Trans. Sig. Proc. 48(5), 1390–1400 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, Berlin (1982)zbMATHGoogle Scholar
  28. 28.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Tin Kam Ho
    • 1
  1. 1.Bell Labs, Alcatel-LucentUSA

Personalised recommendations