Skip to main content

Object Representation, Sample Size, and Data Set Complexity

  • Chapter

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Summary

The complexity of a pattern recognition problem is determined by its representation. It is argued and illustrated by examples that the sampling density of a given data set and the resulting complexity of a learning problem are inherently connected. A number of criteria are constructed to judge this complexity for the chosen dissimilarity representation. Some nonlinear transformations of the original representation are also investigated to illustrate that such changes may affect the resulting complexity. If the initial sampling density is originally insufficient, this may result in a data set of a lower complexity and with a satisfactory sampling. On the other hand, if the number of samples is originally abundant, the representation may become more complex.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.G. Arkadev, E.M. Braverman. Computers, Pattern Recognition. Washington, DC: Thompson, 1966.

    Google Scholar 

  2. C.L. Blake, C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998.

    Google Scholar 

  3. P.S. Bradley, O.L. Mangasarian. Feature selection via concave minimization and support vector machines. In International Conference on Machine Learning, pages 82–90. San Francisco: Morgan Kaufmann, 1998.

    Google Scholar 

  4. P.S. Bradley, O.L. Mangasarian, W.N. Street. Feature selection via mathematical programming. INFORMS Journal on Computing, 10, 209–217, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  5. F. Corpet, F. Servant, J. Gouzy, D. Kahn. Prodom and prodom-cg: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Research, 28, 267–269, 2000.

    Article  Google Scholar 

  6. P.A. Devijver, J. Kittler. Pattern Recognition, A Statistical Approach. Englewood Cliffs, NJ: Prentice Hall, 1982.

    MATH  Google Scholar 

  7. M.P. Dubuisson, A.K. Jain. Modified Hausdorff distance for object matching. In International Conference on Pattern Recognition, volume 1, pages 566–568, 1994.

    Google Scholar 

  8. R.P.W. Duin. Compactness and complexity of pattern recognition problems. In International Symposium on Pattern Recognition ‘In Memoriam Pierre Devijver’, pages 124–128. Brussels: Royal Military Academy, 1999.

    Google Scholar 

  9. R.P.W. Duin, E. Pękalska. Complexity of dissimilarity based pattern classes. In Scandinavian Conference on Image Analysis. pages 663–670, Bergen, Norway, 2001.

    Google Scholar 

  10. R.P.W. Duin, D.M.J. Tax. Classifier conditional posterior probabilities. In A. Amin, D. Dori, P. Pudil, H. Freeman, eds. Advances in Pattern Recognition, LNCS, volume 1451, pages 611–619. New York: Springer Verlag, 1998.

    Chapter  Google Scholar 

  11. R.P.W. Duin, D.M.J. Tax. Combining support vector and mathematical programming methods for classification. In B. Schoelkopf, C. Burges, A. Smola, eds. Advances in Kernel Methods — Support Vector Machines, pages 307–326. Cambridge, MA: MIT Press, 1999.

    Google Scholar 

  12. L. Goldfarb. A new approach to pattern recognition. In L.N. Kanal, A. Rosenfeld, eds. Progress in Pattern Recognition, volume 2, pages 241–402. Amsterdam: Elsevier Science Publishers BV, 1985.

    Google Scholar 

  13. J.C. Gower. A general coefficient of similarity and some of its properties. Biometrics, 27, 25–33, 1971.

    Article  Google Scholar 

  14. T. Graepel, B. Schölkopf, et al. Classification on proximity data with LP-machines. In International Conference on Artificial Neural Networks, pages 304–309, 1999.

    Google Scholar 

  15. T.K. Ho, M. Basu. Measuring the complexity of classification problems. In International Conference on Pattern Recognition, volume 2, pages 43–47, Barcelona, Spain, 2000.

    Google Scholar 

  16. T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 289–300, 2002.

    Article  Google Scholar 

  17. D. Hofstadter. Gödel, Escher, Bach — an Eternal Golden Braid. New York: Basic Books, 1979.

    Google Scholar 

  18. A.K. Jain, B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. In P.R. Krishnaiah, L.N. Kanal, eds. Handbook of Statistics, volume 2, pages 835–855. Amsterdam: North-Holland, 1987.

    Google Scholar 

  19. A.K. Jain, D. Zongker. Representation and recognition of handwritten digits using deformable templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1386–1391, 1997.

    Article  Google Scholar 

  20. A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Jornal of Molecular Biology, 247, 536–540, 1995.

    Article  Google Scholar 

  21. E. Pękalska. Dissimilarity representations in pattern recognition. Concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft, The Netherlands, January 2005.

    Google Scholar 

  22. E. Pękalska, R.P.W. Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8), 943–956, 2002.

    Article  MATH  Google Scholar 

  23. E. Pękalska, R.P.W. Duin. On not making dissimilarities euclidean. In T. Caelli, A. Amin, R.P.W. Duin, M. Kamel, de D. Ridder, eds. Joint IAPR International Workshops on SSPR and SPR, LNCS, pages 1143–1151. New York: Springer-Verlag, 2004.

    Google Scholar 

  24. E. Pękalska, R.P.W. Duin, and P. Paclík. Prototype selection for dissimilarity based classifiers. Pattern Recognition, 39(2), 189–208, 2006.

    Article  MATH  Google Scholar 

  25. E. Pękalska, P. Paclík, R.P.W. Duin. A generalized kernel approach to dissimilarity based classification. Journal of Machine Learning Research, 2, 175–211, 2001.

    Article  Google Scholar 

  26. E. Pękalska, D.M.J. Tax, R.P.W. Duin. One-class LP classifier for dissimilarity representations. In S. Thrun S. Becker, K. Obermayer, eds. Advances in Neural Information Processing Systems 15, pages 761–768. Cambridge, MA: MIT Press, 2003.

    Google Scholar 

  27. V. Roth, J. Laub, J.M. Buhmann, K.-R. Müller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems, pages 841–856. Cambridge, MA: MIT Press, 2003.

    Google Scholar 

  28. M. Skurichina, R.P.W. Duin. Combining different normalizations in lesion diagnostics. In O. Kaynak, E. Alpaydin, E. Oja, L. Xu, eds. Artificial Neural Networks and Information Processing, Supplementary Proceedings ICANN/ICONIP, pages 227–230, Istanbul, Turkey, 2003.

    Google Scholar 

  29. D.C.G. de Veld, M. Skurichina, M.J.H. Witjes, et al. Autofluorescence characteristics of healthy oral mucosa at different anatomical sites. Lasers in Surgery and Medicine, 23, 367–376, 2003.

    Article  Google Scholar 

  30. M.M. Waldrop. Complexity, the Emerging Science at the Edge of Order and Chaos. New York: Simon & Schuster, 1992.

    Google Scholar 

  31. C.L. Wilson, M.D. Garris. Handprinted character database 3. Technical report, National Institute of Standards and Technology, February 1992.

    Google Scholar 

  32. D. Wolpert. The Mathematics of Generalization. New York: Addison-Wesley, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Verlag London Limited

About this chapter

Cite this chapter

Duin, R.P.W., Pękalska, E. (2006). Object Representation, Sample Size, and Data Set Complexity. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-172-3_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-171-6

  • Online ISBN: 978-1-84628-172-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics