Summary
The complexity of a pattern recognition problem is determined by its representation. It is argued and illustrated by examples that the sampling density of a given data set and the resulting complexity of a learning problem are inherently connected. A number of criteria are constructed to judge this complexity for the chosen dissimilarity representation. Some nonlinear transformations of the original representation are also investigated to illustrate that such changes may affect the resulting complexity. If the initial sampling density is originally insufficient, this may result in a data set of a lower complexity and with a satisfactory sampling. On the other hand, if the number of samples is originally abundant, the representation may become more complex.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A.G. Arkadev, E.M. Braverman. Computers, Pattern Recognition. Washington, DC: Thompson, 1966.
C.L. Blake, C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998.
P.S. Bradley, O.L. Mangasarian. Feature selection via concave minimization and support vector machines. In International Conference on Machine Learning, pages 82–90. San Francisco: Morgan Kaufmann, 1998.
P.S. Bradley, O.L. Mangasarian, W.N. Street. Feature selection via mathematical programming. INFORMS Journal on Computing, 10, 209–217, 1998.
F. Corpet, F. Servant, J. Gouzy, D. Kahn. Prodom and prodom-cg: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Research, 28, 267–269, 2000.
P.A. Devijver, J. Kittler. Pattern Recognition, A Statistical Approach. Englewood Cliffs, NJ: Prentice Hall, 1982.
M.P. Dubuisson, A.K. Jain. Modified Hausdorff distance for object matching. In International Conference on Pattern Recognition, volume 1, pages 566–568, 1994.
R.P.W. Duin. Compactness and complexity of pattern recognition problems. In International Symposium on Pattern Recognition ‘In Memoriam Pierre Devijver’, pages 124–128. Brussels: Royal Military Academy, 1999.
R.P.W. Duin, E. Pękalska. Complexity of dissimilarity based pattern classes. In Scandinavian Conference on Image Analysis. pages 663–670, Bergen, Norway, 2001.
R.P.W. Duin, D.M.J. Tax. Classifier conditional posterior probabilities. In A. Amin, D. Dori, P. Pudil, H. Freeman, eds. Advances in Pattern Recognition, LNCS, volume 1451, pages 611–619. New York: Springer Verlag, 1998.
R.P.W. Duin, D.M.J. Tax. Combining support vector and mathematical programming methods for classification. In B. Schoelkopf, C. Burges, A. Smola, eds. Advances in Kernel Methods — Support Vector Machines, pages 307–326. Cambridge, MA: MIT Press, 1999.
L. Goldfarb. A new approach to pattern recognition. In L.N. Kanal, A. Rosenfeld, eds. Progress in Pattern Recognition, volume 2, pages 241–402. Amsterdam: Elsevier Science Publishers BV, 1985.
J.C. Gower. A general coefficient of similarity and some of its properties. Biometrics, 27, 25–33, 1971.
T. Graepel, B. Schölkopf, et al. Classification on proximity data with LP-machines. In International Conference on Artificial Neural Networks, pages 304–309, 1999.
T.K. Ho, M. Basu. Measuring the complexity of classification problems. In International Conference on Pattern Recognition, volume 2, pages 43–47, Barcelona, Spain, 2000.
T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 289–300, 2002.
D. Hofstadter. Gödel, Escher, Bach — an Eternal Golden Braid. New York: Basic Books, 1979.
A.K. Jain, B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. In P.R. Krishnaiah, L.N. Kanal, eds. Handbook of Statistics, volume 2, pages 835–855. Amsterdam: North-Holland, 1987.
A.K. Jain, D. Zongker. Representation and recognition of handwritten digits using deformable templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1386–1391, 1997.
A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Jornal of Molecular Biology, 247, 536–540, 1995.
E. Pękalska. Dissimilarity representations in pattern recognition. Concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft, The Netherlands, January 2005.
E. Pękalska, R.P.W. Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8), 943–956, 2002.
E. Pękalska, R.P.W. Duin. On not making dissimilarities euclidean. In T. Caelli, A. Amin, R.P.W. Duin, M. Kamel, de D. Ridder, eds. Joint IAPR International Workshops on SSPR and SPR, LNCS, pages 1143–1151. New York: Springer-Verlag, 2004.
E. Pękalska, R.P.W. Duin, and P. Paclík. Prototype selection for dissimilarity based classifiers. Pattern Recognition, 39(2), 189–208, 2006.
E. Pękalska, P. Paclík, R.P.W. Duin. A generalized kernel approach to dissimilarity based classification. Journal of Machine Learning Research, 2, 175–211, 2001.
E. Pękalska, D.M.J. Tax, R.P.W. Duin. One-class LP classifier for dissimilarity representations. In S. Thrun S. Becker, K. Obermayer, eds. Advances in Neural Information Processing Systems 15, pages 761–768. Cambridge, MA: MIT Press, 2003.
V. Roth, J. Laub, J.M. Buhmann, K.-R. Müller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems, pages 841–856. Cambridge, MA: MIT Press, 2003.
M. Skurichina, R.P.W. Duin. Combining different normalizations in lesion diagnostics. In O. Kaynak, E. Alpaydin, E. Oja, L. Xu, eds. Artificial Neural Networks and Information Processing, Supplementary Proceedings ICANN/ICONIP, pages 227–230, Istanbul, Turkey, 2003.
D.C.G. de Veld, M. Skurichina, M.J.H. Witjes, et al. Autofluorescence characteristics of healthy oral mucosa at different anatomical sites. Lasers in Surgery and Medicine, 23, 367–376, 2003.
M.M. Waldrop. Complexity, the Emerging Science at the Edge of Order and Chaos. New York: Simon & Schuster, 1992.
C.L. Wilson, M.D. Garris. Handprinted character database 3. Technical report, National Institute of Standards and Technology, February 1992.
D. Wolpert. The Mathematics of Generalization. New York: Addison-Wesley, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Verlag London Limited
About this chapter
Cite this chapter
Duin, R.P.W., Pękalska, E. (2006). Object Representation, Sample Size, and Data Set Complexity. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-84628-172-3_2
Publisher Name: Springer, London
Print ISBN: 978-1-84628-171-6
Online ISBN: 978-1-84628-172-3
eBook Packages: Computer ScienceComputer Science (R0)