Object Representation, Sample Size, and Data Set Complexity

Duin, Robert P. W.; Pękalska, Elżzbieta

doi:10.1007/978-1-84628-172-3_2

Object Representation, Sample Size, and Data Set Complexity

Robert P. W. Duin³ &
Elżzbieta Pękalska³

Chapter

1142 Accesses
6 Citations

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Summary

The complexity of a pattern recognition problem is determined by its representation. It is argued and illustrated by examples that the sampling density of a given data set and the resulting complexity of a learning problem are inherently connected. A number of criteria are constructed to judge this complexity for the chosen dissimilarity representation. Some nonlinear transformations of the original representation are also investigated to illustrate that such changes may affect the resulting complexity. If the initial sampling density is originally insufficient, this may result in a data set of a lower complexity and with a satisfactory sampling. On the other hand, if the number of samples is originally abundant, the representation may become more complex.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.G. Arkadev, E.M. Braverman. Computers, Pattern Recognition. Washington, DC: Thompson, 1966.
Google Scholar
C.L. Blake, C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998.
Google Scholar
P.S. Bradley, O.L. Mangasarian. Feature selection via concave minimization and support vector machines. In International Conference on Machine Learning, pages 82–90. San Francisco: Morgan Kaufmann, 1998.
Google Scholar
P.S. Bradley, O.L. Mangasarian, W.N. Street. Feature selection via mathematical programming. INFORMS Journal on Computing, 10, 209–217, 1998.
Article MathSciNet MATH Google Scholar
F. Corpet, F. Servant, J. Gouzy, D. Kahn. Prodom and prodom-cg: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Research, 28, 267–269, 2000.
Article Google Scholar
P.A. Devijver, J. Kittler. Pattern Recognition, A Statistical Approach. Englewood Cliffs, NJ: Prentice Hall, 1982.
MATH Google Scholar
M.P. Dubuisson, A.K. Jain. Modified Hausdorff distance for object matching. In International Conference on Pattern Recognition, volume 1, pages 566–568, 1994.
Google Scholar
R.P.W. Duin. Compactness and complexity of pattern recognition problems. In International Symposium on Pattern Recognition ‘In Memoriam Pierre Devijver’, pages 124–128. Brussels: Royal Military Academy, 1999.
Google Scholar
R.P.W. Duin, E. Pękalska. Complexity of dissimilarity based pattern classes. In Scandinavian Conference on Image Analysis. pages 663–670, Bergen, Norway, 2001.
Google Scholar
R.P.W. Duin, D.M.J. Tax. Classifier conditional posterior probabilities. In A. Amin, D. Dori, P. Pudil, H. Freeman, eds. Advances in Pattern Recognition, LNCS, volume 1451, pages 611–619. New York: Springer Verlag, 1998.
Chapter Google Scholar
R.P.W. Duin, D.M.J. Tax. Combining support vector and mathematical programming methods for classification. In B. Schoelkopf, C. Burges, A. Smola, eds. Advances in Kernel Methods — Support Vector Machines, pages 307–326. Cambridge, MA: MIT Press, 1999.
Google Scholar
L. Goldfarb. A new approach to pattern recognition. In L.N. Kanal, A. Rosenfeld, eds. Progress in Pattern Recognition, volume 2, pages 241–402. Amsterdam: Elsevier Science Publishers BV, 1985.
Google Scholar
J.C. Gower. A general coefficient of similarity and some of its properties. Biometrics, 27, 25–33, 1971.
Article Google Scholar
T. Graepel, B. Schölkopf, et al. Classification on proximity data with LP-machines. In International Conference on Artificial Neural Networks, pages 304–309, 1999.
Google Scholar
T.K. Ho, M. Basu. Measuring the complexity of classification problems. In International Conference on Pattern Recognition, volume 2, pages 43–47, Barcelona, Spain, 2000.
Google Scholar
T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 289–300, 2002.
Article Google Scholar
D. Hofstadter. Gödel, Escher, Bach — an Eternal Golden Braid. New York: Basic Books, 1979.
Google Scholar
A.K. Jain, B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. In P.R. Krishnaiah, L.N. Kanal, eds. Handbook of Statistics, volume 2, pages 835–855. Amsterdam: North-Holland, 1987.
Google Scholar
A.K. Jain, D. Zongker. Representation and recognition of handwritten digits using deformable templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1386–1391, 1997.
Article Google Scholar
A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Jornal of Molecular Biology, 247, 536–540, 1995.
Article Google Scholar
E. Pękalska. Dissimilarity representations in pattern recognition. Concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft, The Netherlands, January 2005.
Google Scholar
E. Pękalska, R.P.W. Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8), 943–956, 2002.
Article MATH Google Scholar
E. Pękalska, R.P.W. Duin. On not making dissimilarities euclidean. In T. Caelli, A. Amin, R.P.W. Duin, M. Kamel, de D. Ridder, eds. Joint IAPR International Workshops on SSPR and SPR, LNCS, pages 1143–1151. New York: Springer-Verlag, 2004.
Google Scholar
E. Pękalska, R.P.W. Duin, and P. Paclík. Prototype selection for dissimilarity based classifiers. Pattern Recognition, 39(2), 189–208, 2006.
Article MATH Google Scholar
E. Pękalska, P. Paclík, R.P.W. Duin. A generalized kernel approach to dissimilarity based classification. Journal of Machine Learning Research, 2, 175–211, 2001.
Article Google Scholar
E. Pękalska, D.M.J. Tax, R.P.W. Duin. One-class LP classifier for dissimilarity representations. In S. Thrun S. Becker, K. Obermayer, eds. Advances in Neural Information Processing Systems 15, pages 761–768. Cambridge, MA: MIT Press, 2003.
Google Scholar
V. Roth, J. Laub, J.M. Buhmann, K.-R. Müller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems, pages 841–856. Cambridge, MA: MIT Press, 2003.
Google Scholar
M. Skurichina, R.P.W. Duin. Combining different normalizations in lesion diagnostics. In O. Kaynak, E. Alpaydin, E. Oja, L. Xu, eds. Artificial Neural Networks and Information Processing, Supplementary Proceedings ICANN/ICONIP, pages 227–230, Istanbul, Turkey, 2003.
Google Scholar
D.C.G. de Veld, M. Skurichina, M.J.H. Witjes, et al. Autofluorescence characteristics of healthy oral mucosa at different anatomical sites. Lasers in Surgery and Medicine, 23, 367–376, 2003.
Article Google Scholar
M.M. Waldrop. Complexity, the Emerging Science at the Edge of Order and Chaos. New York: Simon & Schuster, 1992.
Google Scholar
C.L. Wilson, M.D. Garris. Handprinted character database 3. Technical report, National Institute of Standards and Technology, February 1992.
Google Scholar
D. Wolpert. The Mathematics of Generalization. New York: Addison-Wesley, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Information & Communication Theory Group, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Mekelweg 4, 2628, CD Delft, The Netherlands
Robert P. W. Duin & Elżzbieta Pękalska

Authors

Robert P. W. Duin
View author publications
You can also search for this author in PubMed Google Scholar
Elżzbieta Pękalska
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, City College, City University of New York, USA
Mitra Basu PhD
Bell Laboratories, Lucent Technologies, New Jersey, USA
Tin Kam Ho BBA, MS, PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Duin, R.P.W., Pękalska, E. (2006). Object Representation, Sample Size, and Data Set Complexity. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-84628-172-3_2
Publisher Name: Springer, London
Print ISBN: 978-1-84628-171-6
Online ISBN: 978-1-84628-172-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics