Pairwise feature evaluation for constructing reduced representations

Harol, Artsiom; Lai, Carmen; Pękalska, Elżbieta; Duin, Robert P. W.

doi:10.1007/s10044-006-0050-x

Pairwise feature evaluation for constructing reduced representations

Theoretical Advances
Published: 19 October 2006

Volume 10, pages 55–68, (2007)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Artsiom Harol¹,
Carmen Lai¹,
Elżbieta Pękalska^1,2 &
…
Robert P. W. Duin¹

117 Accesses
17 Citations
Explore all metrics

Abstract

Feature selection methods are often used to determine a small set of informative features that guarantee good classification results. Such procedures usually consist of two components: a separability criterion and a selection strategy. The most basic choices for the latter are individual ranking, forward search and backward search. Many intermediate methods such as floating search are also available. The forward as well as backward selection may cause lossy evaluation of the criterion and/or overtraining of the final classifier in case of high-dimensional spaces and small sample size problems. Backward selection may also become computationally prohibitive. Individual ranking, on the other hand, suffers as it neglects dependencies between features. A new strategy based on a pairwise evaluation has recently been proposed by Bo and Jonassen (Genome Biol 3, 2002) and Pękalska et al. (International Conference on Computer Recognition Systems, Poland, pp 271–278, 2005). Since it considers interactions between features, but always restricted to two-dimensional spaces, it may circumvent the small sample size problem. In this paper, we evaluate this idea in a more general framework for the selection of features as well as prototypes. Our finding is that such a pairwise selection may improve over traditional procedures and we present some artificial and real-world examples to support this claim. Additionally, we have also discovered that the set of problems for which the pairwise selection may be effective is small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On some aspects of minimum redundancy maximum relevance feature selection

Article 24 December 2019

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Article 20 August 2018

Feature Ranking for Feature Sorting and Feature Selection, and Feature Sorting: FR4(FSoFS) $$\wedge $$ FSo

References

Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Bennett CH, Gacs P, Li M, Vitányi PMB, Zurek W (1998) Information distance. IEEE Trans Inf Theory IT-44(4):1407–1423
Article Google Scholar
Bo T, Jonassen I (2002) New feature subset selection procedures for classification of expression profiles. Genome Biol 3
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, California
MATH Google Scholar
Brodatz P (1996) Textures: a photographic album for artists and designers. Dover, New York
Google Scholar
Bunke H, Sanfeliu A (1990) Syntactic and structural pattern recognition theory and applications. World Scientific
Cover TM, van Campenhout JM (1977) On the possible ordering in the measurement selection problem. IEEE Trans Syst Man Cybern SMC-7(9):657–661
Google Scholar
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: International Conference on Machine Learning, pp 74–81
Dubuisson MP, Jain AK (1994) Modified Hausdorff distance for object matching. In: International Conference on Pattern Recognition, vol 1, pp 566–568
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Duin RPW, Juszczak P, de Ridder D, Paclík P, Pękalska E, Tax DMJ (2004) PR-Tools, Pattern Recognition Tools. http://www.prtools.org
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic, INC
Hall M (2000) Correlation-based feature selection for machine learning. Ph.D Thesis, University of Waikato
Jain AK, Zongker D (1997) Feature selection—evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Article Google Scholar
Jain AK, Zongker D (1997) Representation and recognition of handwritten digits using deformable templates. IEEE Trans Pattern Anal Mach Intell 19(12):1386–1391
Article Google Scholar
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37
Article Google Scholar
John GH, Kohavi R, Pfleger P (1994) Irrelevant features and the subset selection problem. In: Mahine learning: Proceedings of the Ninth International Conference. Morgan Kaufmann
Kohavi R (1995) The power of decision tables. In: Proceedings of the Eighth European Conference on Machine Learning ECML95, Lecture Notes in Artificial Intelligence, 914, pp 174–189. Springer, Berlin Heidelberg New York
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parametthe GA/KNN method. Bioinformatics 17:1131–1142
Article Google Scholar
Lozano M, Sotoca JM, Sanchez JS, Pla F, Pękalska E, Duin RPW (2006) Experimental study on prototype optimisation algorithms for dissimilarity based classifiers. Pattern Recognit 39(10):1827–1838
Article Google Scholar
Paclík P, Novovičová J, Somol P, Pudil P (2000) Road sign classification using Laplace Kernel classifier. Pattern Recognit Lett 21(13–14):1165–1173
Article Google Scholar
Pękalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition. Foundations and applications. World Scientific, Singapore
Google Scholar
Pękalska E, Harol A, Lai C, Duin RPW (2005) Pairwise selection of features and prototypes. In: International Conference on Computer Recognition Systems, Poland, pp 271–278
Pękalska E, Duin RPW, Paclík P (2002) A generalized Kernel approach to dissimilarity based classification. J Mach Learn Res 2(2):175–211
Article MathSciNet Google Scholar
Pękalska E, Duin RPW, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208
Article Google Scholar
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15:1119–1125
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Veltkamp RC, Hagedoorn M (2000) Shape similarity measures, properties, and constructions. Advances in visual information systems, pp 467–476
Wilson CL, Garris MD (1992) Handprinted character database 3. Technical Report, National Institute of Standards and Technology
Xing E, Jordan M, Karp R (2001) Feature selection for high-dimencional genomic microarray data. In: International Conference on Machine Learning, pp 601–608
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: International Conference on Machine Learning, Washington

Download references

Acknowledgments

This work is supported by the Dutch Organization for Scientific Research (NWO) and the Dutch Cancer Institute (NKI). The authors thank Prof. Anil Jain and Dr. Douglas Zongker for providing the Digit dissimilarity data and Dr. Pavel Paclík for providing the RoadSign dissimilarity data.

Author information

Authors and Affiliations

Information and Communication Theory group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Artsiom Harol, Carmen Lai, Elżbieta Pękalska & Robert P. W. Duin
School of Computer Science, University of Manchester, Manchester, UK
Elżbieta Pękalska

Authors

Artsiom Harol
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Lai
View author publications
You can also search for this author in PubMed Google Scholar
Elżbieta Pękalska
View author publications
You can also search for this author in PubMed Google Scholar
Robert P. W. Duin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artsiom Harol.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harol, A., Lai, C., Pękalska, E. et al. Pairwise feature evaluation for constructing reduced representations. Pattern Anal Applic 10, 55–68 (2007). https://doi.org/10.1007/s10044-006-0050-x

Download citation

Received: 28 November 2005
Accepted: 03 August 2006
Published: 19 October 2006
Issue Date: February 2007
DOI: https://doi.org/10.1007/s10044-006-0050-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pairwise feature evaluation for constructing reduced representations

Abstract

Access this article

Similar content being viewed by others

On some aspects of minimum redundancy maximum relevance feature selection

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Feature Ranking for Feature Sorting and Feature Selection, and Feature Sorting: FR4(FSoFS) $$\wedge $$ FSo

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pairwise feature evaluation for constructing reduced representations

Abstract

Access this article

Similar content being viewed by others

On some aspects of minimum redundancy maximum relevance feature selection

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Feature Ranking for Feature Sorting and Feature Selection, and Feature Sorting: FR4(FSoFS) $$\wedge $$ FSo

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation