Pattern Analysis and Applications

, Volume 18, Issue 2, pp 247–261 | Cite as

Generalized multi-scale stacked sequential learning for multi-class classification

Theoretical Advances

Abstract

In many classification problems, neighbor data labels have inherent sequential relationships. Sequential learning algorithms take benefit of these relationships in order to improve generalization. In this paper, we revise the multi-scale sequential learning approach (MSSL) for applying it in the multi-class case (MMSSL). We introduce the error-correcting output codesframework in the MSSL classifiers and propose a formulation for calculating confidence maps from the margins of the base classifiers. In addition, we propose a MMSSL compression approach which reduces the number of features in the extended data set without a loss in performance. The proposed methods are tested on several databases, showing significant performance improvement compared to classical approaches.

Keywords

Stacked sequential learning Multi-scale Error-correct output codes (ECOC) Contextual classification 

Abbreviation List

X

Set of samples

Y

Set of labels

x

A sample

y

A label

h(x)

A classifier

\(y^{\prime}\)

A prediction from a classifier

\(y^{\prime\prime}\)

A final prediction from a chain of classifiers

xext

Extended set

J

Neighborhood relationship function

z

Neighborhood model features

ρ

Neighborhood

θ

Neighborhood parameterization

w

Number of elements in the neighborhood window

s

Number of scales

c

Set of different classes in a multi-class problem

\(\hat{F}(\mathbf{x}, c)\)

A prediction confidence map

N

Number of classes in a multi-class problem

n

Number of dichotomizers

σ

Parameter of a Gaussian filter

Set of scales defined by σ parameters

b

A dichotomizer

M

ECOC coding matrix

\({\mathcal{Y}}\)

A class codeword in ECOC framework

\({\mathcal{X}}\)

A sample prediction codeword in ECOC framework

mx

Margin for a prediction of sample x

β

Constant which governs transition in a sigmoidean function

t

Number of iterations in an ADABoost classifier

δ

A soft distance

α

Normalization parameter for soft distance δ

gσ

A multidimensional isotropic gaussian filter with zero mean and σ standard deviation

\({\mathcal{P}}\)

A set of partitions of classes

P

A partition of groups of classes

γ

A symbol in a partition codeword

\(\Upgamma\)

A partition codeword

R

The mean ranking for each system configurations

E

The total number of experiments

k

The total number of system configuration

\(\chi_{2}^{F}\)

Friedman statistic value

References

  1. 1.
    Allwein E, Schapire R, Singer Y (2002) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MathSciNetGoogle Scholar
  2. 2.
    Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286MATHGoogle Scholar
  3. 3.
    Dietterich TG (2002) Machine learning for sequential data: A Review. Proceedings on joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. In: Lecture notes in computer science, vol 2396, pp 15–30Google Scholar
  4. 4.
    Dietterich TG, Ashenfelter A, Bulatov Y (2004) Training conditional random fields via gradient tree boosting. In: Proceedings of the 21th ICML, pp 217–224Google Scholar
  5. 5.
    Nilsson NJ (1965) Learning Machines. McGraw-Hill, New YorkGoogle Scholar
  6. 6.
    Cohen WW, de Carvalho VR (2005) Stacked sequential learning. In: Proceedings of IJCAI 2005, pp 671–676Google Scholar
  7. 7.
    McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of ICML 2000, pp 591–598Google Scholar
  8. 8.
    Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting. Ann Stat 28:2MathSciNetGoogle Scholar
  9. 9.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259CrossRefMathSciNetGoogle Scholar
  10. 10.
    Lafferty JD, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp 282–289Google Scholar
  11. 11.
    Burt P, Adelson E (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 31(4):532–540CrossRefGoogle Scholar
  12. 12.
    Korč F, Förstner W (2009) eTRIMS Image Database for Interpreting Images of Man-Made Scenes, TR-IGG-P-2009-01, University of BonnGoogle Scholar
  13. 13.
    Boykov Y, Funka-Lea G (2006) Graph cuts and eN-D image segmentation. Int J Comput Vis 70(2):109–131CrossRefGoogle Scholar
  14. 14.
    Escalera S, Tax D, Pujol O, Radeva P, Duin R (2008) Subclass problem-dependent design of error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 30(6):1041–1054CrossRefGoogle Scholar
  15. 15.
    Mottl V, Dvoenko S, Kopylov A (2004) Pattern recognition in interrelated data: the problem, fundamental assumptions, recognition algorithms. In: Proceedingsof the 17th ICPR, Cambridge UK, vol 1, pp 188–191Google Scholar
  16. 16.
    Gatta C, Puertas E, Pujol O (2011) Multi-scale stacked sequential learning. Pattern Recognit 44(10–11):2414–2426CrossRefGoogle Scholar
  17. 17.
    Ciompi F et al (2011)A holistic approach for the detection of media-adventitia border in IVUS. In: Med Image Comput Comput Assist Interv. MICCAI’11 vol 14, 3rd edn, pp 411–419Google Scholar
  18. 18.
    Dalal N, Triggs B (2011) Histograms of oriented gradients for human detection. In: Proceedings of 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) vol 1, pp 886–893Google Scholar
  19. 19.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNetGoogle Scholar
  20. 20.
    Casale P, Pujol O, Radeva P (2011) Personalization and user verification in wearable systems using biometric walking patterns.Personal Ubiquitous Comput, pp 1–18Google Scholar
  21. 21.
    Escalera S, Pujol O, Radeva P (2010) On the decoding process in ternary error-correcting output codes. Trans Pattern Anal Mach Intell 32(1):120–134CrossRefGoogle Scholar
  22. 22.
    Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139CrossRefMathSciNetGoogle Scholar
  23. 23.
    Boykov Y, Kolmogorov V (2003) Computing geodesics and minimal surfaces via graph cuts. In: Proceedings Ninth IEEE international conference on computer vision, vol 1, pp 26–33, 13–16 Oct 2003Google Scholar
  24. 24.
    Bottou L, LeCun Y (2005) Graph transformer networks for image recognition. Bulletin of the International Statistical Institute (ISI), 55th SessionGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Dept. Matemàtica Aplicada i AnàlisiUniversitat de BarcelonaBarcelonaSpain
  2. 2.Computer Vision CenterBellaterraSpain

Personalised recommendations