The Foundational Theory of Optimal Bayesian Pairwise Linear Classifiers

  • Luis Rueda
  • B. John Oommen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1876)


When dealing with normally distributed classes, it is well known that the optimal discriminant function for two-classes is linear when the covariance matrices are equal. In this paper, we determine conditions for the optimal linear classifier when the covariance matrices are non-equal. In all the cases discussed here, the classifier is given by a pair of straight lines which is a particular case of the general equation of second degree. One of these cases is when we have two overlapping classes with equal means, which is a general case of the Minsky’s Paradox for the Perceptron. Our results, which to our knowledge are the pioneering results for pairwise linear classifiers, yield a general linear classifier for this particular case, which can be obtained directly from the parameters of the distribution. Numerous other analytic results for two and d-dimensional normal vectors have been derived. Finally, we have also provided some empirical results in all the cases, and demonstrated that these linear classifiers achieve very good performance.


Discriminant Function Covariance Matrice Positive Real Number Back Propagation Neural Network Linear Discriminant Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. Aladjem., Linear Discriminant Analysis for Two Classes Via Removal of Classification Structure. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(2):187–192, 1997.CrossRefGoogle Scholar
  2. 2.
    P. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice-Hall, 1982.Google Scholar
  3. 3.
    R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., 1973.Google Scholar
  4. 4.
    K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.Google Scholar
  5. 5.
    K. Fukunaga. Statistical Pattern Recognition. Handbook of Pattern Recognition and Computer Vision, pages 33–60, 1993.Google Scholar
  6. 6.
    T. M. Ha. Optimum Decision Rules in Pattern Recognition. Advances in Pattern Recognition, SSPR’98-SPR’98, pages 726–735, 1998.Google Scholar
  7. 7.
    M. Minsky. Perceptrons. MIT Press, 2nd edition, 1988.Google Scholar
  8. 8.
    A. Rao, D. Miller, K. Rose, and A. Gersho. A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models. IEEE Trans. Pattern Analysis and Machine Intelligence, 21(2):159–173, 1999.CrossRefGoogle Scholar
  9. 9.
    S. Raudys. Linear Classifiers in Perceptron Design. Proc. 13th ICPR, Track D, Wien, 1996.Google Scholar
  10. 10.
    S. Raudys. On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(6):667–671, 1997.CrossRefGoogle Scholar
  11. 11.
    B. Ripley. Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.Google Scholar
  12. 12.
    L. Rueda and B.J. Oommen. On Optimal Pairwise Linear Classifiers. Unabridged version of this paper. Submitted for Publication.Google Scholar
  13. 13.
    R. Schalkoff. Pattern Recognition: Statistical, Structural and Neural Approaches. John Wiley and Sons, Inc., 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Luis Rueda
    • 1
  • B. John Oommen
    • 2
  1. 1.School of Computer ScienceCarleton UniversityOttawaCanada
  2. 2.IEEEUSA

Personalised recommendations