An Introduction to Pattern Classification

  • Elad Yom-Tov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3176)

Abstract

Pattern classification is the field devoted to the study of methods designed to categorize data into distinct classes. This categorization can be either distinct labeling of the data (supervised learning), division of the data into classes (unsupervised learning), selection of the most significant features of the data (feature selection), or a combination of more than one of these tasks.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on information theory IT-37(4), 1034–1054 (1991)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. Journal of Machine Learning Research 2, 125–137 (2001)MATHGoogle Scholar
  3. 3.
    Bezdek, J.C.: Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Applied mathematics center, Ithaka, NY (1973)Google Scholar
  4. 4.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Boser, B.E., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th annual ACM workshop on computational learning theory, Pittsburgh, PA, USA, pp. 144–152. ACM Press, New York (1992)Google Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Chapman and Hall, New York (1993)MATHGoogle Scholar
  7. 7.
    Burges, C.J.C., Schölkopf, B.: Improving the accuracy and speed of support vector machines. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 375. The MIT Press, Cambridge (1997)Google Scholar
  8. 8.
    Cardoso, J.-F.: Blind signal separation: Statistical principles. Proceedings of the IEEE 9(10), 2009–2025 (1998)CrossRefGoogle Scholar
  9. 9.
    Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image retrieval. In: Proceedings of the international conference on image processing, vol. 1, pp. 34–37. IEEE, Los Alamitos (2001)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support vector networks. Machine learning 20, 273–297 (1995)MATHGoogle Scholar
  11. 11.
    Cristianini, N., Shawe-Taylor, J., Kandola, J.: Spectral kernel methods for clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 649–655. The MIT Press, Cambridge (2002)Google Scholar
  12. 12.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the em algorithm (with discussion). Journal of the royal statistical society, Series B 39, 1–38 (1977)MathSciNetMATHGoogle Scholar
  13. 13.
    Dietrich, T.G., Bakiri, G.: Solving multi-class learning problems via error-correcting output codes. Journal of artificial intelligence research 2, 263–286 (1995)MATHGoogle Scholar
  14. 14.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  15. 15.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley and Sons, Inc., New-York (2001)MATHGoogle Scholar
  16. 16.
    Duin, R.P.W., Roli, F., de Ridder, D.: A note on core research issues for statistical pattern recognition. Pattern recognition letters 23, 493–499 (2002)CrossRefMATHGoogle Scholar
  17. 17.
    Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. Technion CCIT Report number 446, Technion, Haifa, Israel (2003)Google Scholar
  18. 18.
    Fahlman, S.E.: Faster-learning variations on back-propagation: An empirical study. In: Sejnowski, T.J., Hinton, G.E., Touretzky, D.S. (eds.) Connectionist Models Summer School, San Mateo, CA, USA. Morgan Kaufmann, San Francisco (1988)Google Scholar
  19. 19.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1995)CrossRefMATHGoogle Scholar
  20. 20.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine learning 46(1-3), 389–422 (2002)CrossRefMATHGoogle Scholar
  21. 21.
    Hastie, T.J., Tibshirani, R.J.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10. The MIT Press, Cambridge (1998)Google Scholar
  22. 22.
    Haykin, S.: Neural Networks: A comprehensive foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)MATHGoogle Scholar
  23. 23.
    Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on pattern analysis and machine intelligence 22(1), 4–37 (1999)CrossRefGoogle Scholar
  24. 24.
    Kohonen, T.: Self-organization and associative memory. Biological Cybernetics 43(1), 59–69 (1982)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 284–292. Morgan Kaufmann, San Francisco (1996)Google Scholar
  26. 26.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  27. 27.
    Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of speech and natural language workshop, pp. 212–217. Morgan Kaufmann, San Francisco (1992)CrossRefGoogle Scholar
  28. 28.
    Lloyd, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory IT-2, 129–137 (1982)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    MacKay, D.J.: Bayesian model comparison and backprop nets. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Neural Networks for Signal Processing, San Mateo, CA, USA, vol. 4, pp. 839–846. Morgan Kaufmann, San Francisco (1992)Google Scholar
  30. 30.
    Mehta, M., Rissanen, J., Agrawal, R.: Mdl-based decision tree pruning. In: Proceedings of the first international conference on knowledge discovery and data mining, pp. 216–221 (1995)Google Scholar
  31. 31.
    Meir, R., El-Yaniv, R., Ben-David, S.: Localized boosting. In: Proceedings of the 13th Annual Conference on Computer Learning Theory, pp. 190–199. Morgan Kaufmann, San Francisco (2000)Google Scholar
  32. 32.
    Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.-R.: Fisher discriminant analysis with kernels. In: Hu, Y.-H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing, vol. IX, pp. 41–48. IEEE, Los Alamitos (1999)Google Scholar
  33. 33.
    Mika, S., Schölkopf, B., Smola, A.J., Müller, K.-R., Scholz, M., Rätsch, G.: Kernel pca and de–noising in feature spaces. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems, vol. 11. MIT Press, Cambridge (1999)Google Scholar
  34. 34.
    Moya, M.R., Koch, M.W., Hostetler, L.D.: One-class classifier networks for target recognition applications. In: Proceedings of the world congress on neural networks, Portland, OR, USA. International neural networks society (1993)Google Scholar
  35. 35.
    Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms, ieee transactions on neural networks. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)CrossRefGoogle Scholar
  36. 36.
    Muller, M.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6, 525–533 (1993)CrossRefGoogle Scholar
  37. 37.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. The MIT Press, Cambridge (2002)Google Scholar
  38. 38.
    Nisenson, M., Yariv, I., El-Yaniv, R., Meir, R.: Towards behaviometric security systems: Learning to identify a typist. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 363–374. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  39. 39.
    Parzen, E.: On estimation of a probability density function and mode. Annals of mathematical statistics 33(3), 1065–1076 (1962)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in kernel methods - Support vector learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  41. 41.
    Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern recognition letters 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  42. 42.
    Quinlan, J.R.: Learning efficient classification procedures and their application to chess end games. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine learning: An artificial intelligence approach, pp. 463–482. Morgan Kaufmann, San Francisco (1983)Google Scholar
  43. 43.
    Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  44. 44.
    Rumelhart, D.E., Zipser, D.: Feature discovery by competitive learning. Parallel Distributed Processing, 151–193 (1986)Google Scholar
  45. 45.
    Schölkopf, B., Platt, J., Share-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. TR87, Microsoft Research, Redmond, WA, USA (1999)Google Scholar
  46. 46.
    Schölkopf, B., Smola, A.J.: Leaning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge (2002)Google Scholar
  47. 47.
    Tax, D.M.J., Duin, R.P.W.: Data domain description by support vectors. In: Verleysen, M. (ed.) Proceedings of the European symposium on artificial neural networks, Brussel, pp. 251–256 (1999)Google Scholar
  48. 48.
    Tax, D.M.J., Duin, R.P.W.: Combining one-class classifiers. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, p. 299. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  49. 49.
    Tipping, M.: The relevance vector machine. Journal of machine learning research 1, 211–244 (2001)MathSciNetMATHGoogle Scholar
  50. 50.
    Trunk, G.V.: A problem of dimensionality: A simple example. IEEE Transactions on pattern analysis and machine intelligence 1(3), 306–307 (1979)CrossRefGoogle Scholar
  51. 51.
    Turing, A.M.: Intelligent machinery. In: Ince, D.C. (ed.) Collected works of A.M. Turing: Mechanical Intelligence. Elsevier Science Publishers, Amsterdam (1992)Google Scholar
  52. 52.
    Vapnik, V.N.: Personal communication (2003)Google Scholar
  53. 53.
    Watanabe, W.: Pattern recognition: Human and mechanical. Wiley, Chichester (1985)Google Scholar
  54. 54.
    Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. Journal of machine learning research 3, 1439–1461 (2003)MathSciNetMATHGoogle Scholar
  55. 55.
    Whitley, D.: A genetic algorithm tutorial. Statistics and Computing 4(2), 65–85 (1994)CrossRefGoogle Scholar
  56. 56.
    Yom-Tov, E., Inbar, G.F.: Feature selection for the classification of movements from single movement-related potentials. IEEE Transactions on Neural Systems and Rehabilitation Engineering 10(3), 170–177 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Elad Yom-Tov
    • 1
  1. 1.IBM Haifa Research LabsHaifaIsrael

Personalised recommendations