Advertisement

Knowledge and Information Systems

, Volume 13, Issue 1, pp 1–42 | Cite as

Supervised tensor learning

  • Dacheng TaoEmail author
  • Xuelong Li
  • Xindong Wu
  • Weiming Hu
  • Stephen J. Maybank
Regular Paper

Abstract

Tensor representation is helpful to reduce the small sample size problem in discriminative subspace selection. As pointed by this paper, this is mainly because the structure information of objects in computer vision research is a reasonable constraint to reduce the number of unknown parameters used to represent a learning model. Therefore, we apply this information to the vector-based learning and generalize the vector-based learning to the tensor-based learning as the supervised tensor learning (STL) framework, which accepts tensors as input. To obtain the solution of STL, the alternating projection optimization procedure is developed. The STL framework is a combination of the convex optimization and the operations in multilinear algebra. The tensor representation helps reduce the overfitting problem in vector-based learning. Based on STL and its alternating projection optimization procedure, we generalize support vector machines, minimax probability machine, Fisher discriminant analysis, and distance metric learning, to support tensor machines, tensor minimax probability machine, tensor Fisher discriminant analysis, and the multiple distance metrics learning, respectively. We also study the iterative procedure for feature extraction within STL. To examine the effectiveness of STL, we implement the tensor minimax probability machine for image classification. By comparing with minimax probability machine, the tensor version reduces the overfitting problem.

Keywords

Convex optimization Supervised learning Tensor Alternating projection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amini R, Gallinari P (2005) Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8(4):385–413CrossRefGoogle Scholar
  2. 2.
    Bartlett P, Shawe-Taylor J (1998) Generalization performance of support vector machines and other pattern classifiers. In: Scholkopf B, Burges CJ, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, MAGoogle Scholar
  3. 3.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge, UKzbMATHGoogle Scholar
  4. 4.
    Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2006) A tutorial on geometric programming. Optim EngGoogle Scholar
  5. 5.
    Burges JC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRefGoogle Scholar
  6. 6.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
  7. 7.
    Etemad K, Chellappa R (1998) Discriminant analysis for recognition of human face images. J Opt Soc Am A 14(8):1,724–1,733Google Scholar
  8. 8.
    Fisher RA (1938) The statistical utilization of multiple measurements. Ann Eugenics 8:376–386Google Scholar
  9. 9.
    Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic, New YorkzbMATHGoogle Scholar
  10. 10.
    Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, pp 77–86Google Scholar
  11. 11.
    Girgensohn A, Foote J (1999) Video classification using transform coefficients. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol. 6, pp 3045–3048Google Scholar
  12. 12.
    Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1,254–1,259CrossRefGoogle Scholar
  13. 13.
    Itti L, Koch C (2001) Computational modeling of visual attention. Nat Rev Neurosci 2(3):194–203CrossRefGoogle Scholar
  14. 14.
    Kim SJ, Magnani A, Boyd S (2005) Robust Fisher discriminant analysis. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, CanadaGoogle Scholar
  15. 15.
    Lanckriet G, Cristianini N, Bartlett P, Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72Google Scholar
  16. 16.
    Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2002) A robust minimax approach to classification. J Mach Learn Res 3:555–582CrossRefGoogle Scholar
  17. 17.
    Lathauwer LD (1997) Signal processing based on multilinear algebra. Ph.D. Thesis, Katholike Universiteit Leuven, Leuven, BelgiumGoogle Scholar
  18. 18.
    Li T, Ogihara M (2005) Semisupervised learning from different information sources. Knowl Inf Syst 7(3):289–309CrossRefGoogle Scholar
  19. 19.
    Lobo M, Vandenberghe L, Boyd S, Lebret H (1998) Applications of second-order cone programming. Linear Algebr Appl 284:193–228zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Marshall A, Olkin I (1960) Multivariate Chebyshev inequalities. Ann Math Stat 31(4):1,001–1,014MathSciNetGoogle Scholar
  21. 21.
    Nocedal J, Wright SJ (1999) Numerical optimization. Springer, BerlinzbMATHGoogle Scholar
  22. 22.
    Pedroso JP, Murata N (1999) Support vector machines for linear programming: motivation and formulations. BSIS Technical Report 99-2. Riken Brain Science Institute, Wako-shi, Saitama, JapanGoogle Scholar
  23. 23.
    Popescu I, Bertsimas D (2000) Optimal inequalities in probability theory: a convex optimization approach. Technique Report TM62, InseadGoogle Scholar
  24. 24.
    Prasad BG, Biswas KK, Gupta SK (2004) Region-based image retrieval using integrated color, shape, and location index. Comput Vis Image Underst 94(1–3):192–233Google Scholar
  25. 25.
    Rui Y, Huang TS, Chang SE (1999) Image retrieval: Current techniques, promising directions and open issues. J Vis Commun Image Represent 10:39–62CrossRefGoogle Scholar
  26. 26.
    Salmenkivi M, Mannila H (2005) Using Markov chain Monte Carlo and dynamic programming for event sequence data. Knowl Inf Syst 7(3):267–288CrossRefGoogle Scholar
  27. 27.
    Scholkopf B, Smola A, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12:1,207–1,245CrossRefGoogle Scholar
  28. 28.
    Scholkopf B, Smola A (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond (Adaptive computation and machine learning). MIT Press, Cambridge, MAGoogle Scholar
  29. 29.
    Shashua A, Levin A (2001) Linear image coding for regression and classification using the tensor-rank principle. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, Hawai, vol. 1, pp 42–49Google Scholar
  30. 30.
    Smola A, Friess TT, Scholkopf B (1999) Semiparametric support vector and linear programming machines. Neural Inf Process Syst 11:585–591Google Scholar
  31. 31.
    Strohmann TR, Belitski A, Grudic GZ, DeCoste D (2003) Sparse greedy minimax probability machine classification. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, CanadaGoogle Scholar
  32. 32.
    Sun Y, Fisher R (2003) Object-based visual attention for computer vision. Artif Intell 146(1):77–123zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Sun J, Tao D, Faloutsosy C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USAGoogle Scholar
  34. 34.
    Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRefMathSciNetGoogle Scholar
  35. 35.
    Suykens JAK, van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, SingaporezbMATHGoogle Scholar
  36. 36.
    Tao D, Li X, Hu W, Maybank SJ, Wu X (2005) Supervised tensor learning. In: Proceedings of the IEEE international conference on data mining, Houston, Texas, USA, pp 450–457Google Scholar
  37. 37.
    Tao D (2006) Discriminative linear and multilinear subspace methods. PhD Thesis, University of London, LondonGoogle Scholar
  38. 38.
    Tao D, Li X, Wu X, Maybank SJ (2006) Human carrying status in visual surveillance. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, New York, NY, USA, pp 1,670–1,677Google Scholar
  39. 39.
    Tao D, Li X, Wu X, Maybank SJ (2006) Elapsed time in human gait recognition: a new approach. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Toulouse, FranceGoogle Scholar
  40. 40.
    Tao D, Li X, Maybank SJ (2007) Negative samples analysis in relevance feedback. IEEE Trans Knowl Data EngGoogle Scholar
  41. 41.
    Torralba AB, Oliva A (1999) Semantic organization of scenes using discriminant structural templates. In: Proceedings of the IEEE international conference on computer vision, Kerkyra, Greece, pp 1,253–1,258Google Scholar
  42. 42.
    Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136CrossRefGoogle Scholar
  43. 43.
    Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev 1(38):49–95CrossRefMathSciNetGoogle Scholar
  44. 44.
    Vanderbei R (2001) Linear programming: foundations and extensions, 2nd edn. Springer, BerlinzbMATHGoogle Scholar
  45. 45.
    Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, New YorkzbMATHGoogle Scholar
  46. 46.
    Vasilescu MAO, Terzopoulos D (2003) Multilinear subspace analysis for image ensembles. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, vol. 2, Madison, WI, pp 93–99Google Scholar
  47. 47.
    Wang JZ, Li L, Wiederhold G (2001) SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963CrossRefGoogle Scholar
  48. 48.
    Wechslet H, Phillips J, Bruse V, Soulie F, Hauhg T (eds) (1998). Face recognition: from theory to application. Springer-Verlag, BerlinGoogle Scholar
  49. 49.
    Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. Neural Inf Process Syst 18:Google Scholar
  50. 50.
    Winston WL, Goldberg JB, Venkataramanan M (2002) Introduction to mathematical programming: operations research, 4th edn. Duxbury, Pacific Grove, CA, USAGoogle Scholar
  51. 51.
    Xu D, Yan S, Zhang L, Zhang H-J, Liu Z, Shum H-Y (2005) Concurrent subspaces analysis. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, San Diego, CA, USA, vol. 2, pp 203–208Google Scholar
  52. 52.
    Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, Canada, pp 1,569–1,576Google Scholar
  53. 53.
    Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941CrossRefGoogle Scholar
  54. 54.
    Zangwill WI (1969) Nonlinear programming: a unified approach. Prentice-Hall, Englewood Cliffs, NJzbMATHGoogle Scholar
  55. 55.
    Zhang X (2004) Matrix analysis and applications. Springer, BerlinGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  • Dacheng Tao
    • 1
    Email author
  • Xuelong Li
    • 1
  • Xindong Wu
    • 2
  • Weiming Hu
    • 3
  • Stephen J. Maybank
    • 1
  1. 1.School of Computer Science and Information Systems, BirkbeckUniversity of LondonLondonUK
  2. 2.Department of Computer ScienceUniversity of VermontBurlingtonUSA
  3. 3.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingP.R. China

Personalised recommendations