# Supervised tensor learning

- 670 Downloads
- 214 Citations

## Abstract

Tensor representation is helpful to reduce the small sample size problem in discriminative subspace selection. As pointed by this paper, this is mainly because the structure information of objects in computer vision research is a reasonable constraint to reduce the number of unknown parameters used to represent a learning model. Therefore, we apply this information to the vector-based learning and generalize the vector-based learning to the tensor-based learning as the supervised tensor learning (STL) framework, which accepts tensors as input. To obtain the solution of STL, the alternating projection optimization procedure is developed. The STL framework is a combination of the convex optimization and the operations in multilinear algebra. The tensor representation helps reduce the overfitting problem in vector-based learning. Based on STL and its alternating projection optimization procedure, we generalize support vector machines, minimax probability machine, Fisher discriminant analysis, and distance metric learning, to support tensor machines, tensor minimax probability machine, tensor Fisher discriminant analysis, and the multiple distance metrics learning, respectively. We also study the iterative procedure for feature extraction within STL. To examine the effectiveness of STL, we implement the tensor minimax probability machine for image classification. By comparing with minimax probability machine, the tensor version reduces the overfitting problem.

## Keywords

Convex optimization Supervised learning Tensor Alternating projection## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Amini R, Gallinari P (2005) Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8(4):385–413CrossRefGoogle Scholar
- 2.Bartlett P, Shawe-Taylor J (1998) Generalization performance of support vector machines and other pattern classifiers. In: Scholkopf B, Burges CJ, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, MAGoogle Scholar
- 3.Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge, UKzbMATHGoogle Scholar
- 4.Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2006) A tutorial on geometric programming. Optim EngGoogle Scholar
- 5.Burges JC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRefGoogle Scholar
- 6.Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
- 7.Etemad K, Chellappa R (1998) Discriminant analysis for recognition of human face images. J Opt Soc Am A 14(8):1,724–1,733Google Scholar
- 8.Fisher RA (1938) The statistical utilization of multiple measurements. Ann Eugenics 8:376–386Google Scholar
- 9.Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic, New YorkzbMATHGoogle Scholar
- 10.Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, pp 77–86Google Scholar
- 11.Girgensohn A, Foote J (1999) Video classification using transform coefficients. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol. 6, pp 3045–3048Google Scholar
- 12.Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1,254–1,259CrossRefGoogle Scholar
- 13.Itti L, Koch C (2001) Computational modeling of visual attention. Nat Rev Neurosci 2(3):194–203CrossRefGoogle Scholar
- 14.Kim SJ, Magnani A, Boyd S (2005) Robust Fisher discriminant analysis. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, CanadaGoogle Scholar
- 15.Lanckriet G, Cristianini N, Bartlett P, Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72Google Scholar
- 16.Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2002) A robust minimax approach to classification. J Mach Learn Res 3:555–582CrossRefGoogle Scholar
- 17.Lathauwer LD (1997) Signal processing based on multilinear algebra. Ph.D. Thesis, Katholike Universiteit Leuven, Leuven, BelgiumGoogle Scholar
- 18.Li T, Ogihara M (2005) Semisupervised learning from different information sources. Knowl Inf Syst 7(3):289–309CrossRefGoogle Scholar
- 19.Lobo M, Vandenberghe L, Boyd S, Lebret H (1998) Applications of second-order cone programming. Linear Algebr Appl 284:193–228zbMATHCrossRefMathSciNetGoogle Scholar
- 20.Marshall A, Olkin I (1960) Multivariate Chebyshev inequalities. Ann Math Stat 31(4):1,001–1,014MathSciNetGoogle Scholar
- 21.Nocedal J, Wright SJ (1999) Numerical optimization. Springer, BerlinzbMATHGoogle Scholar
- 22.Pedroso JP, Murata N (1999) Support vector machines for linear programming: motivation and formulations. BSIS Technical Report 99-2. Riken Brain Science Institute, Wako-shi, Saitama, JapanGoogle Scholar
- 23.Popescu I, Bertsimas D (2000) Optimal inequalities in probability theory: a convex optimization approach. Technique Report TM62, InseadGoogle Scholar
- 24.Prasad BG, Biswas KK, Gupta SK (2004) Region-based image retrieval using integrated color, shape, and location index. Comput Vis Image Underst 94(1–3):192–233Google Scholar
- 25.Rui Y, Huang TS, Chang SE (1999) Image retrieval: Current techniques, promising directions and open issues. J Vis Commun Image Represent 10:39–62CrossRefGoogle Scholar
- 26.Salmenkivi M, Mannila H (2005) Using Markov chain Monte Carlo and dynamic programming for event sequence data. Knowl Inf Syst 7(3):267–288CrossRefGoogle Scholar
- 27.Scholkopf B, Smola A, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12:1,207–1,245CrossRefGoogle Scholar
- 28.Scholkopf B, Smola A (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond (Adaptive computation and machine learning). MIT Press, Cambridge, MAGoogle Scholar
- 29.Shashua A, Levin A (2001) Linear image coding for regression and classification using the tensor-rank principle. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, Hawai, vol. 1, pp 42–49Google Scholar
- 30.Smola A, Friess TT, Scholkopf B (1999) Semiparametric support vector and linear programming machines. Neural Inf Process Syst 11:585–591Google Scholar
- 31.Strohmann TR, Belitski A, Grudic GZ, DeCoste D (2003) Sparse greedy minimax probability machine classification. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, CanadaGoogle Scholar
- 32.Sun Y, Fisher R (2003) Object-based visual attention for computer vision. Artif Intell 146(1):77–123zbMATHCrossRefMathSciNetGoogle Scholar
- 33.Sun J, Tao D, Faloutsosy C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USAGoogle Scholar
- 34.Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRefMathSciNetGoogle Scholar
- 35.Suykens JAK, van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, SingaporezbMATHGoogle Scholar
- 36.Tao D, Li X, Hu W, Maybank SJ, Wu X (2005) Supervised tensor learning. In: Proceedings of the IEEE international conference on data mining, Houston, Texas, USA, pp 450–457Google Scholar
- 37.Tao D (2006) Discriminative linear and multilinear subspace methods. PhD Thesis, University of London, LondonGoogle Scholar
- 38.Tao D, Li X, Wu X, Maybank SJ (2006) Human carrying status in visual surveillance. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, New York, NY, USA, pp 1,670–1,677Google Scholar
- 39.Tao D, Li X, Wu X, Maybank SJ (2006) Elapsed time in human gait recognition: a new approach. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Toulouse, FranceGoogle Scholar
- 40.Tao D, Li X, Maybank SJ (2007) Negative samples analysis in relevance feedback. IEEE Trans Knowl Data EngGoogle Scholar
- 41.Torralba AB, Oliva A (1999) Semantic organization of scenes using discriminant structural templates. In: Proceedings of the IEEE international conference on computer vision, Kerkyra, Greece, pp 1,253–1,258Google Scholar
- 42.Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136CrossRefGoogle Scholar
- 43.Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev 1(38):49–95CrossRefMathSciNetGoogle Scholar
- 44.Vanderbei R (2001) Linear programming: foundations and extensions, 2nd edn. Springer, BerlinzbMATHGoogle Scholar
- 45.Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, New YorkzbMATHGoogle Scholar
- 46.Vasilescu MAO, Terzopoulos D (2003) Multilinear subspace analysis for image ensembles. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, vol. 2, Madison, WI, pp 93–99Google Scholar
- 47.Wang JZ, Li L, Wiederhold G (2001) SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963CrossRefGoogle Scholar
- 48.Wechslet H, Phillips J, Bruse V, Soulie F, Hauhg T (eds) (1998). Face recognition: from theory to application. Springer-Verlag, BerlinGoogle Scholar
- 49.Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. Neural Inf Process Syst 18:Google Scholar
- 50.Winston WL, Goldberg JB, Venkataramanan M (2002) Introduction to mathematical programming: operations research, 4th edn. Duxbury, Pacific Grove, CA, USAGoogle Scholar
- 51.Xu D, Yan S, Zhang L, Zhang H-J, Liu Z, Shum H-Y (2005) Concurrent subspaces analysis. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, San Diego, CA, USA, vol. 2, pp 203–208Google Scholar
- 52.Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. In: Advances in neural information processing systems. Vancouver and Whistler, British Columbia, Canada, pp 1,569–1,576Google Scholar
- 53.Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941CrossRefGoogle Scholar
- 54.Zangwill WI (1969) Nonlinear programming: a unified approach. Prentice-Hall, Englewood Cliffs, NJzbMATHGoogle Scholar
- 55.Zhang X (2004) Matrix analysis and applications. Springer, BerlinGoogle Scholar