Multimedia Tools and Applications

, Volume 76, Issue 5, pp 7341–7363 | Cite as

Multimedia content analysis on gesture event detection for a SMART TV Keyboard application



We have proposed an effective machine learning method to analyze multimedia content addressing gesture event detection and recognition. Our machine learning method is based on well-studied techniques such that Procrustes Analysis, Combination of Local and Global Representations, Linear Shape Model, and application to SMART TV Virtual Keyboard. In this paper, we address gesture event detection specially fingertip gesture detection to get smart and advanced usage of technology. Our modern vision keyboard could be a good next generation replacement of SMART TV remote control. It can be more economical as we don’t need physical object like traditional keyboard, remote control and their energy resources like batteries. More information and demonstrations of the proposed keyboard can be accessed at


Gesture event detection Gesture event recognition Computer vision Machine learning for gesture event detection SMART TV Keyboard 


  1. 1.
    Abdulameer MH, Sheikh ASNH, Othman ZA et al. (2014) A modified active appearance model based on an adaptive artificial bee colony. Sci World JGoogle Scholar
  2. 2.
    Anderson TW, Gupta SD (1963) Some inequalities on characteristic roots of matrices. Biometrika 50:522–524MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Andrea C (2001) Dynamic time warping for offline recognition of a small gesture vocabulary. In: Proceedings of the IEEE ICCV workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, July–August, p 83Google Scholar
  4. 4.
    Atchle WR, Edwin HB (1975) Multivariate statistical methods, among-groups covariation. Dowden, Hutchinson & RossGoogle Scholar
  5. 5.
    Baggio DL (2012) Mastering OpenCV with practical computer vision projects. Packt Publishing LtdGoogle Scholar
  6. 6.
    Baker S, Matthews I (2001) Equivalence and efficiency of image alignment algorithms. Comput Vision Pattern Recognition, CVPR 1:I–1090, IEEE, 2001 Google Scholar
  7. 7.
    Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12:149–198MathSciNetMATHGoogle Scholar
  8. 8.
    Beltrami E (1873) On bilinear functions. SVD and signal processing, pp 9–18Google Scholar
  9. 9.
    Berge T, Jos MF (1977) Orthogonal Procrustes rotation for two or more matrices. Psychometrika 42(2):267–276MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Berge T, Jos MF, Dirk LK (1984) Orthogonal rotations to maximal agreement for two or more matrices of different column orders. Psychometrika 49(1):49–55CrossRefGoogle Scholar
  11. 11.
    Brown T, Thomas RC (2000) Finger tracking for the digital desk. Proc First Australasian User Interface Conf 11–16Google Scholar
  12. 12.
    Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Kluwer, Boston, pp 1–43Google Scholar
  13. 13.
    Cambridge Hand Gesture Dataset.
  14. 14.
    Cardoso JF (1999) High-order contrasts for independent component analysis. Neural Comput 11(1):157–192CrossRefGoogle Scholar
  15. 15.
    Cauchy AL. Sur l’équationa l’aide de laquelle on détermine les inégalités séculaires des mouvements des planetes. Exer de math 4(1)74–195Google Scholar
  16. 16.
    Charniak E (1993) Statistical language learning. MIT Press, CambridgeGoogle Scholar
  17. 17.
    Chennubhotla C, Allan J (2001) Sparse PCA. extracting multi-scale structure from data. Computer vision, ICCV 2001. Proc Eighth IEEE Int Conf 1Google Scholar
  18. 18.
    Christian VH, François B (2001) Bare-hand human computer interaction. Proc 2001 Workshop Percetive User Interfaces, Orlando, Florida, USA, 1–8Google Scholar
  19. 19.
    Cliff N (1966) Orthogonal rotation to congruence. Psychometrika 31(1):33–42MathSciNetCrossRefGoogle Scholar
  20. 20.
    Commandeur JJ (1991) Matching configurations. DSWO Press, Leiden University, pp 13–61Google Scholar
  21. 21.
    Cootes TF, Gareth JE, Christopher JT et al. (1998) A comparative evaluation of active appearance model algorithms. BMVC 98:680–689Google Scholar
  22. 22.
    Cootes TF, Kittipanya-ngam P (2002) Comparing variations on the active appearance model algorithm. In BMVC, pp 1–10, 2002Google Scholar
  23. 23.
    Crowley JL, Berard F, Coutaz J et al. (1995) Finger tacking as an input device for augmented reality. Proc Int Workshop Automatic face Gesture Recognition, Zurich, Switzerland, 195–200Google Scholar
  24. 24.
    Derpanis KG (2005) Mean shift clustering, Lecture notes.
  25. 25.
    Dijksterhuis GB, Gower JC (1992) The interpretation of generalized procrustes analysis and allied methods. Food Qual Prefer 3(2):67–87CrossRefGoogle Scholar
  26. 26.
    Edwards, GJ, Christopher JT, Timothy FC et al. (1998) Interpreting face images using active appearance models. automatic face and gesture recognition, proceedings. Third IEEE Int Conf IEEEGoogle Scholar
  27. 27.
    Everson R (1998) Orthogonal, but not orthonormal, procrustes problems. Adv Comput MathGoogle Scholar
  28. 28.
    Fisher RA, Winifred AM (1923) CP32 studies in crop variation, II: the manurialresponse of different potato varieties. J Agric Sci Camb 13:311–320CrossRefGoogle Scholar
  29. 29.
    Forbes K, Eugene F (2005) An efficient search algorithm for motion data using weighted PCA. Proceedings of the 2005 ACM SIGGRAPH. ACM, 2005Google Scholar
  30. 30.
    Francois R, Medioni G (1999) Adaptive color background modeling for real-time segmentation of video streams. In: International conference on imaging science, systems, and technology, Las Vegas, pp 227–232Google Scholar
  31. 31.
    Gavrila DM, Davis LS (1995) Towards 3-d model-based tracking and recognition of human movement: multi-view approach. IEEE Int Workshop automatic face- and gesture recognition. IEEE Computer Society, Zurich, 272–277Google Scholar
  32. 32.
    Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Gower JC (1975) Generalized procrustes analysis. Psychometrika 40(1):33–51MathSciNetCrossRefMATHGoogle Scholar
  34. 34.
    Gower J (1995) Orthogonal and projection procrustes analysisGoogle Scholar
  35. 35.
    Gower JC, Dijksterhuis GB (2004) Procrustes problems. Oxford University Press, OxfordCrossRefMATHGoogle Scholar
  36. 36.
    Green B (1952) The orthogonal approximation of an oblique structure in factor analysis. Psychometrika 17(4):429–440MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Green BF, Gower JC (1979) A problem with congruence. Annual meeting of the psychometric society, Monterey, CaliforniaGoogle Scholar
  38. 38.
    Gross R, Matthews I, Baker S (2005) Generic vs. person specific active appearance models. Image Vis Comput 23(11):1080–1093CrossRefGoogle Scholar
  39. 39.
    Gruen AW, Akca MD (2003) Generalized procrustes analysis and its applications in photogrammetryGoogle Scholar
  40. 40.
    Holzmann GJ (1925) Finite state machine: Ebook.
  41. 41.
    Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441CrossRefMATHGoogle Scholar
  42. 42.
    Hou XW, Li SZ, Zhang H, Cheng Q (2001) Direct appearance models. Computer Vision and Pattern Recognition, 2001 CVPR 1:I–828, IEEE, 2001 Google Scholar
  43. 43.
    Hubert M, Sanne E (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736CrossRefGoogle Scholar
  44. 44.
    Hurley JR, Cattell RB (1962) Producing direct rotation to test a hypothesized factor structure. Behav Sci 7(2):258–262CrossRefGoogle Scholar
  45. 45.
    Igual L, Perez-Sala X, Escalera S, Angulo C, Dela TF (2014) Continuous generalized procrustes analysis. Pattern Recogn 47(2):659–671CrossRefMATHGoogle Scholar
  46. 46.
    Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 225–236Google Scholar
  47. 47.
    Jolliffe L (2002) Principal component analysis. Wiley, New YorkMATHGoogle Scholar
  48. 48.
    Jordan C (1874) Mémoire sur les formes bilinéaires. J Math Pures Appl 19:35–54MATHGoogle Scholar
  49. 49.
    Karhunen J, Jyrki J (1994) Representation and separation of signals using nonlinear PCA type learning. Neural Netw 7(1):113–127CrossRefGoogle Scholar
  50. 50.
    Keaton T, Dominguez SM, Sayed AH et al. (2002) SNAP&TELL: a multi-modal wearable computer interface for browsing the environment. Proc Sixth Int Symposium Wearable Comput, 2002. (ISWC 2002), 75–82Google Scholar
  51. 51.
    Kiers HAL, ten Berge JMF (1992) Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika 57(3):371–382MathSciNetCrossRefMATHGoogle Scholar
  52. 52.
    Kristof W, Wingersky B (1971) A generalization of the orthogonal Procrustes rotation procedure to more than two matrices. Proc Ann Convention Am Psychol Assoc. American Psychological association, 1971Google Scholar
  53. 53.
    Lee HK, Kim JH (1999) An HMM-based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21:961–973CrossRefGoogle Scholar
  54. 54.
    Li F, Wechsler H (2005) Open set face recognition using transduction. IEEE Trans Pattern Anal Mach Intell 27:1686–1697CrossRefGoogle Scholar
  55. 55.
    Lingoes JC, Ingwer B (1978) A direct approach to individual differences scaling using increasingly complex transformations. Psychometrika 43(4):491–519MathSciNetCrossRefMATHGoogle Scholar
  56. 56.
    Lu W-L, Little JJ (2006) Simultaneous tracking and action recognition using the pca-hog descriptor. In: The 3rd Canadian conference on computer and robot vision, 2006. Quebec, pp 6–13Google Scholar
  57. 57.
    Lu H, Plataniotis KN, Venetsanopoulos AN (2006) MPCA: multilinear principal component analysis of tensor objects. Neural Netw IEEE Trans 19(1):18–39Google Scholar
  58. 58.
    Marcell S. Hand posture and gesture dataset.
  59. 59.
    Mika S, Schölkopf B, Smola AJ, Müller KR, Scholz M, Rätsch G. (1998) Kernel PCA and de-noising in feature spaces. In NIPS, vol 4(5)Google Scholar
  60. 60.
    Mosier CI (1939) Determining a simple sturcture when loadings for certain tests are known. Psychometrika 4:149–162CrossRefMATHGoogle Scholar
  61. 61.
    Oka K, Sato Y, Koike H (2002) Real-time gesture event detection tracking and gesture recognition. Comput Graph Appl IEEE 22:64–71CrossRefGoogle Scholar
  62. 62.
    Papandreou G, Maragos P (2008) Adaptive and constrained algorithms for inverse compositional active appearance model fitting. Comput Vision Patt Recognition CVPR 1–8Google Scholar
  63. 63.
    Pearson K (1901) Principal components analysis. London, Edinb, Dublin Philos Mag J Sci 6(2):572–575Google Scholar
  64. 64.
    Peay ER (1988) Multidimensional rotation and scaling of configurations to optimal agreement. Psychometrika 53(2):199–208MathSciNetCrossRefMATHGoogle Scholar
  65. 65.
    Preisendorfer RW (1988) In: Mobley CD (ed) Principal component analysis in meteorology and oceanography, vol 425. Elsevier, AmsterdamGoogle Scholar
  66. 66.
    Quach KG, Duong CN, Luu K et al. (2012) Gabor wavelet-based appearance models. In: Computing and communication technologies, research, innovation, and vision for the future (RIVF), 1–6Google Scholar
  67. 67.
    Quek FKH, Mysliwiec T, Zhao M et al. (1995) Finger mouse: a freehand pointing computer interface. Proc Int Workshop Automatic Face Gesture Recognition, Zurich, Switzerland, 372–377Google Scholar
  68. 68.
    Ramage D (2007) Hidden Markov models fundamentals, Lecture notes.
  69. 69.
    Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhyā: Indian J Stat Ser A 26:329–358MathSciNetMATHGoogle Scholar
  70. 70.
    Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43:1–54CrossRefGoogle Scholar
  71. 71.
    Ren Y, Zhang F (2009) Hand gesture recognition based on meb-svm. In: Second international conference on embedded software and systems, IEEE computer society, Los Alamitos, pp 344–349Google Scholar
  72. 72.
    Ross A Procrustes analysis, Technical report, Department of computer science and engineering, University of South Carolina, SC 29208Google Scholar
  73. 73.
    Sato Y, Kobayashi Y, Koike H et al. (2000) Fast tracking of hands and gesture event detection in infrared images for augmented desk interface. Proc Fourth IEEE Int Conf Automatic Face Gesture Recognition, 462–467, 28–30Google Scholar
  74. 74.
    Schönemann PH (1966) A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1):1–10MathSciNetCrossRefMATHGoogle Scholar
  75. 75.
    Schönemann PH, Robert MC (1970) Fitting one matrix to another under choice of a central dilation and a rigid motion. Psychometrika 35(2):245–255CrossRefGoogle Scholar
  76. 76.
    Senin P (2008) Dynamic time warping algorithm review, technical report.
  77. 77.
    Sigal L, Sclaroff S, Athitsos V et al. (2004) Skin color-based video segmentation under time-varying illumination. IEEE Trans Pattern Anal Mach Intell 862–877Google Scholar
  78. 78.
    Song G, Ai H, Xu GY et al. (2003) Hierarchical direct appearance model for elastic labeled graph localization. Third Int Symposium Multispectral Image Process Pattern Recognition 139–144Google Scholar
  79. 79.
    Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566MathSciNetCrossRefMATHGoogle Scholar
  80. 80.
    Thirumuruganathan S (2010) A detailed introduction to K-nearest neighbor (KNN) algorithm.
  81. 81.
    Tomita A, Ishii JR (1994) Hand shape extraction from a sequence of digitized gray-scale images”, 20th Int. Conf. Industrial Electronics, Control and Instrumentation. IECON ’94 3:1925–1930Google Scholar
  82. 82.
    Vidal R, Ma Y (2005) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27:1945–1960CrossRefGoogle Scholar
  83. 83.
    Wang RY, Popovi J (2009) Real-time hand-tracking with a color glove. ACM SIGGRAPH 2009 papers, 1–8Google Scholar
  84. 84.
    Wöhler C, Anlauf JK (1999) An adaptable time-delay neural-network algorithm for image sequence analysis. IEEE Trans Neural Netw 10:1531–1536CrossRefGoogle Scholar
  85. 85.
    Wu Y, Ma B, Yang M, Zhang J, Jia Y (2014) Metric learning based structural appearance model for robust visual tracking. Circuits Syst Video Technol IEEE Trans 24(5):865–877CrossRefGoogle Scholar
  86. 86.
    Wu Y, Shan Y, Zhangy Z et al. (2000) VISUAL PANEL: from an ordinary paper to a wireless and mobile input device. Technical report, MSR-TR-2000 Microsoft Research Corporation,, October 2000
  87. 87.
    Yan Y, Liu G, Ricci E et al. (2013) Multi-task linear discriminant analysis for multi-view action recognition. Image Process (ICIP), 20th IEEE Int Conf 2842–2846Google Scholar
  88. 88.
    Yan Y, Ricci E, Subramanian R et al. (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Comput Vision (ICCV), IEEE Int Conf 1177–1184Google Scholar
  89. 89.
    Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.MINE Lab, Department of Computer Science and Information EngineeringNational Central University (NCU)Jhongli City, Taoyuan CountyChina

Personalised recommendations