Guest Editorial: Advanced Understanding and Modelling of Human Motion in Multidimensional Spaces


Motion is intimately tied with our behavior. For example, we communicate through facial expressions and gestures to interact with each other in our daily lives. The understanding and modeling of human motion has been a subject of interest in the scientific community for more than one century. The long history of human motion analysis comes from the large scope of applications of such measurement that can be found in medicine, biomechanics, sport, ergonomics, and so forth. It is getting increasingly important with the development of computing technologies and applications. More recently, those technologies have also been widely exploited for the development of human-machine interaction, surveillance, medical diagnosis and computer animation, social interaction, and nonverbal communication.

There is extensive research in human motions especially facial behavior and human body actions in the recent years. Great efforts have been made by researchers from various areas including computer scientists and physiologists etc. Complex methods have been proposed for human motion understanding based on low-level machine-understanding features. Approaches of human motions understanding normally use motion primitives based on the statistical evaluation of motion clusters or dynamic models or discriminative methods. Moreover, the fast-increasing volume of multimedia data has emerged in the recent years. The data related to human motions includes different kinds of sensory data such as visual (including 2D, 3D, and RGB data), audial, electroencephalography (EEG), and touch sensory data. The multimodal data provides overwhelming channels for advanced human motion understanding.

As a matter of fact, human beings are able to respond effectively to emotional states in social interaction from subtle human motions. Therefore, it is still challenging to effectively and efficiently describe and represent human motions. We dedicate this special issue to these challenges as a venue to advance and disseminate the most recent research on this theme. The aim of this special issue is to survey state-of-the-art methodologies, algorithms, and concepts in advanced human motion understanding with any kind of data types. It intends to bridge a gap between the low-level features and high-level semantics of human motions.

We have received 33 manuscripts with each manuscript being blindly reviewed by at least three reviewers consisting of guest editors and anonymous reviewers. After the review process, 21 manuscripts were finally selected to be included in this special issue. Those accepted papers represent a wide spectrum of research under the theme of the special issue ranging from human action, facial expression, to gaze estimation. In Section 2, we present a brief summation of the selected papers for this special issue.

Summary of accepted papers

This special issue has accepted 21 carefully selected papers. Here is a brief summary of those papers in this section.

Falls at homes are one of the major risks for elderly and immediate alarming and helping is essential to reduce the rate of morbidity and mortality. Therefore, human fall detection is necessary and useful in smart home system and health care system. The paper “Human Fall Detection in Surveillance Video based on PCANet” by Shengke Wang et al. [15] highlights the importance of human motion understanding in the healthcare environment. It proposes a new framework for fall detection based on automatically feature learning methods. The authors train the samples by PCANet, which is a deep learning approach to predict the label of each frame that is further used to train an action model with SVM. They report that the proposed method can achieve a reliable result as compared with state-of-the-art methods.

Recognition in a video stream of air-drawn gestures and characters is an important technology in realizing verbal and nonverbal communication in human-computer interaction. The paper “Time-Segmentation and Position-Free Recognition of Air-Drawn Gestures and Characters in Videos” is authored by Yuki Niitsuma et al. [13]. The paper presents an algorithm called time-space continuous dynamic programming (TSCDP) to realize both time- and location-free (spotting) recognition in video streams of isolated alphabetic characters and connected cursive textual characters. The authors test the TSCDP algorithm in recognizing 26 isolated alphabetic characters and 23 Japanese hiragana and kanji air-drawn characters. Moreover, the authors report that the TSCDP algorithm can also be well performed in gesture recognition.

There are seven papers related to human actions or activities included in this issue. The paper “Exploiting Stereoscopic Disparity for Augmenting Human Activity Recognition Performance” by Ioannis Mademlis et al. [11] investigates several ways to exploit scene depth information, implicitly available through the modality of stereoscopic disparity in 3D videos, with the purpose of augmenting performance in the problem of recognizing complex human activities in natural settings. The investigated approaches are flexible and able to cooperate with any monocular low-level feature descriptor. Qi Jia et al. propose the temporal characteristic number (TCN) for individual joint point of a human body in temporal series [5]. Yidong Li et al. [9] develop a learning-based comprehensive evaluation model for traffic data quality (TDQ) in intelligent transportation systems (ITS). Andrés Adolfo Navarro-Newball et al. [12] present a novel approach for recreating life-like experiences through an easy and natural gesture-based interaction. In [14], Xin Wang et al. present a method for the efficient retrieval and browsing of immense amounts of realistic 3D human body motion capture data. Kang Wang et al. [18] propose a base-points-driven shape correspondence (BSC) approach to extract skeletons of articulated objects from 3D mesh shapes.

The eye and its movement as part of the most significant facial features play an important role in reflecting a person’s mental and emotional state. The paper titled “Comparison of Random Forest, Random Ferns and Support Vector Machine for Eye State Classification” by Yanchao Dong et al. [3] presents an eye state estimation framework with various feature sets using random forest\ferns based on its superior performance on time consumption. The comparison of different classifiers indicates that random forest\ferns outperform the SVM in terms of time consumption. The authors incorporate HOG feature into the classifier to reduce noise influence. The correctly recognition rate reported in this paper is above 93 %.

Remote gaze estimation under natural light with free head movements is still a challenging problem. The paper “Remote Gaze Estimation Based on 3D Face Structure and Iris Centers Under Natural Light” by Chunshui Xiong et al. [20] proposes a novel feature-based gaze estimation method without use of cornea reflections. The authors utilize 3D active shape models to represent 3D face structure and present a 3D Iris-Eye-Contours descriptor to represent human gaze information. They improve the ability of tolerance to head movements with rectified 3D iris centers and eye contours obtained by the head poses. The proposed gaze estimation system is tested on several subjects and the results demonstrate that the system can achieve low estimation error and allow natural head movements under natural light.

Face tracking often encounters drifting problems, especially when a significant face appearance variation occurs. The paper “A Fusion Method for Robust Face Tracking” by Xiaodong Jiang et al. [6] proposes a novel and efficient fusion strategy for robust face tracking. A supervised descent method (SDM) and a compressive tracking method (CT) are employed at the same time. The novel online fusion method can remarkably alleviate the drifting problem in robust face tracking. Another paper related to face features is “Positioning corners of human mouth based on local gradient operator” by Yulin Wang et al. [16]. This paper presents a method to detect and locate accurately facial feature points based on local gradient operator.

The paper titled “Ensemble based Extreme Learning Machine for Cross-modality Face Matching” by Yi Jin et al. [7] proposes a new ensemble-based extreme learning machine (ELM) approach for cross-modality face matching, which integrates the voting ELM with a discriminant feature descriptor. ELM is one of the most important and efficient machine-learning algorithms for pattern classification due to its fast learning speed. The authors demonstrate the effectiveness of the proposed approach on two heterogeneous face recognition scenarios.

View-invariant human action recognition is a challenging research topic in computer vision. The paper “Multi-view Transition HMMs based View-invariant Human Action Recognition Method” by Xiaofei Ji et al. [4] presents a novel graphical structure based on hidden Markov models with multi-view transition to model the human action with viewpoint changing. The proposed model can not only simplify the model training process by decomposing the parameter space into multiple sub-spaces but also improve the performance of the algorithm by constraining the possible viewpoint changing.

The paper “Cultural-based Visual Expression: Emotional Analysis of Human Face via Peking Opera Painted Faces (POPF)” by Ding Wang et al. [17] takes the study of Peking Opera Painted Faces (POPF) as an example to see how information and meanings can be effectively expressed through the change of facial expressions based on the facial motion within natural and emotional aspects. The paper titled “Automatic Evaluation of The Degree of Facial Nerve Paralysis” by Ting Wang et al. [19] presents a novel method for evaluating the degree of facial paralysis considering both static facial asymmetry and dynamic transformation factors. A quantitative approach of static facial asymmetry based on local mirror asymmetry is proposed.

Hand postures provide attractive means of interface devices for human-computer interaction. In the paper “Hand posture recognition based on heterogeneous features fusion of multiple kernels learning” by Jiangtao Cao et al. [2], a novel hand posture recognition method is proposed by integrating the multiple image features and multiple kernels learning support vector machine (SVM). The paper “A Novel Approach to Extract Hand Gesture Feature in Depth Images” by Zhaojie Ju et al. [8] proposes a novel approach to extract human hand gesture features in real-time from RGB-D images based on the earth mover’s distance and Lasso algorithms.

The paper “Small Scale Crowd Behavior Classification by Euclidean Distance Variation-Weighted Network” by Xuguang Zhang et al. [21] introduces microscopic-based method and macroscopic-based method in crowd behavior analysis. By exploring the connection between the microscopic and macroscopic properties of a crowd, a method which use Euclidean distance variation-weighted network to recognize the crowd behavior is proposed in this paper.

The paper “Video Parsing via Spatiotemporally Analysis with Images” by Xuelong Li et al. [10] proposes to transfer or propagate such labels from images to videos. The proposed approach consists of three main stages: (I) the posterior category probability density function (PDF) is learned by an algorithm which combines frame relevance and label propagation from images, (II) the prior contextual constraint PDF on the map of pixel categories through whole video is learned by the Markov random fields (MRF), and (III) finally, based on both learned PDFs, the final parsing results are yielded up to the maximum a posterior (MAP) process which is computed via a very efficient graph-cut based integer optimization algorithm.

Finally, the paper “Age Estimation Based on Improved Discriminative Gaussian Process Latent Variable Model” by Lijun Cai et al. [1] proposes a novel age estimation method based on improved discriminative Gaussian process latent variable model (DGPLVM) to discover the underlying trend of aging patterns. To consider age estimation as a complex and nonlinear problem, the authors employ the improved DGPLVM to obtain low-dimensional representations and use the Gaussian process regression model to find the age regressor that maps the low-dimensional representations to ages. They conduct experiments on two widely used databases FG-NET and MORPH, and the results show that the proposed improved DGPLVM age estimation method is effective and comparable to the state-of-the-art methods.


The papers included in this special issue are representatives of the current research challenges in advanced understanding and modeling of human motions. It is expected that these papers can provide researchers with valuable resources and motivations to work on the challenging issues in this research theme.

We would like to thank the Editor-in-Chief of the journal, Professor Borko Furht for his huge support for this issue. Our special thanks go to all editorial staff, especially Mr. Jay-y Banua and Ms. Courtney Clark for their valuable and prompt support throughout the preparation and publication of this special issue. We would like to thank all authors for their contributions to this special issue. We also extend our thanks to all reviewers for their hard work to ensure the high quality of accepted papers.


  1. 1.

    Cai L, Huang L, Liu C (2015) Age estimation based on improved discriminative Gaussian process latent variable model. Multimed Tools Appl. doi:10.1007/s11042-015-2668-4

  2. 2.

    Cao J, Yu S, Liu H, Li P (2015) Hand posture recognition based on heterogeneous features fusion of multiple kernels learning. Multimed Tools Appl. doi:10.1007/s11042-015-2628-z

  3. 3.

    Dong Y, Zhang Y, Yue J, Hu Z (2015) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl. doi: 10.1007/s11042-015-2635-0

  4. 4.

    Ji X, Ju Z, Wang C, Wang C (2015) Multi-view transition HMMs based view-invariant human action recognition method. Multimed Tools Appl. doi: 10.1007/s11042-015-2661-y

  5. 5.

    Jia Q, Fan X, Luo Z, Li H, Huyan K, Li Z (2015) Cross-view action matching using a novel projective invariant on non-coplanar space-time points. Multimed Tools Appl. doi: 10.1007/s11042-015-2704-4

  6. 6.

    Jiang X, Yu H, Lu Y, Liu H (2015) A fusion method for robust face tracking. Multimed Tools Appl. doi: 10.1007/s11042-015-2659-5

  7. 7.

    Jin Y, Cao J, Wang Y, Zhi R (2015) Ensemble based extreme learning machine for cross-modality face matching. Multimed Tools Appl. doi: 10.1007/s11042-015-2650-1

  8. 8.

    Ju Z, Gao D, Cao J, Liu H (2015) A novel approach to extract hand gesture feature in depth images. Multimed Tools Appl. doi: 10.1007/s11042-015-2609-2

  9. 9.

    Li Y, Chen D (2015) A learning-based comprehensive evaluation model for traffic data quality in intelligent transportation systems. Multimed Tools Appl. doi: 10.1007/s11042-015-2676-4

  10. 10.

    Li X, Mou L, Lu X (2015) Video parsing via spatiotemporally analysis with images. Multimed Tools Appl. doi: 10.1007/s11042-015-2735-x

  11. 11.

    Mademlis I, Iosifidis A, Tefas A, Nikolaidis N, Pitas I (2015) Exploiting stereoscopic disparity for augmenting human activity recognition performance. Multimed Tools Appl. doi: 10.1007/s11042-015-2719-x

  12. 12.

    Navarro-Newball AA, Moreno I, Prakash E, Arya A, Contreras VE, Quiceno VA, Lozano S, Mejìa JD, Loaiza DF (2015) Gesture based human motion and game principles to aid understanding of science and cultural practices. Multimed Tools Appl. doi: 10.1007/s11042-015-2667-5

  13. 13.

    Niitsuma Y, Torii S, Yaguchi Y, Oka R (2015) Time-segmentation and position-free recognition of airdrawn gestures and characters in videos. Multimed Tools Appl. doi: 10.1007/s11042-015-2669-3

  14. 14.

    Wang X, Chen l, Jing j, Zheng H (2015) Human motion capture data retrieval based on semantic thumbnail. Multimed Tools Appl. doi: 10.1007/s11042-015-2705-3

  15. 15.

    Wang S, Chen L, Zhou Z, Junyu Dong (2015) Human fall detection in surveillance video based on PCANet. Multimed Tools Appl. doi: 10.1007/s11042-015-2698-y

  16. 16.

    Wang Y, Ding W, Chen Y (2015) Positioning corners of human mouth based on local gradient operator. Multimed Tools Appl. doi: 10.1007/s11042-015-2627-0

  17. 17.

    Wang D, Kang J, Qin S-F, Birringer J (2015) Cultural-based visual expression: emotional analysis of human face via Peking Opera Painted Faces (POPF). Multimed Tools Appl. doi: 10.1007/s11042-015-2665-7

  18. 18.

    Wang K, Razzaq A, Wu Z, Tian F, Ali S, Jia T, Wang X, Zhou M (2015) Novel correspondence-based approach for consistent human skeleton extraction. Multimed Tools Appl. doi: 10.1007/s11042-015-2629-y

  19. 19.

    Wang T, Zhang S, Dong J, Liu L, Yu H (2015) Automatic evaluation of the degree of facial nerve paralysis. Multimed Tools App. doi: 10.1007/s11042-015-2696-0

  20. 20.

    Xiong C, Huang L, Liu C (2015) Remote gaze estimation based on 3D face structure and iris centers under natural light. Multimed Tools Appl. doi: 10.1007/s11042-015-2600-y

  21. 21.

    Zhang X, Ouyang M, Zhang X (2015) Small scale crowd behavior classification by Euclidean distance variation-weighted network. Multimed Tools Appl. doi: 10.1007/s11042-015-2670-x

Download references

Author information



Corresponding author

Correspondence to Hui Yu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, H., Dong, J., Pham, T.D. et al. Guest Editorial: Advanced Understanding and Modelling of Human Motion in Multidimensional Spaces. Multimed Tools Appl 75, 11595–11602 (2016).

Download citation