Multimedia Tools and Applications

, Volume 76, Issue 8, pp 10721–10739 | Cite as

Healthy human sitting posture estimation in RGB-D scenes using object context

  • Baolong LiuEmail author
  • Yi Li
  • Sanyuan Zhang
  • Xiuzi YeEmail author


Unhealthy sitting posture leads to cervical spondylosis and other related cumulative trauma disorders (CTDs). Unfortunately, the research on the investigation of heathy sitting posture is rare. The current research is to estimate heathy sitting posture based on a computer workstation ergonomics perspective. A novel RGB-D scene healthy human sitting posture estimation framework was developed to estimate the sitting posture, in which a human posture is represented by 15 skeletal joints. A healthy human sitting posture configuration is defined from the view of ergonomics, a Naïve Bayes classifier was used to learn the health-constrained spatial and context relationships between objects and the human skeletal joints in the RGB-D scene. At the estimation stage, the object spatial features (e.g., coordinate, distance, height and angle) in the RGB-D scene were obtained through conducting the scene labeling. 15 human skeletal joints were extracted simultaneously from Kinect as primary inputs, and then algorithms were developed to generate and to classify the candidate healthy skeleton joints. Through skeleton refinement, the skeleton joints distribution of a healthy sitting posture was produced. The framework was tested on a dataset comprised of RGB-D scenes, which were collected from 3 subjects (3 types of sitting postures, each in 3 different offices). The experiment results indicate that the framework is feasible and reliable.


Sitting posture estimation Health RGB-D scene Ergonomics 



This work was funded by China Natural Science Foundation (No: 61272304) and Zhejiang Provincial Natural Science Foundation of China (No.LY14F020027, No. LQ16F020007). Many thanks to all reviewers, we appreciate their valuable suggestions and questions.


  1. 1.
    Anand A, Koppula HS, Joachims T, Saxena A (2012) Contextually guided semantic labeling and search for three-dimensional point clouds. Int J Robot Res 0(0):1–16Google Scholar
  2. 2.
    Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In Advances in neural information processing systems (pp. 244–252)Google Scholar
  3. 3.
    Chang X, Shen H, Nie F, Wang S, Yang Y, Zhou X (2014) Compound rank-k projections for bilinear analysis. arXiv preprint arXiv:1411.6231Google Scholar
  4. 4.
    Chang X, Yang Y, Xing E, Yu Y (2015) Complex event detection using semantic saliency and nearly-isotonic SVM. In Proceedings of the 32nd international conference on machine learning (ICML-15) (pp. 1348–1357)Google Scholar
  5. 5.
    Chung MK, Choi K (1997) Ergonomic analysis of musculoskeletal discomforts among conversational VDT operators. Comput Ind Eng 33(3):521–524CrossRefGoogle Scholar
  6. 6.
    Daian I, Van Ruiten AM, Visser A, Zubic S (2007) Sensitive chair: a force sensing chair with multimodal real-time feedback via agent. In Proceedings of the 14th European conference on Cognitive ergonomics: invent! explore! (pp. 163–166). ACMGoogle Scholar
  7. 7.
    Das B, Sengupta AK (1996) Industrial workstation design: a systematic ergonomics approach. Appl Ergon 27(3):157–163CrossRefGoogle Scholar
  8. 8.
    Norman DA (2013) The design of everyday things: revised and expanded edition. Basic Books, New YorkGoogle Scholar
  9. 9.
    Feyen R, Liu Y, Chaffin D, Jimmerson G, Joseph B (2000) Computer-aided ergonomics: a case study of incorporating ergonomics analyses into workplace design. Appl Ergon 31(3):291–300CrossRefGoogle Scholar
  10. 10.
    Foubert N, McKee AM, Goubran R, Knoefel F (2012) Lying and sitting posture recognition and transition detection using a pressure sensor array. In Medical measurements and applications proceedings (MeMeA), 2012 I.E. international symposium on (pp. 1–6). IEEEGoogle Scholar
  11. 11.
    Gibson JJ (1977) The theory of affordances. Hilldale, USAGoogle Scholar
  12. 12.
    Gjoreski H, Luštrek M, Gams M (2011) Accelerometer placement for posture recognition and fall detection. In Intelligent environments (IE), 2011 7th international conference on (pp. 47–54). IEEEGoogle Scholar
  13. 13.
    Grabner H, Gall J, Van Gool L (2011) What makes a chair a chair?. In Computer vision and pattern recognition (CVPR), 2011 I.E. conference on (pp. 1529–1536). IEEEGoogle Scholar
  14. 14.
    Grest D, Woetzel J, Koch R (2005) Nonlinear body pose estimation from depth images. In Pattern recognition (pp. 285–292). Springer, BerlinGoogle Scholar
  15. 15.
    Hochanadel CD (1995) Computer workstation adjustment: a novel process and large sample study. Appl Ergon 26(5):315–326CrossRefGoogle Scholar
  16. 16.
    Jaschinski W, Heuer H, Kylian H (1998) Preferred position of visual displays relative to the eyes: a field study of visual strain and individual differences. Ergonomics 41(7):1034–1049CrossRefGoogle Scholar
  17. 17.
    Jiang Y, Koppula H, Saxena A (2013) Hallucinated humans as the hidden context for labeling 3d scenes. In Computer vision and pattern recognition (CVPR), 2013 I.E. conference on (pp. 2993–3000). IEEEGoogle Scholar
  18. 18.
    Jiang Y, Lim M, Saxena A (2012) Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.646Google Scholar
  19. 19.
    Jiang Y, Saxena A (2013) Discovering different types of topics: Factored topic models. In Proceedings of the twenty-third international joint conference on artificial intelligence (pp. 1429–1436). AAAI PressGoogle Scholar
  20. 20.
    Kahler O, Reid I (2013) Efficient 3d scene labeling using fields of trees. In Computer vision (ICCV), 2013 I.E. international conference on (pp. 3064–3071). IEEEGoogle Scholar
  21. 21.
    Knoop S, Vacek S, Dillmann R (2006) Sensor fusion for 3D human body tracking with an articulated 3D body model. In Robotics and automation, 2006. ICRA 2006. Proceedings 2006 I.E. international conference on (pp. 1686–1691). IEEEGoogle Scholar
  22. 22.
    Koppula HS, Anand A, Joachims T, Saxena A (2011) Semantic labeling of 3d point clouds for indoor scenes. In Advances in neural information processing systems (pp. 244–252)Google Scholar
  23. 23.
    Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. Int J Robot Res 32(8):951–970CrossRefGoogle Scholar
  24. 24.
    Kroemer KH (2001) Office ergonomics. CRC Press, Boca RatonGoogle Scholar
  25. 25.
    Lis AM, Black KM, Korn H, Nordin M (2007) Association between sitting and occupational LBP. Eur Spine J 16(2):283–298CrossRefGoogle Scholar
  26. 26.
    Meyer J, Arnrich B, Schumm J, Tröster G (2010) Design and modeling of a textile pressure sensor for sitting posture classification. Sens J IEEE 10(8):1391–1398CrossRefGoogle Scholar
  27. 27.
    Microsoft Corp. Redmond WA. Kinect for Xbox 360Google Scholar
  28. 28.
    Nie L, Akbari M, Li T, Chua TS (2014) A joint local–global approach for medical terminology assignment. In Medical information retrieval workshop at SIGIR 2014 (p. 24)Google Scholar
  29. 29.
    Nie L, Li T, Akbari M, Shen J, Chua TS (2014) Wenzher: comprehensive vertical search for healthcare domain. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 1245–1246). ACMGoogle Scholar
  30. 30.
    Nie L, Wang M, Zhang L, Yan S, Bo Z, Chua TS (2014) Disease inference from health-related questions via sparse deep learning. Knowl Data Eng IEEE Trans 27(8):2107–2119Google Scholar
  31. 31.
    Nie L, Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond doctors: future health prediction from multimedia and multimodal observations. In Proceedings of the 23rd annual ACM conference on multimedia conference (pp. 591–600). ACMGoogle Scholar
  32. 32.
    Nie L, Zhao YL, Akbari M, Shen J, Chua TS (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. Knowl Data Eng IEEE Trans 27(2):396–409CrossRefGoogle Scholar
  33. 33.
    Okunribido OO, Magnusson M, Pope MH (2006) Low back pain in drivers: the relative role of whole-body vibration, posture and manual materials handling. J Sound Vib 298(3):540–555CrossRefGoogle Scholar
  34. 34.
    Park MY, Kim JY, Shin JH (2000) Ergonomic design and evaluation of a new VDT workstation chair with keyboard–mouse support. Int J Ind Ergon 26(5):537–548CrossRefGoogle Scholar
  35. 35.
    Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In Robotics and automation (ICRA), 2010 I.E. international conference on (pp. 3108–3113). IEEEGoogle Scholar
  36. 36.
    Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: features and algorithms. In Computer vision and pattern recognition (CVPR), 2012 I.E. conference on (pp. 2759–2766). IEEEGoogle Scholar
  37. 37.
    Rubenowitz S (1997) Survey and intervention of ergonomic problems at the workplace. Int J Ind Ergon 19(4):271–275CrossRefGoogle Scholar
  38. 38.
    Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173CrossRefGoogle Scholar
  39. 39.
    Salvendy G (2001) Handbook of industrial engineering: technology and operations management. John Wiley & Sons, HobokenCrossRefGoogle Scholar
  40. 40.
    Sanders MS, McCormick EJ (1987) Human factors in engineering and design. McGRAW-HILL book company, New YorkGoogle Scholar
  41. 41.
    Shikdar AA, Al-Kindi MA (2007) Office ergonomics: deficiencies in computer workstation design. Int J Occup Saf Ergon 13(2):215–223CrossRefGoogle Scholar
  42. 42.
    Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, … Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124Google Scholar
  43. 43.
    Siddiqui M, Medioni G (2010) Human pose estimation from a single view point, real-time range sensor. In Computer vision and pattern recognition workshops (CVPRW), 2010 I.E. computer society conference on (pp. 1–8). IEEEGoogle Scholar
  44. 44.
    Sotoyama M, Jonai H, Saito S, Villanueva MBG (1996) Analysis of ocular surface area for comfortable VDT workstation layout. Ergonomics 39(6):877–884CrossRefGoogle Scholar
  45. 45.
    Springer TJ (1982) VDT workstations: a comparative evaluation of alternatives. Appl Ergon 13(3):211–212CrossRefGoogle Scholar
  46. 46.
    Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from rgbd images. In Robotics and automation (ICRA), 2012 I.E. international conference on (pp. 842–849). IEEEGoogle Scholar
  47. 47.
    Vergara M, Page A, Sancho JL (2006) Analysis of lumbar flexion in sitting posture: location of lumbar vertebrae with relation to easily identifiable skin marks. Int J Ind Ergon 36(11):937–942CrossRefGoogle Scholar
  48. 48.
    Westgaard RH, Winkel J (1997) Ergonomic intervention research for improved musculoskeletal health: a critical review. Int J Ind Ergon 20(6):463–500CrossRefGoogle Scholar
  49. 49.
    Wong WY, Wong MS (2008) Detecting spinal posture change in sitting positions with tri-axial accelerometers. Gait Posture 27(1):168–171CrossRefGoogle Scholar
  50. 50.
    Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In Computer vision and pattern recognition workshops (CVPRW), 2012 I.E. computer society conference on (pp. 20–27). IEEEGoogle Scholar
  51. 51.
    Xu W, Huang MC, Amini N, He L, Sarrafzadeh M (2013) ecushion: a textile pressure sensor array design and calibration for sitting posture analysis. Sens J IEEE 13(10):3926–3934CrossRefGoogle Scholar
  52. 52.
    Ye M, Wang X, Yang R, Ren L, Pollefeys M (2011) Accurate 3d pose estimation from a single depth image. In Computer vision (ICCV), 2011 I.E. international conference on (pp. 731–738). IEEEGoogle Scholar
  53. 53.
    Zhang L, Gao Y, Ji R, Xia Y, Dai Q, Li X (2014) Actively learning human gaze shifting paths for semantics-aware photo cropping. Image Process IEEE Trans 23(5):2235–2245MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571CrossRefGoogle Scholar
  55. 55.
    Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014) Representative discovery of structure cues for weakly-supervised image segmentation. Multimedia IEEE Trans 16(2):470–479CrossRefGoogle Scholar
  56. 56.
    Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. Image Process IEEE Trans 22(12):5071–5084MathSciNetCrossRefGoogle Scholar
  57. 57.
    Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Trans Ind Electron 62(2):1309–1316CrossRefGoogle Scholar
  58. 58.
    Zhang L, Yang Y, Gao Y, Yu YT, Wang C, Li X (2014) A probabilistic associative model for segmenting weakly supervised images. Image Process IEEE Trans 23(9):4150–4159MathSciNetCrossRefGoogle Scholar
  59. 59.
    Zhu Y, Fujimura K (2007) Constrained optimization for human pose estimation from depth sequences. In Computer vision–ACCV 2007 (pp. 408–418). Springer, BerlinGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina
  2. 2.College of Mathematics & Information ScienceWenzhou UniversityWenzhouChina

Personalised recommendations