Abstract
In recent years, automatic human action recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time, embedded vision solution for human action recognition, implemented on an FPGA-based ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human action recognition system with simple motion features and a linear support vector machine classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template class of approaches, which include “Motion History Image” based techniques. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfigured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human action recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is operating reliably at 12 frames/s, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man–machine communications and intelligent environments.
Similar content being viewed by others
References
Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999). doi:http://dx.doi.org/10.1006/cviu.1998.0744
Aizerman, A., Braverman, E.M., Rozoner, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)
Amadeus.: Use—ubiquitous system explorer (fpga development platform). http://www.cs.york.ac.uk/amadeus/projects/centre-use/ (2004)
ARC.: Products and solutions: arc configurable cpu/dsp cores. http://www.arc.com/configurablecores/ (2007)
ARM.: Processor overview. http://www.arm.com/products/CPUs/ (2007)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Bradski, G.R., Davis, J.W.: Motion segmentation and pose recognition with motion history gradients. Mach. Vis. Appl. 13(3), 174–184 (2002)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV, vol. 2, pp. 428–441 (2006)
Davis, J.W.: Hierarchical motion history images for recognizing human motion. In: IEEE Workshop on Detection and Recognition of Events in Video, pp. 39–46 (2001)
Farnell, B.: Moving bodies, acting selves. Annu. Rev. Anthropol. 28, 341–373 (1999)
Freeman, M.: Evaluating dataflow and pipelined vector processing architectures for FPGA co-processors. In: IEEE 9th Euromicro Conference on Digital System Design, Croatia (2006)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning. MIT-Press, USA. http://www.svmlight.joachims.org/, oikonomopoulos (1999)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, Beijing, China, October 15-21, 2005, pp. 166–173 (2005)
Kodak.: Kodak kac-9628 image sensor 648(h) x 488(v) color cmos image sensor. http://www.kodak.com/ezpres/business/ccd/global/plugins/acrobat/en/productsummary/CMOS/KAC-9628ProductSummaryv2.0.pdf (2006)
Meng, H., Pears, N., Bailey, C.: Recognizing human actions based on motion information and SVM. In: 2nd IET International Conference on Intelligent Environments, IET, Athens, Greece, pp. 239–245 (2006)
Meng, H., Pears, N., Bailey, C.: A human action recognition system for embedded computer vision application. In: The 3rd IEEE Workshop on Embeded Computer Vision, Minneapolis, USA (2007a)
Meng, H., Pears, N., Bailey, C.: Motion information combination for fast human action recognition. In: 2nd International Conference on Computer Vision Theory and Applications (VISAPP07), Barcelona, Spain (2007b)
MIPS (2007) Architectures. http://www.mips.com/products/architectures/
Moeslund, T., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 103(2–3), 90–126 (2006)
Ogata, T., Tan, J.K., Ishikawa, S.: High-speed human motion recognition based on a motion history image and an eigenspace. IEICE Trans. Inf. Syst. E89(1), 281–289 (2006)
Oikonomopoulos, A., Patras, I., Pantic, M.: Kernel-based recognition of human actions using spatiotemporal salient points. In: Proceedings of CVPR Workshop 06, vol. 3, pp. 151–156 (2006)
Pears, N.: Projects: Videoware—video processing architecture. http://www.cs.york.ac.uk/amadeus/videoware/ (2004)
Schmidt, A., Laerhoven, K.V.: How to build smart appliances. IEEE Personal Commun. 8(4), 66–71. http://www.citeseer.ist.psu.edu/schmidt01how.html (2001)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, Cambridge, UK (2004)
Silicore.: Wishbone system-on-chip (soc) interconnection architecture for portable ip cores. http://www.opencores.org/projects.cgi/web/wishbone/wbspec_b3.pdf (2002)
Tensilica.: Xtensa configurable processors—overview. http://www.tensilica.com/products/xtensa_overview.htm (2007)
Weinland, D., Ronfard, R., Boyer, E.: Motion history volumes for free viewpoint action recognition. In: IEEE International Workshop on Modeling People and Human Interaction (PHI’05). http://www.perception.inrialpes.fr/Publications/2005/WRB05 (2005)
Wejchert, J.: “The disappearing computer”, information document, ist call for proposals, european commission, future and emerging technologies. http://www.disappearing-computer.net/mission.html (2000)
Wong, S.F., Cipolla, R.: Real-time adaptive hand motion recognition using a sparse bayesian classifier. In: ICCV-HCI, pp. 170–179 (2005)
Wong, S.F., Cipolla, R.: Continuous gesture recognition using a sparse bayesian classifier. In: ICPR, vol. 1, pp. 1084–1087 (2006)
Xilinx.: Microblaze processor. http://www.xilinx.com/ipcenter/processor_central/picoblaze/picoblaze_user_resources.htm (2007a)
Xilinx.: Microblaze soft processor core. http://www.xilinx.com/xlnx/xebiz/designResources/ip_product_details.jsp?key=micro_blaze (2007b)
Xilinx.: Spartan-3 fpga family complete data sheet. http://www.direct.xilinx.com/bvdocs/publications/ds099.pdf (2007c)
Acknowledgments
The authors would like to thank DTI and Broadcom Ltd. for the financial support for this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meng, H., Freeman, M., Pears, N. et al. Real-time human action recognition on an embedded, reconfigurable video processing architecture. J Real-Time Image Proc 3, 163–176 (2008). https://doi.org/10.1007/s11554-008-0073-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-008-0073-1