Developmental Learning for User Activities

  • Xiao Huang
  • Juyang Weng
  • Zhengyou Zhang


This chapter presents a brain-inspired developmental learning system. A personal computer lives with the human user as long as the power is on. It can develop and report some activities of the user like a shadow machine, a virtual machine that runs in the background while the human user is doing its regular activities, on the computer or off the computer. The goal of the teacher of this shadow machine is to enable it to observe human usersʼ status, recognize usersʼ activities, and provide the taught actions as desired reports. Both visual and acoustic contexts are used by this shadow machine to infer the userʼs activities (e.g., in an office). A major challenge is that the system must be applicable to open domains – without a handcrafted environmental model. That is, there is no handcrafted constraint on office lighting, size, setting, nor requirements of the use of a head-mounted close-talk microphone. A room microphone sits somewhere near the computer. The distance between the sound sources and the microphone varies significantly. This system is designed to respond to its sensory inputs. A more challenging issue is to make the system adapt to different users and different environments. Instead of building all the world knowledge in advance (which is intractable), the systemʼs adaptive capability enables it to learn sensorimotor association (which is tractable). The real-time prototype system has been tested in different office environments.


Hide Markov Model Recognition Rate Acoustic Signal Sound Source Activity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Bayesian belief network


fast Fourier transformation


frames per second


Gram–Schmidt orthogonalization


hidden Markov model


inverse discrete Fourier transform


incremental hierarchical discriminant regression


linear discriminant analysis


layered hidden Markov model


Markov decision process


mel-frequency cepstral coefficient


observation-driven Markov decision process


partially observable MDP


logistic regression


  1. 58.1.
    J. Piaget: The Construction of Reality in the Child (Basic Books, New York 1954)CrossRefGoogle Scholar
  2. 58.2.
    J. Elman, E.A. Bates, M.H. Johnson, A. Karmiloff-Smith, D. Parisi, K. Plunkett: Rethinking Innateness: A Connectionist Perspective On Development (MIT, Cambridge 1997)Google Scholar
  3. 58.3.
    J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, E. Thelen: Autonomous mental development by robots and animals, Science 291(5504), 599–600 (2001)CrossRefGoogle Scholar
  4. 58.4.
    J. Weng, N. Ahuja, T.S. Huang: Learning recognition and segmentation of 3-D objects from 2-D images, Proc. IEEE 4th Int. Conf. Comput. Vis. (Michigan State Univ., East Lansing 1993) pp. 121–128Google Scholar
  5. 58.5.
    J. Weng, N. Ahuja, T.S. Huang: Learning recognition using the Cresceptron, Int. J. Comput. Vis. 25(2), 109–143 (1997)CrossRefGoogle Scholar
  6. 58.6.
    T. Moran, P. Dourish: Introduction to this special issue on context-aware computing, Hum. Comput. Interact. 16, 87–95 (2001)CrossRefGoogle Scholar
  7. 58.7.
    S. Shafer, B. Brumitt, J. Cadiz: Interaction issues in context-aware interactive environments, Hum. Comput. Interact. 16, 363–378 (2001)CrossRefGoogle Scholar
  8. 58.8.
    N. Oliver, E. Horvitz: Selective perception policies for guiding sensing and computation in multimodal systems: A comparative analysis, Proc. Int. Conf. Multimodal Interfaces (Vancouver 2003) pp. 3–8Google Scholar
  9. 58.9.
    N. Oliver, A. Pentland: Driver behavior recognition and prediction in a smartcar, Proc. SPIE Aerosense2000 `Enhanc. Synth. Vis.ʼ (Orlando, Florida 2000)Google Scholar
  10. 58.10.
    K. Torkkola, N. Massey, B. Leivian, C. Wood, J. Summers, S. Kundalkar: Classification of critical driving events, Proc. Int. Conf. Mach. Learn. Appl. (ICMLA) (Los Angeles, CA, USA 2003) pp. 81–85Google Scholar
  11. 58.11.
    F. Sparacino, A. Pentland, G. Davenport: Wearable performance, 1st Int. Symp. Wearable Comput. (Cambridge 1997)Google Scholar
  12. 58.12.
    J.K. Aggarwal, Q. Cai: Human motion analysis: A review, Comput. Vis. Image Underst. 73(3), 428–440 (1999)CrossRefGoogle Scholar
  13. 58.13.
    J. Yamato, J. Ohya, K. Ishii: Recognizing human action in time-seqential images using hidden Markov model, Proc. Int. Conf. Comput. Vis. Pattern Recognit. (NTT Hum. Interface Labs, Yokosuka 1992) pp. 379–385Google Scholar
  14. 58.14.
    A. Galata, N. Johnson, D. Hogg: Learning variable length Markov models of behaviour, Comput. Vis. Image Underst. 81(3), 398–413 (2001)CrossRefzbMATHGoogle Scholar
  15. 58.15.
    M. Brand, N. Oliver, A. Pentland: Coupled hidden Markov models for modeling interacting processes, Proc. Int. Conf. Comput. Vis. Pattern Recognit. (1996) pp. 994–999Google Scholar
  16. 58.16.
    M. Brand, V. Kettnaker: Discovery and segmentaion of activities in video, IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 844–851 (2000)CrossRefGoogle Scholar
  17. 58.17.
    Y. Ivanov, A. Bobick: Recognition of visual activities and interactions by stochastic parsing, IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)CrossRefGoogle Scholar
  18. 58.18.
    H. Buxton, S. Gong: Advanced Visual Surveillance using Bayesian Networks, Proc. Int. Conf. Comput. Vis. (Cambridge 1995) pp. 111–123, JuneGoogle Scholar
  19. 58.19.
    A. Madabhushi, J. Aggarwal: A Bayesian approach to human activity recognition, Proc. 2nd Int. Workshops Vis. Surveill. (Washington D.C. 1999) pp. 25–30Google Scholar
  20. 58.20.
    S. Bengio: An asynchronous hidden Markov model for audio-visual speech recognition, Proc. Adv. Neural Inf. Process. Syst. (2003) pp. 1213–1220Google Scholar
  21. 58.21.
    B. Clarkson, A. Pentland: Unsupervised clustering of ambulatory audio and video, Int. Jt. Conf. Acoust., Speech Signal Proces., ICASSPʼ99 (1999) pp. 3037–3040Google Scholar
  22. 58.22.
    N. Oliver, E. Horvitz, A. Garg: Layered representation for human activity recognition, Proc. Int. Conf. Multimodal Interfaces (2002) pp. 3–8Google Scholar
  23. 58.23.
    J. Zacks, B. Tersky: Event structure in perception and cognition, Psychol. Bull. 127(1), 3–21 (2001)CrossRefGoogle Scholar
  24. 58.24.
    J. Deller, J. Proakis, J. Hansen: Discrete-Time Processing of Speech Signals (Inst. Electr. Electron. Eng., New York 2000)Google Scholar
  25. 58.25.
    J. Weng: On developmental mental architectures, Neurocomputing 70(13–15), 2303–2323 (2007)CrossRefGoogle Scholar
  26. 58.26.
    M.L. Puterman: Markov Decision Processes (Wiley, New York 1994)CrossRefzbMATHGoogle Scholar
  27. 58.27.
    L. Kaelbling, M. Littman, A. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
  28. 58.28.
    L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  29. 58.29.
    D.R. Cox: Statistical analysis of time series: Some recent developments, Scand. J. Stat. 8(2), 93–115 (1981)MathSciNetzbMATHGoogle Scholar
  30. 58.30.
    J. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo, CA 1993)Google Scholar
  31. 58.31.
    L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Chapman Hall, New York 1993)zbMATHGoogle Scholar
  32. 58.32.
    W. Hwang, J. Weng: Hierarchical discriminant regression, IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1277–1293 (2000)CrossRefGoogle Scholar
  33. 58.33.
    X. Huang, J. Weng: Locally balanced incremental hierarchical discriminant regression, 4th Int. Conf. Intell. Data Eng. Autom. Learn. (Hong Kong 2003)Google Scholar
  34. 58.34.
    R. Duta, P. Hart, D. Stork: Pattern Classification, 2nd edn. (Wiley, New York 2001)Google Scholar
  35. 58.35.
    W. Hwang, J. Weng: An online training and online testing algorithm for OCR and image orientation classification using hierarchical discriminant regression, Proc. 4th IAPR Int. Workshop Document Anal. Syst. (Rio De Janeiro, Brazil 2000)Google Scholar
  36. 58.36.
    W. Pratt: Digital Image Processing (John Wiley, New York 1991)zbMATHGoogle Scholar
  37. 58.37.
    X. Huang, J. Weng, Z. Zhang: Office presence detection using multimodal context information, Proc. Int. Conf. Acoust., Speech Signal Proces. (ICASSP 2004) (Montreal, Quebec, Canada, USA 2004)Google Scholar
  38. 58.38.
    J. Weng: Why have we passed ``neural networks do not abstract wellʼʼ?, Nat. Intell.: INNS Mag. 1(1), 13–22 (2011)Google Scholar
  39. 58.39.
    M. Luciw, J. Weng: Where what network 3: Developmental top-down attention with multiple meaningful foregrounds, Proc. IEEE Int. Jt. Conf. Neural Netw. (Barcelona, Spain 2010) pp. 4233–4240Google Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.BellevueUSA
  2. 2.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA
  3. 3.Microsoft ResearchMicrosoftRedmondUSA

Personalised recommendations