Discrete Event Dynamic Systems

, Volume 19, Issue 3, pp 377–422 | Cite as

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

  • Edwin K. P. Chong
  • Christopher M. Kreucher
  • Alfred O. HeroIII
Article

Abstract

Adaptive sensing involves actively managing sensor resources to achieve a sensing task, such as object detection, classification, and tracking, and represents a promising direction for new applications of discrete event system methods. We describe an approach to adaptive sensing based on approximately solving a partially observable Markov decision process (POMDP) formulation of the problem. Such approximations are necessary because of the very large state space involved in practical adaptive sensing problems, precluding exact computation of optimal solutions. We review the theory of POMDPs and show how the theory applies to adaptive sensing problems. We then describe a variety of approximation methods, with examples to illustrate their application in adaptive sensing. The examples also demonstrate the gains that are possible from nonmyopic methods relative to myopic methods, and highlight some insights into the dependence of such gains on the sensing resources and environment.

Keywords

Markov decision process POMDP Sensing Tracking Scheduling 

References

  1. Altman E (1998) Constrained Markov decision processes. Chapman and Hall/CRC, LondonGoogle Scholar
  2. Bartels R, Backus S, Zeek E, Misoguti L, Vdovin G, Christov IP, Murnane MM, Kapteyn HC (2000) Shaped-pulse optimization of coherent soft X-rays. Nature 406:164–166CrossRefGoogle Scholar
  3. Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
  4. Bertsekas DP (2005) Dynamic programming and suboptimal control: a survey from ADP to MPC. In: Proc. joint 44th IEEE conf. on decision and control and European control conf., Seville, 12–15 December 2005Google Scholar
  5. Bertsekas DP (2007) Dynamic programming and optimal control, vol I, 3rd edn, 2005; vol II, 3rd edn. Athena Scientific, BelmontGoogle Scholar
  6. Bertsekas DP, Castanon DA (1999) Rollout algorithms for stochastic scheduling problems. Journal of Heuristics 5:89–108MATHCrossRefGoogle Scholar
  7. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATHGoogle Scholar
  8. Blatt D, Hero AO III (2006a) From weighted classification to policy search. In: Advances in neural information processing systems (NIPS) vol 18, pp 139–146Google Scholar
  9. Blatt D, Hero AO III (2006b) Optimal sensor scheduling via classification reduction of policy search (CROPS). In: Proc. int. conf. on automated planning and scheduling (ICAPS)Google Scholar
  10. Castanon D (1997) Approximate dynamic programming for sensor management. In: Proc. 36th IEEE conf. on decision and control, San Diego, pp 1202–1207Google Scholar
  11. Chang HS, Givan RL, Chong EKP (2004) Parallel rollout for online solution of partially observable Markov decision processes. Discret Event Dyn Syst 14(3):309–341MATHCrossRefMathSciNetGoogle Scholar
  12. Chang HS, Fu MC, Hu J, Marcus SI (2007) Simulation-based algorithms for Markov decision processes. Springer series in communications and control engineering. Springer, Berlin Heidelberg New YorkMATHGoogle Scholar
  13. Chen RC, Wagner K (2007) Constrained partially observed Markov decision processes for adaptive waveform scheduling. In: Proc. int. conf. on electromagnetics in advanced applications, Torino, 17–21 September 2007, pp 454–463Google Scholar
  14. Cheng HT (1988) Algorithms for partially observable Markov decision processes. PhD dissertation, University of British ColumbiaGoogle Scholar
  15. Chhetri A, Morrell D, Papandreou-Suppappola A (2004) Efficient search strategies for non-myopic sensor scheduling in target tracking. In: Asilomar conf. on signals, systems, and computersGoogle Scholar
  16. Chong EKP, Givan RL, Chang HS (2000) A framework for simulation-based network control via hindsight optimization. In: Proc. 39th IEEE conf. on decision and control, Sydney, 12–15 December 2000, pp 1433–1438Google Scholar
  17. de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51(6):850–865CrossRefMathSciNetGoogle Scholar
  18. de Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math Oper Res 29(3):462–478MATHCrossRefMathSciNetGoogle Scholar
  19. Gottlieb E, Harrigan R (2001) The Umbra simulation framework. Sandia Tech Report SAND2001-1533 (Unlimited Release)Google Scholar
  20. He Y, Chong EKP (2004) Sensor scheduling for target tracking in sensor networks. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004, pp 743–748Google Scholar
  21. He Y, Chong EKP (2006) Sensor scheduling for target tracking: a Monte Carlo sampling approach. Digit Signal Process 16(5):533–545CrossRefGoogle Scholar
  22. Hero A, Castanon D, Cochran D, Kastella K (eds) (2008) Foundations and applications of sensor management. Springer, Berlin Heidelberg New YorkGoogle Scholar
  23. Ji S, Parr R, Carin L (2007) Nonmyopic multiaspect sensing with partially observable Markov decision processes. IEEE Trans Signal Process 55(6):2720–2730 (Part 1)CrossRefGoogle Scholar
  24. Julier S, Uhlmann J (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422CrossRefGoogle Scholar
  25. Krakow LW, Li Y, Chong EKP, Groom KN, Harrington J, Rigdon B (2006) Control of perimeter surveillance wireless sensor networks via partially observable Markov decision process. In: Proc. 2006 IEEE int Carnahan conf on security technology (ICCST), Lexington, 17–20 October 2006Google Scholar
  26. Kearns MJ, Mansour Y, Ng AY (1999) A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. 16th int. joint conf. on artificial intelligence, pp 1324–1331Google Scholar
  27. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
  28. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134MATHCrossRefMathSciNetGoogle Scholar
  29. Kreucher CM, Hero A, Kastella K (2005a) A comparison of task driven and information driven sensor management for target tracking. In: Proc. 44th IEEE conf. on decision and control (CDC’05), 12–15 December 2005Google Scholar
  30. Kreucher CM, Kastella K, Hero AO III (2005b) Sensor management using an active sensing approach. Signal Process 85(3):607–624MATHCrossRefGoogle Scholar
  31. Kreucher CM, Kastella K, Hero AO III (2005c) Multitarget tracking using the joint multitarget probability density. IEEE Trans Aerosp Electron Syst 41(4):1396–1414CrossRefGoogle Scholar
  32. Kreucher CM, Blatt D, Hero AO III, Kastella K (2006) Adaptive multi-modality sensor scheduling for detection and tracking of smart targets. Digit Signal Process 16:546–567CrossRefGoogle Scholar
  33. Kreucher CM, Hero AO III, Kastella K, Chang D (2004) Efficient methods of non-myopic sensor management for multitarget tracking. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004Google Scholar
  34. Krishnamurthy V (2005) Emission management for low probability intercept sensors in network centric warfare. IEEE Trans Aerosp Electron Syst 41(1):133–151CrossRefGoogle Scholar
  35. Krishnamurthy V, Evans RJ (2001) Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans Signal Process 49(12):2893–2908CrossRefGoogle Scholar
  36. Li Y, Krakow LW, Chong EKP, Groom KN (2006) Dynamic sensor management for multisensor multitarget tracking. In: Proc. 40th annual conf. on information sciences and systems, Princeton, 22–24 March 2006, pp 1397–1402Google Scholar
  37. Li Y, Krakow LW, Chong EKP, Groom KN (2007) Approximate stochastic dynamic programming for sensor scheduling to track multiple targets. Digit Signal Process. doi:10.1016/j.dsp.2007.05.004 Google Scholar
  38. Lovejoy WS (1991a) Computationally feasible bounds for partially observed Markov decision processes. Oper Res 39:162–175MATHCrossRefMathSciNetGoogle Scholar
  39. Lovejoy WS (1991b) A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28(1):47–65MATHCrossRefMathSciNetGoogle Scholar
  40. Miller SA, Harris ZA, Chong EKP (2009) A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking. EURASIP J Appl Signal Process (Special Issue on Signal Processing Advances in Robots and Autonomy). doi:10.1155/2009/724597 Google Scholar
  41. Pontryagin LS, Boltyansky VG, Gamkrelidze RV, Mishchenko EF (1962) The mathematical theory of optimal processes. Wiley, New YorkMATHGoogle Scholar
  42. Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience, New YorkMATHCrossRefGoogle Scholar
  43. Ristic B, Arulampalam S, Gordon N (2004) Beyond the Kalman filter: particle filters for tracking applications. Artech House, NorwoodMATHGoogle Scholar
  44. Roy N, Gordon G, Thrun S (2005) Finding approximate POMDP solutions through belief compression. J Artif Intell Res 23:1–40MATHCrossRefGoogle Scholar
  45. Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516MATHCrossRefMathSciNetGoogle Scholar
  46. Scott WR Jr, Kim K, Larson GD, Gurbuz AC, McClellan JH (2004) Combined seismic, radar, and induction sensor for landmine detection. In: Proc. 2004 int. IEEE geoscience and remote sensing symposium, Anchorage, 20–24 September 2004, pp 1613–1616Google Scholar
  47. Shi L, Chen C-H (2000) A new algorithm for stochastic discrete resource allocation optimization. Discret Event Dyn Syst 10:271–294MATHCrossRefMathSciNetGoogle Scholar
  48. Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21(5):1071–1088MATHCrossRefGoogle Scholar
  49. Sutton RS, Barto AG (1998) Reinforcement learning. MIT, CambridgeGoogle Scholar
  50. Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT, CambridgeMATHGoogle Scholar
  51. Tijms HC (2003) A first course in stochastic models. Wiley, New YorkMATHCrossRefGoogle Scholar
  52. Washburn R, Schneider M, Fox J (2002) Stochastic dynamic programming based approaches to sensor resource management. In: 5th int conf on information fusionGoogle Scholar
  53. Watkins CJCH (1989) Learning from delayed rewards. PhD dissertation, King’s College, University of CambridgeGoogle Scholar
  54. Willems JC (1996) 1969: the birth of optimal control. In: Proc. 35th IEEE conf. on decision and control (CDC’96), pp 1586–1587Google Scholar
  55. Wu G, Chong EKP, Givan RL (2002) Burst-level congestion control using hindsight optimization. IEEE Trans Automat Control (Special Issue on Systems and Control Methods for Communication Networks) 47(6):979–991MathSciNetGoogle Scholar
  56. Yu H, Bertsekas DP (2004) Discretized approximations for POMDP with average cost. In: Proc. 20th conf. on uncertainty in artificial intelligence, Banff, pp 619–627Google Scholar
  57. Zhang NL, Liu W (1996) Planning in stochastic domains: problem characteristics and approximation. Tech. report HKUST-CS96-31, Dept. of Computer Science, Hong Kong University of Science and TechnologyGoogle Scholar
  58. Zhang Z, Moola S, Chong EKP (2008) Approximate stochastic dynamic programming for opportunistic fair scheduling in wireless networks. In: Proc. 47th IEEE conf. on decision and control, Cancun, 9–11 December 2008, pp 1404–1409Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Edwin K. P. Chong
    • 1
  • Christopher M. Kreucher
    • 2
  • Alfred O. HeroIII
    • 3
  1. 1.Colorado State UniversityFort CollinsUSA
  2. 2.Integrity Applications IncorporatedAnn ArborUSA
  3. 3.University of MichiganAnn ArborUSA

Personalised recommendations