Approximate Dynamic Programming Applied to UAV Perimeter Patrol

  • K. Krishnamoorthy
  • M. Park
  • S. Darbha
  • M. Pachter
  • P. Chandler
Part of the Lecture Notes in Control and Information Sciences book series (LNCIS, volume 444)

Abstract

One encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for large scale controlled Markov chains. In this chapter, we consider a base perimeter patrol stochastic control problem. To determine the optimal control policy, one has to solve a Markov decision problem,whose large size renders exact dynamic programming methods intractable. So, we propose a state aggregation based approximate linear programming method to construct provably good sub-optimal policies instead. The state-space is partitioned and the optimal cost-to-go or value function is approximated by a constant over each partition. By minimizing a non-negative cost function defined on the partitions, one can construct an approximate value function which also happens to be an upper bound for the optimal value function of the original Markov chain. As a general result, we show that this approximate value function is independent of the non-negative cost function (or state dependent weights; as it is referred to in the literature) and moreover, this is the least upper bound that one can obtain, given the partitions. Furthermore,we show that the restricted system of linear inequalities also embeds a family of Markov chains of lower dimension, one of which can be used to construct a tight lower bound on the optimal value function. In general, the construction of the lower bound requires the solution to a combinatorial problem. But the perimeter patrol problem exhibits a special structure that enables tractable linear programming formulations for both the upper and lower bounds. We demonstrate this and also provide numerical results that corroborate the efficacy of the proposed methodology.

Keywords

Optimal Policy Unmanned Aerial Vehicle Stochastic Dynamic Program Approximate Dynamic Program Service Delay 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altman, E., Gaujal, B., Hordijk, A., Koole, G.: Optimal admission, routing and service assignment control: the case of single buffer queues. In: Proc. 37th IEEE Conf. Decision and Control, Tampa, pp. 2119–2124 (1998)Google Scholar
  2. 2.
    Axsäter, S.: State aggregation in dynamic programming: An application to scheduling of independent jobs on parallel processors. Oper. Res. Letters 2, 171–176 (1983)MATHCrossRefGoogle Scholar
  3. 3.
    Bean, J.C., Birge, J.R., Smith, R.L.: Aggregation in dynamic programming. Oper. Res. 35, 215–220 (1987)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)MATHGoogle Scholar
  5. 5.
    Bertsimas, D., van Ryzin, G.: The dynamic traveling repairman problem. MIT Sloan School Working Paper No. 3036-89-MS (1989), http://dspace.mit.edu/bitstream/handle/1721.1/2256/SWP-3036-20441350.pdf
  6. 6.
    Browne, S., Yechiali, U.: Dynamic scheduling in single-server multiclass service systems with unit buffers. Naval Research Logistics 38, 383–396 (1991)MATHCrossRefGoogle Scholar
  7. 7.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley-Interscience (2006)Google Scholar
  8. 8.
    Darbha, S., Krishnamoorthy, K., Pachter, M., Chandler, P.: State aggregation based linear programming approach to approximate dynamic programming. In: Proc. IEEE Conf. Decision and Control, Atlanta, GA, pp. 935–941 (2010)Google Scholar
  9. 9.
    De Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Oper. Res., 850–865 (2003)Google Scholar
  10. 10.
    Denardo, E.V.: On linear programming in a Markov decision problem. Management Sci. 16(5), 282–288 (1970)MathSciNetCrossRefGoogle Scholar
  11. 11.
    d’Epenoux, F.: A probabilistic production and inventory problem. Management Sci. 10(1), 98–108 (1963)CrossRefGoogle Scholar
  12. 12.
    Gordon, G.: Approximate solutions to Markov decision processes. Ph.D. thesis, Carnegie Mellon University, Pittsburg, PA (1999)Google Scholar
  13. 13.
    Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Programming 51, 141–202 (1991)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Grötschel, M., Lovász, L., Schijver, A.: The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1(2), 169–197 (1981)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Harel, A., Stulman, A.: Polling, greedy and horizon servers on a circle. Oper. Res. 43(1), 177–186 (1995)MATHCrossRefGoogle Scholar
  16. 16.
    Hordijk, A., Kallenberg, L.C.M.: Linear programming and Markov decision chains. Management Sci. 25(4), 352–362 (1979)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Howard, R.A.: Dynamic Programming and Markov Processes. The MIT Press, Cambridge (1960)MATHGoogle Scholar
  18. 18.
    Kish, B., Pachter, M., Jacques, D.: Effectiveness Measures for Operations in Uncertain Environments. In: UAV Cooperative Decision and Control: Challenges and Practical Approaches, pp. 103–124. SIAM (2009)Google Scholar
  19. 19.
    Kleinrock, L.: Queueing Systems. In: Queueing Sytems. Theory, vol. I. Wiley (1975)Google Scholar
  20. 20.
    Krishnamoorthy, K., Pachter, M., Chandler, P.: Maximizing the throughput of a patrolling UAV by dynamic programming. In: Proc. IEEE Multi-Systems Conf., pp. 916–920. Denver, CO (2011)Google Scholar
  21. 21.
    Krishnamoorthy, K., Pachter, M., Chandler, P., Casbeer, D., Darbha, S.: UAV perimeter patrol operations optimization using efficient dynamic programming. In: Proc. American Control Conf., San Fransisco, CA, pp. 462–467 (2011)Google Scholar
  22. 22.
    Krishnamoorthy, K., Pachter, M., Chandler, P., Darbha, S.: Optimization of perimeter patrol operations using Unmanned Aerial Vehicles. AIAA J. Guidance, Control and Dynamics 35(2), 434–441 (2012), doi:10.2514/1.54720CrossRefGoogle Scholar
  23. 23.
    Krishnamoorthy, K., Pachter, M., Darbha, S., Chandler, P.: Approximate dynamic programming with state aggregation applied to UAV perimeter patrol. Internat. J. Robust and Nonlinear Control 21, 1396–1409 (2011)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Krishnamoorthy, K., Park, M., Pachter, M., Chandler, P., Darbha, S.: Bounding procedure for stochastic dynamic programs with application to the perimeter patrol problem. In: Proc. American Control Conf., Montreal, QC, pp. 5874–5880 (2012)Google Scholar
  25. 25.
    Levy, H., Sidi, M.: Polling systems: Applications, modeling and optimization. IEEE Trans. Communications 38(10), 1750–1760 (1990)CrossRefGoogle Scholar
  26. 26.
    Mack, C., Murphy, T., Webb, N.L.: The efficiency of N machines uni-directionally patrolled by one operative when walking time and repair times are constants. J. Royal Statistical Society, Ser. B 19(1), 166–172 (1957)MathSciNetMATHGoogle Scholar
  27. 27.
    MacQueen, J.B.: A Modified Dynamic Programming Method for Markovian Decision Problems. J. Math. Anal. and Appl. 14, 38–43 (1966)MathSciNetMATHCrossRefGoogle Scholar
  28. 28.
    Manne, A.S.: Linear programming and sequential decisions. Management Sci. 6(3), 259–267 (1960)MathSciNetMATHCrossRefGoogle Scholar
  29. 29.
    Mendelssohn, R.: Improved bounds for aggregated linear programs. Oper. Res. 28(6), 1450–1453 (1980)MathSciNetMATHCrossRefGoogle Scholar
  30. 30.
    Mendelssohn, R.: An iterative aggregation procedure for Markov decision processes. Oper. Res. 30(1), 62–73 (1982)MathSciNetMATHCrossRefGoogle Scholar
  31. 31.
    Morrison, J.R., Kumar, P.R.: New linear program performance bounds for queueing networks. J. Optim. Theory and Appl. 100(3), 575–597 (1999)MathSciNetMATHCrossRefGoogle Scholar
  32. 32.
    Park, M., Krishnamoorthy, K., Darbha, S., Pachter, M., Chandler, P.: State aggregation based linear program for stochastic dynamic programs: an invariance property. Oper. Res. Letters (2012), doi:10.1016/j.orl.2012.08.006Google Scholar
  33. 33.
    Porteus, E.L.: Bounds and transformations for discounted finite Markov decision chains. Oper. Res. 23(4), 761–784 (1975)MathSciNetMATHCrossRefGoogle Scholar
  34. 34.
    Schuurmans, D., Patrascu, R.: Direct value-approximation for factored MDPs. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1579–1586. MIT Press, Cambridge (2001)Google Scholar
  35. 35.
    Schweitzer, P.J., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. and Appl. 110(2), 568–582 (1985)MathSciNetMATHCrossRefGoogle Scholar
  36. 36.
    Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems: Introduction. Wiley Series in Probability and Statistics. Wiley-Interscience (1999)Google Scholar
  37. 37.
    Takagi, H.: Queueing analysis of polling models. ACM Computing Surveys 20(1), 5–28 (1988)MATHCrossRefGoogle Scholar
  38. 38.
    Takagi, H.: Queueing analysis of polling models: progress in 1990-94. In: Frontiers in Queueing: Models and Applications in Science and Engineering, pp. 119–146. CRC Press (1997)Google Scholar
  39. 39.
    Takagi, H.: Analysis and Application of Polling Models. In: Reiser, M., Haring, G., Lindemann, C. (eds.) Performance Evaluation. LNCS, vol. 1769, pp. 423–442. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  40. 40.
    Trick, M., Zin, S.: A linear programming approach to solving stochastic dynamic programs (1993)Google Scholar
  41. 41.
    Trick, M., Zin, S.: Spline approximation to value functions: A linear programming approach. Macroeconomic Dynamics 1, 255–277 (1997)MATHCrossRefGoogle Scholar
  42. 42.
    Van Roy, B.: Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • K. Krishnamoorthy
    • 1
  • M. Park
    • 2
  • S. Darbha
    • 2
  • M. Pachter
    • 3
  • P. Chandler
    • 4
  1. 1.Infoscitex CorporationDaytonUSA
  2. 2.Texas A & M UniversityCollege StationUSA
  3. 3.Air Force Institute of TechnologyWright-Patterson A.F.B.DaytonUSA
  4. 4.Air Force Research LaboratoryWright-Patterson A.F.B.DaytonUSA

Personalised recommendations