Approximate Dynamic Programming Applied to UAV Perimeter Patrol
Abstract
One encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for large scale controlled Markov chains. In this chapter, we consider a base perimeter patrol stochastic control problem. To determine the optimal control policy, one has to solve a Markov decision problem,whose large size renders exact dynamic programming methods intractable. So, we propose a state aggregation based approximate linear programming method to construct provably good sub-optimal policies instead. The state-space is partitioned and the optimal cost-to-go or value function is approximated by a constant over each partition. By minimizing a non-negative cost function defined on the partitions, one can construct an approximate value function which also happens to be an upper bound for the optimal value function of the original Markov chain. As a general result, we show that this approximate value function is independent of the non-negative cost function (or state dependent weights; as it is referred to in the literature) and moreover, this is the least upper bound that one can obtain, given the partitions. Furthermore,we show that the restricted system of linear inequalities also embeds a family of Markov chains of lower dimension, one of which can be used to construct a tight lower bound on the optimal value function. In general, the construction of the lower bound requires the solution to a combinatorial problem. But the perimeter patrol problem exhibits a special structure that enables tractable linear programming formulations for both the upper and lower bounds. We demonstrate this and also provide numerical results that corroborate the efficacy of the proposed methodology.
Keywords
Optimal Policy Unmanned Aerial Vehicle Stochastic Dynamic Program Approximate Dynamic Program Service DelayPreview
Unable to display preview. Download preview PDF.
References
- 1.Altman, E., Gaujal, B., Hordijk, A., Koole, G.: Optimal admission, routing and service assignment control: the case of single buffer queues. In: Proc. 37th IEEE Conf. Decision and Control, Tampa, pp. 2119–2124 (1998)Google Scholar
- 2.Axsäter, S.: State aggregation in dynamic programming: An application to scheduling of independent jobs on parallel processors. Oper. Res. Letters 2, 171–176 (1983)MATHCrossRefGoogle Scholar
- 3.Bean, J.C., Birge, J.R., Smith, R.L.: Aggregation in dynamic programming. Oper. Res. 35, 215–220 (1987)MathSciNetMATHCrossRefGoogle Scholar
- 4.Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)MATHGoogle Scholar
- 5.Bertsimas, D., van Ryzin, G.: The dynamic traveling repairman problem. MIT Sloan School Working Paper No. 3036-89-MS (1989), http://dspace.mit.edu/bitstream/handle/1721.1/2256/SWP-3036-20441350.pdf
- 6.Browne, S., Yechiali, U.: Dynamic scheduling in single-server multiclass service systems with unit buffers. Naval Research Logistics 38, 383–396 (1991)MATHCrossRefGoogle Scholar
- 7.Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley-Interscience (2006)Google Scholar
- 8.Darbha, S., Krishnamoorthy, K., Pachter, M., Chandler, P.: State aggregation based linear programming approach to approximate dynamic programming. In: Proc. IEEE Conf. Decision and Control, Atlanta, GA, pp. 935–941 (2010)Google Scholar
- 9.De Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Oper. Res., 850–865 (2003)Google Scholar
- 10.Denardo, E.V.: On linear programming in a Markov decision problem. Management Sci. 16(5), 282–288 (1970)MathSciNetCrossRefGoogle Scholar
- 11.d’Epenoux, F.: A probabilistic production and inventory problem. Management Sci. 10(1), 98–108 (1963)CrossRefGoogle Scholar
- 12.Gordon, G.: Approximate solutions to Markov decision processes. Ph.D. thesis, Carnegie Mellon University, Pittsburg, PA (1999)Google Scholar
- 13.Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Programming 51, 141–202 (1991)MathSciNetMATHCrossRefGoogle Scholar
- 14.Grötschel, M., Lovász, L., Schijver, A.: The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1(2), 169–197 (1981)MathSciNetMATHCrossRefGoogle Scholar
- 15.Harel, A., Stulman, A.: Polling, greedy and horizon servers on a circle. Oper. Res. 43(1), 177–186 (1995)MATHCrossRefGoogle Scholar
- 16.Hordijk, A., Kallenberg, L.C.M.: Linear programming and Markov decision chains. Management Sci. 25(4), 352–362 (1979)MathSciNetMATHCrossRefGoogle Scholar
- 17.Howard, R.A.: Dynamic Programming and Markov Processes. The MIT Press, Cambridge (1960)MATHGoogle Scholar
- 18.Kish, B., Pachter, M., Jacques, D.: Effectiveness Measures for Operations in Uncertain Environments. In: UAV Cooperative Decision and Control: Challenges and Practical Approaches, pp. 103–124. SIAM (2009)Google Scholar
- 19.Kleinrock, L.: Queueing Systems. In: Queueing Sytems. Theory, vol. I. Wiley (1975)Google Scholar
- 20.Krishnamoorthy, K., Pachter, M., Chandler, P.: Maximizing the throughput of a patrolling UAV by dynamic programming. In: Proc. IEEE Multi-Systems Conf., pp. 916–920. Denver, CO (2011)Google Scholar
- 21.Krishnamoorthy, K., Pachter, M., Chandler, P., Casbeer, D., Darbha, S.: UAV perimeter patrol operations optimization using efficient dynamic programming. In: Proc. American Control Conf., San Fransisco, CA, pp. 462–467 (2011)Google Scholar
- 22.Krishnamoorthy, K., Pachter, M., Chandler, P., Darbha, S.: Optimization of perimeter patrol operations using Unmanned Aerial Vehicles. AIAA J. Guidance, Control and Dynamics 35(2), 434–441 (2012), doi:10.2514/1.54720CrossRefGoogle Scholar
- 23.Krishnamoorthy, K., Pachter, M., Darbha, S., Chandler, P.: Approximate dynamic programming with state aggregation applied to UAV perimeter patrol. Internat. J. Robust and Nonlinear Control 21, 1396–1409 (2011)MathSciNetMATHCrossRefGoogle Scholar
- 24.Krishnamoorthy, K., Park, M., Pachter, M., Chandler, P., Darbha, S.: Bounding procedure for stochastic dynamic programs with application to the perimeter patrol problem. In: Proc. American Control Conf., Montreal, QC, pp. 5874–5880 (2012)Google Scholar
- 25.Levy, H., Sidi, M.: Polling systems: Applications, modeling and optimization. IEEE Trans. Communications 38(10), 1750–1760 (1990)CrossRefGoogle Scholar
- 26.Mack, C., Murphy, T., Webb, N.L.: The efficiency of N machines uni-directionally patrolled by one operative when walking time and repair times are constants. J. Royal Statistical Society, Ser. B 19(1), 166–172 (1957)MathSciNetMATHGoogle Scholar
- 27.MacQueen, J.B.: A Modified Dynamic Programming Method for Markovian Decision Problems. J. Math. Anal. and Appl. 14, 38–43 (1966)MathSciNetMATHCrossRefGoogle Scholar
- 28.Manne, A.S.: Linear programming and sequential decisions. Management Sci. 6(3), 259–267 (1960)MathSciNetMATHCrossRefGoogle Scholar
- 29.Mendelssohn, R.: Improved bounds for aggregated linear programs. Oper. Res. 28(6), 1450–1453 (1980)MathSciNetMATHCrossRefGoogle Scholar
- 30.Mendelssohn, R.: An iterative aggregation procedure for Markov decision processes. Oper. Res. 30(1), 62–73 (1982)MathSciNetMATHCrossRefGoogle Scholar
- 31.Morrison, J.R., Kumar, P.R.: New linear program performance bounds for queueing networks. J. Optim. Theory and Appl. 100(3), 575–597 (1999)MathSciNetMATHCrossRefGoogle Scholar
- 32.Park, M., Krishnamoorthy, K., Darbha, S., Pachter, M., Chandler, P.: State aggregation based linear program for stochastic dynamic programs: an invariance property. Oper. Res. Letters (2012), doi:10.1016/j.orl.2012.08.006Google Scholar
- 33.Porteus, E.L.: Bounds and transformations for discounted finite Markov decision chains. Oper. Res. 23(4), 761–784 (1975)MathSciNetMATHCrossRefGoogle Scholar
- 34.Schuurmans, D., Patrascu, R.: Direct value-approximation for factored MDPs. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1579–1586. MIT Press, Cambridge (2001)Google Scholar
- 35.Schweitzer, P.J., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. and Appl. 110(2), 568–582 (1985)MathSciNetMATHCrossRefGoogle Scholar
- 36.Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems: Introduction. Wiley Series in Probability and Statistics. Wiley-Interscience (1999)Google Scholar
- 37.Takagi, H.: Queueing analysis of polling models. ACM Computing Surveys 20(1), 5–28 (1988)MATHCrossRefGoogle Scholar
- 38.Takagi, H.: Queueing analysis of polling models: progress in 1990-94. In: Frontiers in Queueing: Models and Applications in Science and Engineering, pp. 119–146. CRC Press (1997)Google Scholar
- 39.Takagi, H.: Analysis and Application of Polling Models. In: Reiser, M., Haring, G., Lindemann, C. (eds.) Performance Evaluation. LNCS, vol. 1769, pp. 423–442. Springer, Heidelberg (2000)CrossRefGoogle Scholar
- 40.Trick, M., Zin, S.: A linear programming approach to solving stochastic dynamic programs (1993)Google Scholar
- 41.Trick, M., Zin, S.: Spline approximation to value functions: A linear programming approach. Macroeconomic Dynamics 1, 255–277 (1997)MATHCrossRefGoogle Scholar
- 42.Van Roy, B.: Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)MathSciNetMATHCrossRefGoogle Scholar