Deep Reinforcement Learning for Multi-satellite Collection Scheduling

  • Jason T. Lam
  • François Rivest
  • Jean BergerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11934)


Multi-satellite scheduling often involves generating a fixed number of potential task schedules, evaluating them all, and selecting the path that yields the highest expected rewards. Unfortunately, this approach, however accurate, is nearly impossible to scale up and be applied to large realistic problems due to combinatorial explosion. Furthermore, re-generating solutions each time the tasks change is costly, inefficient and slow. To address these issues, we adapt a deep reinforcement learning solution that automatically learns a policy for multi-satellite scheduling, as well as a representation for the problems. The algorithm learns a heuristic that selects the next best task given the current problem and partial solution, avoiding any search in the creation of the schedule. Although preliminary results in learning a collection satellite scheduling heuristic still fail to outperform baseline domain specific methods, the trained system might be fast enough to potentially generate decisions in near real-time.


Planning and scheduling Deep reinforcement learning Graph embedding Multi-satellite collection scheduling 


  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  2. 2.
    Barkaoui, M., Berger, J.: A new hybrid genetic algorithm for the multi-satellite collection scheduling problem. J. Oper. Res. Soc. (To appear)Google Scholar
  3. 3.
    Benoist, T., Rottembourg, B.: Upper bounds for revenue maximization in a satellite scheduling problem. Q. J. Belg. Fr. Ital. Oper. Res. Soc. 2(3), 235–249 (2004)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711 (2016)Google Scholar
  5. 5.
    Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. arXiv preprint arXiv:1704.01665 (2017)
  6. 6.
    Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584 (2017)
  7. 7.
    Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, pp. 6348–6358 (2017)Google Scholar
  8. 8.
    Lawler, E.L., Lenstra, J.K., Kan, A.H.R., Shmoys, D.B.: Sequencing and scheduling: algorithms and complexity. Handb. Oper. Res. Manag. Sci. 4, 445–522 (1993)Google Scholar
  9. 9.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  10. 10.
    Lenstra, J.K., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Annals of Discrete Mathematics, vol. 1, pp. 343–362. Elsevier (1977)Google Scholar
  11. 11.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  12. 12.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  13. 13.
    Nazari, M., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 9860–9870. Curran Associates, Inc. (2018)Google Scholar
  14. 14.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  15. 15.
    Sarkheyli, A., Vaghei, B.G., Bagheri, A.: New tabu search heuristic in scheduling earth observation satellites. In: 2010 2nd International Conference on Software Technology and Engineering, vol. 2, pp. V2–199. IEEE (2010)Google Scholar
  16. 16.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  17. 17.
    Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11(Apr), 1201–1242 (2010)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Wang, J., Demeulemeester, E., Qiu, D., Liu, J.: Exact and inexact scheduling algorithms for multiple earth observation satellites under uncertainties of clouds. Available at SSRN 2634934 (2015)Google Scholar
  19. 19.
    Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  20. 20.
    Wolfe, W.J., Sorensen, S.E.: Three scheduling algorithms applied to the earth observing systems domain. Manag. Sci. 46(1), 148–166 (2000)CrossRefGoogle Scholar

Copyright information

© Crown 2019

Authors and Affiliations

  1. 1.School of ComputingQueen’s UniversityKingstonCanada
  2. 2.Department of Mathematics and Computer ScienceRoyal Military College of CanadaKingstonCanada
  3. 3.Defence Research Development CanadaValcartierCanada

Personalised recommendations