The K Best-Paths Approach to Approximate Dynamic Programming with Application to Portfolio Optimization

  • Nicolas Chapados
  • Yoshua Bengio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4013)


We describe a general method to transform a non-markovian sequential decision problem into a supervised learning problem using a K-best-paths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental results using a kernel-based controller architecture that would not normally be considered in traditional reinforcement learning or approximate dynamic programming.


Utility Function Portfolio Optimization Portfolio Management Sharpe Ratio Good Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  2. 2.
    Bengio, Y.: Using a financial training criterion rather than a prediction criterion. International Journal of Neural Systems 8(4), 433–443 (1997)CrossRefGoogle Scholar
  3. 3.
    Bengio, Y., Chapados, N.: Extensions to metric based model selection. Journal of Machine Learning Research 3(7–8), 1209–1228 (2003)zbMATHCrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (2000)Google Scholar
  5. 5.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn., vol. I. Athena Scientific, Belmont (2001)zbMATHGoogle Scholar
  6. 6.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  7. 7.
    Chapados, N., Bengio, Y.: Cost functions and model combination for VaR-based asset allocation using neural networks. IEEE Transactions on Neural Networks 12, 890–906 (2001)CrossRefGoogle Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedins of Thirteenth International Conference, pp. 148–156 (1996)Google Scholar
  9. 9.
    Grinold, R.C., Kahn, R.N.: Active Portfolio Management. McGraw-Hill, New York (2000)Google Scholar
  10. 10.
    Pelayo, V.M.J., Varó, A.M.: Computing the K shortest paths: a new algorithm and an experimental comparison. In: Proc. 3rd Worksh. Algorithm Engineering (July 1999)Google Scholar
  11. 11.
    Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany (August 2005)Google Scholar
  12. 12.
    Merton, R.C.: Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics 51(3), 247–257 (1969)CrossRefGoogle Scholar
  13. 13.
    Moody, J., Saffel, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)CrossRefGoogle Scholar
  14. 14.
    Mossin, J.: Optimal multiperiod portfolio policies. The Journal of Business 41(2), 215–229 (1968)CrossRefGoogle Scholar
  15. 15.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  16. 16.
    Samuelson, P.A.: Lifetime portfolio selection by dynamic stochastic programming. The Review of Economics and Statistics 51(3), 239–246 (1969)CrossRefGoogle Scholar
  17. 17.
    Sharpe, W.F.: Mutual fund performance. Journal of Business, 119–138 (January 1966)Google Scholar
  18. 18.
    Sharpe, W.F.: The sharpe ratio. The Journal of Portfolio Management 21(1), 49–58 (1994)CrossRefGoogle Scholar
  19. 19.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  20. 20.
    Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. IEEE Press Series on Computational Intelligence. Wiley–IEEE Press, Hardcover (2004)CrossRefGoogle Scholar
  21. 21.
    Sortino, F., Price, L.: Performance measurement in a downside risk framework. In: The Journal of Investing, pp. 59–65 (Fall 1994)Google Scholar
  22. 22.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  23. 23.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, London (1995)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicolas Chapados
    • 1
  • Yoshua Bengio
    • 1
  1. 1.Dept. IROUniversité de MontréalMontréalCanada

Personalised recommendations