Sequence Labeling with Reinforcement Learning and Ranking Algorithms

  • Francis Maes
  • Ludovic Denoyer
  • Patrick Gallinari
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4701)


Many problems in areas such as Natural Language Processing, Information Retrieval, or Bioinformatic involve the generic task of sequence labeling. In many cases, the aim is to assign a label to each element in a sequence. Until now, this problem has mainly been addressed with Markov models and Dynamic Programming.

We propose a new approach where the sequence labeling task is seen as a sequential decision process. This method is shown to be very fast with good generalization accuracy. Instead of searching for a globally optimal label sequence, we learn to construct this optimal sequence directly in a greedy fashion. First, we show that sequence labeling can be modelled using Markov Decision Processes, so that several Reinforcement Learning (RL) algorithms can be used for this task. Second, we introduce a new RL algorithm which is based on the ranking of local labeling decisions.


Input Sequence Markov Decision Process Conditional Random Field Ranking Algorithm Sequence Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Forney, G.D.: The viterbi algorithm. Proceedings of The IEEE 61(3), 268–278 (1973)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Daumé III, H., Marcu, D.: Learning as search optimization: Approximate large margin methods for structured prediction. In: ICML, Bonn, Germany, ACM Press, New York (2005)Google Scholar
  3. 3.
    Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction (2006)Google Scholar
  4. 4.
    Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press-Wiley, Cambridge, Massachusetts (1960)zbMATHGoogle Scholar
  5. 5.
    J. Si, A. G. Barto, W.B., P., W.II., D.: Handbook of Learning and Approximate Dynamic Programming. Wiley&Sons, INC., Publications (2004)Google Scholar
  6. 6.
    Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Bertsekas, D.P: Rollout agorithms: an overview. In: Decision and Control, pp. 448–449 (1999)Google Scholar
  8. 8.
    Tsampouka, P., Shawe-Taylor, J.: Perceptron-like large margin classifiers (2005)Google Scholar
  9. 9.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289. Morgan Kaufmann, San Francisco, CA (2001)Google Scholar
  10. 10.
    Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden markov support vector machines. In: ICML, pp. 3–10. ACM Press, New York (2003)Google Scholar
  11. 11.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML, ACM Press, New York (2004)Google Scholar
  12. 12.
    Maes, F., Denoyer, L., Gallinari, P.: Sequence labeling with reinforcement learning and ranking algorithms. Technical report, LIP6 - University of Paris 6 (2007)Google Scholar
  13. 13.
    Kassel, R.H.: A comparison of approaches to on-line handwritten character recognition. PhD thesis, Cambridge, MA, USA (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Francis Maes
    • 1
  • Ludovic Denoyer
    • 1
  • Patrick Gallinari
    • 1
  1. 1.LIP6 - University of Paris 6 

Personalised recommendations