Active Hidden Markov Models for Information Extraction

  • Tobias Scheffer
  • Christian Decomain
  • Stefan Wrobel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2189)


Information extraction from HTML documents requires a classifier capable of assigning semantic labels to the words or word sequences to be extracted. If completely labeled documents are available for training, well-known Markov model techniques can be used to learn such classifiers. In this paper, we consider the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training. We first give detailed account of the task and its appropriate loss function, and show how it can be minimized given an HMM. We describe an EM style algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects “difficult” unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.


Hide Markov Model Information Extraction Observation Sequence Small Margin Semantic Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    T. Berners-Lee. Semantic web road map. Internal note, World Wide Web Consortium, 1998.Google Scholar
  2. 2.
    T. Brants. Cascaded markov models. In Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, 1999.Google Scholar
  3. 3.
    D. Cohn, Z. Ghahramani, and M. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996.zbMATHGoogle Scholar
  4. 4.
    Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Seán Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1-2):69–113, 2000.zbMATHCrossRefGoogle Scholar
  5. 5.
    L. Eikvil. Information extraction from the world wide web: a survey. Technical Report 945, Norwegian Computing Center, 1999.Google Scholar
  6. 6.
    S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32:41–64, 1998.CrossRefGoogle Scholar
  7. 7.
    Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history. In Proceedings of the International Conference on Computational Linguistics, 1996.Google Scholar
  8. 8.
    Thomas Hofmann and Joachim M. Buhmann. Active data clustering. In Advances in Neural Information Processing Systems, volume 10, 1998.Google Scholar
  9. 9.
    N. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. Journal of Information Systems, Special Issue on Semistructured Data, 23(8), 1998.Google Scholar
  10. 10.
    Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems, volume 7,pages 231–238, 1995.Google Scholar
  11. 11.
    N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.Google Scholar
  13. 13.
    L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, 1989.CrossRefGoogle Scholar
  14. 14.
    T. Scheffer, S. Hoche, and S. Wrobel. Learning hidden markov models for information extraction actively from partially labeled text. Technical report, University of Magdeburg, 2001.Google Scholar
  15. 15.
    Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden markov model structure for information extraction. In AAAI’99 Workshop on Machine Learning for Information Extraction, 1999.Google Scholar
  16. 16.
    V. Vapnik. Statistical Learning Theory. Wiley, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Tobias Scheffer
    • 1
    • 2
  • Christian Decomain
  • Stefan Wrobel
    • 1
  1. 1.University of MagdeburgMagdeburgGermany
  2. 2.SemanticEdgeBerlinGermany

Personalised recommendations