Entropy Guided Transformation Learning

  • Cícero Nogueira dos Santos
  • Ruy Luiz Milidiú
Part of the Studies in Computational Intelligence book series (SCI, volume 201)


This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. ETL generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. ETL uses the information gain in order to select the feature combinations that provide good template sets.

We describe the application of ETL to two language independent Text Mining preprocessing tasks: part-of-speech tagging and phrase chunking. We also report our findings on one language independent Information Extraction task: named entity recognition. Overall, we successfully apply it to six different languages: Dutch, English, German, Hindi, Portuguese and Spanish.

For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. Furthermore, our extensive experimental results demonstrate that ETL is an effective way to learn accurate transformation rules. We believe that by avoiding the use of handcrafted templates, ETL enables the use of transformation rules to a greater range of Text Mining applications.


Information Gain Transformation Rule Name Entity Recognition Entity Recognition Semantic Role Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: PROPOR, pp. 110–117 (2003)Google Scholar
  2. 2.
    Bharati, A., Mannem, P.R.: Introduction to shallow parsing contest on south asian languages. In: Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 1–8 (2007)Google Scholar
  3. 3.
    Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)Google Scholar
  4. 4.
    Brants, T.: Tnt – a statistical part-of-speech tagger. In: ANLP, pp. 224–231 (2000)Google Scholar
  5. 5.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics 21(4), 543–565 (1995)Google Scholar
  7. 7.
    Carberry, S., Vijay-Shanker, K., Wilson, A., Samuel, K.: Randomized rule selection in transformation-based learning: a comparative study. Natural Language Engineering 7(2), 99–116 (2001)CrossRefGoogle Scholar
  8. 8.
    Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)Google Scholar
  9. 9.
    Corston-Oliver, S., Gamon, M.: Combining decision trees and transformation-based learning to correct transferred linguistic representations. In: Proceedings of the Ninth Machine Tranlsation Summit, New Orleans, USA, pp. 55–62. Association for Machine Translation in the Americas (2003)Google Scholar
  10. 10.
    Curran, J.R., Wong, R.K.: Formalisation of transformation-based learning. In: Proceedings of the ACSC, Canberra, Australia, pp. 51–57 (2000)Google Scholar
  11. 11.
    Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1, 131–156 (1997)CrossRefGoogle Scholar
  12. 12.
    dos Santos, C.N., Oliveira, C.: Constrained atomic term: Widening the reach of rule templates in transformation based learning. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS, vol. 3808, pp. 622–633. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Proceedings of 8th Workshop on Computational Processing of Written and Spoken Portuguese, pp. 143–152 (2008)Google Scholar
  14. 14.
    Elming, J.: Transformation-based corrections of rule-based mt. In: Proceedings of the EAMT 11th Annual Conference, Oslo, Norway (2006)Google Scholar
  15. 15.
    Finger, M.: Técnicas de otimização da precisão empregadas no etiquetador tycho brahe. In: Proceedings of PROPOR, São Paulo, pp. 141–154 (November 2000)Google Scholar
  16. 16.
    Florian, R.: Named entity recognition as a house of cards: Classifier stacking. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 175–178 (2002)Google Scholar
  17. 17.
    Florian, R., Henderson, J.C., Ngai, G.: Coaxing confidences from an old friend: Probabilistic classifications from transformation rule lists. In: Proceedings of Joint Sigdat Conference on Empirical Methods in NLP and Very Large Corpora, Hong Kong University of Science and Technology (October 2000)Google Scholar
  18. 18.
    Forman, G., Guyon, I., Elisseeff, A.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)zbMATHCrossRefGoogle Scholar
  19. 19.
    Francis, W.N., Kucera, H.: Frequency analysis of english usage. Lexicon and grammar (1982)Google Scholar
  20. 20.
    Freitas, M.C., Duarte, J.C., dos Santos, C.N., Milidiú, R.L., Renteria, R.P., Quental, V.: A machine learning approach to the identification of appositives. In: Proceedings of Ibero-American AI Conference, Ribeirão Preto, Brazil (October 2006)Google Scholar
  21. 21.
    Freitas, M.C., Garrao, M., Oliveira, C., dos Santos, C.N., Silveira, M.: A anotação de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL / XXV Congresso da SBC, São Leopoldo - RS - Brasil (2005)Google Scholar
  22. 22.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Higgins, D.: A transformation-based approach to argument labeling. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL 2004), Boston, Massachusetts, USA, May 6 - 7, 2004, pp. 114–117. Association for Computational Linguistics (2004)Google Scholar
  24. 24.
    Hwang, Y.-S., Chung, H.-J., Rim, H.-C.: Weighted probabilistic sum model based on decision tree decomposition for text chunking. International Journal of Computer Processing of Oriental Languages (1), 1–20 (2003)CrossRefGoogle Scholar
  25. 25.
    IEL-UNICAMP and IME-USP. Corpus anotado do português histórico tycho brahe, (accessed January 23, 2008)
  26. 26.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Englewood Cliffs (2000)Google Scholar
  27. 27.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the NAACL 2001 (2001)Google Scholar
  28. 28.
    Mangu, L., Brill, E.: Automatic rule acquisition for spelling correction. In: Proceedings of The Fourteenth ICML. Morgan Kaufmann, San Francisco (1997)Google Scholar
  29. 29.
    Màrquez, L., Carreras, X., Litkowski, K.C., Stevenson, S.: Semantic role labeling: an introduction to the special issue. Computational Linguistics 34(2), 145–159 (2008)CrossRefGoogle Scholar
  30. 30.
    Megyesi, B.: Shallow parsing with pos taggers and linguistic features. Journal of Machine Learning Research 2, 639–668 (2002)zbMATHCrossRefGoogle Scholar
  31. 31.
    Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)Google Scholar
  32. 32.
    Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for portuguese named entity recognition. In: Proceedings of Fourth Workshop in Information and Human Language Technology, Ribeirão Preto, Brazil (2006)Google Scholar
  33. 33.
    Milidiú, R.L., Duarte, J.C., dos Santos, C.N.: Tbl template selection: An evolutionary approach. In: Proceedings of Conference of the Spanish Association for Artificial Intelligence - CAEPIA, Salamanca, Spain (2007)Google Scholar
  34. 34.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  35. 35.
    Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (June 2001)Google Scholar
  36. 36.
    Avinesh, P.V.S., Gali, K.: Part-of-speech tagging and chunking using conditional random fields and transformation based learning. In: Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24 (2007)Google Scholar
  37. 37.
    Ross Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  38. 38.
    Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  39. 39.
    Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K.W., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Kluwer, Dordrecht (1999)Google Scholar
  40. 40.
    Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: chunking. In: Proceedings of the 2nd workshop on Learning language in logic and the 4th CONLL, Morristown, NJ, USA, pp. 127–132. Association for Computational Linguistics (2000)Google Scholar
  41. 41.
    Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of ANLP 1997 (1997)Google Scholar
  42. 42.
    Su, J., Zhang, H.: A fast decision tree learning algorithm. In: AAAI (2006)Google Scholar
  43. 43.
    Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)Google Scholar
  44. 44.
    Wu, Y.-C., Chang, C.-H., Lee, Y.-S.: A general and multi-lingual phrase chunking model based on masking method. In: Proceedings of 7th International Conference on Intelligent Text Processing and Computational Linguistics, pp. 144–155 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Cícero Nogueira dos Santos
    • 1
  • Ruy Luiz Milidiú
    • 1
  1. 1.Departamento de InformáticaPUC-RioRio de JaneiroBrazil

Personalised recommendations