Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies

  • Francisco Carrero
  • José Carlos Cortizo
  • José María Gómez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5326)


The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.


Semantic techniques data pre and post processing information filtering recommender systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus. In: Proceedings of the American Medical Informatics Association Symp., pp. 17–21 (2001)Google Scholar
  3. 3.
    Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 2004 32, D267–D270 (2004)CrossRefGoogle Scholar
  4. 4.
    Carrero García, F., et al.: Attribute Analysis in Biomedical Text Classification. In: Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology, Spanish Nacional Cancer Research Centre (CNIO), Madrid, SPAIN (2007)Google Scholar
  5. 5.
    Cortizo, J.C., Giraldez, I.: Discovering Data Dependencies in Web Content Mining. In: Proceedings of the IADIS International Conference WWW/Internet 2004, Madrid, Spain, October 6-9, 2004, pp. 881–884 (2004)Google Scholar
  6. 6.
    Cortizo, J.C., Giraldez, I., Gaya, M.C.: Wrapping the Naïve Bayes Classifier to Relax the Effect of Dependences. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 229–239. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Gaya, M.C., Giraldez, I., Cortizo, J.C.: Uso de algoritmos evolutivos para la fusion de teorías en minería de datos distribuida. In: Actas de la XII Conferencia de la Asociación Española para la Inteligencia Artificial – CAEPIA/TTIA 2007, vol. 2, pp. 121–130 (2007)Google Scholar
  8. 8.
    Gómez Hidalgo, J.M., et al.: Concept Indexing for Automated Text Categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Gonzalo, J., et al.: Indexing with WordNet synsets can improve Text Retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing, Montreal (1998)Google Scholar
  10. 10.
    Gonzalo, J., et al.: Applying EuroWordNet to Cross-Language Text Retrieval. Computers and the Humanities 32, 2–3, 185–207 (1998)CrossRefGoogle Scholar
  11. 11.
    Marko, K., Schulz, S., Hahn, U.: MorphoSaurus–design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods of Information in Medicine 44(4), 537–545 (2005)Google Scholar
  12. 12.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  13. 13.
    Snyder, B., Palmer, M.: The English all words task. In: SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)Google Scholar
  14. 14.
    Volk, M., et al.: Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics 67(1-3), 97–112 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Francisco Carrero
    • 1
  • José Carlos Cortizo
    • 1
    • 2
  • José María Gómez
    • 3
  1. 1.Universidad Europea de MadridMadridSpain
  2. 2.Artificial Intelligence & Network Solutions S.L.Spain
  3. 3.Departamento de I+D, Optenet, Parque Empresarial Alvia, Las RozasMadridSpain

Personalised recommendations