Skip to main content
Log in

Thesaurus or Logical Ontology, Which One Do We Need for Text Mining?

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Ontologies are recognised as important tools, not only for effective and efficient information sharing, but also for information extraction and text mining. In the biomedical domain, the need for a common ontology for information sharing has long been recognised, and several ontologies are now widely used. However, there is confusion among researchers concerning the type of ontology that is needed for text mining , and how it can be used for effective knowledge management, sharing, and integration in biomedicine. We argue that there are several different ways to define an ontology and that, while the logical view is popular for some applications, it may be neither possible nor necessary for text mining. We propose a text-centered approach for knowledge sharing, as an alternative to formal ontologies. We argue that a thesaurus (i.e. an organised collection of terms enriched with relations) is more useful for text mining applications than formal ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • S. Ananiadou H. Mima G. Nenadic (2001) A Terminology Management Workbench for Molecular Biology P. Vet Particlevan del (Eds) et al. Information Extraction in Molecular Biology University of Twente the Netherlands

    Google Scholar 

  • Ananiadou S., Friedman C., Tsujii J. (eds.) (2004) Named Entity Recognition in Biomedicine, Special Issue. Journal of Biomedical Informatics, 37(6).

  • C. Blaschke L. Hirschman A. Valencia (2002) ArticleTitleInformation Extraction in Molecular Biology Briefings in Bioinformatics. 3 IssueID2 154–165 Occurrence Handle12139435

    PubMed  Google Scholar 

  • C. Blaschke A. Valencia (2002) ArticleTitleMolecular Biology Nomenclature Thwarts Information-Extraction Progress IEEE Intelligent Systems. 17 IssueID3 73–76 Occurrence Handle10.1109/5254.988462

    Article  Google Scholar 

  • Bechofer S., Goble C., Rector A., Solomon W., Nowlan W. Terminologies and Terminology Servers for Information Environments, 8th IEE International Conference on Software Technology and Engineering Practice, London, UK, 35–42, 1997.

  • Ceusters W., Smith B., Kumar A., Dhaen C.(2003) Mistakes in Medical Ontologies: Where Do they Come from and How Can they be Detected? In Pisanelli D. (ed.), Ontologies in Medicine. Proceedings of the workshop on medical ontologies, Rome.

  • Chang J., Schutze D., Altman R. (2002) Creating on-line Dictionary of Abbreviations from Medline. Journal of the American Medical Informatics Association.

  • K. Frantzi S. Ananiadou H. Mima (2000) ArticleTitleAutomatic Recognition of Multi-Word Terms: The C/NC Value Method International Journal of Digital Libraries. 3 IssueID2 115–130 Occurrence Handle10.1007/s007999900023

    Article  Google Scholar 

  • L. Hirschman J. Park J. Tsujii L. Wong C. Wu (2002) ArticleTitleAccomplishments and Challenges in Literature Data Mining for Biology In Bioinformatics. 18 IssueID12 1553–1561 Occurrence Handle10.1093/bioinformatics/18.12.1553

    Article  Google Scholar 

  • C. Jacquemin E. Tzoukermann (1999) NLP for Term Variant Extraction: A Synergy of Morphology, Lexicon and Syntax T. Strzalkowski (Eds) Natural Language Information Retrieval Kluwer Boston 25–74

    Google Scholar 

  • Jacquemin C. (2001) Spotting and Discovering Terms through NLP. MIT Press.

  • MEDLINE 2004. National Library of Medicine. Available from: http://www.ncbi.nlm.nih.gov/PubMed.

  • Mima H., Ananiadou S., Nenadic G., Tsujii J., (2002) A Methodology for Terminology-Based Knowledge Acquisition and Integration. In Proceedings of 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 667–673.

  • Mima H., Ananiadou S., Matsushima K. (2004) Design and Implementation of a Terminology-Based Literature Mining and Knowledge Structuring System. In Proceedings of CompuTerm, Coling, Geneva, Switzerland.

  • Morgan A., Yeh A., Hirshman L. (2004) Gene Name Extraction using FlyBase Resources. In Ananiadou S., Friedman C. and Tsuji J. (eds), Named Entity Recognition in Biomedicine, Special Issue. Journal of Biomedical Informatics, 37(6).

  • Nenadic G., Mima H., Ananiadou S. Tsujii J. (2002) Terminology-Based Literature Mining and Knowledge Acquisition in Biomedicine. In International Journal of Medical Informatics.

  • Nenadic G., Spasic I., Ananiadou S. (2005) Mining Biomedical Abstracts: What’s in a Term? In Su K.-Y., Tsujii J. Lee J.-H. et al. (eds.), Natural Language Processing IJCNLP 2004 First International Joint Conference, Lecture Notes in Computer Science Vol. 3248, 2005.

  • Ohta T., Tateishi Y., Tsujii J., et al. (2002) GENIA Corpus: An Annotated Research Abstract Corpus in Molecular biology domain. In Proceedings of HLT, San Diego.

  • Pustejovsky J., Castano B., Cochran B., et al. (2001) Extraction and Disambiguation of Acronym-Meaning Pairs in Medline. In Proceedings of Medinfo.

  • Sager J.C. (1990) A Practical Course in Terminology Processing. John Benjamins Publ. Company.

  • Spasic I., Ananiadou S. (2004) Using Automatically Learnt Verb Selectional Preferences for Classification of Biomedical Terms. In: Ananiadou S., Friedman C., Tsujii J. (eds). Named Entity Recognition in Biomedicine, Special Issue. Journal of Biomedical Informatics, 37(6), 483–497.

  • Spasic I., Ananiadou S., Tsujii J. (forthcoming) MaSTerClass: A Case-Based Reasoning System for the Classification of Biomedical Terms. In Journal of Bioinformatics (accepted for publication), Oxford University Press.

  • Tateishi Y., Ohta T., Tsujii J. (2004) Annotation of Predicate-Argument Structure on Molecular Biology Text. In Proceedings of the Workshop on Beyond Shallow Analyses IJCNLP-04, Hainan, China.

  • Tauson O., Chen L., et al. (2004) Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguities. In Proceedings of PSB, Hawaii.

  • Tsuruoka Y., Tsujii J. (2003) Probabilistic Term Variant Generator for Biomedical Terms. In Proceedings of ACM SIGIR, Toronto.

  • The Gene ontology (GO) database and information resource (2004) Nucleic Acid Research, 32, D258–D261.

  • National Cancer Institute Thesaurus. Available from: http://ncicb.nci.nih.gov/.

  • UMLS http://www.nlm.nih.gov/research/umls/.

  • Universal Decimal Classification (UDC) consortium. Available from: http://www.udcc.org/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junichi Tsujii.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsujii, J., Ananiadou, S. Thesaurus or Logical Ontology, Which One Do We Need for Text Mining?. Language Res Eval 39, 77–90 (2005). https://doi.org/10.1007/s10579-005-2697-0

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-005-2697-0

Keywords

Navigation