Neuroanatomical term generation and comparison between two terminologies
An approach and software tools are described for identifying and extracting compound terms (CTs), acronyms and their associated contexts from textual material that is associated with neuroanatomical atlases. A set of simple syntactic rules were appended to the output of a commercially available part of speech (POS) tagger (Qtag v 3.01) that extracts CTs and their associated context from the texts of neuroanatomical atlases. This “hybrid” parser appears to be highly sensitive and recognized 96% of the potentially germane neuroanatomical CTs and acronyms present in the cat and primate thalamic atlases.
A comparison of neuroanatomical CTs and acronyms between the cat and primate atlas texts was initially performed using exact-term matching. The implementation of string-matching algorithms significantly improved the identification of relevant terms and acronyms between the two domains. The End Gap Free string matcher identified 98% of CTs and the Needleman Wunsch (NW) string matcher matched 36% of acronyms between the two atlases.
Combining several simple grammatical and lexical rules with the POS tagger (“hybrid parser”) (1) extracted complex neuroanatomical terms and acronyms from selected cat and primate thalamic atlases and (2) and facilitated the semi-automated generation of a highly granular thalamic terminology. The implementation of string-matching algorithms (1) reconciled terminological errors generated by optical character recognition (OCR) software used to generate the neuroanatomical text information and (2) increased the sensitivity of matching neuroanatomical terms and acronyms between the two neuroanatomical domains that were generated by the “hybrid” parser.
Index EntriesTerm similarity thalamic atlas neuroanatomical indexing information retrieval string matching statistical parser
Unable to display preview. Download preview PDF.
- American Heritage Dictionary of the English Language, The: Fourth Edition. 2000, Houghton-Mifflin, Boston, MA.Google Scholar
- Assadi, H. and Bourigault, D. (1996) Acquisition and modeling of knowledge starting from texts: data-processing tools and methodological elements. In: Acts of 10th Congress Pattern Recognition and Artificial Intelligence, Rennes, France.Google Scholar
- Berman, A. L. and Jones E. G. (1982) The Thalamus and Basal Telencephalon of the Cat. A Cytoarchitectonic Atlas with Stereotaxic Coordinates. University of Wisconsin Press, Madison, WI.Google Scholar
- Gardner, D., Abato, M., Knuth, K. H., Debellis, R., and Gardner, E. P. (2001a) A functional ontology for neuroinformatics. The Human Brain Project/Neuroinformatics Annual Spring Meeting, May 21–22, 2001, Bethesda, MD.Google Scholar
- Gusfield, D. (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK.Google Scholar
- Jacquemin, C. and Bourigault, D. (2002) Termextraction and automatic indexing. In: Handbook of Computational Linguistics. (Mitkov, R., ed.) Oxford University Press, Oxford, UK, Chapter 19.Google Scholar
- Jones, E. G. (1998) The thalamus of primates In: Handbook of Chemical Neuroanatomy, Volume 14. (Bloom, F. E., et al., eds.) Elsevier, Amsterdam, The Netherlands.Google Scholar
- Kuang-Hua, C. and Chert, I. (1994) Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In: 32nd Annual Meeting of the Association for Computational Linguistics, June 27–30, New Mexico State University, Las Cruces, NM.Google Scholar
- Language Technology Group. http://www.ltg.ed.ac.uk/software/chunk/Google Scholar
- Lopresti, D. and Wilfong, G. (1999) Cross-domain approximate string matching. Sixth International Symposium on String Processing and Information Retrieval. Cancun, Mexico, September 22–24, pp. 120–127.Google Scholar
- Manning, C. D. and Schütze, H. (2000) Foundations of statistical natural language. MIT Press, Cambridge, MA, p. 83.Google Scholar
- Maynard, D. and Ananiadou, S. (1999) Identifying contextual information for multi-word term extraction, In: 5th International Congress on Terminology and Knowledge Engineering (TKE99), pp. 212–221.Google Scholar
- Monge, A. E. and Elkan, C. P. (1996) The field matching problem: Algorithms and applications. Second International Conference on Knowledge Discovery and Data Mining. (KDD96), Portland, OR, August 2–4, pp. 267–270.Google Scholar
- Penn Tree Bank. http://www.cis.upenn.edu/~treebank/home.htmlGoogle Scholar
- Qtag v 3.01, Portable POS Tagger. Oliver Mason, Department of English, School of Humanities, The University of Birmingham, UK. http://web.bham.ac.uk/O.Mason/Google Scholar
- SPECIALIST Lexicon. http://www.nlm.nih.gov/research/umls/META4.HTML#s4Google Scholar
- Zhu, J. J. and Ungar, L. H. (2000) String Edit Analysis for merging databases. Knowledge Discovery and Data Mining Workshop, August 20. Boston, MA. ACM SIG KDD, Jan 2001, Vol. 2., No, 2, p. 3.Google Scholar