Skip to main content

BioDI: A New Approach to Improve Biomedical Documents Indexing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8055))

Abstract

The partial match between biomedical documents and controlled vocabularies allows to find in the documents more terms variants than those existing in the dictionaries. However, it generates irrelevant information. We propose a new approach for indexing biomedical documents with the Medical Subject Headings (MeSH) thesaurus that aims to overcome the limitation of the partial match. In fact, our indexing approach proposes to restrict the stemming process in the step of pretreatment. The step of the descriptors extraction is based essentially on the vector space model and combines semantic and statistic methods to compute a score to estimate the relevance of a descriptor given a document. The knowledge provided by the Unified Medical Language System (UMLS) is used then for filtering. The filtering method aims to keep only relevant descriptors. The experiments of our approach that have been carried out on the OHSUMED collection, showed very encouraging results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Happe, A., Pouliquen, B., Burgun, A., Cuggia, M., Beux, P.L.: Automatic concept extraction from spoken medical reports. I. J. Medical Informatics 70(2-3), 255–263 (2003)

    Article  Google Scholar 

  2. Jonquet, C., LePendu, P., Falconer, S.M., Coulet, A., Noy, N.F., Musen, M.A., Shah, N.H.: NCBO Resource Index: Ontology-based search and mining of biomedical resources. J. Web Sem. 9(3), 316–324 (2011)

    Article  Google Scholar 

  3. Mukherjea, et al.: Enhancing a biomedical information extraction system with dictionary mining and context Disambiguation. IBM Journal of Research and Development 48(5/6), 693–701 (2004)

    Article  Google Scholar 

  4. Zhou, X., Zhang, X., Hu, X.: MaxMatcher: Biological concept extraction using approximate dictionary lookup. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 1145–1149. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Ruch, P.: Automatic assignment of biomedical categories: toward a generic approach. Bioinform. J. 22(6), 658–664 (2006)

    Article  Google Scholar 

  6. Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., Rogers, W.J.: The NLM indexing initiative’s medical text indexer. Med. Health Info. 11(1), 268–272 (2004)

    Google Scholar 

  7. Majdoubi, J., Tmar, M., Gargouri, F.: Using the MeSH thesaurus to index a medical article: combination of content, structure and semantics. In: International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, KES, vol. (1), pp. 277–284 (2009)

    Google Scholar 

  8. Nelson, S.J., Johnson, W.D., Humphreys, B.L.: Relationships in Medical Subject Heading. In: Relationships in the Organization of Knowledge, pp. 171–184. Kluwer Academic Publishers (2001)

    Google Scholar 

  9. Trieschnigg, D., Pezik, P., Lee, V., et al.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009)

    Article  Google Scholar 

  10. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(4), 267–270 (2004)

    Article  Google Scholar 

  11. Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001)

    Google Scholar 

  12. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1981)

    Article  Google Scholar 

  13. Couto, F.M., Silva, M.J., Coutinho: Finding genomic ontology terms in text using evidence content. BMC Bioinformatic 6, (S-1) (2005)

    Google Scholar 

  14. Chebil, W., Soualmia, L.F., Dahamna, B., Darmoni, S.J.: Automatic indexing of health documents in French: Evaluating and analysing errors. IRBM BioMedical Engineering and Research 33(2), 129–136 (2012)

    Google Scholar 

  15. Manning, C.D., Schütze, H.: Fondations of statistical natural language processing, pp. 534–536. MIT Press, Cambridge (1999)

    Google Scholar 

  16. Dinh, D., Tamine, L.: Towards a context sensitive approach to searching information based on domain specific knowledge sources. Web Semantics: Science, Services and Agents on the World Wide Web 12-13, 41–52 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chebil, W., Soualmia, L.F., Darmoni, S.J. (2013). BioDI: A New Approach to Improve Biomedical Documents Indexing. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40285-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40285-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40284-5

  • Online ISBN: 978-3-642-40285-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics