BioDI: A New Approach to Improve Biomedical Documents Indexing

Chebil, Wiem; Soualmia, Lina Fatima; Darmoni, Stéfan Jacques

doi:10.1007/978-3-642-40285-2_9

BioDI: A New Approach to Improve Biomedical Documents Indexing

Wiem Chebil^21,22,
Lina Fatima Soualmia²¹ &
Stéfan Jacques Darmoni²¹

Conference paper

1766 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8055))

Abstract

The partial match between biomedical documents and controlled vocabularies allows to find in the documents more terms variants than those existing in the dictionaries. However, it generates irrelevant information. We propose a new approach for indexing biomedical documents with the Medical Subject Headings (MeSH) thesaurus that aims to overcome the limitation of the partial match. In fact, our indexing approach proposes to restrict the stemming process in the step of pretreatment. The step of the descriptors extraction is based essentially on the vector space model and combines semantic and statistic methods to compute a score to estimate the relevance of a descriptor given a document. The knowledge provided by the Unified Medical Language System (UMLS) is used then for filtering. The filtering method aims to keep only relevant descriptors. The experiments of our approach that have been carried out on the OHSUMED collection, showed very encouraging results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Happe, A., Pouliquen, B., Burgun, A., Cuggia, M., Beux, P.L.: Automatic concept extraction from spoken medical reports. I. J. Medical Informatics 70(2-3), 255–263 (2003)
Article Google Scholar
Jonquet, C., LePendu, P., Falconer, S.M., Coulet, A., Noy, N.F., Musen, M.A., Shah, N.H.: NCBO Resource Index: Ontology-based search and mining of biomedical resources. J. Web Sem. 9(3), 316–324 (2011)
Article Google Scholar
Mukherjea, et al.: Enhancing a biomedical information extraction system with dictionary mining and context Disambiguation. IBM Journal of Research and Development 48(5/6), 693–701 (2004)
Article Google Scholar
Zhou, X., Zhang, X., Hu, X.: MaxMatcher: Biological concept extraction using approximate dictionary lookup. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 1145–1149. Springer, Heidelberg (2006)
Chapter Google Scholar
Ruch, P.: Automatic assignment of biomedical categories: toward a generic approach. Bioinform. J. 22(6), 658–664 (2006)
Article Google Scholar
Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., Rogers, W.J.: The NLM indexing initiative’s medical text indexer. Med. Health Info. 11(1), 268–272 (2004)
Google Scholar
Majdoubi, J., Tmar, M., Gargouri, F.: Using the MeSH thesaurus to index a medical article: combination of content, structure and semantics. In: International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, KES, vol. (1), pp. 277–284 (2009)
Google Scholar
Nelson, S.J., Johnson, W.D., Humphreys, B.L.: Relationships in Medical Subject Heading. In: Relationships in the Organization of Knowledge, pp. 171–184. Kluwer Academic Publishers (2001)
Google Scholar
Trieschnigg, D., Pezik, P., Lee, V., et al.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009)
Article Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(4), 267–270 (2004)
Article Google Scholar
Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1981)
Article Google Scholar
Couto, F.M., Silva, M.J., Coutinho: Finding genomic ontology terms in text using evidence content. BMC Bioinformatic 6, (S-1) (2005)
Google Scholar
Chebil, W., Soualmia, L.F., Dahamna, B., Darmoni, S.J.: Automatic indexing of health documents in French: Evaluating and analysing errors. IRBM BioMedical Engineering and Research 33(2), 129–136 (2012)
Google Scholar
Manning, C.D., Schütze, H.: Fondations of statistical natural language processing, pp. 534–536. MIT Press, Cambridge (1999)
Google Scholar
Dinh, D., Tamine, L.: Towards a context sensitive approach to searching information based on domain specific knowledge sources. Web Semantics: Science, Services and Agents on the World Wide Web 12-13, 41–52 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Normandie Univ, CISMeF Team, LITIS-TIBS EA 4108, Rouen University and Hospital, France
Wiem Chebil, Lina Fatima Soualmia & Stéfan Jacques Darmoni
Research Unit MARS, Monastir University, Tunisia
Wiem Chebil

Authors

Wiem Chebil
View author publications
You can also search for this author in PubMed Google Scholar
Lina Fatima Soualmia
View author publications
You can also search for this author in PubMed Google Scholar
Stéfan Jacques Darmoni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Department of Information Technologies, University of Economics, Winston Churchill Square 4, 130 67, Prague 3, Czech Republic
Josef Basl
Institute of Software Technology, Vienna University of Technology, Favoritenstraße 9-11 / 188, 1040, Vienna, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chebil, W., Soualmia, L.F., Darmoni, S.J. (2013). BioDI: A New Approach to Improve Biomedical Documents Indexing. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40285-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-40285-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40284-5
Online ISBN: 978-3-642-40285-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics