Sentence Filtering for Information Extraction in Genomics, a Classification Problem

Nédellec, Claire; Abdel Vetah, Mohamed Ould; Bessières, Philippe

doi:10.1007/3-540-44794-6_27

Claire Nédellec³,
Mohamed Ould Abdel Vetah^3,4 &
Philippe Bessières⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2168))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2582 Accesses
5 Citations

Abstract

In some domains, Information Extraction (IE) from texts requires syntactic and semantic parsing. This analysis is computationally expensive and IE is potentially noisy if it applies to the whole set of documents when the relevant information is sparse. A preprocessing phase that selects the fragments which are potentially relevant increases the efficiency of the IE process. This phase has to be fast and based on a shallow description of the texts. We applied various classification methods — IVI, a Naive Bayes learner and C4.5 — to this fragment filtering task in the domain of functional genomics. This paper describes the results of this study. We show that the IVI and Naive Bayes methods with feature selection gives the best results as compared with their results without feature selection and with C4.5 results.

Download to read the full chapter text

Chapter PDF

The GENIA Corpus: Annotation Levels and Applications

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources

Article Open access 13 December 2016

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Article Open access 30 October 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blaschke C., Andrade M. A., Ouzounis C. and Valencia A., “Automatic Extraction of biological information from scientific text: protein-protein interactions”, in Proc. of ISMB’99, 1999.
Google Scholar
Collier N., Nobata C. and Tsujii, “Extracting the names of genes and gene products with a hidden Markov model. In Proc. COLING’2000, Saarbrück,, July-August 2000.
Google Scholar
Craven M. and Kumlien J., “Constructing Biological Knowledge Bases by Extracting Information from Text Sources.”, In Proc. of ISMB’99, 1999.
Google Scholar
Domingos P. and Pazzani M., “Beyond independence: conditions for the optimality of the simple Bayesian classifier”, in Proc. of ICML’96, Saitta L. (ed.), pp. 105–112, 1996.
Google Scholar
Fukuda K., Tsunoda T., Tamura A. and Takagi T., “Toward Information Extraction: Identifying protein names from biological papers”. In Proc. PSB’98, 1998.
Google Scholar
Humphreys K., Demetriou G, and Gaizauskas R., “Two applications of information extraction to biological science article: enzyme interaction and protein structure”. In Proc. of PSB’2000, vol.5, pp. 502–513, Honolulu, 2000.
Google Scholar
John G. and Kohavi R., “Wrappers for feature subset selection”, in Artificial Intelligence Journal, 1997.
Google Scholar
Langley P. and Sage S., “Induction of selective Bayesian classifiers”, in Proc. of UAI’ 94, Lopez de Mantaras R. (Ed.), pp. 399–406, Morgan Kaufmann, 1994.
Google Scholar
Mitchell, T. M., Machine Learning, Mac Graw Hill, 1997.
Google Scholar
Proceedings of the Message Understanding Conference (MUC-4-7), Morgan Kaufman, San Mateo, USA, 1992-98.
Google Scholar
Ono T., Hishigaki H., Tanigami A., and Takagi T., “Automated extraction of information on protein-protein interactions from the biological literature”. In Bioinformatics, vol 17 no 2 2001, pp. 155–161, 2001
Article Google Scholar
Pillet V., Méthodologie d’extraction automatique d’information à partir de la littérature scientifique en vue d’alimenter un nouveau système d’information, thèse de l’Université de droit, d’économie et des sciences d’Aix-Marseille, 2000.
Google Scholar
Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B., “Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction”. In Genome Informatics 1998, S. Miyano and T. Takagi, (Eds), Universal Academy Press, Inc, Tokyo, Japan, pp. 72–80, 1998.
Google Scholar
Quinlan J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992.
Google Scholar
Riloff E., “Automatically constructing a Dictionary for Information Extraction Tasks”. In Proc. of AAAI-93, pp. 811–816, AAAI Press / The MIT Press, 1993.
Google Scholar
Soderland S., “Learning Information Extraction Rules for Semi-Structured and Free Text” in Machine Learning Journal, vol 34, 1999.
Google Scholar
Stapley B. J. and Benoit G., “Bibliometrics: Information Retrieval and Visualization from co-occurrence of gene names in MedLine abstracts”. In Proc. of PSB’2000, 2000.
Google Scholar
Thomas, J., Milward, D., Ouzounis C., Pulman S. and Caroll M., “Automatic Extraction of Protein Interactions from Scientific Abstracts”. In Proc. of PSB2000, vol.5, p. 502–513, Honolulu, 2000.
Google Scholar
Yang Y. and Pedersen J., “A comparative study on feature selection in text categorization.”, in Proc. of ICML’97,1997. Fehler! Textmarke nicht definiert.
Google Scholar

Download references

Author information

Authors and Affiliations

LRI UMR 8623 CNRS, Université Paris-Sud, 91405, Orsay cedex
Claire Nédellec & Mohamed Ould Abdel Vetah
ValiGen SA, Tour Neptune, 92086, La-Défense
Mohamed Ould Abdel Vetah
Mathématique, Informatique et Génome (MIG) INRA, 78026, Versailles cedex
Philippe Bessières

Authors

Claire Nédellec
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ould Abdel Vetah
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Bessières
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Inst.of Information and Computing Sciences Dept. of Mathematics and Computer Science, University of Utrecht, Padualaan 14, de Uithof, 3508, TB Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nédellec, C., Abdel Vetah, M.O., Bessières, P. (2001). Sentence Filtering for Information Extraction in Genomics, a Classification Problem. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_27

Download citation

DOI: https://doi.org/10.1007/3-540-44794-6_27
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Sentence Filtering for Information Extraction in Genomics, a Classification Problem

Abstract

Chapter PDF

Similar content being viewed by others

The GENIA Corpus: Annotation Levels and Applications

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sentence Filtering for Information Extraction in Genomics, a Classification Problem

Abstract

Chapter PDF

Similar content being viewed by others

The GENIA Corpus: Annotation Levels and Applications

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation