Information Extraction from Bibliography for Marker-Assisted Selection in Wheat

  • Claire Nédellec
  • Robert Bossy
  • Dialekti Valsamou
  • Marion Ranoux
  • Wiktoria Golik
  • Pierre Sourdille
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 478)


Improvement of most animal and plant species of agronomical interest in the near future has become an international stake because of the increasing demand for feeding a growing world population and to mitigate the reduction of the industrial resources. The recent advent of genomic tools contributed to improve the discovery of linkage between molecular markers and genes that are involved in the control of traits of agronomical interest such as grain number or disease resistance. This information is mostly published as scientific papers but rarely available in databases. Here, we present a method aiming at automatically extract this information from the scientific literature and relying on a knowledge model of the target information and on the WheatPhenotype ontology that we developed for this purpose. The information extraction results were evaluated and integrated into the on-line semantic search engine AlvisIR WheatMarker.


information extraction corpus annotation natural language processing ontology building biology genetics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nédellec, C.: Learning Language in Logic – Genic Interaction Extraction Challenge. In: Proc 4th Learning Language in Logic Workshop (LLL 2005), pp. 31–37 (2005)Google Scholar
  2. 2.
    Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6(suppl. 1), S1 (2005)Google Scholar
  3. 3.
    Kim, J.D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Extracting bio-molecular events from literature – The BioNLP 2009 Shared Task. Computational Intelligence 27(4), 513–540 (2011)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Golik, W., et al.: ATOL: the multi-species livestock trait ontology. In: Dodero, J.M., Palomo-Duarte, M., Karampiperis, P. (eds.) MTSR 2012. CCIS, vol. 343, pp. 289–300. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Collier, N., Tran, M.-V., Le, H.-Q., Ha, Q.-T., Oellrich, A., et al.: Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking. PLoS One 8(10), e72965 (2013), doi:10.1371/journal.pone.0072965Google Scholar
  6. 6.
    Paux, E., Faure, S., Choulet, F., Roger, D., Gauthier, V., Martinant, J.-P., Sourdille, P., Balfourier, F., Lepaslier, M.-C., Brunel, D., Cakir, M., Gandon, B., Feuillet, C.: Insertion site based polymorphism markers open new perspectives for genome saturation and marker-assisted selection in wheat. Plant Biotechnol. J. (2009)Google Scholar
  7. 7.
    Nédellec, C., Nazarenko, A., Bossy, R.: Information Extraction. In: Staab, S., Studer, R. (eds.) Ontology Handbook, 2nd edn., pp. 663–686. Springer, Berlin (2009)CrossRefGoogle Scholar
  8. 8.
    Papazian, F., Bossy, R., Nédellec, C.: AlvisAE: a collaborative Web text annotation editor for knowledge acquisition. In: Proc. 6th Linguistic Annotation Workshop (The LAW VI), pp. 149–152 (2012)Google Scholar
  9. 9.
    Kripke, S.: Naming and Necessity. Harvard University Press, Boston (1982)Google Scholar
  10. 10.
    Golik, W., Warnier, P., Nédellec, C.: Corpus-based extension of termino-ontology by linguistic analysis: a use case in biomedical event extraction. In: Proc. 9th Intl Conf. Terminology and Artificial Intelligence (TIA 2011), pp. 37–39 (2011)Google Scholar
  11. 11.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3), 229–236 (2010)CrossRefGoogle Scholar
  12. 12.
    Ratkovic, Z., Golik, W., Warnier, P.: Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach. BMC Bioinformatics 13(suppl. 11), S8 (2012)Google Scholar
  13. 13.
    Bossy, R., Golik, W., Ratkovic, Z., Bessières, P., Nédellec, C.: BioNLP Shared Task 2013 – an overview of the bacteria biotope task. In: Proc BioNLP Shared Task 2013 Workshop 2013, pp. 74–82. Association for Computational Linguistics, ACL (2013)Google Scholar
  14. 14.
    Golik, W., Bossy, R., Ratkovic, Z., Nédellec, C.: Improving term extraction with linguistic analysis in the biomedical domain. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013), Samos, Greece (2013)Google Scholar
  15. 15.
    Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
  16. 16.
    Raats, D., Frenkel, Z., Krugman, T., Dodek, I., Sela, H., Simková, H., Magni, F., Cattonaro, F., Vautrin, S., Bergès, H., Wicker, T., Keller, B., Leroy, P., Philippe, R., Paux, E., Doležel, J., Feuillet, C., Korol, A., Fahima, T.: The physical map of wheat chromosome 1BS provides insights into its gene space organization and evolution. Genome Biol. 14(12), R138 (2013)Google Scholar
  17. 17.
    Choulet, F., Alberti, A., Theil, S., Glover, N., Barbe, V., et al.: Analysis of the wheat chromosome 3B reference sequence reveals structural and functional compartmentalization. Science 345 (2014), doi:10.1126/science.1249721Google Scholar
  18. 18.
    International Wheat Genome Sequencing Consortium (IWGSC) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345 (2014), doi: 10.1126/science.1251788Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Claire Nédellec
    • 1
  • Robert Bossy
    • 1
  • Dialekti Valsamou
    • 1
  • Marion Ranoux
    • 2
  • Wiktoria Golik
    • 1
  • Pierre Sourdille
    • 2
  1. 1.INRA, unité UR1077 MIG (Mathématique, Informatique et Génome), Domaine de VilvertJouy-en-JosasFrance
  2. 2.INRA, UMR1095 GDEC (Génétique, Diversité, Ecophysiologie des Céréales), Domaine de CrouëlClermont-Ferrand cedexFrance

Personalised recommendations