Text Mining for Systems Modeling

  • Axel Kowald
  • Sebastian Schmeier
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 696)

Abstract

The yearly output of scientific papers is constantly rising and makes it often impossible for the individual researcher to keep up. Text mining of scientific publications is, therefore, an interesting method to automate knowledge and data retrieval from the literature. In this chapter, we discuss specific tasks required for text mining, including their problems and limitations. The second half of the chapter demonstrates the various aspects of text mining using a practical example. Publications are transformed into a vector space representation and then support vector machines are used to classify papers depending on their content of kinetic parameters, which are required for model building in systems biology.

References

  1. 1.
    White J, Wain H, Bruford E, Povey S (1999) Promoting a standard nomenclature for genes and proteins. Nature 402(6760):347CrossRefPubMedGoogle Scholar
  2. 2.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29CrossRefPubMedGoogle Scholar
  3. 3.
    Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256CrossRefPubMedGoogle Scholar
  4. 4.
    Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33:W783–W786 (Web Server issue)CrossRefPubMedGoogle Scholar
  5. 5.
    Soldatova LN, King RD (2005) Are the current ontologies in biology good ontologies? Nat Biotechnol 23(9):1095–1098CrossRefPubMedGoogle Scholar
  6. 6.
    Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W et al (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255CrossRefPubMedGoogle Scholar
  7. 7.
    Spasic I, Ananiadou S, McNaught J, Kumar A (2005) Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform 6(3):239–251CrossRefPubMedGoogle Scholar
  8. 8.
    Hoffmann R, Valencia A (2004) A gene network for navigating the literature. Nat Genet 36(7):664CrossRefPubMedGoogle Scholar
  9. 9.
    Rebholz-Schuhmann D, Kirsch H, Arregui M, Gaudan S, Riethoven M, Stoehr P (2007) EBIMed-text crunching to gather facts for proteins from Medline. Bioinformatics 23(2):e237–e244CrossRefPubMedGoogle Scholar
  10. 10.
    Plake C, Schiemann T, Pankalla M, Hakenberg J, Leser U (2006) AliBaba: PubMed as a graph. Bioinformatics 22(19):2444–2445CrossRefPubMedGoogle Scholar
  11. 11.
    Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874CrossRefGoogle Scholar
  12. 12.
    Hakenberg J, Schmeier S, Kowald A, Klipp E, Leser U (2004) Finding kinetic parameters using text mining. OMICS 8(2):131–152CrossRefPubMedGoogle Scholar
  13. 13.
    Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefGoogle Scholar
  14. 14.
    Strasberg HR, Manning CD, Rindfleisch TC, Melmon KL (2000) What’s related? Generalizing approaches to related articles in medicine. Proc AMIA Symp 838–842Google Scholar
  15. 15.
    Glenisson P, Antal P, Mathys J, Moreau Y, De Moor B (2003) Evaluation of the vector space representation in text-based gene clustering. Pac Symp Biocomput 391–402Google Scholar
  16. 16.
    Vapnik VN (1995) The nature of statistical learning theory. Springer, BerlinGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Axel Kowald
    • 1
  • Sebastian Schmeier
    • 2
  1. 1.Protagen AGDortmundGermany
  2. 2.South African National Bioinformatics InstituteUniversity of the Western CapeBellvilleSouth Africa

Personalised recommendations