Skip to main content

Text Mining for Systems Modeling

  • Protocol
  • First Online:
Data Mining in Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 696))

Abstract

The yearly output of scientific papers is constantly rising and makes it often impossible for the individual researcher to keep up. Text mining of scientific publications is, therefore, an interesting method to automate knowledge and data retrieval from the literature. In this chapter, we discuss specific tasks required for text mining, including their problems and limitations. The second half of the chapter demonstrates the various aspects of text mining using a practical example. Publications are transformed into a vector space representation and then support vector machines are used to classify papers depending on their content of kinetic parameters, which are required for model building in systems biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. White J, Wain H, Bruford E, Povey S (1999) Promoting a standard nomenclature for genes and proteins. Nature 402(6760):347

    Article  CAS  PubMed  Google Scholar 

  2. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    Article  CAS  PubMed  Google Scholar 

  3. Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256

    Article  PubMed  Google Scholar 

  4. Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33:W783–W786 (Web Server issue)

    Article  CAS  PubMed  Google Scholar 

  5. Soldatova LN, King RD (2005) Are the current ontologies in biology good ontologies? Nat Biotechnol 23(9):1095–1098

    Article  CAS  PubMed  Google Scholar 

  6. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W et al (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255

    Article  CAS  PubMed  Google Scholar 

  7. Spasic I, Ananiadou S, McNaught J, Kumar A (2005) Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform 6(3):239–251

    Article  CAS  PubMed  Google Scholar 

  8. Hoffmann R, Valencia A (2004) A gene network for navigating the literature. Nat Genet 36(7):664

    Article  CAS  PubMed  Google Scholar 

  9. Rebholz-Schuhmann D, Kirsch H, Arregui M, Gaudan S, Riethoven M, Stoehr P (2007) EBIMed-text crunching to gather facts for proteins from Medline. Bioinformatics 23(2):e237–e244

    Article  CAS  PubMed  Google Scholar 

  10. Plake C, Schiemann T, Pankalla M, Hakenberg J, Leser U (2006) AliBaba: PubMed as a graph. Bioinformatics 22(19):2444–2445

    Article  CAS  PubMed  Google Scholar 

  11. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874

    Article  Google Scholar 

  12. Hakenberg J, Schmeier S, Kowald A, Klipp E, Leser U (2004) Finding kinetic parameters using text mining. OMICS 8(2):131–152

    Article  CAS  PubMed  Google Scholar 

  13. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  14. Strasberg HR, Manning CD, Rindfleisch TC, Melmon KL (2000) What’s related? Generalizing approaches to related articles in medicine. Proc AMIA Symp 838–842

    Google Scholar 

  15. Glenisson P, Antal P, Mathys J, Moreau Y, De Moor B (2003) Evaluation of the vector space representation in text-based gene clustering. Pac Symp Biocomput 391–402

    Google Scholar 

  16. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Kowald, A., Schmeier, S. (2011). Text Mining for Systems Modeling. In: Hamacher, M., Eisenacher, M., Stephan, C. (eds) Data Mining in Proteomics. Methods in Molecular Biology, vol 696. Humana Press. https://doi.org/10.1007/978-1-60761-987-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-987-1_19

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60761-986-4

  • Online ISBN: 978-1-60761-987-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics