Skip to main content

Analysis of Biological Processes and Diseases Using Text Mining Approaches

  • Protocol
  • First Online:
Bioinformatics Methods in Clinical Research

Part of the book series: Methods in Molecular Biology ((MIMB,volume 593))

Abstract

A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein–protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into a common platform provided by the BioCreative Metaserver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krallinger M, Valencia A, Hirschman L. (2008) Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 9(Suppl 2):S8.

    Article  PubMed  CAS  Google Scholar 

  2. Braconi Quintaje S, Orchard S. (2008) The annotation of both human and mouse kinomes in UniProtKB/Swiss-Prot: one small step in manual annotation, one giant leap for full comprehension of genomes. Mol Cell Proteomics 7(8): 1409–1419.

    Article  PubMed  CAS  Google Scholar 

  3. Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. (2007) Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 23(13):i41–i48.

    Article  PubMed  CAS  Google Scholar 

  4. Leitner F, Valencia A. (2008) A text-mining perspective on the requirements for electronically annotated abstracts. FEBS Lett 582(8):1178–1181.

    Article  PubMed  CAS  Google Scholar 

  5. Ceol A, Chatr-Aryamontri A, Licata L, Cesareni G. (2008) Linking entries in protein interaction database to structured text: the FEBS Letters experiment. FEBS Lett 582(8):1171–1177.

    Article  PubMed  CAS  Google Scholar 

  6. Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM. Open Regulatory Annotation Consortium. (2008) Text-mining assisted regulatory annotation. Genome Biol 9(2):R31.

    Article  PubMed  CAS  Google Scholar 

  7. Shtatland T, Guettler D, Kossodo M, Pivovarov M, Weissleder R. (2007) PepBank – a database of peptides based on sequence text mining and public peptide data sources. BMC Bioinformatics 8:280.

    Article  PubMed  CAS  Google Scholar 

  8. Hoffmann R, Valencia A. (2005) Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21(Suppl 2):ii252–ii258.

    Article  PubMed  CAS  Google Scholar 

  9. Manning CD, Schütze H. (2003) Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA.

    Google Scholar 

  10. Jiang J, Zhai CX. (2007) An empirical study of tokenization strategies for biomedical information retrieval. Inform Retr 10:341–363.

    Article  Google Scholar 

  11. Tomanek K, Wermter J, Hahn U. (2007) Sentence and token splitting based on conditional random fields. Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 49–57.

    Google Scholar 

  12. Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A. (2008) Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol 9(Suppl 2):S4.

    Article  PubMed  CAS  Google Scholar 

  13. Porter MF. (1980) An algorithm for suffix stripping. Program 14(3):130–137.

    Google Scholar 

  14. Crim J, McDonald R, Pereira F. (2005) Automatically annotating documents with normalized gene lists. BMC Bioinformatics 6(Suppl 1):S13.

    Article  PubMed  CAS  Google Scholar 

  15. Settles B. (2005) ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14):3191–3192.

    Article  PubMed  CAS  Google Scholar 

  16. Wang H, Huang M, Ding S, Zhu X. (2008) Exploiting and integrating rich features for biological literature classification. BMC Bioinformatics 9(Suppl 3):S4.

    Article  CAS  Google Scholar 

  17. Hakenberg J, Plake C, Leaman R, Schroeder M, Gonzalez G. (2008) Inter-species normalization of gene mentions with GNAT. Bioinformatics 24(16):i126–i132.

    Article  PubMed  Google Scholar 

  18. Smith L, Rindflesch T, Wilbur WJ. (2004) MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics 20(14):2320–2321.

    Article  PubMed  CAS  Google Scholar 

  19. Pyysalo S, Salakoski T, Aubin S, Nazarenko A. (2006) Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches. BMC Bioinformatics 7(Suppl 3):S2.

    Article  PubMed  Google Scholar 

  20. Rinaldi F, Schneider G, Kaljurand K, Hess M, Andronis C, Konstandi O, Persidis A. (2007) Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artif Intell Med 39(2):127–136.

    Article  PubMed  Google Scholar 

  21. Bethard S, Lu Z, Martin JH, Hunter L. (2008) Semantic role labeling for protein transport predicates. BMC Bioinformatics 9:277.

    Article  PubMed  Google Scholar 

  22. Koike A, Niwa Y, Takagi T. (2005) Automatic extraction of gene/protein biological functions from biomedical text. Bioinformatics 21(7):1227–1236.

    Article  PubMed  CAS  Google Scholar 

  23. Rodríguez-Penagos C, Salgado H, Martínez-Flores I, Collado-Vides J. (2007) Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. BMC Bioinformatics 8:293.

    Article  PubMed  CAS  Google Scholar 

  24. Yamamoto Y, Takagi T. (2007) Biomedical knowledge navigation by literature clustering. J Biomed Inform 40(2):114–130.

    Article  PubMed  Google Scholar 

  25. Krauthammer M, Nenadic G. (2004) Term identification in the biomedical literature. J Biomed Inform 37(6):512–526.

    Article  PubMed  CAS  Google Scholar 

  26. Okazaki N, Ananiadou S. (2006) Building an abbreviation dictionary using a term recognition approach. Bioinformatics 22(24):3089–3095.

    Article  PubMed  CAS  Google Scholar 

  27. Leitner F, et al. (2008) Introducing meta-services for biomedical information extraction. Genome Biol 9(Suppl 2):S6.

    Article  PubMed  CAS  Google Scholar 

  28. Kim JJ, Pezik P, Rebholz-Schuhmann D. (2008) MedEvi: retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 24(11): 1410–1412.

    Article  PubMed  CAS  Google Scholar 

  29. Tomanek K, Wermter J, Hahn U. (2007) An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. Proceedings of EMNLP-CoNLL 2007, pp. 486–495.

    Google Scholar 

  30. http://www.ncbi.nlm.nih.gov.

  31. Natarajan J, Ganapathy J. (2007) Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature. Bioinformation 2(5):185–193.

    PubMed  Google Scholar 

  32. Camon E, et al. (2004) The Gene Ontology Annotation (GOA) database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32:262–266.

    Article  CAS  Google Scholar 

  33. Siadaty MS, Shu J, Knaus WA. (2007) Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles. BMC Med Inform Decis Mak 7:1.

    Article  PubMed  Google Scholar 

  34. http://www.pubmedreader.com.

  35. http://bioinfo.amc.uva.nl/human-genetics/pubreminer/.

  36. Eaton AD. (2006) HubMed: a web-based biomedical literature search interface. Nucleic Acids Res 34(Web server issue):W745–W747.

    Article  PubMed  CAS  Google Scholar 

  37. Lewis J, Ossowski S, Hicks J, Errami M, Garner HR. (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18):2298–2304.

    Article  PubMed  CAS  Google Scholar 

  38. http://www.pubmedcentral.nih.gov/.

  39. http://highwire.org/.

  40. Hearst MA, Divoli A, Guturu H, Ksikes A, Nakov P, Wooldridge MA, Ye J. (2007) BioText Search Engine: beyond abstract search. Bioinformatics 23(16):2196–2197.

    Article  PubMed  CAS  Google Scholar 

  41. Doms A, Schroeder M. (2005) GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Res 33(Web server issue):W783–W786.

    Article  PubMed  CAS  Google Scholar 

  42. Smith B, et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255.

    Article  PubMed  CAS  Google Scholar 

  43. Tsuruoka Y, McNaught J, Ananiadou S. (2008) Normalizing biomedical terms by minimizing ambiguity and variability. BMC Bioinformatics 9(Suppl 3):S2.

    Google Scholar 

  44. http://www.nlm.nih.gov/research/umls/.

  45. Frijters R, Heupers B, van Beek P, Bouwhuis M, van Schaik R, de Vlieg J, Polman J, Alkema W. (2008) CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Res 36(Web server issue):W406–W410.

    Article  PubMed  CAS  Google Scholar 

  46. http://mor.nlm.nih.gov/perl/gennav.pl.

  47. http://129.194.97.165/GOCat/.

  48. Fink JL, Kushch S, Williams PR, Bourne PE. (2008) BioLit: integrating biological literature with databases. Nucleic Acids Res 36(Web server issue):W385–W389.

    Article  PubMed  CAS  Google Scholar 

  49. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A. (2008) Text processing through web services: calling Whatizit. Bioinformatics 24(2):296–298.

    Article  PubMed  CAS  Google Scholar 

  50. Lussier Y, Borlawsky T, Rappaport D, Liu Y, Friedman C. (2006) PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing. Pacific Symposium on Biocomputing, pp. 64–75.

    Google Scholar 

  51. Blaschke C, Leon EA, Krallinger M, Valencia A. (2005) Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(Suppl 1):S16.

    Article  PubMed  CAS  Google Scholar 

  52. Oliveros JC, Blaschke C, Herrero J, Dopazo J, Valencia A. (2000) Expression profiles and biological function. Genome Inform Ser Workshop Genome Inform 11:106–117.

    PubMed  CAS  Google Scholar 

  53. Raychaudhuri S, Chang JT, Imam F, Altman RB. (2003) The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res 31(15):4553–4560.

    Article  PubMed  CAS  Google Scholar 

  54. Resnik P. (1995) Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 448–453.

    Google Scholar 

  55. Lord PW, Stevens RD, Brass A, Goble CA. (2003) Semantic similarity measures as tools for exploring the gene ontology. Pacific Symposium on Biocomputing, pp. 601–612.

    Google Scholar 

  56. Fellbaum C, Hahn U, Smith B. (2006) Towards new information resources for public health – from WordNet to MedicalWordNet. J Biomed Inform 39(3):321–332.

    Article  PubMed  CAS  Google Scholar 

  57. del Pozo A, Pazos F, Valencia A. (2008) Defining functional distances over gene ontology. BMC Bioinformatics 9:50.

    Article  PubMed  CAS  Google Scholar 

  58. Johnson HL, Cohen KB, Baumgartner WA, Lu Z, Bada M, Kester T, Kim H, Hunter L. (2006) Evaluation of lexical methods for detecting relationships between concepts from multiple ontologies. Pacific Symposium on Biocomputing, pp. 28–39.

    Google Scholar 

  59. Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW. (2003) PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4:11.

    Article  PubMed  Google Scholar 

  60. Blaschke C, Valencia A. (2001) The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform 12:123–134.

    PubMed  CAS  Google Scholar 

  61. Krallinger M, Malik R, Valencia A. (2006) Text mining and protein annotations: the construction and use of protein description sentences. Genome Inform 17(2): 121–130.

    PubMed  CAS  Google Scholar 

  62. Jenssen TK, Laegreid A, Komorowski J, Hovig E. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28.

    Article  PubMed  CAS  Google Scholar 

  63. Rebholz-Schuhmann D, Kirsch H, Arregui M, Gaudan S, Riethoven M, Stoehr P. (2007) EBIMed – text crunching to gather facts for proteins from Medline. Bioinformatics 23(2):e237–e244.

    Article  PubMed  CAS  Google Scholar 

  64. http://https://www-tsujii.is.s.u-tokyo.ac.jp/info-pubmed/.

  65. Chen H, Sharp BM. (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5:147.

    Article  PubMed  Google Scholar 

  66. Rinaldi F, Kappeler T, Kaljurand K, Schneider G, Klenner M, Clematide S, Hess M, von Allmen JM, Parisot P, Romacker M, Vachon T. (2008) OntoGene in BioCreative II. Genome Biol 9(Suppl 2):S13.

    Article  PubMed  CAS  Google Scholar 

  67. Baumgartner WA, Lu Z, Johnson HL, Caporaso JG, Paquette J, Lindemann A, White EK, Medvedeva O, Cohen KB, Hunter L. (2008) Concept recognition for extracting protein interaction relations from biomedical text. Genome Biol 9(Suppl 2):S9.

    Article  PubMed  CAS  Google Scholar 

  68. Narayanaswamy M, Ravikumar KE, Vijay-Shanker K. (2005) Beyond the clause: extraction of phosphorylation information from Medline abstracts. Bioinformatics 21(Suppl 1):i319–i327.

    Article  PubMed  CAS  Google Scholar 

  69. Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, Valencia A. (2005) Text mining for metabolic pathways, signaling cascades, and protein networks. Sci STKE 283:pe21.

    Google Scholar 

  70. Oda K, Kim JD, Ohta T, Okanohara D, Matsuzaki T, Tateisi Y, Tsujii J. (2008) New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinformatics 9(Suppl 3):S5.

    Article  PubMed  CAS  Google Scholar 

  71. Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A. (2001) GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl 1):S74–S82.

    PubMed  Google Scholar 

  72. Yuryev A, Mulyukov Z, Kotelnikova E, Maslov S, Egorov S, Nikitin A, Daraselia N, Mazo I. (2006) Automatic pathway building in biological association networks. BMC Bioinformatics 7:171.

    Article  PubMed  CAS  Google Scholar 

  73. Koike A, Kobayashi Y, Takagi T. (2003) Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 13(6A):1231–1243.

    Article  PubMed  CAS  Google Scholar 

  74. Ding J, Viswanathan K, Berleant D, Hughes L, Wurtele ES, Ashlock D, Dickerson JA, Fulmer A, Schnable PS. (2005) Using the biological taxonomy to access biological literature with PathBinderH. Bioinformatics 21(10):2560–2562.

    Article  PubMed  CAS  Google Scholar 

  75. Lee H, Yi GS, Park JC. (2008) E3Miner: a text mining tool for ubiquitin-protein ligases. Nucleic Acids Res 36(Web server issue):W416–W422.

    Article  PubMed  CAS  Google Scholar 

  76. Al-Shahrour F, Carbonell J, Minguez P, Goetz S, Conesa A, Tárraga J, Medina I, Alloza E, Montaner D, Dopazo J. (2008) Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 36(Web server issue):W341–W346.

    Article  PubMed  CAS  Google Scholar 

  77. Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. (2008) STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 36(Database issue):D684–D688.

    PubMed  CAS  Google Scholar 

  78. Chang A, Scheer M, Grote A, Schomburg I, Schomburg D. (2008) BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res 37(Database issue):D588–D592.

    Google Scholar 

  79. Spasić I, Schober D, Sansone SA, Rebholz-Schuhmann D, Kell DB, Paton NW. (2008) Facilitating the development of controlled vocabularies for metabolomics technologies with text mining. BMC Bioinformatics 9(Suppl 5):S5.

    Article  PubMed  CAS  Google Scholar 

  80. Jin Y, McDonald RT, Lerman K, Mandel MA, Carroll S, Liberman MY, Pereira FC, Winters RS, White PS. (2006) Automated recognition of malignancy mentions in biomedical literature. BMC Bioinformatics 7:492.

    Article  PubMed  CAS  Google Scholar 

  81. Chun HW, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J. (2006) Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. Pacific Symposium in Biocomputing, pp. 4–15.

    Google Scholar 

  82. Pospisil P, Iyer LK, Adelstein SJ, Kassis AI. (2006) A combined approach to data mining of textual and structured data to identify cancer-related targets. BMC Bioinformatics 7:354.

    Article  PubMed  CAS  Google Scholar 

  83. Natarajan J, Berrar D, Dubitzky W, Hack C, Zhang Y, DeSesa C, Van Brocklyn JR, Bremer EG. (2006) Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line. BMC Bioinformatics 7:373.

    Article  PubMed  CAS  Google Scholar 

  84. Li X, Chen H, Huang Z, Su H, Martinez JD. (2007) Global mapping of gene/protein interactions in PubMed abstracts: a framework and an experiment with P53 interactions. J Biomed Inform 40(5): 453–464.

    Article  PubMed  CAS  Google Scholar 

  85. McDonald DM, Chen H, Su H, Marshall BB. (2004) Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser. Bioinformatics 20(18):3370–3378.

    Article  PubMed  CAS  Google Scholar 

  86. Gonzalez G, Uribe JC, Brophy C, Baral C. (2007) Mining gene-disease relationships from biomedical literature: weighting protein-protein interactions and connectivity measures. Pacific Symposium of Biocomputing, pp. 28–39.

    Google Scholar 

  87. Croning MD, Marshall MC, McLaren P, Armstrong JD, Grant SG. (2008) G2Cdb: the Genes to Cognition database. Nucleic Acids Res 37(Database issue):D846–D851.

    Google Scholar 

  88. Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, Ngo QH, Dien D, Kawtrakul A, Takeuchi K, Shigematsu M, Taniguchi K. (2008) BioCaster: detecting public health rumors with a web-based text mining system. Bioinformatics 24(24):2940–2941.

    Google Scholar 

  89. Srinivasan P, Libbus B. (2004) Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 20(Suppl 1):i290–i296.

    Article  PubMed  CAS  Google Scholar 

  90. Tremblay K, Lemire M, Potvin C, Tremblay A, Hunninghake GM, Raby BA, Hudson TJ, Perez-Iratxeta C, Andrade-Navarro MA, Laprise C. (2008) Genes to diseases (G2D) computational method to identify asthma candidate. PLoS ONE 3(8):e2907.

    Article  PubMed  CAS  Google Scholar 

  91. Tsuruoka Y, Tsujii J, Ananiadou S. (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560.

    Article  PubMed  CAS  Google Scholar 

  92. Müller H, Mancuso F. (2008) Identification and analysis of co-occurrence networks with NetCutter. PLoS ONE 3(9):e3178.

    Article  PubMed  CAS  Google Scholar 

  93. Jelier R, Schuemie MJ, Veldhoven A, Dorssers LC, Jenster G, Kors JA. (2008) Anni 2.0: a multipurpose text-mining tool for the life sciences. Genome Biol 9(6):R96.

    Article  PubMed  CAS  Google Scholar 

  94. Lin SM, McConnell P, Johnson KF, Shoemaker J. (2004) MedlineR: an open source library in R for Medline literature data mining. Bioinformatics 20(18):3659–3661.

    Article  PubMed  CAS  Google Scholar 

  95. Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. (2007) CARGO: a web portal to integrate customized biological information. Nucleic Acids Res 35:W16–W20.

    Article  PubMed  Google Scholar 

  96. Xuan W, Wang P, Watson SJ, Meng F. (2007) Medline search engine for finding genetic markers with biological significance. Bioinformatics 23(18):2477–2484.

    Article  PubMed  CAS  Google Scholar 

  97. Furlong LI, Dach H, Hofmann-Apitius M, Sanz F. (2008) OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature. BMC Bioinformatics 9:84.

    Article  PubMed  CAS  Google Scholar 

  98. Caporaso JG, Baumgartner WA, Randolph DA, Cohen KB, Hunter L. (2007) MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23(14):1862–1865.

    Article  PubMed  CAS  Google Scholar 

  99. McDonald RT, Winters RS, Mandel M, Jin Y, White PS, Pereira F. (2004) An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics 20(17):3249–3251.

    Article  PubMed  CAS  Google Scholar 

  100. Saunders RE, Perkins SJ. (2008) CoagMDB: a database analysis of missense mutations within four conserved domains in five vitamin K-dependent coagulation serine proteases using a text-mining tool. Hum Mutat 29(3):333–344.

    Article  PubMed  CAS  Google Scholar 

  101. Bajdik CD, Kuo B, Rusaw S, Jones S, Brooks-Wilson A. (2005) CGMIM: automated text-mining of Online Mendelian Inheritance in Man (OMIM) to identify genetically-associated cancers and candidate genes. BMC Bioinformatics 6:78.

    Article  PubMed  CAS  Google Scholar 

  102. Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ. (2008) A navigator for human genome epidemiology. Nat Genet 40(2):124–125.

    Article  PubMed  CAS  Google Scholar 

  103. Yu W, Clyne M, Dolan SM, Yesupriya A, Wulf A, Liu T, Khoury MJ, Gwinn M. (2008) GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC Bioinformatics 9:205.

    Article  PubMed  CAS  Google Scholar 

  104. Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. (2008) PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36:W399–W405.

    Article  PubMed  CAS  Google Scholar 

  105. Fang YC, Huang HC, Juan HF. (2008) MeInfoText: associated gene methylation and cancer information from text mining. BMC Bioinformatics 9:22.

    Article  PubMed  CAS  Google Scholar 

  106. Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. (2008) PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Res 36:D842–D846.

    Article  PubMed  CAS  Google Scholar 

  107. Tranchevent LC, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, De Moor B, Aerts S, Moreau Y. (2008) ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 36:W377–W384.

    Article  PubMed  CAS  Google Scholar 

  108. Perez-Iratxeta C, Bork P, Andrade-Navarro MA. (2008) Update of the G2D tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res 35: W212–W216.

    Article  Google Scholar 

  109. Gaulton KJ, Mohlke KL, Vision TJ. (2007) A computational system to select candidate genes for complex human traits. Bioinformatics 23(9):1132–1140.

    Article  PubMed  CAS  Google Scholar 

  110. Krallinger M, Rojas A, Valencia A. (2008) Creating reference datasets for systems biology applications using text mining. Ann NY Acad Sci, accepted for publication. 1158:14–28.

    Google Scholar 

  111. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A. (2008) Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol 9(Suppl 2):S1.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Krallinger, M., Leitner, F., Valencia, A. (2010). Analysis of Biological Processes and Diseases Using Text Mining Approaches. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-194-3_16

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-193-6

  • Online ISBN: 978-1-60327-194-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics