Bioinformatics Methods in Clinical Research

Volume 593 of the series Methods in Molecular Biology pp 341-382


Analysis of Biological Processes and Diseases Using Text Mining Approaches

  • Martin KrallingerAffiliated withCentro Nacional de Investigaciones Oncológicas
  • , Florian LeitnerAffiliated withCentro Nacional de Investigaciones Oncológicas
  • , Alfonso ValenciaAffiliated withCentro Nacional de Investigaciones Oncológicas

* Final gross prices may vary according to local VAT.

Get Access


A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein–protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into a common platform provided by the BioCreative Metaserver.

Key words

Text mining information extraction natural language processing pathways cancer diseases gene raking document classification biomarkers epigenetics