Introduction

In the recent decades, high-throughput screening methods were established, bringing forth major breakthroughs in the fields of molecular biology and biomedicine. Since researchers in these fields need to interpret an enormous quantity of data and the publication rates of scientific articles are exploding, demands on text mining technology are growing with each passing year.

Medie (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/) and Info-pubmed (http://www-tsujii.is.s.u-tokyo.ac.jp/info-pubmed/) were developed as a response to these information needs. Medie is a general-purpose integrated Pubmed search engine and Info-pubmed is a targeted system for finding information about the interactions of key biomedical entities.

In this work, the first update of these systems since their introduction, we present multiple extensions of the systems based on recent advances in biomedical text mining.

Extensions of Medie and Info-pubmed

Medie and Info-pubmed are based on deep syntactic analysis of sentence structure. To allow users to take advantage of the latest parsing technology, the current release integrates an improved parser [1].

In an extension of semantic search capabilities, the updated Medie system incorporates extended ontology-based search that allows the query verb to be replaced by any GENIA event ontology (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/) term. Such searches are expanded to the set of verbs annotated as expressing the given event in GENIA corpus [2]: for example, a search for Positive regulation will now match activate, induce, etc.

To allow more focused searches, we incorporated the section labeling method of Hirohata et al. [3], creating search options limiting queries to specific types of sentences such as methods, results and conclusions. The indexing system and search options were further augmented with Pubmed annotation metadata, allowing searches to be limited by MeSH terms, author, or journal.

The initial release of Info-pubmed implemented search for automatically detected protein-protein interactions. We have extended this search capability to include gene-disease associations [4], allowing the system to be used also to study the epidemiological connections of biomolecules.

Finally, we have extended the coverage of both systems to the entire PubMed and added scheduled update modules that perform daily updates of the system database, fully automating data access, analysis and indexing.

Figure 1 shows an example search result on Medie illustrating a number of the newly introduced functions.

Figure 1
figure 1

Snapshot of updated Medie: “What disease does dystrophin cause?”

Conclusions

We have introduced extended and updated functionality for Medie and Info-pubmed, search systems integrating state-of-the-art text mining technology. The updates allow advanced semantic searches of the latest published information in all of Pubmed.