Prioritizing Literature Search Results Using a Training Set of Classified Documents
Finding relevant articles is rapidly becoming a demanding task for researchers in the biomedical field, due to the rapid expansion of the scientific literature. We investigate the use of ranking strategies for prioritizing literature search results given an initial topic of interest. Focusing on the topic of protein-protein interactions, we compared ranking strategies based on different classifiers and features. The best result obtained on the BioCreative III PPI test set was an area under the interpolated precision-recall curve of 0,629. We then analyze the use of this method for ranking the result of PubMed queries. The results shown indicate that this strategy can be used by database curators to prioritize articles for extraction of protein-protein interactions, and also by general researchers looking for publications describing protein-protein interactions within a particular area of interest.
KeywordsInformation Retrieval Biomedical Literature Protein-protein Interactions Article Classification
Unable to display preview. Download preview PDF.
- 1.National Library of Medicine, MEDLINE Fact Sheet (2010), http://www.nlm.nih.gov/pubs/factsheets/medline.html (accessed December 3, 2010)
- 4.Lehne, B., Schlitt, T.: Protein–protein interaction databases: Keeping up with growing interactomes. Hum. Genomics 3(3), 291–297 (2009)Google Scholar
- 7.National Library of Medicine, Entrez Programming Utilities, http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/eutils_help.html (accessed December 3, 2010)
- 16.Krallinger, M., Vazquez, M., Leitner, F., Valencia, A.: Results of the BioCreative III Article Classification Task. In: Proceedings of the Third BioCreative Workshop, Bethesda, USA, September 13-15 (2010)Google Scholar
- 17.The Apache Software Foundation, Apache Lucene (2010), http://lucene.apache.org/ (accessed December 3, 2010)
- 18.HUPO Proteomics Standards Initiative, MI Ontology (2005), http://psidev.sourceforge.net/mi/rel25/data/psi-mi25.obo (accessed December 3, 2010)
- 19.Sasaki, Y., Montemagni, S., Pezik, P., Rebholz-Schuhmann, D., McNaught, J., Ananiadou, S.: BioLexicon: A Lexical Resource for the Biology Domain. In: Proceedings of the Third International Symposium on Semantic Mining in Biomedicine, Turku, Finland, September 1-3 (2008)Google Scholar
- 20.McCallum, A.K.: MALLET: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu (accessed December 3, 2010)