Journal of Intelligent Information Systems

, Volume 34, Issue 3, pp 249–274

Passage extraction and result combination for genomics information retrieval


DOI: 10.1007/s10844-009-0097-4

Cite this article as:
Hu, Q. & Huang, J.X. J Intell Inf Syst (2010) 34: 249. doi:10.1007/s10844-009-0097-4


In this paper, we first propose algorithms for passage extraction to build indices for the purpose of generating more accurate passages as query answers. Second, we propose a basic result combination method and an improved result combination method to combine the retrieved results from different indices for the purpose of selecting and merging relevant passages as outputs. For passage extraction, three new algorithms are proposed, namely paragraphParsed, sentenceParsed and wordSentenceParsed. For result combination, a novel method is proposed, in which we use factor analysis to generate a better baseline result for combination by finding some hidden common factors that can be used to estimate the importance of keywords and keyword associations. Finally, we report the experimental results that confirm the effectiveness and superiority of the factor analysis based method for result combination. Our proposed approaches achieve excellent results on the TREC 2006 and 2007 Genomics data sets, which provide a promising avenue for constructing high performance information retrieval systems in biomedicine.


Information retrievalPassage extractionResult combinationLinear regressionFactor analysisGenomics

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringYork UniversityTorontoCanada
  2. 2.School of Information TechnologyYork UniversityTorontoCanada