An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain

  • Min Song
  • Il-Yeol Song
  • Xiaohua Hu
  • Robert B. Allen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3518)

Abstract

In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agichtein, E., Gravano, L.: Querying Text Databases for Efficient Information Extraction. In: Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE), pp. 113–124 (2003)Google Scholar
  2. 2.
    Chang, K.C., Garcia-Molina, H., Paepcke, A.: Boolean Query Mapping Across Heterogeneous Information Sources. IEEE Transactions on Knowledge and Data Engineering 8(4), 515–521 (1996)CrossRefGoogle Scholar
  3. 3.
    Cohen, W.W., Singer, Y.: Simple, Fast, and Effective Rule Learner. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, July 18-22, pp. 335–342 (1999)Google Scholar
  4. 4.
    French, J.C., Brown, D.E., Kim, N.H.: A Classification Approach to Boolean Query Reformulation. Journal of the American Society for Information Science 48(8), 694–706 (1997)CrossRefGoogle Scholar
  5. 5.
    Salton, G., Buckley, C., Fox, E.A.: Automatic query formulations in information retrieval. Journal of the American Society for Information Science 34(4), 262–280 (1983)CrossRefGoogle Scholar
  6. 6.
    Song, M., Song, I.Y., Hu, T.: KPSpotter: A Flexible Information Gain-based Keyphrase Extraction System. In: Fifth International Workshop on Web Information and Data Management (WIDM 2003), pp. 50–53 (2003)Google Scholar
  7. 7.
    Van Der Pol, R.: Dipe-D: a Tool For Knowledge-based Query Formulation. Information Retrieval 6, 21–47 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Min Song
    • 1
  • Il-Yeol Song
    • 1
  • Xiaohua Hu
    • 1
  • Robert B. Allen
    • 1
  1. 1.College of Information Science and TechnologyDrexel UniversityPhiladelphia

Personalised recommendations