Skip to main content

PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining

  • Conference paper
Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3192))

Abstract

PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language analysis. The extracted interactions are further analyzed with a set of features of each entity which were constructed from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The evaluation of system performance proceeded with the protein interaction data of S.cerevisiae (bakers yeast) from MIPS and SGD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrade, M.A., Borka, P.: Automated extraction of information in molecular biology. FEBS Letters 476, 12–17 (2000)

    Article  Google Scholar 

  2. Chiang, J.H., et al.: GIS: a biomedical text–mining system for gene information discovery. Bioinformatics 20(1), 120–121 (2004)

    Article  Google Scholar 

  3. Blaschke, C., et al.: Automatic extraction of biological information from scientific text: protein–protein interactions. In: Proc. of ISMB 1999, Heidelberg, Germany, pp. 60–67 (1999)

    Google Scholar 

  4. BioBiblioMetrics, http://www.bmm.icnet.uk/~stapleyb/biobib/

  5. Tanabe, L., et al.: MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27, 1210–1217 (1999)

    MathSciNet  Google Scholar 

  6. Safran, M., et al.: Human gene-centric databases at the Weizmann institute of science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res. 31(1), 142–146 (2003)

    Article  Google Scholar 

  7. Andrade, M., Valencia, A., Automatic, A.: extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)

    Article  Google Scholar 

  8. Perez-Iratxeta, C., et al.: XplorMed: a tool for exploring MEDLINE abstracts. Trends. Biochem. Sci. 26, 573–575 (2001)

    Article  Google Scholar 

  9. Friedman, C., et al.: GENIS: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl.1), S74–S82 (2001)

    Google Scholar 

  10. Daraselia, N., et al.: Extracting human protein interactions from MEDLINE using a fullsentence parser. Bioinformatics 20(5), 604–611 (2004)

    Article  Google Scholar 

  11. Nédellec, C., et al.: Machine learning for information extraction in genomics – state of the art and perspectives. In: Sirmakessis, S. (ed.) Text Mining and its Applications. Studies in Fuzzi. and Soft Comp., vol. 138, pp. 99–118. Springer, Heidelberg (2004)

    Google Scholar 

  12. Humphreys, B.L., et al.: The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5, 1–11 (1998)

    Article  Google Scholar 

  13. Kim, J.D., et al.: GENIA corpus - semantically annotated corpus for bio-textmining. Bioinformatics 19(Suppl. 1), i180–182 (2003)

    Article  Google Scholar 

  14. Hwang, Y.S., et al.: Weighted probabilistic sum model based on decision tree decomposition for text chunking. Int. J. Comp. Proc. Orient. Lang. 16(1), 1–20 (2003)

    Article  Google Scholar 

  15. Lee, K.J., et al.: Two-phase biomedical NE recognition based on SVMs. In: Proc. of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)

    Google Scholar 

  16. Eom, J.H., et al.: PubMiner – a machine learning-based biomedical text mining system. Technical Report BI–TR0401), Biointelligence Lab., Seoul National University (2004)

    Google Scholar 

  17. Christie, K.R., et al.: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32(1), D311–D314 (2004)

    Article  Google Scholar 

  18. Mewes, H.W., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(1), D41–D44 (2004)

    Article  Google Scholar 

  19. Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD 1993, Washington D.C., USA, pp. 207–216 (1993)

    Google Scholar 

  20. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of SIGIR 2000, Athens, Greece, pp. 208–215 (2000)

    Google Scholar 

  21. Yu, L., Liu, H.: Feature selection for high dimensional data: a fast correlation-based filter solution. In: Proc. of ICML 2003, Washington D.C., USA, pp. 856–863 (2003)

    Google Scholar 

  22. Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  23. Press, W.H., et al.: Numerical recipes in C. Cambridge University Press, Cambridge (1988)

    MATH  Google Scholar 

  24. Oyama, T., et al.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18, 705–714 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eom, JH., Zhang, BT. (2004). PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30106-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22959-9

  • Online ISBN: 978-3-540-30106-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics