Skip to main content

An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems

  • Conference paper
  • 2134 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6096))

Abstract

This paper addresses the problem of extracting and processing relevant information from unstructured electronic documents of the biomedical domain. The documents are full scientific papers. This problem imposes several challenges, such as identifying text passages that contain relevant information, collecting the relevant information pieces, populating a database and a data warehouse, and mining these data. For this purpose, this paper proposes the IEDSS-Bio, an environment for Information Extraction and Decision Support System in Biomedical domain. In a case study, experiments with machine learning for identifying relevant text passages (disease and treatment effects, and patients number information on Sickle Cell Anemia papers) showed that the best results (95.9% accuracy) were obtained with a statistical method and the use of preprocessing techniques to resample the examples and to eliminate noise.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2007)

    Google Scholar 

  2. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17(3), 37–54 (1996)

    Google Scholar 

  3. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  4. Džeroski, S.: Multi-Relational Data Mining: An Introduction. ACM SIGKDD Explorations Newsletter 5(1), 1–16 (2003)

    Article  Google Scholar 

  5. Cohen, K.B., Hunter, L.: Getting Started in Text Mining. PLoS Computational Biology 4(1), 1–3 (2008)

    Article  Google Scholar 

  6. Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical Literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)

    Article  Google Scholar 

  7. Ananiadou, S., McNaught, J. (eds.): Text Mining for Biology and Biomedicine. Artech House, Norwood (2006)

    Google Scholar 

  8. Tsuruoka, Y., Tsujii, J.: Improving the Performance of Dictionary-Based Approaches in Protein Name Recognition. Journal of Biomedical Informatics 37(6), 461–470 (2004)

    Article  Google Scholar 

  9. Chun, H.-W., Tsuruoka, Y., Kim, J.-D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, J.: Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning. In: 11th PSB, Hawaii, pp. 4–15 (2006)

    Google Scholar 

  10. Mika, S., Rost, B.: NLProt: Extracting Protein Names and Sequences from Papers. Nucleic Acids Research 32(suppl. 2), 634–637 (2004)

    Article  Google Scholar 

  11. Seki, K., Mostafa, J.: A Hybrid Approach to Protein Name Identification in Biomedical Texts. Information Processing & Management 41(4), 723–743 (2005)

    Article  Google Scholar 

  12. Hanisch, D., Fundel, K., Mevissen, H., Zimmer, R., Fluck, J.: Prominer: Rule-Based Protein and Gene Entity Recognition. BMC Bioinf. 6(suppl. 1), S14 (2005)

    Article  Google Scholar 

  13. Tanabe, L., Wilbur, W.J.: Tagging Gene and Protein Names in Biomedical Text. Bioinformatics 18(8), 1124–1132 (2002)

    Article  Google Scholar 

  14. Bremer, E.G., Natarajan, J., Zhang, Y., DeSesa, C., Hack, C.J., Dubitzky, W.: Text Mining of Full Text Articles and Creation of a Knowledge Base for Analysis of Microarray Data. In: López, J.A., Benfenati, E., Dubitzky, W. (eds.) KELSI 2004. LNCS (LNAI), vol. 3303, pp. 84–95. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Garten, Y., Altman, R.: Pharmspresso: A Text Mining Tool for Extraction of Pharmacogenomic Concepts and Relationships from Full Text. BMC Bioinf. 10(suppl. 2), S6 (2009)

    Article  Google Scholar 

  16. Tanabe, L., Wilbur, W.J.: Tagging Gene and Protein Names in Full Text Articles. In: Workshop on NLP in the Biomedical Domain, pp. 9–13. ACL, Phildadelphia (2002)

    Google Scholar 

  17. Cohen, A.M., Hersh, W.R.: A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics 6(1), 57–71 (2005)

    Article  Google Scholar 

  18. Pinto, A.C.S., Matos, P.F., Perlin, C.B., Andrade, C.G., Carosia, A.E.O., Lombardi, L.O., Ciferri, R.R., Pardo, T.A.S., Ciferri, C.D.A., Vieira, M.T.P.: Technical Report Sickle Cell Anemia. Technical Report, Federal University of São Carlos (2009), http://sca.dc.ufscar.br/download/files/report.sca.pdf

  19. Fleiss, J.L.: Measuring Nominal Scale Agreement among Many Raters. Psychological Bulletin 76(5), 378–382 (1971)

    Article  Google Scholar 

  20. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  21. Anthony, L., Lashkia, G.V.: Mover: A Machine Learning Tool to Assist in the Reading and Writing of Technical Papers. IEEE Trans. Prof. Comm. 46(3), 185–193 (2003)

    Article  Google Scholar 

  22. Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1), 159–174 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matos, P.F., Lombardi, L.O., Pardo, T.A.S., Ciferri, C.D.A., Vieira, M.T.P., Ciferri, R.R. (2010). An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13022-9_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13021-2

  • Online ISBN: 978-3-642-13022-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics