Skip to main content

Combined Search in Structured and Unstructured Medical Data

  • Chapter
  • First Online:
High-Performance In-Memory Genome Data Analysis

Part of the book series: In-Memory Data Management Research ((IMDM))

Abstract

Today, structured medical data is often considered apart from its unstructured counterpart. When searching for a specific piece of information either structured sources, e.g. genomic variant lists and electronic medical records, or unstructured sources, e.g. medical papers, research documentations and trial descriptions, are examined. However, structured data, such as a patient’s genomic data, can be valuable in searching unstructured data like clinical trial proposals in order to find apposite information for the patient. Consequently, today’s separation of both source types impedes any insights into coherencies between them. In this contribution, I propose utilizing in-memory databases to combine results from search in structured as well as in unstructured medical data and introduce a research prototype for a clinical trial search tool. The prototype suggests matching clinical trials based on a patient’s genome and benefits from the analytical performance of the in-memory database. Furthermore, I investigate how an increasing amount of medical input data affects the performance of the prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boese JH et al. (2012) Data Management with SAP’s In-memory Computing Engine. In: Proceedings of the 15th International Conference on Extending Database Technology

    Google Scholar 

  2. Chang JT, Schütze H, Altman RB (2004) GAPSCORE: Finding Gene and Protein Names One Word at a Time. Bioinformatics Journal 20(2):216–225

    Article  CAS  Google Scholar 

  3. Chiang JH, Yu HC (2003) MeKE: Discovering the Functions of Gene Products from Biomedical Literature via Sentence Alignment. Bioinformatics Journal 19(11):1417–1422

    Article  CAS  Google Scholar 

  4. Cios KJ, William Moore G (2002) Uniqueness of medical data mining. Artificial intelligence in medicine 26(1):1–24

    Article  PubMed  Google Scholar 

  5. Committee HGN (2013) HUGO Gene Nomenclature Committee. http://www.genenames.org/. Accessed Sep 23, 2013

  6. DeWitt DJ et al. (1984) Implementation Techniques for Main Memory Database Systems. In: Proceedings of the International Conference Management of Data, ACM, pp 1–8

    Google Scholar 

  7. Garcia-Molina H, Salem K (1992) Main Memory Database Systems: An Overview. IEEE Transactions on Knowledge and Data Engineering 4(6):509–516

    Article  Google Scholar 

  8. Hamosh A et al. (2005) Online Mendelian Inheritance in Man (OMIM), a Knowledgebase of Human Genes and Genetic Disorders. Nucleic Acids Research 33:D514 – D517

    Article  PubMed  CAS  Google Scholar 

  9. Hunt DL et al. (1998) Effects of Computer-based Clinical Decision Support Systems on Physician Performance and Patient Outcomes. Journal of the American Medical Association 280(15):1339–1346

    Article  PubMed  CAS  Google Scholar 

  10. Ibrahim GM, Chung C, BernsteinM (2011) Competing for Patients: An Ethical Framework for Recruiting Patients with Brain Tumors into Clinical Trials. Journal of Neuro-Oncology 104(3):623–627

    Article  PubMed  CAS  Google Scholar 

  11. Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1):27–30

    Article  PubMed  CAS  Google Scholar 

  12. Knöpfel A, Gröne B, Tabeling P (2005) Fundamental Modeling Concepts. Wiley, West Sussex UK

    Google Scholar 

  13. Krallinger M, Valencia A (2005) Text-mining and Information-retrieval Services for Molecular Biology. Genome Biology 6(7):224

    Article  PubMed  Google Scholar 

  14. Krallinger M et al. (2008) Evaluation of Text-mining Systems for Biology: Overview of the Second BioCreative Community Challenge. Genome Biology 9 supplement 2:S1

    Google Scholar 

  15. Krallinger M et al. (2008) Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biology 9, supplement 2:S8

    Google Scholar 

  16. Nadeau D, Sekine S (2007) A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes 30(1):3–26

    Article  Google Scholar 

  17. National Center for Biotechnology Information, U.S. National Library of Medicine (2013) Pubmed. http://www.ncbi.nlm.nih.gov/pubmed. Accessed Sep 23, 2013

  18. Plattner H (2013) A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases. Springer

    Google Scholar 

  19. Python Software Foundation (2013) 15.3. Time - Time Access and Conversions - Python v2.7.5 documentation. http://docs.python.org/2/library/time.html. Accessed Sep 23, 2013

  20. Python Software Foundation (2013) 26.6. Timeit - Measure Execution Time of Small Code Snippets. http://docs.python.org/2/library/timeit.html. Accessed Sep 23, 2013

  21. SAP AG (2013) SAP HANA Developer Guide. http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf. Accessed Sep 23, 2013

  22. SAP AG (2013) Text Data Processing Extraction Customization Guide. http://help.sap.com/businessobject/product_guides/sboDS42/en/ds_42_tdp_ext_cust_en.pdf. Accessed Sep 23, 2013

  23. SAP AG (2013) Text Data Processing Language Reference Guide. http://help.sap.com/businessobject/product_guides/boexir4/en/sbo401_ds_tdp_lang_ref_en.pdf. Accessed Sep 23, 2013

  24. Schapranow MP, Plattner H, Meinel C (2013) Applied In-Memory Technology for High-Throughput Genome Data Processing and Real-time Analysis. In: Proceedings of the XXI Winter Course of the Centro Avanzado Tecnológico de Análisis de Imagen, pp 35–42

    Google Scholar 

  25. Schapranow MP et al. (2013) Mobile Real-time Analysis of Patient Data for Advanced Decision Support in Personalized Medicine. In: Proceedings of the 5th International Conference on eHealth, Telemedicine, and Social Medicine

    Google Scholar 

  26. Settles B (2005) ABNER: An Open Source Tool for Automatically Tagging Genes, Proteins and other Entity Names in Text. Bioinformatics Journal 21(14):3191–3192

    Article  CAS  Google Scholar 

  27. Sittig DF et al. (2008) Grand challenges in clinical decision support v10. Journal of biomedical informatics 41(2):387

    Google Scholar 

  28. Tanabe L, Wilbur WJ (2002) Tagging Gene and Protein Names in Full Text Articles. In: Proceedings of theWorkshop on Natural Language Processing in the Biomedical Domain, vol 3, pp 9–13

    Google Scholar 

  29. The Centre for Applied Genomics (2013) Database of Genomic Variants. http://dgvbeta.tcag.ca/dgv/app/downloads. Accessed Sep 23, 2013

  30. UniProt Consortium (2013) Universal Protein Resource (UniProt). http://www.uniprot.org/. Accessed Sep 23, 2013

  31. U.S. Food and Drug Administration (2012) The FDA’s Drug Review Process: Ensuring Drugs Are Safe and Effective. http://www.fda.gov/drugs/resourcesforyou/consumers/ucm143534.htm. Accessed Sep 23, 2013

  32. U.S. National Institutes of Health (2013) ClinicalTrials.gov. http://www.clinicaltrials.gov/. Accessed Sep 23, 2013

  33. U.S. National Institutes of Health (2013) How to Use Advanced Search - ClinicalTrials.gov. http://clinicaltrials.gov/ct2/help/how-find/advanced. Accessed Sep 23, 2013

  34. U.S. National Institutes of Health (2013) Learn About Clinical Studies - ClinicalTrials.gov. http://clinicaltrials.gov/ct2/about-studies/learn. Accessed Sep 23, 2013

  35. U.S. National Library of Medicine (2013) 2012AB FDA Structured Product Labels Source Information. http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MTHSPL/. Accessed Sep 23, 2013

  36. U.S. National Library of Medicine (2013) Citations Added to MEDLINE by Fiscal Year. http://www.nlm.nih.gov/bsd/stats/cit_added.html. Accessed Sep 23, 2013

  37. U.S. National Library of Medicine (2013) Unified Medical Language System (UMLS). http://www.nlm.nih.gov/research/umls/. Accessed Sep 23, 2013

  38. Weizmann Institute of Science (2013) All GeneCards genes. http://genecards.org/cgi-bin/cardlisttxt.pl. Accessed Sep 23, 2013

  39. Weizmann Institute of Science (2013) GeneCards - Human Genes | Gene Database | Gene Search. http://genecards.org/. Accessed Sep 23, 2013

  40. Weizmann Institute of Science (2013) Information Page for GeneCards Sections. http://genecards.org/info.shtml. Accessed Sep 23, 2013

  41. Zarin D et al. (2013) ClinicalTrials.gov and Related Projects: Improving Access to Information about Clinical Trials; A Report to the Board of Scientific Counselors. Technical Report TR –2013-001, Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Heller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Heller, D. (2014). Combined Search in Structured and Unstructured Medical Data. In: Plattner, H., Schapranow, MP. (eds) High-Performance In-Memory Genome Data Analysis. In-Memory Data Management Research. Springer, Cham. https://doi.org/10.1007/978-3-319-03035-7_8

Download citation

Publish with us

Policies and ethics