Skip to main content

Data Management, Databases, and Warehousing

  • Chapter
  • First Online:
Biomedical Informatics for Cancer Research

Abstract

This chapter provides a discussion on data management, databases, and data warehousing with particular reference to their utilization in cancer research. The section on data management describes the special requirements of data for research purposes. It discusses policies, ethics, and protocols involved in data collection, standardization, confidentiality, data entry and preparation, storage, quality assurance, and security. We have focused on the unique issues pertaining to data uniformity and consistency facilitating multi-institutional data sharing, data transfer, and collaboration. The section on Databases elaborates on the architecture and components of database systems. It also discusses various types of database systems with emphasis on the more commonly employed relational model of databases, database functions, and properties. In Data Warehousing the concept of data warehouses, along with warehouse architecture, technology, tools, and applications are discussed. A section on existing data resource systems has been detailed focusing on systems currently employed at the University of Pittsburgh to facilitate translational cancer research. There is a brief discussion on issues and approaches related to both databases and warehouses, which emphasizes their individual strengths and attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Amin W, Parwani AV et al (2008) National Mesothelioma Virtual Bank: a standard based biospecimen and clinical data resource to enhance translational research. BMC Cancer 8:236

    Article  PubMed  Google Scholar 

  • Association of Directors of Anatomic and Surgical Pathology (2007) Recommendations for the reporting of pleural mesothelioma. Am J Clin Pathol 127(1):15–19

    Article  Google Scholar 

  • Bell DE, Padula LJL (1974) Secure computer systems: mathematical foundation and models. MITRE Technical Report M74-243

    Google Scholar 

  • Berry MJA, Linoff G (1997) Data mining techniques for marketing, QTY, and customer support. McGraw-Hill, New York

    Google Scholar 

  • Berson A, Smith SJ (1997) Data warehousing, data mining, and OLAP. McGraw-Hill, New York

    Google Scholar 

  • College of American Pathologists (2009) “Cancer protocols and checklists.” Retrieved July 23, 2009, from http://www.cap.org/apps/cap.portal?_nfpb=true&cntvwrPtlt_actionOverride=/portlets/contentViewer/show&_windowLabel=cntvwrPtlt&cntvwrPtlt{actionForm.contentReference}=committees/cancer/cancer_protocols/protocols_index.html&_state=maximized&_pageLabel=cntvwr

    Google Scholar 

  • Cox BJ (1986) Object-oriented programming: an evolutionary approach. Addison-Wesley, Reading, MA

    Google Scholar 

  • Date CJ (2000) An introduction to database systems. Addison-Wesley, Reading, MA

    Google Scholar 

  • Date CJ, Warden A (1990) Relational database writings, 1985–1989. Addison-Wesley, Reading, MA

    Google Scholar 

  • Denning DER (1982) Cryptography and data security. Addison-Wesley, Reading, MA

    Google Scholar 

  • Dhir R, Patel AA et al (2008) A multidisciplinary approach to honest broker services for tissue banks and clinical data: a pragmatic and practical model. Cancer 113(7):1705–1715

    Article  PubMed  Google Scholar 

  • Drake TA, Braun J et al (2007) A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network. Hum Pathol 38(8):1212–1225

    Article  PubMed  Google Scholar 

  • Fragoso G, de Coronado S et al (2004) Overview and utilization of the NCI Thesaurus. Comp Funct Genomics 5(8):648–653

    Article  PubMed  CAS  Google Scholar 

  • Greene FL, Page DL et al (2002) AJCC cancer staging manual. Springer, New York

    Google Scholar 

  • Hartel FW, de Coronado S et al (2005) Modeling a description logic vocabulary for cancer research. J Biomed Inform 38(2):114–129

    Article  PubMed  Google Scholar 

  • Inmon WH (1992) Building the data warehouse. QED Technical Publication Group, Boston, MA

    Google Scholar 

  • Melamed J, Datta MW et al (2004) The cooperative prostate cancer tissue resource: a specimen and data resource for cancer researchers. Clin Cancer Res 10(14):4614–4621

    Article  PubMed  Google Scholar 

  • Mohanty SK, Mistry AT et al (2008) The development and deployment of Common Data Elements for tissue banks for translational research in cancer – an emerging standard based approach for the Mesothelioma Virtual Tissue Bank. BMC Cancer 8:91

    Article  PubMed  Google Scholar 

  • Niland JC, Townsend RM, Annechiarico R, Johnson K, Beck JR, Manion FJ, Hutchinson F, Robbins RJ, Chute CG, Vogel LH, Saltz JH, Watson MA, Casavant TL, Soong SJ, Bondy J, Fenstermacher DA, Becich MJ, Casagrande JT, Tuck DP (2007) The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12(Pt 1):330–333

    Google Scholar 

  • North American Association of Central Cancer Registries. “Data Standards for Cancer Registries.” Retrieved July 23, 2009, from http://www.naaccr.org/index.asp?Col_SectionKey=7&Col_ContentID=122

  • Patel AA, Kajdacsy-Balla A et al (2005) The development of common data elements for a multi-institute prostate cancer tissue bank: the Cooperative Prostate Cancer Tissue Resource (CPCTR) experience. BMC Cancer 5:108

    Article  PubMed  Google Scholar 

  • Patel AA, Gilbertson JR et al (2006) An informatics model for tissue banks – lessons learned from the Cooperative Prostate Cancer Tissue Resource. BMC Cancer 6:120

    Article  PubMed  Google Scholar 

  • Patel AA, Gilbertson JR et al (2007a) A novel cross-disciplinary multi-institute approach to translational cancer research: Lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC). Cancer Inform 3:255–273

    PubMed  CAS  Google Scholar 

  • Patel AA, Gupta D et al (2007b) Availability and quality of paraffin blocks identified in pathology archives: a multi-institutional study by the Shared Pathology Informatics Network (SPIN). BMC Cancer 7:37

    Article  PubMed  Google Scholar 

  • Piwowar HA, Becich MJ et al (2008) Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med 5(9):e183

    Article  PubMed  Google Scholar 

  • Sioutos N, de Coronado S et al (2007) NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40(1):30–43

    Article  PubMed  CAS  Google Scholar 

  • Tobias J, Chilukuri R et al (2006) The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak 6:25

    Article  PubMed  Google Scholar 

  • Tsatalos OG, Solomon M et al (1994) The GMAP: a versatile tool for physical data independence. University of Wisconsin-Madison, Computer Sciences Department, Madison, WI

    Google Scholar 

  • U.S Department of Commerce/National Bureau of Standards: Data Encyption Standard. Federal Information Processing Standards Publication 46. In.; 1997 January 15

    Google Scholar 

  • van Griethuysen JJ (1982) Concepts and terminology for the conceptual schema and the information base: International Organization for Standardization

    Google Scholar 

Download references

Acknowledgement

This work was supported by the following grants: Cooperative Prostate Cancer Tissue Resource – NCI U01 CA86735; Pennsylvania Cancer Alliance Bioinformatics Consortium – PA DOH – ME 01-740; Cancer Center Support Grant – NCI – P30 CA47904; Shared Pathology Informatics Network – NCI – U01 CA091338; caBIG – NCI Contracts – 94125DBS47 and 28XS210; NMVB – CDC NIOSH – 5 – U19 OH009077-02 and U24 OH009077-03; Clinical and Translational Science Award – NCRR – (UL1 RR024153-03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael J. Becich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Amin, W., Kang, H.P., Becich, M.J. (2010). Data Management, Databases, and Warehousing. In: Ochs, M., Casagrande, J., Davuluri, R. (eds) Biomedical Informatics for Cancer Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5714-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-5714-6_3

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5712-2

  • Online ISBN: 978-1-4419-5714-6

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics