Abstract
This chapter provides a discussion on data management, databases, and data warehousing with particular reference to their utilization in cancer research. The section on data management describes the special requirements of data for research purposes. It discusses policies, ethics, and protocols involved in data collection, standardization, confidentiality, data entry and preparation, storage, quality assurance, and security. We have focused on the unique issues pertaining to data uniformity and consistency facilitating multi-institutional data sharing, data transfer, and collaboration. The section on Databases elaborates on the architecture and components of database systems. It also discusses various types of database systems with emphasis on the more commonly employed relational model of databases, database functions, and properties. In Data Warehousing the concept of data warehouses, along with warehouse architecture, technology, tools, and applications are discussed. A section on existing data resource systems has been detailed focusing on systems currently employed at the University of Pittsburgh to facilitate translational cancer research. There is a brief discussion on issues and approaches related to both databases and warehouses, which emphasizes their individual strengths and attributes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amin W, Parwani AV et al (2008) National Mesothelioma Virtual Bank: a standard based biospecimen and clinical data resource to enhance translational research. BMC Cancer 8:236
Association of Directors of Anatomic and Surgical Pathology (2007) Recommendations for the reporting of pleural mesothelioma. Am J Clin Pathol 127(1):15–19
Bell DE, Padula LJL (1974) Secure computer systems: mathematical foundation and models. MITRE Technical Report M74-243
Berry MJA, Linoff G (1997) Data mining techniques for marketing, QTY, and customer support. McGraw-Hill, New York
Berson A, Smith SJ (1997) Data warehousing, data mining, and OLAP. McGraw-Hill, New York
College of American Pathologists (2009) “Cancer protocols and checklists.” Retrieved July 23, 2009, from http://www.cap.org/apps/cap.portal?_nfpb=true&cntvwrPtlt_actionOverride=/portlets/contentViewer/show&_windowLabel=cntvwrPtlt&cntvwrPtlt{actionForm.contentReference}=committees/cancer/cancer_protocols/protocols_index.html&_state=maximized&_pageLabel=cntvwr
Cox BJ (1986) Object-oriented programming: an evolutionary approach. Addison-Wesley, Reading, MA
Date CJ (2000) An introduction to database systems. Addison-Wesley, Reading, MA
Date CJ, Warden A (1990) Relational database writings, 1985–1989. Addison-Wesley, Reading, MA
Denning DER (1982) Cryptography and data security. Addison-Wesley, Reading, MA
Dhir R, Patel AA et al (2008) A multidisciplinary approach to honest broker services for tissue banks and clinical data: a pragmatic and practical model. Cancer 113(7):1705–1715
Drake TA, Braun J et al (2007) A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network. Hum Pathol 38(8):1212–1225
Fragoso G, de Coronado S et al (2004) Overview and utilization of the NCI Thesaurus. Comp Funct Genomics 5(8):648–653
Greene FL, Page DL et al (2002) AJCC cancer staging manual. Springer, New York
Hartel FW, de Coronado S et al (2005) Modeling a description logic vocabulary for cancer research. J Biomed Inform 38(2):114–129
Inmon WH (1992) Building the data warehouse. QED Technical Publication Group, Boston, MA
Melamed J, Datta MW et al (2004) The cooperative prostate cancer tissue resource: a specimen and data resource for cancer researchers. Clin Cancer Res 10(14):4614–4621
Mohanty SK, Mistry AT et al (2008) The development and deployment of Common Data Elements for tissue banks for translational research in cancer – an emerging standard based approach for the Mesothelioma Virtual Tissue Bank. BMC Cancer 8:91
Niland JC, Townsend RM, Annechiarico R, Johnson K, Beck JR, Manion FJ, Hutchinson F, Robbins RJ, Chute CG, Vogel LH, Saltz JH, Watson MA, Casavant TL, Soong SJ, Bondy J, Fenstermacher DA, Becich MJ, Casagrande JT, Tuck DP (2007) The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12(Pt 1):330–333
North American Association of Central Cancer Registries. “Data Standards for Cancer Registries.” Retrieved July 23, 2009, from http://www.naaccr.org/index.asp?Col_SectionKey=7&Col_ContentID=122
Patel AA, Kajdacsy-Balla A et al (2005) The development of common data elements for a multi-institute prostate cancer tissue bank: the Cooperative Prostate Cancer Tissue Resource (CPCTR) experience. BMC Cancer 5:108
Patel AA, Gilbertson JR et al (2006) An informatics model for tissue banks – lessons learned from the Cooperative Prostate Cancer Tissue Resource. BMC Cancer 6:120
Patel AA, Gilbertson JR et al (2007a) A novel cross-disciplinary multi-institute approach to translational cancer research: Lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC). Cancer Inform 3:255–273
Patel AA, Gupta D et al (2007b) Availability and quality of paraffin blocks identified in pathology archives: a multi-institutional study by the Shared Pathology Informatics Network (SPIN). BMC Cancer 7:37
Piwowar HA, Becich MJ et al (2008) Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med 5(9):e183
Sioutos N, de Coronado S et al (2007) NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40(1):30–43
Tobias J, Chilukuri R et al (2006) The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak 6:25
Tsatalos OG, Solomon M et al (1994) The GMAP: a versatile tool for physical data independence. University of Wisconsin-Madison, Computer Sciences Department, Madison, WI
U.S Department of Commerce/National Bureau of Standards: Data Encyption Standard. Federal Information Processing Standards Publication 46. In.; 1997 January 15
van Griethuysen JJ (1982) Concepts and terminology for the conceptual schema and the information base: International Organization for Standardization
Acknowledgement
This work was supported by the following grants: Cooperative Prostate Cancer Tissue Resource – NCI U01 CA86735; Pennsylvania Cancer Alliance Bioinformatics Consortium – PA DOH – ME 01-740; Cancer Center Support Grant – NCI – P30 CA47904; Shared Pathology Informatics Network – NCI – U01 CA091338; caBIG – NCI Contracts – 94125DBS47 and 28XS210; NMVB – CDC NIOSH – 5 – U19 OH009077-02 and U24 OH009077-03; Clinical and Translational Science Award – NCRR – (UL1 RR024153-03).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Amin, W., Kang, H.P., Becich, M.J. (2010). Data Management, Databases, and Warehousing. In: Ochs, M., Casagrande, J., Davuluri, R. (eds) Biomedical Informatics for Cancer Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5714-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5714-6_3
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5712-2
Online ISBN: 978-1-4419-5714-6
eBook Packages: MedicineMedicine (R0)