Data Management, Databases, and Warehousing

  • Waqas Amin
  • Hyunseok Peter Kang
  • Michael J. Becich


This chapter provides a discussion on data management, databases, and data warehousing with particular reference to their utilization in cancer research. The section on data management describes the special requirements of data for research purposes. It discusses policies, ethics, and protocols involved in data collection, standardization, confidentiality, data entry and preparation, storage, quality assurance, and security. We have focused on the unique issues pertaining to data uniformity and consistency facilitating multi-institutional data sharing, data transfer, and collaboration. The section on Databases elaborates on the architecture and components of database systems. It also discusses various types of database systems with emphasis on the more commonly employed relational model of databases, database functions, and properties. In Data Warehousing the concept of data warehouses, along with warehouse architecture, technology, tools, and applications are discussed. A section on existing data resource systems has been detailed focusing on systems currently employed at the University of Pittsburgh to facilitate translational cancer research. There is a brief discussion on issues and approaches related to both databases and warehouses, which emphasizes their individual strengths and attributes.


Application Program Interface Data Warehouse Structure Query Language Query Tool Honest Broker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the following grants: Cooperative Prostate Cancer Tissue Resource – NCI U01 CA86735; Pennsylvania Cancer Alliance Bioinformatics Consortium – PA DOH – ME 01-740; Cancer Center Support Grant – NCI – P30 CA47904; Shared Pathology Informatics Network – NCI – U01 CA091338; caBIG – NCI Contracts – 94125DBS47 and 28XS210; NMVB – CDC NIOSH – 5 – U19 OH009077-02 and U24 OH009077-03; Clinical and Translational Science Award – NCRR – (UL1 RR024153-03).


  1. Amin W, Parwani AV et al (2008) National Mesothelioma Virtual Bank: a standard based biospecimen and clinical data resource to enhance translational research. BMC Cancer 8:236PubMedCrossRefGoogle Scholar
  2. Association of Directors of Anatomic and Surgical Pathology (2007) Recommendations for the reporting of pleural mesothelioma. Am J Clin Pathol 127(1):15–19CrossRefGoogle Scholar
  3. Bell DE, Padula LJL (1974) Secure computer systems: mathematical foundation and models. MITRE Technical Report M74-243Google Scholar
  4. Berry MJA, Linoff G (1997) Data mining techniques for marketing, QTY, and customer support. McGraw-Hill, New YorkGoogle Scholar
  5. Berson A, Smith SJ (1997) Data warehousing, data mining, and OLAP. McGraw-Hill, New YorkGoogle Scholar
  6. College of American Pathologists (2009) “Cancer protocols and checklists.” Retrieved July 23, 2009, from{actionForm.contentReference}=committees/cancer/cancer_protocols/protocols_index.html&_state=maximized&_pageLabel=cntvwrGoogle Scholar
  7. Cox BJ (1986) Object-oriented programming: an evolutionary approach. Addison-Wesley, Reading, MAGoogle Scholar
  8. Date CJ (2000) An introduction to database systems. Addison-Wesley, Reading, MAGoogle Scholar
  9. Date CJ, Warden A (1990) Relational database writings, 1985–1989. Addison-Wesley, Reading, MAGoogle Scholar
  10. Denning DER (1982) Cryptography and data security. Addison-Wesley, Reading, MAGoogle Scholar
  11. Dhir R, Patel AA et al (2008) A multidisciplinary approach to honest broker services for tissue banks and clinical data: a pragmatic and practical model. Cancer 113(7):1705–1715PubMedCrossRefGoogle Scholar
  12. Drake TA, Braun J et al (2007) A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network. Hum Pathol 38(8):1212–1225PubMedCrossRefGoogle Scholar
  13. Fragoso G, de Coronado S et al (2004) Overview and utilization of the NCI Thesaurus. Comp Funct Genomics 5(8):648–653PubMedCrossRefGoogle Scholar
  14. Greene FL, Page DL et al (2002) AJCC cancer staging manual. Springer, New YorkGoogle Scholar
  15. Hartel FW, de Coronado S et al (2005) Modeling a description logic vocabulary for cancer research. J Biomed Inform 38(2):114–129PubMedCrossRefGoogle Scholar
  16. Inmon WH (1992) Building the data warehouse. QED Technical Publication Group, Boston, MAGoogle Scholar
  17. Melamed J, Datta MW et al (2004) The cooperative prostate cancer tissue resource: a specimen and data resource for cancer researchers. Clin Cancer Res 10(14):4614–4621PubMedCrossRefGoogle Scholar
  18. Mohanty SK, Mistry AT et al (2008) The development and deployment of Common Data Elements for tissue banks for translational research in cancer – an emerging standard based approach for the Mesothelioma Virtual Tissue Bank. BMC Cancer 8:91PubMedCrossRefGoogle Scholar
  19. Niland JC, Townsend RM, Annechiarico R, Johnson K, Beck JR, Manion FJ, Hutchinson F, Robbins RJ, Chute CG, Vogel LH, Saltz JH, Watson MA, Casavant TL, Soong SJ, Bondy J, Fenstermacher DA, Becich MJ, Casagrande JT, Tuck DP (2007) The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12(Pt 1):330–333Google Scholar
  20. North American Association of Central Cancer Registries. “Data Standards for Cancer Registries.” Retrieved July 23, 2009, from
  21. Patel AA, Kajdacsy-Balla A et al (2005) The development of common data elements for a multi-institute prostate cancer tissue bank: the Cooperative Prostate Cancer Tissue Resource (CPCTR) experience. BMC Cancer 5:108PubMedCrossRefGoogle Scholar
  22. Patel AA, Gilbertson JR et al (2006) An informatics model for tissue banks – lessons learned from the Cooperative Prostate Cancer Tissue Resource. BMC Cancer 6:120PubMedCrossRefGoogle Scholar
  23. Patel AA, Gilbertson JR et al (2007a) A novel cross-disciplinary multi-institute approach to translational cancer research: Lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC). Cancer Inform 3:255–273PubMedGoogle Scholar
  24. Patel AA, Gupta D et al (2007b) Availability and quality of paraffin blocks identified in pathology archives: a multi-institutional study by the Shared Pathology Informatics Network (SPIN). BMC Cancer 7:37PubMedCrossRefGoogle Scholar
  25. Piwowar HA, Becich MJ et al (2008) Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med 5(9):e183PubMedCrossRefGoogle Scholar
  26. Sioutos N, de Coronado S et al (2007) NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40(1):30–43PubMedCrossRefGoogle Scholar
  27. Tobias J, Chilukuri R et al (2006) The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak 6:25PubMedCrossRefGoogle Scholar
  28. Tsatalos OG, Solomon M et al (1994) The GMAP: a versatile tool for physical data independence. University of Wisconsin-Madison, Computer Sciences Department, Madison, WIGoogle Scholar
  29. U.S Department of Commerce/National Bureau of Standards: Data Encyption Standard. Federal Information Processing Standards Publication 46. In.; 1997 January 15Google Scholar
  30. van Griethuysen JJ (1982) Concepts and terminology for the conceptual schema and the information base: International Organization for StandardizationGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Waqas Amin
  • Hyunseok Peter Kang
  • Michael J. Becich
    • 1
  1. 1.Department of Biomedical InformaticsUPMC Cancer PavilionPittsburghUSA

Personalised recommendations