Abstract
This chapter presents an integrative visual data mining approach towards biomedical data. This approach and supporting methodology are presented at a high level. They combine in a consistent manner a set of visualisation and data mining techniques that operate over an integrated data set of several diverse components, including medical (clinical) data, patient outcome and interview data, corresponding gene expression and SNP data, domain ontologies and health management data. The practical application of the methodology and the specific data mining techniques engaged are demonstrated on two case studies focused on the biological mechanisms of two different types of diseases: Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia, respectively. The common between the cases is the structure of the data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds.): The Analysis of Gene Expression Data: Methods and Software. Springer, New York (2003)
Hoffman, E.P., Awad, T., Spira, A., Palma, J., Webster, T., Wright, G., Buckley, J., Davis, R., Hubbell, E., Jones, W., Tibshirani, R., Tompkins, R., Triche, T., Xiao, W., West, M., Warrington, J.A.: Expression profiling - best practices for data generation and interpretation in clinical trials. Nature Reviews: Genetics 4, 229–237 (2004)
Piatetsky-Shapiro, G., Khabaza, T., Ramaswamy, S.: Capturing best practice for microarray gene expression data analysis. In: Proceedings of the 9-th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD-2003, ACM Press, Washington, D.C. (2003)
Piatetsky-Shapiro, G., Tamayo, P.: Microarray data mining: Facing the challenges. SIGKDD Explorations 5(2), 1–5 (2003)
Glenisson, P., Mathys, J., Moor, B.D.: Meta-clustering of gene expression data and literature-based information. SIGKDD Explorations 5(2), 101–112 (2003)
Curran, M.D., Liu, H., Long, F., Ge, N.: Statistical methods for joint data mining of gene expression and DNA sequence database. SIGKDD Explorations 5(2), 122–129 (2003)
Seifert, M., Scherf, M., Epple, A., Werner, T.: Multievidence microarray mining. Trends in Genetics 21(10), 553–558 (2005)
Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J.M., Pascual-Montano, A.: Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics 7, 54–70 (2006)
Georgii, E., Richter, L., Ruckert, U., Kramer, S.: Analyzing microarray data using quantitative association rules. Bioinformatics, 21(suppl. 2), 123–129 (2005)
Dietzsch, J., Gehlenborg, N., Nieselt, K.: Mayday-a microarray data analysis workbench. Bioinformatics 22(8), 1010–1012 (2006)
Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER – an integrative program suite for microarray data analysis. BMC Bioinformatics 6, 232–244 (2005)
Hasegawa, Y., Seki, M., Mochizuki, Y., Heida, N., Hirosawa, K., Okamoto, N., Sakurai, T., Satou, M., Akiyama, K., Iida, K., Lee, K., Kanaya, S., Demura, T., Shinozaki, K., Konagaya, A., Toyoda, T.: A flexible representation of omic knowledge for thorough analysis of microarray data. Plant Methods 2(1), 5–46 (2006)
Felix, C.A., Lange, B.J., Chessells, J.M.: Pediatric acute lymphoblastic leukemia: Challenges and controversies in 2000. In: Hematology 2000, January 2000, pp. 285–302 (2000)
Nelson, S.J., Powell, T., Humphreys, B.L.: The Unified Medical Language System (UMLS) project. In: Kent, A., Hall, C.M. (eds.) Encyclopedia of Library and Information Science, pp. 369–378. Marcel Dekker, Inc., New York (2002)
Weng, L., Dai, H., Zhan, Y., He, Y., Stepaniants, S.B., Bassett, D.E.: Rosetta error model for gene expression analysis. Bioinformatics 22(9), 1111–1121 (2006)
Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert Jr., C.J., Brazma, A.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9), 1–9 (2002)
Aplenc, R., Lange, B.: Pharmacogenetic determinants of outcome in acute lymphoblastic leukaemia. British Journal of Haematology 125(4), 421–434 (2004)
Goto, Y., Yue, L., Yokoi, A., Nishimura, R., Uehara, T., Koizumi, S., Saikawa, Y.: A novel single-nucleotide polymorphism in the 3’-untranslated region of the human dihydrofolate reductase gene with enhanced expression. Clinical Cancer Research 7, 1952–1956 (2001)
The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nature - Genetics 25, 25–29 (2000)
Afari, N., Buchwald, D.: Chronic Fatigue Syndrome: A review. American Journal of Psychiatry 160, 221–236 (2003)
Reeves, W.C., Wagner, D., Nisenbaum, R., Jones, J.F., Gurbaxani, B., Solomon, L., Papanicolaou, D.A., Unger, E.R., Vernon, S.D., Heim, C.: Chronic Fatigue Syndrome - A clinically empirical approach to its definition and study. BMC Medicine 3(19) (2005)
CDC Chronic Fatigue Syndrome Research Group. CAMDA 2006 Conference Contest Datasets, viewed at January 12, 2008 (2006), http://www.camda.duke.edu/camda06/datasets/
National Center for Infectious Diseases. Proposal: clinical assessment of subjects with Chronic Fatigue Syndrome and other fatiguing illnesses in Wichita (2006), ftp://ftp.camda.duke.edu/CAMDA06_DATASETS/wichita_clinical_irb_protocol.doc
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, pp. 282–285. Cambridge University Press, Cambridge (2004)
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 637–649 (2001)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Boston (1998)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Australian Institute of Health and Welfare (AIHW) & Australasian Association of Cancer Registries (AACR), Cancer in Australia, in AIHW cat. no. CAN 23. 2004: Canberra: AIHW (Cancer Series no. 28) (2001)
Henze, G., Fengler, R., Hartmann, R., Kornhuber, B., Janka-Schaub, G., Niethammer, D., Riehm, H.: Six-year experience with a comprehensive approach to the treatment of recurrent childhood acute lymphoblastic leukemia (ALL-REZ BFM 85). A relapse study of the BFM group. Blood 78(5), 1166–1172 (1991)
Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute, 98(4), 262–272 (2006)
Skillicorn, D.B., Simoff, S., Kennedy, P., Catchpoole, D.: Strategies for winnowing microarray data. In: Bioinformatics Workshop, SIAM International Conference on Data Mining 2004 (2004)
Kennedy, P., Simoff, S.J.: CONGO: Clustering on the Gene Ontology. In: Proceedings 2nd Australasian Data Mining Workshop, ADM 2003., UTS Press, Canberra (2003)
Kennedy, P.J., Simoff, S.J., Skillicorn, D., Catchpoole, D.: Extracting and explaining biological knowledge in microarray data. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, Springer, Berlin/Heidelberg (2004)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, San Diego, USA (1999)
Lee, S.G., Hur, J.U., Kim, Y.,, S.: A graph-theoretic modeling on GO space for biological interpretation of gene clusters. Bioinformatics 20(3), 381–388 (2004)
Vêncio, R.Z.N., Koide, T., Gomes, S.L., Pereira, C.A.d.B.: BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics 7(1), 86–116 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kennedy, P., Simoff, S.J., Catchpoole, D.R., Skillicorn, D.B., Ubaudi, F., Al-Oqaily, A. (2008). Integrative Visual Data Mining of Biomedical Data: Investigating Cases in Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia. In: Simoff, S.J., Böhlen, M.H., Mazeika, A. (eds) Visual Data Mining. Lecture Notes in Computer Science, vol 4404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71080-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-71080-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71079-0
Online ISBN: 978-3-540-71080-6
eBook Packages: Computer ScienceComputer Science (R0)