Grid-Based Knowledge Discovery in Clinico-Genomic Data

  • Michael May
  • George Potamias
  • Stefan Rüping
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)


Knowledge discovery in clinico-genomic data is a task that requires to integrate not only highly heterogeneous kinds of data, but also the requirements and interests of very different user groups. Technologies of grid computing promise to be an effective tool to combine all these requirements into a single architecture. In this paper, we describe scenarios and future research directions related to grid-based knowledge discovery in clinico-genomic data, and introduce the approach taken by the recently launched ACGT project. The whole endeavor is considered in the context of biomedical informatics research and aims towards the realization of an integrated and grid-enabled biomedical infrastructure. The presented integrated clinico-genomics knowledge discovery (ICGKD) scenario and its process realization is based on a multi-strategy data-mining approach that seamlessly integrates three distinct data-mining components: clustering, association rules mining, and feature-selection. Preliminary experimental results are indicative of the rational and reliability of the approach.


Data Mining Association Rule Knowledge Discovery Association Rule Mining Grid Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sander, C.: Genomic Medicine and the Future of Health Care. Science 287(5460), 1977–1978 (2000)CrossRefGoogle Scholar
  2. 2.
    Martin-Sanchez, F., et al.: Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of Biomedical Informatics 37(1), 30–42 (2004)CrossRefGoogle Scholar
  3. 3.
    Foster, I., Kesselman, C.(eds.).: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Francisco (2004)Google Scholar
  4. 4.
    Stankovski, V., May, M., Franke, J., Schuster, A., McCourt, D., Dubitzky, W.: A service-centric perspective for data mining in complex problem solving environments. In: Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA 2004), Las Vegas, USA, vol. II, pp. 780–787 (2004)Google Scholar
  5. 5.
    Parks, M.R., Disis, M.L.: Conflicts of interest in translational research. Journal of Translational Medicine 2(28), 1–4 (2004)Google Scholar
  6. 6.
    Witten, I., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  7. 7.
    R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2005) ISBN 3-900051-07-0Google Scholar
  8. 8.
    Tsiknakis, M., Kafetzopoulos, D., Potamias, G., Analyti, A., Marias, K., Manganas, A.: Building a European Biomedical Grid on Cancer: The ACGT Integrated Project. Stud Health Technol Inform. 120, 247–258 (2006)Google Scholar
  9. 9.
    Potamias, G., Tsiknakis, M., Papoutsidis, V., Kanterakis, A., Marias, K., Kafetzopoulos, D.: Advancing Clinico-Genomic Research Trials via Integrated Knowledge Discovery Operations. In: MIE 2006 (poster presentation) (2006)Google Scholar
  10. 10.
    Potamias, G., Koumakis, L., Moustakis, V.: Mining XML Clinical Data: The HealthObs System. Ingenierie des systems d’information, special session: Recherche, extraction et exploration d’information 10(1), 59–79 (2004)Google Scholar
  11. 11.
    Potamias, G., Koumakis, L., Moustakis, V.: Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 256–266. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Eisen, M., Spellman, P., Botstein, D., Brown, P.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 96, 14863–14867 (1999)Google Scholar
  13. 13.
    Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
  14. 14.
    Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  15. 15.
    Alon, U., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)CrossRefGoogle Scholar
  16. 16.
    Gupta, S.K., Rao, S., Bhatnagar, V.: K-means Clustering Algorithm for Categorical Attributes. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 203–208. Springer, Heidelberg (1999)Google Scholar
  17. 17.
    San, O.M., Huynh, V., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–247 (2004)zbMATHMathSciNetGoogle Scholar
  18. 18.
    Kanterakis, A., Potamias, G.: Supporting Clinico-Genomic Knowledge Discovery: A Multi-Strategy Data Mining Process. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 520–524. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Katehakis, D., Sfakianaki, S., Tsiknakis, M., Orphanoudakis, S.: An Infrastructure for Integrated Electronic Health Record Services: The Role of XML. Journal of Medical Internet Research 3(1), E7 (2001)CrossRefGoogle Scholar
  20. 20.
    van’t Veer, L., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael May
    • 1
  • George Potamias
    • 2
  • Stefan Rüping
    • 1
  1. 1.Fraunhofer AIS, Schloss BirlinghovenSt. AugustinGermany
  2. 2.Institute of Computer Science, FORTHHeraklion, CreteGreece

Personalised recommendations