Rough Sets in Bioinformatics

  • Torgeir R. Hvidsten
  • Jan Komorowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4400)


Rough set-based rule induction allows easily interpretable descriptions of complex biological systems. Here, we review a number of applications of rough sets to problems in bioinformatics, including cancer classification, gene and protein function prediction, gene regulation, protein-drug interaction and drug resistance.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fleischmann, R.D., et al.: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995)CrossRefGoogle Scholar
  2. 2.
    Berman, H.M., et al.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Schena, M., et al.: Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  4. 4.
    Duggan, D.J., et al.: Expression profiling using cDNA microarrays. Nat. Genet. 21, 10–14 (1999)CrossRefGoogle Scholar
  5. 5.
    Patterson, S.D., Aebersold, R.H.: Proteomics: the first decade and beyond. Nat. Genet. 33(Suppl.), 311–323 (2003)CrossRefGoogle Scholar
  6. 6.
    Kanehisa, M., Bork, P.: Bioinformatics in the post-sequence era. Nat. Genet. 33(Suppl.), 305–310 (2003)CrossRefGoogle Scholar
  7. 7.
    Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  8. 8.
    Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10, 821–855 (2003)CrossRefGoogle Scholar
  9. 9.
    Jenssen, T.K., et al.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001)CrossRefGoogle Scholar
  10. 10.
    Brazma, A., Krestyaninova, M., Sarkans, U.: Standards for systems biology. Nat. Rev. Genet. 7, 593–605 (2006)CrossRefGoogle Scholar
  11. 11.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)CrossRefGoogle Scholar
  12. 12.
    Pawlak, Z.: Rough sets. International Journal of Information and Computer Science 11(5), 341–356 (1982)CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Series D: System Theory, Knowledge Engineering and Problem Solving, vol. 9. Kluwer Academic Publishers, Dordrecht (1991)MATHGoogle Scholar
  14. 14.
    Komorowski, J., et al.: Rough sets: A tutorial. In: Rough Fuzzy Hybridization: A New Trend in Decision-Making, pp. 3–98. Springer, Singapore (1999)Google Scholar
  15. 15.
    Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support: Handbook of Applications and Advances in Rough Sets Theory. Series D: System Theory, Knowledge Engineering and Problem Solving, vol. 11, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)Google Scholar
  16. 16.
    Skowron, A., Nguyen, H.S.: Boolean reasoning scheme with some applications in data mining. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 107–115. Springer, Heidelberg (1999)Google Scholar
  17. 17.
    Churchill, G.A.: Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32(Suppl.), 490–495 (2002)CrossRefGoogle Scholar
  18. 18.
    Quackenbush, J.: Microarray data normalization and transformation. Nat. Genet. 32(Suppl.), 496–501 (2002)CrossRefGoogle Scholar
  19. 19.
    Iyer, V.R., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
  20. 20.
    Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  21. 21.
    Brown, M.P.S., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97(1), 262–267 (2000)CrossRefGoogle Scholar
  22. 22.
    Midelfart, H., et al.: Learning rough set classifiers from gene expression and clinical data. Fundamenta Informaticae 53(2), 155–183 (2002)MathSciNetGoogle Scholar
  23. 23.
    Nørsett, K.G., et al.: Gene expression based classification of gastric carcinoma. Cancer Lett. 210, 227–237 (2004)CrossRefGoogle Scholar
  24. 24.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, London (1993)MATHGoogle Scholar
  25. 25.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)Google Scholar
  26. 26.
    Manley, B.F.J.: Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall, Boca Raton (2002)Google Scholar
  27. 27.
    Dennis, J.L., et al.: Markers of adenocarcinoma characteristic of the site of origin: Development of a diagnostic algorithm. Clin. Cancer Res. 11, 3766–3772 (2005)CrossRefGoogle Scholar
  28. 28.
    Hvidsten, T.R., et al.: Predicting gene function from gene expressions and ontologies. In: Altman, R.B., et al. (eds.) Pacific Symposium on Biocomputing, Mauna Lani, Hawai’i, pp. 299–310. World Scientific Publishing, Singapore (2001)Google Scholar
  29. 29.
    Hvidsten, T.R., Lægreid, A., Komorowski, J.: Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics 19, 1116–1123 (2003)CrossRefGoogle Scholar
  30. 30.
    Lægreid, A., et al.: Predicting gene ontology biological process from temporal gene expression patterns. Genome Res. 13, 965–979 (2003)CrossRefGoogle Scholar
  31. 31.
    Eisen, M., et al.: Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  32. 32.
    Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21, 33–37 (1999)CrossRefGoogle Scholar
  33. 33.
    Cho, R.J., et al.: Transcriptional regulation and function during the human cell cycle. Nature Genetics 27, 48–54 (2001)Google Scholar
  34. 34.
    Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature genetics 29, 153–159 (2001)CrossRefGoogle Scholar
  35. 35.
    Hvidsten, T.R., et al.: Discovering regulatory binding-site modules using rule-based learning. Genome Res. 15, 856–866 (2005)CrossRefGoogle Scholar
  36. 36.
    Hughes, J.D., et al.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)CrossRefGoogle Scholar
  37. 37.
    Lee, T.I., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)CrossRefGoogle Scholar
  38. 38.
    Wilczyński, B., et al.: Using local gene expression similarities to discover regulatory binding site modules. Accepted in BMC Bioinformatics (2006)Google Scholar
  39. 39.
    Andersson, C.R., et al.: Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors. Submitted (2006)Google Scholar
  40. 40.
    Skolnick, J., Fetrow, J.S.: From genes to protein structure and function: Novel applications of computational approaches in the genomic era. Trends Biotechnol 18, 34–39 (2000)CrossRefGoogle Scholar
  41. 41.
    Apweiler, R., et al.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)CrossRefGoogle Scholar
  42. 42.
    Chandonia, J.-M., Brenner, S.E.: The impact of structural genomics: Expectations and outcomes. Science 311, 347–351 (2006)CrossRefGoogle Scholar
  43. 43.
    Tress, M., et al.: Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 61(Suppl. 7), 27–45 (2005)CrossRefGoogle Scholar
  44. 44.
    Zhang, C., Kim, S.-H.: Overview of structural genomics: from structure to function. Curr. Opin. Chem. Biol. 7, 28–32 (2003)CrossRefGoogle Scholar
  45. 45.
    Hvidsten, T.R., et al.: A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinformatics 19(Suppl. 2), II81–II91 (2003)Google Scholar
  46. 46.
    Pazos, F., Sternberg, M.J.E.: Automated prediction of protein function and detection of functional sites from structure. Proc. Natl. Acad. Sci. USA 101, 14754–14759 (2004)CrossRefGoogle Scholar
  47. 47.
    Orengo, C.A., Todd, A.E., Thornton, J.M.: From protein structure to function. Curr. Opin. Struct. Biol. 9, 374–382 (1999)CrossRefGoogle Scholar
  48. 48.
    Laskowski, R.A., Watson, J.D., Thornton, J.M.: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 33, W89–W93 (2005)CrossRefGoogle Scholar
  49. 49.
    Pal, D., Eisenberg, D.: Inference of protein function from protein structure. Structure 13, 121–130 (2005)CrossRefGoogle Scholar
  50. 50.
    Hvidsten, T.R., et al.: High through-put protein function prediction using local substructures. Submitted (2006)Google Scholar
  51. 51.
    Terfloth, L.: Drug design. In: Gasteiger, J., Engel, T. (eds.) Chemoinformatics, pp. 497–618. Wiley-VCH, Weinheim (2003)Google Scholar
  52. 52.
    Wikberg, J.E.S., Maris, L., Peteris, P.: Proteochemometrics: A tool for modelling the molecular interaction space. In: Kubinyi, H., Müler, G. (eds.) Chemogenomics in Drug Discovery - A Medicinal Chemistry Perspective, pp. 289–309. Wiley-VCH, Weinheim (2004)Google Scholar
  53. 53.
    Strömbergsson, H., et al.: Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand interactions. Proteins 63, 24–34 (2006)CrossRefGoogle Scholar
  54. 54.
    Strömbergsson, H., et al.: Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures. Accepted to Proteins (2006)Google Scholar
  55. 55.
    Kontijevskis, A., Wikberg, J.E.S., Komorowski, J.: Computational proteomics analysis of HIV-1 protease interactome. Submitted (2006)Google Scholar
  56. 56.
    Kierczak, M., Rudnicki, W.R., Komorowski, J.: Construction of rough set-based classifiers for predicting HIV resistance to non-nucleoside reverse transcriptase inhibitors. Manuscript (2006)Google Scholar
  57. 57.
    Bazan, J.G., Skowron, A., Synak, P.: Dynamic reducts as a tool for extracting laws from decision tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS, vol. 869, pp. 346–355. Springer, Heidelberg (1994)Google Scholar
  58. 58.
    Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. International Journal of Approximate Reasoning 25(2), 123–143 (2000)CrossRefMathSciNetMATHGoogle Scholar
  59. 59.
    Ågotnes, T., Komorowski, J., Løken, T.: Taming large rule models in rough set approaches. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 193–203. Springer, Heidelberg (1999)Google Scholar
  60. 60.
    Makosa, E.: Rule tuning. Master thesis. The Linnaeus Centre for Bioinformatics, Uppsala University (2005)Google Scholar
  61. 61.
    Düntsch, I.: Statistical evaluation of rough set dependency analysis. Int. J. Human-Computer Studies 46, 589–604 (1997)CrossRefGoogle Scholar
  62. 62.
    Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artificial Intelligence 106, 109–137 (1998)CrossRefMathSciNetMATHGoogle Scholar
  63. 63.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  64. 64.
    Skowron, A.: Synthesis of adaptive decision systems from experimental data. In: Aamodt, A., Komorowski, J. (eds.) Fifth Scandinavian Conference on Artificial Intelligence, Trondheim, Norway, pp. 220–238. IOS Press, Amsterdam (1995)Google Scholar
  65. 65.
    Komorowski, J., Øhrn, A., Skowron, A.: ROSETTA rough sets. In: Klösgen, W., Żytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 554–559. Oxford University Press, Oxford (2002)Google Scholar
  66. 66.
    Żytkow, J.M., Rauch, J. (eds.): PKDD 1999. LNCS (LNAI), vol. 1704. Springer, Heidelberg (1999)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Torgeir R. Hvidsten
    • 1
  • Jan Komorowski
    • 1
  1. 1.The Linnaeus Centre for Bioinformatics, Uppsala University, UppsalaSweden

Personalised recommendations