Rough Sets in Bioinformatics

  • Torgeir R. Hvidsten
  • Jan Komorowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4400)


Rough set-based rule induction allows easily interpretable descriptions of complex biological systems. Here, we review a number of applications of rough sets to problems in bioinformatics, including cancer classification, gene and protein function prediction, gene regulation, protein-drug interaction and drug resistance.


Gene Ontology Local Descriptor Functional Site Rule Model Local Substructure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fleischmann, R.D., et al.: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995)CrossRefGoogle Scholar
  2. 2.
    Berman, H.M., et al.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Schena, M., et al.: Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  4. 4.
    Duggan, D.J., et al.: Expression profiling using cDNA microarrays. Nat. Genet. 21, 10–14 (1999)CrossRefGoogle Scholar
  5. 5.
    Patterson, S.D., Aebersold, R.H.: Proteomics: the first decade and beyond. Nat. Genet. 33(Suppl.), 311–323 (2003)CrossRefGoogle Scholar
  6. 6.
    Kanehisa, M., Bork, P.: Bioinformatics in the post-sequence era. Nat. Genet. 33(Suppl.), 305–310 (2003)CrossRefGoogle Scholar
  7. 7.
    Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  8. 8.
    Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10, 821–855 (2003)CrossRefGoogle Scholar
  9. 9.
    Jenssen, T.K., et al.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001)CrossRefGoogle Scholar
  10. 10.
    Brazma, A., Krestyaninova, M., Sarkans, U.: Standards for systems biology. Nat. Rev. Genet. 7, 593–605 (2006)CrossRefGoogle Scholar
  11. 11.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)CrossRefGoogle Scholar
  12. 12.
    Pawlak, Z.: Rough sets. International Journal of Information and Computer Science 11(5), 341–356 (1982)CrossRefMathSciNetzbMATHGoogle Scholar
  13. 13.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Series D: System Theory, Knowledge Engineering and Problem Solving, vol. 9. Kluwer Academic Publishers, Dordrecht (1991)zbMATHGoogle Scholar
  14. 14.
    Komorowski, J., et al.: Rough sets: A tutorial. In: Rough Fuzzy Hybridization: A New Trend in Decision-Making, pp. 3–98. Springer, Singapore (1999)Google Scholar
  15. 15.
    Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support: Handbook of Applications and Advances in Rough Sets Theory. Series D: System Theory, Knowledge Engineering and Problem Solving, vol. 11, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)Google Scholar
  16. 16.
    Skowron, A., Nguyen, H.S.: Boolean reasoning scheme with some applications in data mining. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 107–115. Springer, Heidelberg (1999)Google Scholar
  17. 17.
    Churchill, G.A.: Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32(Suppl.), 490–495 (2002)CrossRefGoogle Scholar
  18. 18.
    Quackenbush, J.: Microarray data normalization and transformation. Nat. Genet. 32(Suppl.), 496–501 (2002)CrossRefGoogle Scholar
  19. 19.
    Iyer, V.R., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
  20. 20.
    Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  21. 21.
    Brown, M.P.S., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97(1), 262–267 (2000)CrossRefGoogle Scholar
  22. 22.
    Midelfart, H., et al.: Learning rough set classifiers from gene expression and clinical data. Fundamenta Informaticae 53(2), 155–183 (2002)MathSciNetGoogle Scholar
  23. 23.
    Nørsett, K.G., et al.: Gene expression based classification of gastric carcinoma. Cancer Lett. 210, 227–237 (2004)CrossRefGoogle Scholar
  24. 24.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, London (1993)zbMATHGoogle Scholar
  25. 25.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)Google Scholar
  26. 26.
    Manley, B.F.J.: Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall, Boca Raton (2002)Google Scholar
  27. 27.
    Dennis, J.L., et al.: Markers of adenocarcinoma characteristic of the site of origin: Development of a diagnostic algorithm. Clin. Cancer Res. 11, 3766–3772 (2005)CrossRefGoogle Scholar
  28. 28.
    Hvidsten, T.R., et al.: Predicting gene function from gene expressions and ontologies. In: Altman, R.B., et al. (eds.) Pacific Symposium on Biocomputing, Mauna Lani, Hawai’i, pp. 299–310. World Scientific Publishing, Singapore (2001)Google Scholar
  29. 29.
    Hvidsten, T.R., Lægreid, A., Komorowski, J.: Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics 19, 1116–1123 (2003)CrossRefGoogle Scholar
  30. 30.
    Lægreid, A., et al.: Predicting gene ontology biological process from temporal gene expression patterns. Genome Res. 13, 965–979 (2003)CrossRefGoogle Scholar
  31. 31.
    Eisen, M., et al.: Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  32. 32.
    Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21, 33–37 (1999)CrossRefGoogle Scholar
  33. 33.
    Cho, R.J., et al.: Transcriptional regulation and function during the human cell cycle. Nature Genetics 27, 48–54 (2001)Google Scholar
  34. 34.
    Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature genetics 29, 153–159 (2001)CrossRefGoogle Scholar
  35. 35.
    Hvidsten, T.R., et al.: Discovering regulatory binding-site modules using rule-based learning. Genome Res. 15, 856–866 (2005)CrossRefGoogle Scholar
  36. 36.
    Hughes, J.D., et al.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)CrossRefGoogle Scholar
  37. 37.
    Lee, T.I., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)CrossRefGoogle Scholar
  38. 38.
    Wilczyński, B., et al.: Using local gene expression similarities to discover regulatory binding site modules. Accepted in BMC Bioinformatics (2006)Google Scholar
  39. 39.
    Andersson, C.R., et al.: Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors. Submitted (2006)Google Scholar
  40. 40.
    Skolnick, J., Fetrow, J.S.: From genes to protein structure and function: Novel applications of computational approaches in the genomic era. Trends Biotechnol 18, 34–39 (2000)CrossRefGoogle Scholar
  41. 41.
    Apweiler, R., et al.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)CrossRefGoogle Scholar
  42. 42.
    Chandonia, J.-M., Brenner, S.E.: The impact of structural genomics: Expectations and outcomes. Science 311, 347–351 (2006)CrossRefGoogle Scholar
  43. 43.
    Tress, M., et al.: Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 61(Suppl. 7), 27–45 (2005)CrossRefGoogle Scholar
  44. 44.
    Zhang, C., Kim, S.-H.: Overview of structural genomics: from structure to function. Curr. Opin. Chem. Biol. 7, 28–32 (2003)CrossRefGoogle Scholar
  45. 45.
    Hvidsten, T.R., et al.: A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinformatics 19(Suppl. 2), II81–II91 (2003)Google Scholar
  46. 46.
    Pazos, F., Sternberg, M.J.E.: Automated prediction of protein function and detection of functional sites from structure. Proc. Natl. Acad. Sci. USA 101, 14754–14759 (2004)CrossRefGoogle Scholar
  47. 47.
    Orengo, C.A., Todd, A.E., Thornton, J.M.: From protein structure to function. Curr. Opin. Struct. Biol. 9, 374–382 (1999)CrossRefGoogle Scholar
  48. 48.
    Laskowski, R.A., Watson, J.D., Thornton, J.M.: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 33, W89–W93 (2005)CrossRefGoogle Scholar
  49. 49.
    Pal, D., Eisenberg, D.: Inference of protein function from protein structure. Structure 13, 121–130 (2005)CrossRefGoogle Scholar
  50. 50.
    Hvidsten, T.R., et al.: High through-put protein function prediction using local substructures. Submitted (2006)Google Scholar
  51. 51.
    Terfloth, L.: Drug design. In: Gasteiger, J., Engel, T. (eds.) Chemoinformatics, pp. 497–618. Wiley-VCH, Weinheim (2003)Google Scholar
  52. 52.
    Wikberg, J.E.S., Maris, L., Peteris, P.: Proteochemometrics: A tool for modelling the molecular interaction space. In: Kubinyi, H., Müler, G. (eds.) Chemogenomics in Drug Discovery - A Medicinal Chemistry Perspective, pp. 289–309. Wiley-VCH, Weinheim (2004)Google Scholar
  53. 53.
    Strömbergsson, H., et al.: Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand interactions. Proteins 63, 24–34 (2006)CrossRefGoogle Scholar
  54. 54.
    Strömbergsson, H., et al.: Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures. Accepted to Proteins (2006)Google Scholar
  55. 55.
    Kontijevskis, A., Wikberg, J.E.S., Komorowski, J.: Computational proteomics analysis of HIV-1 protease interactome. Submitted (2006)Google Scholar
  56. 56.
    Kierczak, M., Rudnicki, W.R., Komorowski, J.: Construction of rough set-based classifiers for predicting HIV resistance to non-nucleoside reverse transcriptase inhibitors. Manuscript (2006)Google Scholar
  57. 57.
    Bazan, J.G., Skowron, A., Synak, P.: Dynamic reducts as a tool for extracting laws from decision tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS, vol. 869, pp. 346–355. Springer, Heidelberg (1994)Google Scholar
  58. 58.
    Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. International Journal of Approximate Reasoning 25(2), 123–143 (2000)CrossRefMathSciNetzbMATHGoogle Scholar
  59. 59.
    Ågotnes, T., Komorowski, J., Løken, T.: Taming large rule models in rough set approaches. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 193–203. Springer, Heidelberg (1999)Google Scholar
  60. 60.
    Makosa, E.: Rule tuning. Master thesis. The Linnaeus Centre for Bioinformatics, Uppsala University (2005)Google Scholar
  61. 61.
    Düntsch, I.: Statistical evaluation of rough set dependency analysis. Int. J. Human-Computer Studies 46, 589–604 (1997)CrossRefGoogle Scholar
  62. 62.
    Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artificial Intelligence 106, 109–137 (1998)CrossRefMathSciNetzbMATHGoogle Scholar
  63. 63.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  64. 64.
    Skowron, A.: Synthesis of adaptive decision systems from experimental data. In: Aamodt, A., Komorowski, J. (eds.) Fifth Scandinavian Conference on Artificial Intelligence, Trondheim, Norway, pp. 220–238. IOS Press, Amsterdam (1995)Google Scholar
  65. 65.
    Komorowski, J., Øhrn, A., Skowron, A.: ROSETTA rough sets. In: Klösgen, W., Żytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 554–559. Oxford University Press, Oxford (2002)Google Scholar
  66. 66.
    Żytkow, J.M., Rauch, J. (eds.): PKDD 1999. LNCS (LNAI), vol. 1704. Springer, Heidelberg (1999)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Torgeir R. Hvidsten
    • 1
  • Jan Komorowski
    • 1
  1. 1.The Linnaeus Centre for Bioinformatics, Uppsala University, UppsalaSweden

Personalised recommendations