Rough Sets in Ortholog Gene Detection

Selection of Feature Subsets and Case Reduction Considering Imbalance
  • Deborah Galpert Cańizares
  • Reinier Millo Sánchez
  • María Matilde García Lorenzo
  • Gladys Casas Cardoso
  • Ricardo Grau Abalo
  • Leticia Arco García
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8537)


Ortholog detection should be improved because of the real value of ortholog genes in the prediction of protein functions. Datasets in the binary classification problem can be represented as information systems. We use a gene pair extended similarity relation based on an extension of the Rough Set Theory and aggregated gene similarity measures as gene features, to select feature subsets with the aid of quality measures that take imbalance into account. The proposed procedure can be useful for datasets with few features and discrete parameters. The case reduction obtained from the approximation of ortholog and non-ortholog concepts might be an effective method to cope with extremely high imbalance in supervised classification.


Ortholog Detection Rough Sets Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowledge-Based Systems 21, 753–763 (2008)CrossRefGoogle Scholar
  3. 3.
    Chen, M.-C., et al.: An information granulation based data mining approach for classifying imbalanced data. Information Sciences 178, 3214–3227 (2008)CrossRefGoogle Scholar
  4. 4.
    Stefanowski, J., Wilk, S.: Combining rough sets and rule based classifiers for handling imbalanced data. Fundamenta Informaticae (2006)Google Scholar
  5. 5.
    Liu, J., Hu, Q., Yu, D.: A weighted rough set based method developed for class imbalance learning. Information Sciences 178, 1235–1256 (2008)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Salichos, L., Rokas, A.: Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade. PLoS ONE 6(4), 1–11 (2011)CrossRefGoogle Scholar
  7. 7.
    Östlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Frings, O., Sonnhammer, E.L.L., Roopra, S.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Research (2010)Google Scholar
  8. 8.
    Linard, B., et al.: OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinformatics 12(11), 1471–2105 (2011)Google Scholar
  9. 9.
    Muller, J., et al.: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)CrossRefGoogle Scholar
  10. 10.
    Dessimoz, C., Cannarozzi, G.M., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.H.: OMA, A comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research 13, 2178–2189 (2003)CrossRefGoogle Scholar
  12. 12.
    Deluca, T.F., et al.: Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics 22, 2044–2046 (2006)CrossRefGoogle Scholar
  13. 13.
    Kamvysselis, M.K.: Computational comparative genomics: genes, regulation, evolution. In: Electrical Engineering and Computer Science, p. 100, Massachusetts Institute of Technology, Massachusetts (2003)Google Scholar
  14. 14.
    Fu, Z., et al.: MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement. Journal of Computational Biology 14, 16 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    del Carpio-Muñoz, C.A., Carbajal, J.C.: Folding Pattern Recognition in Proteins Using Spectral Analysis Methods. Genome Informatics 13, 163–172 (2002)Google Scholar
  16. 16.
    Galpert, D.: A local-global gene comparison for ortholog detection in two closely related eukaryotes species. Investigación de Operaciones 33(2), 130–140 (2012)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Millo, R., et al.: Agregación de medidas de similitud para la detección de ortólogos, validación con medidas basadas en la teoría de conjuntos aproximados. Computación y Sistemas 18(1) (2014)Google Scholar
  18. 18.
    Deza, E.: Dictionary of Distances. Elsevier (2006)Google Scholar
  19. 19.
    Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLOS One 5(6) (2010)CrossRefGoogle Scholar
  20. 20.
    Komorowski, J., Pawlak, Z., Polkowski, L.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making. Springer, Singapore (1999)Google Scholar
  21. 21.
    Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P.P. (ed.) Advances in Machine Intelligence & Soft-Computing, pp. 17–33 (1997)Google Scholar
  22. 22.
    Shulcloper, J.R., Arenas, A.G., Trinidad, J.F.M.: Enfoque lógico combinatorio al reconocimiento de patrones: Selección de variables y clasificación supervisada. Instituto Politécnico Nacional (1995)Google Scholar
  23. 23.
    Pawlak, Z.: Vagueness and uncertainty: a rough set perspective. Computational Intelligence: an International Journal 11, 227–232 (1995)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: 14th International Conference on Machine Learning (1997)Google Scholar
  25. 25.
    He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  26. 26.
    Koch, E.N., et al.: Conserved rules govern genetic interaction degree across species. Genome Biology 13(7) (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Deborah Galpert Cańizares
    • 1
  • Reinier Millo Sánchez
    • 1
  • María Matilde García Lorenzo
    • 1
  • Gladys Casas Cardoso
    • 1
  • Ricardo Grau Abalo
    • 1
  • Leticia Arco García
    • 1
  1. 1.Computer Science DepartmentUniversidad Central “Marta Abreu” de Las VillasSanta ClaraCuba

Personalised recommendations