Skip to main content

Rough Sets in Ortholog Gene Detection

Selection of Feature Subsets and Case Reduction Considering Imbalance

  • Conference paper
Rough Sets and Intelligent Systems Paradigms

Abstract

Ortholog detection should be improved because of the real value of ortholog genes in the prediction of protein functions. Datasets in the binary classification problem can be represented as information systems. We use a gene pair extended similarity relation based on an extension of the Rough Set Theory and aggregated gene similarity measures as gene features, to select feature subsets with the aid of quality measures that take imbalance into account. The proposed procedure can be useful for datasets with few features and discrete parameters. The case reduction obtained from the approximation of ortholog and non-ortholog concepts might be an effective method to cope with extremely high imbalance in supervised classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)

    Article  MathSciNet  Google Scholar 

  2. Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowledge-Based Systems 21, 753–763 (2008)

    Article  Google Scholar 

  3. Chen, M.-C., et al.: An information granulation based data mining approach for classifying imbalanced data. Information Sciences 178, 3214–3227 (2008)

    Article  Google Scholar 

  4. Stefanowski, J., Wilk, S.: Combining rough sets and rule based classifiers for handling imbalanced data. Fundamenta Informaticae (2006)

    Google Scholar 

  5. Liu, J., Hu, Q., Yu, D.: A weighted rough set based method developed for class imbalance learning. Information Sciences 178, 1235–1256 (2008)

    Article  MathSciNet  Google Scholar 

  6. Salichos, L., Rokas, A.: Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade. PLoS ONE 6(4), 1–11 (2011)

    Article  Google Scholar 

  7. Östlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Frings, O., Sonnhammer, E.L.L., Roopra, S.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Research (2010)

    Google Scholar 

  8. Linard, B., et al.: OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinformatics 12(11), 1471–2105 (2011)

    Google Scholar 

  9. Muller, J., et al.: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)

    Article  Google Scholar 

  10. Dessimoz, C., Cannarozzi, G.M., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.H.: OMA, A comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research 13, 2178–2189 (2003)

    Article  Google Scholar 

  12. Deluca, T.F., et al.: Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics 22, 2044–2046 (2006)

    Article  Google Scholar 

  13. Kamvysselis, M.K.: Computational comparative genomics: genes, regulation, evolution. In: Electrical Engineering and Computer Science, p. 100, Massachusetts Institute of Technology, Massachusetts (2003)

    Google Scholar 

  14. Fu, Z., et al.: MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement. Journal of Computational Biology 14, 16 (2007)

    Article  MathSciNet  Google Scholar 

  15. del Carpio-Muñoz, C.A., Carbajal, J.C.: Folding Pattern Recognition in Proteins Using Spectral Analysis Methods. Genome Informatics 13, 163–172 (2002)

    Google Scholar 

  16. Galpert, D.: A local-global gene comparison for ortholog detection in two closely related eukaryotes species. Investigación de Operaciones 33(2), 130–140 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Millo, R., et al.: Agregación de medidas de similitud para la detección de ortólogos, validación con medidas basadas en la teoría de conjuntos aproximados. Computación y Sistemas 18(1) (2014)

    Google Scholar 

  18. Deza, E.: Dictionary of Distances. Elsevier (2006)

    Google Scholar 

  19. Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLOS One 5(6) (2010)

    Article  Google Scholar 

  20. Komorowski, J., Pawlak, Z., Polkowski, L.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making. Springer, Singapore (1999)

    Google Scholar 

  21. Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P.P. (ed.) Advances in Machine Intelligence & Soft-Computing, pp. 17–33 (1997)

    Google Scholar 

  22. Shulcloper, J.R., Arenas, A.G., Trinidad, J.F.M.: Enfoque lógico combinatorio al reconocimiento de patrones: Selección de variables y clasificación supervisada. Instituto Politécnico Nacional (1995)

    Google Scholar 

  23. Pawlak, Z.: Vagueness and uncertainty: a rough set perspective. Computational Intelligence: an International Journal 11, 227–232 (1995)

    Article  MathSciNet  Google Scholar 

  24. Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: 14th International Conference on Machine Learning (1997)

    Google Scholar 

  25. He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  26. Koch, E.N., et al.: Conserved rules govern genetic interaction degree across species. Genome Biology 13(7) (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Galpert Cańizares, D., Millo Sánchez, R., García Lorenzo, M.M., Casas Cardoso, G., Grau Abalo, R., García, L.A. (2014). Rough Sets in Ortholog Gene Detection. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science(), vol 8537. Springer, Cham. https://doi.org/10.1007/978-3-319-08729-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08729-0_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08728-3

  • Online ISBN: 978-3-319-08729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics