Rough Sets in Ortholog Gene Detection

Galpert Cańizares, Deborah; Millo Sánchez, Reinier; García Lorenzo, María Matilde; Casas Cardoso, Gladys; Grau Abalo, Ricardo; García, Leticia Arco

doi:10.1007/978-3-319-08729-0_15

Deborah Galpert Cańizares¹⁰,
Reinier Millo Sánchez¹⁰,
María Matilde García Lorenzo¹⁰,
Gladys Casas Cardoso¹⁰,
Ricardo Grau Abalo¹⁰ &
…
Leticia Arco García¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8537))

1034 Accesses

Abstract

Ortholog detection should be improved because of the real value of ortholog genes in the prediction of protein functions. Datasets in the binary classification problem can be represented as information systems. We use a gene pair extended similarity relation based on an extension of the Rough Set Theory and aggregated gene similarity measures as gene features, to select feature subsets with the aid of quality measures that take imbalance into account. The proposed procedure can be useful for datasets with few features and discrete parameters. The case reduction obtained from the approximation of ortholog and non-ortholog concepts might be an effective method to cope with extremely high imbalance in supervised classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)
Article MathSciNet Google Scholar
Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowledge-Based Systems 21, 753–763 (2008)
Article Google Scholar
Chen, M.-C., et al.: An information granulation based data mining approach for classifying imbalanced data. Information Sciences 178, 3214–3227 (2008)
Article Google Scholar
Stefanowski, J., Wilk, S.: Combining rough sets and rule based classifiers for handling imbalanced data. Fundamenta Informaticae (2006)
Google Scholar
Liu, J., Hu, Q., Yu, D.: A weighted rough set based method developed for class imbalance learning. Information Sciences 178, 1235–1256 (2008)
Article MathSciNet Google Scholar
Salichos, L., Rokas, A.: Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade. PLoS ONE 6(4), 1–11 (2011)
Article Google Scholar
Östlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Frings, O., Sonnhammer, E.L.L., Roopra, S.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Research (2010)
Google Scholar
Linard, B., et al.: OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinformatics 12(11), 1471–2105 (2011)
Google Scholar
Muller, J., et al.: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)
Article Google Scholar
Dessimoz, C., Cannarozzi, G.M., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.H.: OMA, A comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)
Chapter Google Scholar
Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research 13, 2178–2189 (2003)
Article Google Scholar
Deluca, T.F., et al.: Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics 22, 2044–2046 (2006)
Article Google Scholar
Kamvysselis, M.K.: Computational comparative genomics: genes, regulation, evolution. In: Electrical Engineering and Computer Science, p. 100, Massachusetts Institute of Technology, Massachusetts (2003)
Google Scholar
Fu, Z., et al.: MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement. Journal of Computational Biology 14, 16 (2007)
Article MathSciNet Google Scholar
del Carpio-Muñoz, C.A., Carbajal, J.C.: Folding Pattern Recognition in Proteins Using Spectral Analysis Methods. Genome Informatics 13, 163–172 (2002)
Google Scholar
Galpert, D.: A local-global gene comparison for ortholog detection in two closely related eukaryotes species. Investigación de Operaciones 33(2), 130–140 (2012)
MathSciNet MATH Google Scholar
Millo, R., et al.: Agregación de medidas de similitud para la detección de ortólogos, validación con medidas basadas en la teoría de conjuntos aproximados. Computación y Sistemas 18(1) (2014)
Google Scholar
Deza, E.: Dictionary of Distances. Elsevier (2006)
Google Scholar
Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLOS One 5(6) (2010)
Article Google Scholar
Komorowski, J., Pawlak, Z., Polkowski, L.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making. Springer, Singapore (1999)
Google Scholar
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P.P. (ed.) Advances in Machine Intelligence & Soft-Computing, pp. 17–33 (1997)
Google Scholar
Shulcloper, J.R., Arenas, A.G., Trinidad, J.F.M.: Enfoque lógico combinatorio al reconocimiento de patrones: Selección de variables y clasificación supervisada. Instituto Politécnico Nacional (1995)
Google Scholar
Pawlak, Z.: Vagueness and uncertainty: a rough set perspective. Computational Intelligence: an International Journal 11, 227–232 (1995)
Article MathSciNet Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: 14th International Conference on Machine Learning (1997)
Google Scholar
He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
Koch, E.N., et al.: Conserved rules govern genetic interaction degree across species. Genome Biology 13(7) (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Central “Marta Abreu” de Las Villas, Carretera a Camajuaní km 5½, Santa Clara, Cuba
Deborah Galpert Cańizares, Reinier Millo Sánchez, María Matilde García Lorenzo, Gladys Casas Cardoso, Ricardo Grau Abalo & Leticia Arco García

Authors

Deborah Galpert Cańizares
View author publications
You can also search for this author in PubMed Google Scholar
Reinier Millo Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
María Matilde García Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar
Gladys Casas Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Grau Abalo
View author publications
You can also search for this author in PubMed Google Scholar
Leticia Arco García
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz & Zbigniew W. Raś &
Department of Computer Science and Artificial Intelligence, University of Granada, Calle del Periodista Daniel Saucedo Aranda s/n, 18071, Granada, Spain
Chris Cornelis
DISCo, Università di Milano – Bicocca, Viale Sarca 336 – U14, 20126, Milano, Italy
Davide Ciucci
Dpt. de Matemáticas, University of Càdiz, Spain
Jesús Medina-Moreno
School of Computing and Information Systems, University of Tasmania, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galpert Cańizares, D., Millo Sánchez, R., García Lorenzo, M.M., Casas Cardoso, G., Grau Abalo, R., García, L.A. (2014). Rough Sets in Ortholog Gene Detection. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science(), vol 8537. Springer, Cham. https://doi.org/10.1007/978-3-319-08729-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-08729-0_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08728-3
Online ISBN: 978-3-319-08729-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics