Abstract
Ortholog detection should be improved because of the real value of ortholog genes in the prediction of protein functions. Datasets in the binary classification problem can be represented as information systems. We use a gene pair extended similarity relation based on an extension of the Rough Set Theory and aggregated gene similarity measures as gene features, to select feature subsets with the aid of quality measures that take imbalance into account. The proposed procedure can be useful for datasets with few features and discrete parameters. The case reduction obtained from the approximation of ortholog and non-ortholog concepts might be an effective method to cope with extremely high imbalance in supervised classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)
Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowledge-Based Systems 21, 753–763 (2008)
Chen, M.-C., et al.: An information granulation based data mining approach for classifying imbalanced data. Information Sciences 178, 3214–3227 (2008)
Stefanowski, J., Wilk, S.: Combining rough sets and rule based classifiers for handling imbalanced data. Fundamenta Informaticae (2006)
Liu, J., Hu, Q., Yu, D.: A weighted rough set based method developed for class imbalance learning. Information Sciences 178, 1235–1256 (2008)
Salichos, L., Rokas, A.: Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade. PLoS ONE 6(4), 1–11 (2011)
Östlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Frings, O., Sonnhammer, E.L.L., Roopra, S.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Research (2010)
Linard, B., et al.: OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinformatics 12(11), 1471–2105 (2011)
Muller, J., et al.: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)
Dessimoz, C., Cannarozzi, G.M., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.H.: OMA, A comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)
Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research 13, 2178–2189 (2003)
Deluca, T.F., et al.: Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics 22, 2044–2046 (2006)
Kamvysselis, M.K.: Computational comparative genomics: genes, regulation, evolution. In: Electrical Engineering and Computer Science, p. 100, Massachusetts Institute of Technology, Massachusetts (2003)
Fu, Z., et al.: MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement. Journal of Computational Biology 14, 16 (2007)
del Carpio-Muñoz, C.A., Carbajal, J.C.: Folding Pattern Recognition in Proteins Using Spectral Analysis Methods. Genome Informatics 13, 163–172 (2002)
Galpert, D.: A local-global gene comparison for ortholog detection in two closely related eukaryotes species. Investigación de Operaciones 33(2), 130–140 (2012)
Millo, R., et al.: Agregación de medidas de similitud para la detección de ortólogos, validación con medidas basadas en la teoría de conjuntos aproximados. Computación y Sistemas 18(1) (2014)
Deza, E.: Dictionary of Distances. Elsevier (2006)
Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLOS One 5(6) (2010)
Komorowski, J., Pawlak, Z., Polkowski, L.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making. Springer, Singapore (1999)
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P.P. (ed.) Advances in Machine Intelligence & Soft-Computing, pp. 17–33 (1997)
Shulcloper, J.R., Arenas, A.G., Trinidad, J.F.M.: Enfoque lógico combinatorio al reconocimiento de patrones: Selección de variables y clasificación supervisada. Instituto Politécnico Nacional (1995)
Pawlak, Z.: Vagueness and uncertainty: a rough set perspective. Computational Intelligence: an International Journal 11, 227–232 (1995)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: 14th International Conference on Machine Learning (1997)
He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Koch, E.N., et al.: Conserved rules govern genetic interaction degree across species. Genome Biology 13(7) (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Galpert Cańizares, D., Millo Sánchez, R., García Lorenzo, M.M., Casas Cardoso, G., Grau Abalo, R., García, L.A. (2014). Rough Sets in Ortholog Gene Detection. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science(), vol 8537. Springer, Cham. https://doi.org/10.1007/978-3-319-08729-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-08729-0_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08728-3
Online ISBN: 978-3-319-08729-0
eBook Packages: Computer ScienceComputer Science (R0)