Abstract
The problem of genome annotation (i.e., the establishment of the biological roles of proteins and corresponding genes) is one of the major tasks of postgenomic bioinformatics. This paper reports the development of the previously proposed formalism for the study of the local solvability of the genome annotation problem. Here, we introduce the concepts of elementary motifs, positional independence of motifs, heuristic evaluation of informativeness, and solvability on the sets of elementary motifs. We show that introduction of a linear order in a set of elementary motifs allows us to calculate the irreducible motif sets. The formalism was used in experiments to compute the sets of the most informative motifs for several protein functions.
Similar content being viewed by others
References
I. Yu. Torshin, Bioinformatics in the Post-Genomic Era: Sensing the Change from Molecular Genetics to Personalized Medicine (Nova Biomedical Books, New York, 2009).
I. Yu. Torshin, “On Solvability, Regularity, and Locality of the Problem of Genome Annotation,” Pattern Recogn. Image Anal. 20(3), 386–395 (2010).
K. V. Rudakov and I. Yu. Torshin, “Solving Ability Problems of Protein Secondary Structure Recognition,” Informat. Prim. 4(2), 25–35 (2010).
K. V. Rudakov, “Signs Values Classification Problems in Recognition Problems,” in Proc. Int. Conf. “Intellectualization of Information Processing” IIP-8 (Paphos, Oct. 17–23, 2010).
I. Yu. Torshin, “Motive Analysis in the Problem of Protein Secondary Structure Recognition on the Base of Solvability Criterion,” Proc. Int. Conf. “Intellectualization of Information Processing” IIP-8 (Paphos, Oct. 17–23, 2010).
Yu. I. Zhuravlev, “Set-Theoretical Methods for Logic Algebra,” Probl. Kibernet. 8(1), 25–45 (1962).
Yu. I. Zhuravlev, “On Algebraic Approach for Solving Classification and Recognition Problems,” in Cybernetic Problems (Nauka, Moscow, 1978), Issue 33, pp. 5–68 [in Russian].
K. V. Rudakov, “The Way to Use the Universe Limitations for Researching the Classification Algorithms,” Kibernet., No. 1, 1–5 (1988).
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, “Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium,” Nature Genet. 25, 25–29 (2000).
N. Hulo, C. J. Sigrist, V. Le Saux, P. S. Langendijk-Genevaux, L. Bordoli, A. Gattiker, E. De Castro, P. Bucher, and A. Bairoch, “Resent Improvements to the PROSITE Database,” Nucl. Acids Res. 1(32 Database Issue), D134–7 (2004).
K. V. Vorontsov, “Combinatorial Reliability Theory for Precedent Learning,” Doctoral Dissertation in Mathematics and Physics (Vychislitel’nyi Tsentr RAN, Moscow, 2010).
J. Furnkranz and P. A. Flach, “Roc’ n’ Rule Learning-Towards a Better Understanding of Covering Algorithms,” Mach. Learn. 58(1), 39–77 (2005).
Author information
Authors and Affiliations
Additional information
Ivan Yur’evich Torshin. Born 1972, graduated from the Chemistry Department of Moscow State University (MSU) in 1995. Received Candidate’s degree at the Chemistry Department of Moscow State University in 1997. Senior researcher of the Russian Branch of the Institute of Trace Elements for UNESCO, lecturer at Moscow Institute of Physics and Technology (MIPT) and MSU, a member of the Center of Forecasting and Recognition. Author of 84 papers in reference journals in biology, chemistry, medicine, and computer science, including 3 monographs of the series Bioinformatics in the Post-Genomic Era” (Nova Biomedical Publishers, NY, 2006–2009).
Rights and permissions
About this article
Cite this article
Torshin, I.Y. The study of the solvability of the genome annotation problem on sets of elementary motifs. Pattern Recognit. Image Anal. 21, 652–662 (2011). https://doi.org/10.1134/S1054661811040171
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661811040171