Skip to main content

On the Procedures of Generation of Numerical Features over Partitions of Sets of Objects in the Problem of Predicting Numerical Target Variables

Abstract

Analysis of criteria for the solvability/regularity of problems and of the correctness of algorithms is applied here to the problem of prediction of the values of numerical variables. It is shown that partial regularity is a necessary and sufficient condition for the solvability of the corresponding system of the classification problems. Cross-validation experiments conducted on several datasets from the field of biomedicine (non-invasive diagnostics of magnesium concentration in blood plasma), bioinformatics (prediction of the protein secondary structure), and solid-state physics (prediction of the properties of high-temperature superconductors) have demonstrated the effectiveness of the developed methods for generating “synthetic” informative numerical features and for increasing the accuracy of prediction of the numerical target variables.

This is a preview of subscription content, access via your institution.

Fig. 1.

REFERENCES

  1. 1

    I. Yu. Torshin and K. V. Rudakov, “On the theoretical basis of metric analysis of poorly formalized problems of recognition and classification,” Pattern Recogn. Image Anal. 25 (4), 577–587 (2015).

    Article  Google Scholar 

  2. 2

    Yu. I. Zhuravlev, “Correct algebras over sets of incorrect (heuristic) algorithms. I,” Cybern. 13 (4), 489–497 (1977).

    MATH  Google Scholar 

  3. 3

    K. V. Rudakov, “On some universal constraints for classification algorithms”, USSR Comput. Math. Math. Phys. 26 (6), 75–81 (1986).

    Article  Google Scholar 

  4. 4

    I. Yu. Torshin and K. V. Rudakov, “Combinatorial analysis of the solvability properties of the problems of recognition and completeness of algorithmic models. Part 1: Factorization approach,” Pattern Recogn. Image Anal. 27 (1), 16–28 (2017).

    Article  Google Scholar 

  5. 5

    I. Yu. Torshin and K. V. Rudakov, “Combinatorial analysis of the solvability properties of the problems of recognition and completeness of algorithmic models. Part 2: Metric approach within the framework of the theory of classification of feature values,” Pattern Recogn. Image Anal. 27 (2), 184–199 (2017).

    Article  Google Scholar 

  6. 6

    I. Yu. Torshin and K. V. Rudakov, “On metric spaces arising during formalization of recognition and classification problems. Part 1: Properties of compactness,” Pattern Recogn. Image Anal. 26 (2), 274–284 (2016).

    Article  Google Scholar 

  7. 7

    I. Yu. Torshin and K. V. Rudakov, “On metric spaces arising during formalization of problems of recognition and classification. Part 2: Density properties,” Pattern Recogn. Image Anal. 26 (3), 483–496 (2016).

    Article  Google Scholar 

  8. 8

    A. G. Ivakhnenko and V. G. Lapa, Cybernetic Predictive Devices (Naukova Dumka, Kiev, 1965) [in Russian].

    Google Scholar 

  9. 9

    K. V. Vorontsov, Combinatorial Theory of Reliability of Learning by Precedents, Doctoral Dissertation in Mathematics and Physics (Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 2010).

  10. 10

    A. N. Kolmogorov, “Combinatorial foundations of information theory and the calculus of probabilities,” Russ. Math. Surv. 38 (4), 29–40 (1983).

    MathSciNet  Article  Google Scholar 

  11. 11

    R. J. Solomonoff, “A formal theory of inductive inference. Part I,” Inf. Control 7 (1), 1–22 (1964). https://doi.org/10.1016/S0019-9958(64)90223-2

    MathSciNet  Article  MATH  Google Scholar 

  12. 12

    I. Yu. Torshin, “On solvability, regularity, and locality of the problem of genome annotation,” Pattern Recogn. Image Anal. 20 (3), 386–395 (2010).

    Article  Google Scholar 

  13. 13

    I. Yu. Torshin, “The study of the solvability of the genome annotation problem on sets of elementary motifs,” Pattern Recogn. Image Anal. 21 (4), 652–662 (2011).

    Article  Google Scholar 

  14. 14

    K. V. Rudakov and I. Yu. Torshin, “The motif information analysis based on the solvability criterion for the protein secondary structure recognition,” Inform. Primen. (Inf. Appl.) 6 (1), 79–90 (2012) [in Russian].

  15. 15

    N. L. Bol’shev and N. V. Smirnov, Mathematical Statistics Tables (Nauka, Moscow, 1983) [in Russian].

    MATH  Google Scholar 

  16. 16

    A. N. Kolmogoroff, “Sulla determinazione empirica di una legge di distribuzione,” Giorn. Ist. Ital. Attuari 4 (1), 83–91 (1933).

    MATH  Google Scholar 

  17. 17

    I. Yu. Torshin, “Optimal dictionaries of the final information on the basis of the solvability criterion and their applications in bioinformatics,” Pattern Recogn. Image Anal. 23 (2), 319–327 (2013).

    Article  Google Scholar 

  18. 18

    M. B. Nevel’son and R. Z. Has’minskii, Stochastic Approximation and Recursive Estimation, Translations of Math. Monographs, Vol. 47 (American Mathematical Society, Providence, RI, 1973; Nauka, Moscow, 1972).

  19. 19

    E. Yu. Egorova, I. Yu. Torshin, O. A. Gromova, A. I. Martynov, “The use of cardiointervalography for diagnostic screening and evaluation of the efficiency of correction of magnesium deficiency and comorbid conditions,” Terapevticheskiy Arkhiv (Ther. Arch.) 87 (8), 16–28 (2015) [in Russian].

    Article  Google Scholar 

  20. 20

    I. Yu. Torshin, Sensing The Change: From Molecular Genetics To Personalized Medicine, in Bioinformatics in the Post-Genomic Era Series (Nova Science Publ., New York, 2009). ISBN 1-60692-217-0

  21. 21

    O. A. Gromova and I. Yu. Torshin, Magnesium and thediseases of civilization” (GEOTAR-Media, Moscow, 2018) [in Russian]. ISBN 978-5-9704-4527-3

    Google Scholar 

  22. 22

    I. Yu. Torshin, V. A. Aleshin, and E. V. Antipov, “Synthesis and properties of the high-temperature superconductor HgBa2CuO4+d,” Sverkhprovodimost’: Fizika, Khimiya, Tekhnika (Supercond.: Phys., Chem., Technol.) 7 (10–12), 1579–1587 (1994) [in Russian].

Download references

ACKNOWLEDGMENTS

We are grateful to Prof. O.A. Gromova for useful discussions on the expert analysis of biomedical data.

Funding

This work was supported by the Russian Foundation for Basic Research, project nos. 19-07-00356, 18-07-01022, 17-07-01419, 16-07-01129, and 18-07-00944.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to I. Yu. Torshin or K. V. Rudakov.

Ethics declarations

We declare that we have no conflict of interests related to the preparation and publication of this article.

Additional information

Ivan Yur’evich Torshin. Born 1972. Graduated from the Department of Chemistry, Moscow State University, in 1995. Received candidates degrees in chemistry in 1997 and in physics and mathematics in 2011. Currently is a senior researcher at Dorodnicyn Computing Centre, an associate professor at Moscow Institute of Physics and Technology, lecturer at the Faculty of Computational Mathematics and Cybernetics, Moscow State University, leading scientist at the Russian Branch of the Trace Elements Institute for UNESCO, and a member of the Center of Forecasting and Recognition. Author of 450 publications in peer-reviewed journals in biology, chemistry, medicine, and informatics and of 9 monographs: 5 in Russian and 4 in English (the series “Bioinformatics in Post-genomic Era”, Nova Biomedical Publishers, NY, 2006-2009).

Konstantin Vladimirovich Rudakov. Born 1954. Russian mathematician, corresponding member of the Russian Academy of Sciences, Head of the Department of Computational Methods of Forecasting at the Dorodnicyn Computing Centre, Informatics and Control Federal Research Center, Russian Academy of Sciences, and Head of the Intelligent Systems Chair at the Moscow Institute of Physics and Technology.

Translated by I. Nikitin

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Torshin, I.Y., Rudakov, K.V. On the Procedures of Generation of Numerical Features over Partitions of Sets of Objects in the Problem of Predicting Numerical Target Variables. Pattern Recognit. Image Anal. 29, 654–667 (2019). https://doi.org/10.1134/S1054661819040175

Download citation

Keywords:

  • algebraic approach
  • regularity of problems
  • topologies and lattices
  • subquadratic algorithms
  • big data