The authors propose an approach to the construction of classifiers in the class of Random Forest algorithms. A genetic algorithm is used to determine the optimal combination and composition of ensembles of features in the construction of forest trees. The principles of the group method of data handling are used to optimize the structure of the trees. Optimization of the tree voting procedure in the forest is implemented by the analytic hierarchy process. Examples of the use of the proposed algorithm for the detection of pathologies on medical images are provided, as well as the classification results in comparison with other known analogs.
Similar content being viewed by others
References
I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput. Sci., Vol. 2, Iss. 3, 160 (2021). https://doi.org/10.1007/s42979-021-00592-x.
A. Mayr, H. Binder, O. Gefeller, and M. Schmid, “The evolution of boosting algorithms. From machine learning to statistical modelling,” Methods Inf. Med., Vol. 53, No. 06, 419–427 (2014). https://doi.org/10.3414/ME13-01-0122.
A. H. Osman and H. M. Aljahdali, “An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model,” IEEE Access, Vol. 8, 39165–39174 (2020). https://doi.org/10.1109/ACCESS.2020.2976149.
T.-K. Ho, “Random decision forests,” in: Proc. 3rd Intern. Conf. on Document Analysis and Recognition (Montreal, QC, Canada, 14–16 August 1995), Vol. 1, IEEE (1995), pp. 278–282. https://doi.org/10.1109/ICDAR.1995.598994.
Ie. Nastenko, V. Maksymenko, S. Potashev, V. Pavlov, V. Babenko, S. Rysin, O. Matviichuk, and V. Lazoryshinets, “Random forest algorithm construction for the diagnosis of coronary heart disease based on echocardiography video data streams,” Innov. Biosyst. Bioeng., Vol. 5, No. 1, 61–69 (2021). https://doi.org/10.20535/ibb.2021.5.1.225794.
B. Pavlyshenko “Using stacking approaches for machine learning models,” in: 2018 IEEE Second Intern.Conf. on Data Stream Mining & Processing (DSMP) (Lviv, Ukraine, August 21–25, 2018), IEEE (2018), pp. 255–258. https://doi.org/10.1109/DSMP.2018.8478522.
S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa, “Conceptual understanding of convolutional neural network — a deep learning approach,” Procedia Comput. Sci., Vol. 132, 679–688 (2018). https://doi.org/10.1016/j.procs.2018.05.069.
J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognition, Vol. 77, 354–377 (2018). https://doi.org/10.1016/j.patcog.2017.10.013.
A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,” Physica D: Nonlinear Phenomena, Vol. 404, 132306 (2020). https://doi.org/10.1016/j.physd.2019.132306.
C. S. Bojer and J. P. Meldgaard, “Kaggle forecasting competitions: An overlooked learning opportunity,” Int. J. Forecast., Vol. 37, Iss. 2, 587–603 (2021). https://doi.org/10.1016/j.ijforecast.2020.07.007.
T. Gururaj, Y. M. Vishrutha, M. Uma, D. Rajeshwari, and B. K. Ramya, “Prediction of lung cancer risk using random forest algorithm based on Kaggle data set,” Int. J. Recen. Technol. Eng., 2020. Vol. 8, Iss. 6, 1623–1630. https://doi.org/10.35940/ijrte.F7879.038620.
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. Snchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, Vol. 42, 60–88 (2017). https://doi.org/10.1016/j.media.2017.07.005.
Ie. Nastenko, V. Pavlov, O. Nosovets, V. Kruglyi, M. Honcharuk, A. Karliuk, D. Hrishko, O. Trofimenko, and V. Babenko, “Texture analysis application in medical images classification task solving,” Biomedical Engineering and Technology, No. 4, 69–82 (2020). https://doi.org/10.20535/2617-8974.2020.4.221876.
Y. Cosgun, A. Yildirim, M. Yucel, A. E. Karakoc, G. Koca, A. Gonultas, G. Gursoy, H. Ustun, and M. Korkmaz, “Evaluation of invasive and noninvasive methods for the diagnosis of helicobacter pylori infection,” Asian Pac. J. Cancer Prev., Vol. 17, No. 12, 5265–5272 (2016). DOI: https://doi.org/10.22034/APJCP.2016.17.12.5265.
M. Norouzi, M. D. Collins, D. J. Fleet, and P. Kohli, “CO2 Forest: improved random forest by continuous optimization of oblique splits,” arXiv:1506.06155v2 [cs.LG] 24 Jun (2015). https://doi.org/10.48550/arXiv.1506.06155.
A. Chaudhary, S. Kolhe, and R. Kamal, “An improved random forest classifier for multi-class classification,” Inf. Process. Agric., Vol. 3, Iss. 4, 215–222 (2016). https://doi.org/10.1016/j.inpa.2016.08.002.
E. Elyan and M. M. Gaber, “A genetic algorithm approach to optimising random forests applied to class engineered data,” Inf. Sci., Vol. 384, 220–234 (2017). https://doi.org/10.1016/j.ins.2016.08.007.
I. Nastenko, V. Maksymenko, I. Dykan, O. Nosovets, B. Tarasiuk, V. Pavlov, V. Babenko, V. Kruhlyi, V. Soloduschenko, M. Dyba, and V. Umanets, “Liver pathological states identification in diffuse diseases with self-organization models based on ultrasound images texture features,” in: 2020 IEEE 15th Intern. Conf. on Computer Sciences and Information Technologies (CSIT) (Zbarazh, Ukraine, September 23–26, 2020), Vol. 2, IEEE (2020), pp. 21–25. https://doi.org/10.1109/CSIT49958.2020.9321999.
I. Nastenko, V. Maksymenko, A. Galkin, V. Pavlov, O. Nosovets, I. Dykan, B. Tarasiuk, V. Babenko, V. Umanets, O. Petrunina, and D. Klymenko, “Liver pathological states identification with self-organization models based on ultrasound images texture features,” in: N. Shakhovska and M. O. Medykovskyy (eds.), Advances in Intelligent Systems and Computing V, CSIT 2020; Advances in Intelligent Systems and Computing, Vol. 1293, Springer, Cham (2021), pp. 401–418. https://doi.org/10.1007/978-3-030-63270-0_26.
L. Anastasakis and N. Mort, “The development of self-organization techniques in modelling: A review of the group method of data handling (GMDH),” Research Report No. 813, University of Sheffield, United Kingdom (2001). URL: https://gmdhsoftware.com/GMDH_%20Anastasakis_and_Mort_2001.pdf.
E. Furman, Y. Kye, and J. Su, “Computing the Gini index: A note,” Economics Letters, Vol. 185, 108753 (2019). https://doi.org/10.1016/j.econlet.2019.108753.
X. Dong, M. Qian, and R. Jiang, “Packet classification based on the decision tree with information entropy,” J. Supercomput., Vol. 76, Iss. 6, 4117–4131 (2020). https://doi.org/10.1007/s11227-017-2227-z.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, Vol. 21, No. 1, 6 (2020). https://doi.org/10.1186/s12864-019-6413-7.
L. Breiman, “Bagging predictors,” Technical Report No. 421, University of California, Department of Statistics, Berkeley, California (1994).
L. Breiman, “Random forests,” Mach. Learn., Vol. 45, Iss. 1, 5–32 (2001). https://doi.org/10.1023/A:1010933404324.
L. Breiman, “Bagging predictors,” Mach. Learn., Vol. 24, Iss. 2, 123–140 (1996). https://doi.org/10.1007/BF00058655.
D. E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison-Wesley Longman Publishing Co., Inc., Boston (1989).
O. Nosovets, V. Babenko, I. Davydovych, O. Petrunina, O. Averianova, and L. D. Zyonh, “Personalized clinical treatment selection using genetic algorithm and analytic hierarchy process,” Adv. Sci. Technol. Eng. Syst. J., Vol. 6, No. 4, 406–413 (2021). https://doi.org/10.25046/aj060446.
T. L. Saaty, Decision Making for Leaders: The Analytic Hierarchy Process for Decisions in a Complex World, RWS Publications, Pittsburgh (1990).
S. Sperandei, “Understanding logistic regression analysis,” Biochem. Med., Vol. 24, Iss. 1, 12–18 (2014). https://doi.org/10.11613/BM.2014.003.
J. Žižka, F. Dařena, and A. Svoboda, “Adaboost,” in: Text Mining with Machine Learning, CRC Press, Boca Raton (2019), pp. 201–210. https://doi.org/10.1201/9780429469275-9.
O. Petrunina, D. Shevaga, V. Babenko, V. Pavlov, S. Rysin, and I. Nastenko, “Comparative analysis of classification algorithms in the analysis of medical images from speckle tracking echocardiography video data,” Innov. Biosyst. Bioeng., Vol. 5, No. 3, 153–166 (2021). https://doi.org/10.20535/ibb.2021.5.3.234990.
Ie. Nastenko, V. Maksymenko, S. Potashev, V. Pavlov, V. Babenko, S. Rysin, O. Matviichuk, and V. Lazoryshinets, “Group method of data handling application in constructing of coronary heart disease diagnosing algorithms,” Biomedical Engineering and Technology, No. 5, 1–9 (2021). https://doi.org/10.20535/2617-8974.2021.5.227141.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Kibernetyka ta Systemnyi Analiz, No. 2, March–April, 2023, pp. 190–202.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Babenko, V., Nastenko, I., Pavlov, V. et al. Classification of Pathologies on Medical Images Using the Algorithm of Random Forest of Optimal-Complexity Trees. Cybern Syst Anal 59, 346–358 (2023). https://doi.org/10.1007/s10559-023-00569-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10559-023-00569-z