Abstract
The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false negative rate.
In this paper, we built and evaluated eighteen ML models classifying approximately 4300 female participants from the UK Biobank into three categorical risk statuses based on responses for the discretised visceral adipose tissue values from magnetic resonance imaging. We also examined the effect of sampling techniques on classification modelling when dealing with class imbalance.
Results showed that the use of sampling techniques had a significant impact. They not only drove an improvement in predicting patients risk status, but also facilitated an increase in the information contained within each variable. Based on domain experts criteria, the three best models for classification were finally identified.
These encouraging results will guide further developments of classification models for predicting visceral adipose tissue without the need for a costly scan.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5(4), 597–604 (2006)
Gu, J., Zhou, Y., Zuo, X.: Making class bias useful: a strategy of learning from imbalanced data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 287–295. Springer, Heidelberg (2007)
More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv:1608.06048 [stat.AP] (2016)
Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling : which is best for handling unbalanced classes with unequal error costs?. In: Proceedings of the 2007 International Conference on Data Mining, Las Vegas, USA, pp. 35–41 (2007)
Bekkar, M., Taklit, A.A.: Imbalanced data learning approaches review. Int. J. Data Min. Knowl. Manag. Process (IJDKP) 3(4), 15–33 (2013)
Ensemble Learning to Improve Machine Learning Results. https://blog.statsbot.co/ensemble-learning-d1dcd548e936. Accessed 19 Feb 2019
Dzeroski, S., Zenko, B.: Is combining classifiers better than selecting the best one? In: Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco. Morgan Kaufmann (2002)
Choi, J.M.: A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Graduate Thesis and Dissertation, Iowa State University (2010)
Unbalanced Data Is a Problem? No, Balanced Data Is Worse. https://matloff.wordpress.com/2015/09/29/unbalanced-data-is-a-problem-no-balanced-data-is-worse/. Accessed 24 Feb 2019
When should I balance classes in a training data set? https://stats.stackexchange.com/questions/227088/when-should-i-balance-classes-in-a-training-data-set. Accessed 22 Nov 2018
Bharat Rao, R., Fung, G., Rosales, R.: On the dangers of cross-validation. An experimental evaluation. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 588–596 (2008)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Faith, J., Mintram, R., Angelova, M.: Gene expression targeted projection pursuit for visualizing gene expression data classifications. Bioinformatics 22(21), 2667–2673 (2006)
Information gain which test is more informative? https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf. Accessed 29 Mar 2019
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Wang, Y.C., McPherson, K., Marsh, T., Gortmaker, S.L., Brown, M.: Health and economic burden of the projected obesity trends in the USA and the UK. Lancet 378(9793), 815–825 (2011)
Sam, S., Mazzone, T.: Adipose tissue changes in obesity and the impact on metabolic function. Transl. Res. 164(4), 284–292 (2014)
Dattilo, A.M., Kris-Etherton, P.M.: Effects of weight reduction on blood lipids and lipoproteins: a meta-analysis. Am. J. Clin. Nutr. 56(2), 320–328 (1992)
Fox, C.S., et al.: Abdominal visceral and subcutaneous adipose tissue compartments. Circulation 116(1), 39–48 (2007)
Després, J.-P., Lemieux, I., Bergeron, J., Pibarot, P., Mathieu, P., Larose, E., Rodés-Cabau, J., Bertrand, O.F., Poirier, P.: Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk. Arterioscler. Thromb. Vasc. Biol. 28(6), 1039–1049 (2008)
Chin, S.-H., Kahathuduwa, C.N., Binks, M.: Physical activity and obesity: what we know and what we need to know*. Obes. Rev. 17(12), 1226–1244 (2016)
Golabi, P., Bush, H., Younossi, Z.M.: Treatment strategies for nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Clin. Liver Dis. 21(4), 739–753 (2017)
Uusitupa, M., Lindi, V., Louheranta, A., Salopuro, T., Lindström, J., Tuomilehto, J.: Long-term improvement in insulin sensitivity by changing lifestyles of people with impaired glucose tolerance. Diabetes 52(10), 2532–2538 (2003)
Brouwers, B., Hesselink, M.K.C., Schrauwen, P., Schrauwen-Hinderling, V.B.: Effects of exercise training on intrahepatic lipid content in humans. Diabetologia 59(10), 2068–2079 (2016)
Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., et al.: UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12(3), e1001779 (2015)
Information gain, mutual information and related measures - Cross Validated. https://stats.stackexchange.com/questions/13389/information-gain-mutual-information-and-related-measures. Accessed 22 Oct 2018
Haddow, C., Perry, J., Durrant, M., Faith, J.: Predicting functional residues of protein sequence alignments as a feature selection task. Int. J. Data Min. Bioinform. 5(6), 691–705 (2011)
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II (2003)
Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)
Zhang H.: The optimality of Naive Bayes. In: American Association for Artificial Intelligence (2004)
Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59(1–2), 161–205 (2005)
Ayer, T., Chhatwal, F., Alagoz, O., Kahn, C.E., Woods, R.W., Burnside, E.S.: Comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radio Graphics 30(1), 13–22 (2010)
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier Inc., Amsterdam (2005)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Jonsdottir, T., Hvannberg, E.T., Sigurdsson, H., Sigurdsson, S.: The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. Expert Syst. Appl. 34(1), 108–118 (2008)
Maheshwari, S., Agrawal, J., Sharma, S.: A new approach for classification of highly imbalanced data sets using evolutionary algorithms. Int. J. Sci. Eng. Res. 2(7), 1–5 (2011)
Computing Precision and Recall for Multi-class Classification Problems. http://text-analytics101.rxnlp.com/2014/10/computing-precision-and-recall-for.html. Accessed 02 Aug 2018
Parkinson, J.R. et al.: Visceral adipose tissue, thigh adiposity and liver fat fraction: a cross sectional analysis of the UK Biobank. UK Biobank (2019)
Bagging and Random Forest Ensemble Algorithms for Machine Learning. https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/. Accessed 22 Oct 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Aldraimli, M. et al. (2020). Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases. In: Henriques, J., Neves, N., de Carvalho, P. (eds) XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019. MEDICON 2019. IFMBE Proceedings, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-030-31635-8_81
Download citation
DOI: https://doi.org/10.1007/978-3-030-31635-8_81
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31634-1
Online ISBN: 978-3-030-31635-8
eBook Packages: EngineeringEngineering (R0)