Abstract
Cataract is a very common eye disease and the most significant cause of blindness. In consideration of its burden on society, the focus was put on testing the risk factors of cataract and building robust machine learning models in which these factors can be utilized to predict the risk of cataract. The data used herein was collected by a Chinese physical examination center located in Shanghai. It contains more than 120,000 examinees and about 500 physical examination metrics. Firstly, association rules were adopted to filter 39 abnormalities which are more likely to incur the risk of cataract, and the significance of these abnormalities was tested with univariate analysis and multivariate analysis. The test results indicate that age, diabetes, refractive error, retinal arteriosclerosis, thyroid nodules, and incomplete mammary gland degeneration significantly increase the possibility of cataract. Various machine learning models were compared in terms of their performance in predicting the risk of cataract based on these six factors, among which the logistic regression model and the decision-tree based ensemble methods outperform others. The test set AUC of these models can reach 0.84.
Similar content being viewed by others
References
Assmann G, Cullen P, Schulte H (2002). Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation 105(3):310–315.
Breiman L (2001). Random forests. Machine Learning 45(1):5–32.
Chang J R (2011). Risk factors associated with incident cataracts and cataract surgery in the Age-related Eye Disease Study (AREDS): AREDS report number 32. Ophthalmology 118(11): 2113–2119.
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16: 321–357.
Conroy R M, Pyörälä K, Fitzgerald A P, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetière P, Jousilahti P, Keil U, Njølstad I, Oganov RG, Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L, Graham IM; SCORE project group (2003). Estimation of ten-year risk of fatal cardiovascular disease in Europe: The SCORE project. European Heart Journal 24(11): 987–1003.
Cumming R G, Mitchell P (1997). Alcohol, smoking, and cataracts: The Blue Mountains eye study. Archives of Ophthalmology 115(10): 1296–1303.
Foster A (2000). Vision 2020: The cataract challenge. Community Eye Health 13(34): 17–19.
Harding J J, Harding R S, Egerton M (1989). Risk factors for cataract in Oxfordshire: Diabetes, peripheral neuropathy, myopia, glaucoma and diarrhoea. Acta Ophthalmologica 67(5): 510–517.
Heyningen V R (1972). The human lens: I. A comparison of cataracts extracted in Oxford (England) and Shikarpur (W. Pakistan). Experimental Eye Research 13(2): 136–147.
Hiller R, Sperduto R D, Ederer F (1986). Epidemiologic associations with nuclear, cortical, and posterior subcapsular cataracts. American Journal of Epidemiology 124(6): 916–925.
Hodge W G, Whitcher J P, Satariano W (1995). Risk factors for age-related cataracts. Epidemiologic Reviews 17(2): 336–346.
Javitt J C, Wang F, West S K (1996). Blindness due to cataract: Epidemiology and prevention. Annual Review of Public Health 17: 159–177.
Fei Jiang, Yong Jiang, Hui Zhi, Yi Dong, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong, Haipeng Shen, Yongjun Wang (2017). Artificial intelligence in healthcare: Past, present and future. Stroke and Vascular Neurology. 2(4): 230–243.
Jiang TX, Zhai SN, Yan J, Li Y, Lu ZQ (2012). Association between hyperlipidemia, diabetes and age-related cataract. International Eyes Science 12(11): 2098–2101.
Kaur A, Gupta V, Christopher A F, Malik M A, Bansal P (2016). Nutraceuticals in prevention of cataract can evidence based approach. Saudi Journal of Ophthalmology 31(1): 30–37.
Moncef K, Rim K, Rupert B, Hans L, Flaxman SR, Jonasl JB, Jill K, Janet L, Kovin N, Konrad P (2015). Number of people blind or visually impaired by cataract worldwide and in world regions, 1990 to 2010. Investigative Ophthalmology & Visual Science 56(11): 6762–6769.
Kleiman R S, Larose E R, Badger J C, Page D, Peissig P L (2018). Using machine learning algorithms to predict risk for development of calciphylaxis in patients with chronic kidney disease. AMIA Summits on Translational Science Proceedings 2018, 139.
Kuppens E V, Van Best J A, Sterk C C (1995). Is glaucoma associated with an increased risk of cataract? British Journal of Ophthalmology 79(7): 649–652.
Lundberg S M, Lee S I (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems: 4765–4774.
Maaten L V D, Hinton G (2008). Visualizing data using t-SNE. Journal of Machine Learning Research 9(Nov): 2579–2605.
Mao W S, Hu T S (1982). An epidemiologic survey of senile cataract in China. Chinese Medical Journal 95(11): 813–818.
Shapley L S, Lloyd S (1951). Notes on the n-Person Game II: The Value of an n-Person Game. Santa Monica, CA: RAND Corporation.
Tang Y, Wang X, Wang J, Huang W, Gao YP (2015). Prevalence and causes of visual impairment in a Chinese adult population: The Taizhou eye study. Ophthalmology 122(7):1480–1488.
Tavani A, Negri E, La Vecchia C (1996). Food and nutrient intake and risk of cataract. Annals of Epidemiology 6(1): 41–46.
Welp A, Woodbury R B, McCoy M A, et al. Understanding the epidemiology of vision loss and impairment in the United States. Making Eye Health A Population Health Imperative: Vision for Tomorrow, National Academies Press (US).
Wilson P, D’Agostino R, Levy D, Bélanger A M, Silbershatz H, Kannel W (1998). Prediction of coronary heart disease using risk factor categories. Circulation 97(18): 1837–1847.
World Health Organization (2014). Facts about blindness and visual impairment.
Xu B, Shi L (2012). Analysis of the importance of early prevention and disease detection of physical examination. Modern Preventive Medicine 39(19): 5033–5034.
Yang X, Li J, Hu D, Chen J, Li Y, Huang J, Liu X, Liu F, Cao J, Shen C (2016). Predicting the 10-year risks of atherosclerotic cardiovascular disease in Chinese population: the China-PAR project (Prediction for ASCVD Risk in China). Circulation 134(19): 1430–1440.
Yeh D Y, Cheng C H, Chen Y W (2011). A predictive model for cerebrovascular disease using data mining. Expert Systems with Applications 38(7): 8970–8977.
Zhang Q, Zhu Z, Meng W, Zhang YY, Xue FZ (2012). Longitudinal monitoring large-scale health check-up data analysis strategy. Journal of Shandong University (Health Sciences) 50(2): 149–156.
Zhang R, Zheng L, Pan G (2015). Application and foundation of disease prediction models. Chinese Journal of Health Statistics 32(4): 724–726.
Zhao Y, Wong Z S Y, Tsui K L (2018). A framework of rebalancing imbalanced healthcare data for rare events’ classification: A case of look-alike sound-alike mix-up incident detection. Journal of Healthcare Engineering:1–11.
Acknowledgments
This work has been supported by the National Key R&D Program of China under Grant No. 2020AAA0103800.
Author information
Authors and Affiliations
Corresponding author
Additional information
Jianqiao Hao is a master student at School of Economics and Management, Tsinghua University, China.
Yongbo Xiao is a professor (with tenure) at School of Economics and Management, Tsinghua University, China. He received his Ph.D. and M.A. in Management Science and Engineering in 2006, and B.E. in Management Information Systems in 2000, all from Tsinghua University. His research interests include revenue and pricing management, service management, supply chain management, and healthcare management. His research papers have been published in international journals including Operations Research, Production and Operations Management, Decision Sciences, Naval Research Logistics, IIE Transactions, etc.
Shudi Du is a Ph.D student at School of Economics and Management, Tsinghua University, China. She received her master degree in Management and Systems from New York University in 2018. Her research interests include public healthcare and smart cities.
Rights and permissions
About this article
Cite this article
Hao, J., Xiao, Y. & Du, S. Physical Examination Data Based Cataract Risk Analysis. J. Syst. Sci. Syst. Eng. 30, 198–214 (2021). https://doi.org/10.1007/s11518-021-5477-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-021-5477-5