Introduction

Intraocular pressure (IOP) is a measure of the fluid pressure within the eye and it is an important marker for many ophthalmological diseases, including glaucoma, which is one of the world’s leading causes of irreversible blindness1. IOP is the result of the balance between the rates of aqueous humor production at the ciliary body and aqueous outflow from the eye through the conventional and uveoscleral pathways. The magnitude of IOP is primarily decided by local factors, such as resistance of the trabecular meshwork and juxtacanalicular connective tissues2,3,4. However, in the conventional pathway, aqueous humor is drained into the Schlemm’s canal and ultimately the episcleral vein2, and thus IOP is also affected by exogenous (systemic) factors, as suggested by a recent study5. Indeed, we recently investigated the associations of various systemic factors with IOP using a dataset from a health examination program database, and it was suggested that some of these were significantly associated with IOP level, including age, percent body fat, systolic blood pressure (SBP), pulse rate, albumin, and hemoglobin A1c (HbA1c)3. The first purpose of the current study was to investigate how much of IOP can be explained using various systemic factors.

It would be beneficial to predict IOP accurately using only systemic factors and without a tonometry at various settings, such as medical check-up, however it is presumed that IOP is not only decided by systemic factors, but also local (ocular) conditions. A fundus photography is one of the most representative and basic ophthalmological measurement. There have been remarkable recent developments in artificial intelligence (AI) and its application to a fundus photography. For instance, Poplin et al. showed that the sex of an individual can be identified from a color fundus photograph using DL with 97% accuracy6. We have also reported that an accurate diagnosis of glaucoma can be achieved, using a similar number of fundus photographs (3132 images) with the current study7,8,9, similar to other recent studies10,11,12,13,14,15,16. These results could imply that useful ophthalmological information can be extracted from a color fundus photograph using DL. The second purpose of the current study was to investigate the accuracy of predicting IOP using fundus photography and deep learning (DL).

Methods

Subjects

The Institutional Review Board of the Shimane University Faculty of Medicine approved this study (IRB No. 20190131-1), which was conducted according to the tenets of the Declaration of Helsinki. Each participant provided Informed Consent. The cohort database included 6272 examinations from 2577 subjects who participated in a health examination system in the Shimane Institute of Health Science17,18 from August 3, 1998, to March 28, 2019. We chose 6519 examinations from 5645 eyes of 2835 subjects from the database who had a complete measurements of: age, sex, height, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), history of diabetes mellitus (DM), history of hypertension (HT), history of hyperlipidemia, past and current smoking habitat, 25 blood examinations (total protein (TP), albumin/globulin ratio (A/G), aspartate aminotransferase (AST), alanine aminotransferase (ALT), guanosine triphosphate (γGTP), alkaline phosphatase (ALP), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), hemoglobin A1c (HbA1c), white blood cell (WBC) count, red blood cell (RBC) count, hemoglobin (Hb), hematocrit (Ht), platelet (Plt) count, fibrinogen, blood urea nitrogen (BUN), creatinine (Cre), sodium (Na), potassium (K), chlorine (Cl), calcium (Ca), uric acid (UA), and amylase), IOP, and a color fundus photograph. BMI was calculated as body weight (kg) divided by the square of the body height (m). Experienced laboratory technicians measured IOP using a non-contact tonometer (Full Auto Tonometer TX-F, Canon Incorporated, Tokyo, Japan). Color fundus photographs were obtained using a non-mydriatic fundus camera with a 45 view-angle (before December 2012 using CR6-45NM, Canon, Tokyo, Japan, and after January 2013 using CR-2, Canon).

Training and validation datasets

All of the measurements obtained by December 31, 2016 were assigned to the training dataset, which consisted of 3883 examinations from 3883 eyes of 1945 subjects. A validation dataset was also prepared for the purpose of DL parameter tuning, using data obtained between January 1, 2017 and December 31, 2017 (454 examinations from 454 eyes from 229 subjects).

Testing dataset

The testing dataset was prepared using data obtained between January 1, 2017 and December 31, 2017 (289 examinations from 289 eyes from 146 subjects). There was no overlap among the three datasets.

DL model to predict IOP from fundus photography

We adopted a type of convolutional neural network (CNN) known as ResNet6 to predict IOP from fundus photographs, following our previous studies in which a diagnosis of glaucoma was predicted from fundus photographs7,8,19. Unlike the simple CNN, ‘identical skip connections’ that skip one or more layers are used in ResNet and features are propagated to succeeding layers, which is well-known to be useful for image classification and feature extraction. This is because it enables ResNet to facilitate a deeper and larger network, which is helpful to acquire more effective and conceptual features without overfitting. In the current study, a ResNet model with 18 layers was pre-trained with the ImageNet classification20. This methodology is inspired by recent successes in fine-tuning deep neural networks21, whereby parameters of a network are first derived in a different but large pre-training dataset and then used to initialize training in a new and smaller training dataset. We attempted further improvements of the model by applying image augmentation of the training data22: all of the images in the training; dataset were horizontally flipped. The last fully-connected layer in ResNet was used to output the predicted value of IOP. Left eyes were mirror imaged to right eyes. Details of the parameters used in ResNet were: learning rate: 0.01, batch size: 100, damping capacity: 0.9 and weight decay: 0.0001.

Models to predict IOP from systemic variables

First, using the training dataset, a multivariate linear regression model (MLM) was built to predict IOP using 35 variables (age, sex, height, BMI, SBP, DBP, history of DM, history of HT, history of hyperlipidemia, past and current smoking habitat, 25 blood examinations). Using this model, IOP values in the testing dataset were predicted, and the absolute prediction error was calculated. A number of other prediction models were also constructed using the following machine learning methods: (1) support vector machine (SVM)23, (2) Random Forest (RF)24, and (3) least absolute shrinkage and selection operator regression (LASSO)25,26. Support vector machine performs regression in a latent space (kernel space) to yield an accurate prediction even in a non-linear regression. Random Forest consists of many decision trees (regression trees), and outputs the averaged value from all individual trees. Each tree is constructed using a different bootstrap sample from the original data (bootstrapping is repeated sampling until the original sample size is reached, allowing duplication). In LASSO, the sum of the absolute values of the regression coefficients is constrained or penalized, so that the final model gives an accurate prediction. The details of each method follow.

  1. 1.

    Support vector machine: radial basis function, penalty parameter = 1.0

  2. 2.

    Random forest: number of trees = 10,000, criterion = Gini index, minimum number of samples required to split an internal node = 2, the minimum number of samples required to be at a leaf node = 1

  3. 3.

    LASSO: optimum lambda value was decided the minimum prediction error with the leave-one cross validation within the training dataset.

Subsequently, using these models, IOP values in the testing dataset were predicted, and absolute prediction errors were calculated.

Statistical analysis

Absolute prediction errors were compared using the linear mixed model whereby values were nested within patients. The linear mixed model adjusts for the hierarchical structure of the data, modeling in a way in which measurements are grouped within subjects to reduce the possible bias derived from the nested structure of data27,28.

Furthermore, the association between the predicted IOP values and actual IOP values in the testing dataset was calculated using the correlation coefficient. Again, considering the nested structure of the current dataset, the association was also calculated using the marginal R-squared (mR2) value with the linear mixed model, following a method proposed by Nakagawa and Holger29.

Results

The characteristics of the 1569 study subjects (819 men, 52%; 750 women, 48%; mean age, 62.2 ± 8.7 years; range 27–92 years) are summarized in Table 1. The mean IOP was 12.8 ± 3.0 mmHg (range 7–33.1 mmHg) in the right eye and 12.8 ± 3.0 mmHg (range 7–33.8 mmHg) in the left eye.

Table 1 Subjects' demographic data.

The results of univariate analyses between various systemic parameters and the IOP are summarized in Table 2. Among 35 parameters, 28 parameters showed significant association with IOP when not adjusted for age and sex (p < 0.05). When adjusted for age and sex, 23 (among 33) parameters showed significant association with IOP.

Table 2 Results of univariate analyses between IOP and various systemic parameters.

The absolute prediction error with each method is shown in Table 3.

Table 3 The absolute prediction error with each method.

Table 4 shows the results of the MLM obtained with the training dataset. Among the 35 parameters, 11 showed a significant association with IOP p < 0.05), including Height, BMI, Age, sex, smoking habitat, TP, HbA1c and SBP.

Table 4 Result of MLM obtained with the training dataset.

The mean squared error, for the DL model, with the validation dataset saturated at < 100 epochs, as shown in Fig. 1. The predicted IOP values were derived from epoch = 100. The relationship between the predicted IOP values with each prediction method and actual IOP value is shown in Fig. 2a–e, using the Bland Altman plot. The correlation coefficient and mR2 values of these variables are shown in Table 5. Significant correlations were observed between IOP and the predicted IOP values with MLM, LASSO, SVM, and RF (p < 0.05), but not with the DL model using color fundus photographs (p = 0.16 or 0.17). There was a significant association between (difference between predicted IOP and actual IOP) and (mean of predicted IOP and actual IOP) with all models (p < 0.001).

Figure 1
figure 1

The mean squared error, of the DL model, with the validation dataset at each epoch. The mean squared error saturated at < 100 epochs.

Figure 2
figure 2

The relation between the predicted IOP values with each prediction method and actual IOP value, shown as a Bland–Altman plot. (a) MLM, (b) LASSO, (c) SVM, (d) RF, (e) DL. Data was shown as a smoothed scatter plot. MLM multivariate linear regression, LASSO least absolute shrinkage and selection operator regression, SVM support vector machine, RF random forest, DL deep learning.

Table 5 The correlation coefficient and mR2 values of these variables.

The absolute error associated with MLM is illustrated in Fig. 3.

Figure 3
figure 3

Histogram of absolute prediction error associated with MLM model. MLM multivariate linear regression.

Discussion

In the current study, IOP was predicted using a variety of modelling methods and different data. A considerably more accurate prediction of IOP was achieved using a MLM of systemic variables (mean absolute error = 2.29 dB and mR2 = 0.15) compared to a DL model with color fundus photography (mean absolute error = 2.70 dB and mR2 = 0.0066). Machine learning methods (LASSO, SVM and RF) did not improve prediction accuracy.

The MLM included 11 variables that were significantly correlated with IOP. We recently reported that several systemic factors were associated with IOP level, including age, percent body fat, SBP, pulse rate, albumin, and HbA1c30. We observe that older age, higher SBP, and higher HbA1c were again significantly associated with increased IOP. The effect of age on IOP is controversial. Previous cross-sectional studies from Italy31 and the United States32,33 suggested a significant positive association between age and IOP, however, the inverse effect has also been reported in cross-sectional or longitudinal studies from other countries, mainly in Asia, including Japan34,35,36,37,38,39. The current study—conducted in Japan—also suggested a negative association between age and IOP. The significant positive correlation between higher SBP and IOP is in agreement with other previous studies33,35,37,38,39,40,41,42,43,44,45,46, where the mechanism has been speculated as an increased filtration fraction of the aqueous humor through elevated ciliary artery pressure, increased serum corticoids and also sympathetic tone result in elevated IOP47,48. The association between HbA1c and IOP is also in agreement with previous studies33,34,35,37,39,42,43,44,46,47,49,50. Several mechanisms have been reported for obesity to be associated with increasing IOP, such as sympathetic hyperactivation, increased corticosteroid, excessive intraorbital adipose tissue, increases in blood viscosity with high hemoglobin and hematocrit values, increased episcleral venous pressure, a consequent decrease in the facility of aqueous outflow also transitory elevations in IOP resulting from breath-holding and thorax compression while tonometry is performed during slit-lamp examinations in obese patients47,51,52,53,54. Our previous study suggested percent body fat is associated with increased IOP, whereas this was the case for BMI in the current study. Smoking status was significantly associated with elevated IOP, agreeing with a previous study55.

It is widely acknowledged that ordinary statistical models, such as linear or binomial logistic regression, may be over-fitted to the original sample, especially when the number of predictor variables is large. We have reported on the usefulness of applying machine learning methods for many applications, including diagnosing glaucoma from optical coherence tomography measurements56,57,58,59, predicting vision related quality of life60, and VF progression61,62,63, compared to ordinal linear or logistic regression. Nonetheless, in the current study, there was no improvement in the prediction accuracy of machine learning methods compared to the MLM. This may be because of the size of the training dataset was quite large (5540 examinations) and therefore overfitting was less of a problem. Despite the significant association between predicted IOP and true IOP, only a moderate mR2 value was obtained (up to 0.15). Coefficient of determination value represents how much of the data is explained by the model. Correlation coefficient is identical to the square root of coefficient of determination value. The mR2 value shows how much of the data can be explained by the fixed effect in the linear mixed model. Hence, the current results suggested that approximately 15% of IOP was explained by MLM and other machine learning models. In other words, our results suggested IOP can be only partially explained by systemic factors, and the remaining part may only be described locally (using measurements from the eye). As shown in the Bland–Altman plots (Fig. 2), the distribution of the difference between the predicted and actual IOP values were not horizontal, and correlated with the mean of these values. This is because the prediction accuracy was relatively poor and the predicted values were relatively constant regardless of the actual IOP value. Furthermore, although it has been suggested that the Random Forests method is more useful than other machine learning methods64,65,66, this merit was not observed compared to other machine learning methods in the current study. These finding would also support that IOP can be only partially explained by systemic factors, and the predictability cannot be considerably improved by merely applying machine leaning methods, such as the Random Forests.

A recent study revealed that DL could discriminate sex from fundus photography with very high accuracy6. In contrast, we recently suggested that the discrimination of sex can be achieved, at least to some extent (AUC = 77.9%), using a ‘visible’ machine learning method (LASSO) with clinically meaningful variables such as color intensities, tessellation, and also geometrical information of the optic disc and retinal vessels. As a result, it was implied that the DL model learned a principle to discriminate sex from color fundus photographs. On the other hand, the current study suggested that DL was not accurate to predict IOP from fundus photographs since there only a poor association (mR2 = 0.0066) was observed between the IOP predicted from this approach and actual IOP. We attempted other DL methods, instead of ResNet18 (VGG1667 and Inception-v368), however, results were not improved (data not shown in “Result”). This may suggest little valuable information is present in color fundus photography regarding IOP. This study included a fairly large training dataset, however, it was much smaller compared to other representative datasets for DL, such as ImageNet (14,000,000 images)20 and CIFAR10 (60,000 images, https://www.cs.toronto.edu/~kriz/cifar.html), although we have recently suggested the diagnosis of glaucoma, using color fundus photographs and DL, can be achieved with an even smaller sample size (N = 3132)7,8,9. Better results might be observed if DL was applied to a larger dataset. The current results suggested that IOP can only be partially explained using systemic factors (15%; as suggested by the mR2 value) or color fundus photography with DL (0.66%), which implies we need to continue to conduct IOP measurement using a tonometry. The merit of accurately predicting systemic factors using a color fundus photograph, such as shown in69, cannot be overestimated, such as medical check up in developing countries without tonometry. This is in particular true with a smart-phone base fundus photography, since recent studies have suggested that the usefulness of a deep learning-assisted program to screen for retinal diseases using a smartphone70,71.

The current study had several limitations, the first of which was the use of non-contact tonometry, which is generally believed to be less reliable than Goldmann applanation tonometry (the repeatability coefficient with non-contact tonometry has been reported as ± 3.2 mmHg, whereas that with Goldmann applanation tonometry was between ± 2.2 and 2.5 mmHg)72,73 although IOP is usually measured using the non-contact tonometry in a health examination outside eye clinics. Further, there was an absence of central corneal thickness measurements that are known to induce measurement errors during tonometry74,75 In addition, the usefulness of applying DL to color fundus photography in glaucomatous eyes should also be investigated in a future study. The current study consisted of a health examination cohort, and hence the vast majority cases had normal IOP values. A further study is needed to investigate whether the current approach is more useful in eyes with higher IOP values. In particular it should be further investigated that whether DL enables more accurate prediction of IOP using a larger dataset.

In conclusion, the current study, using a health examination cohort, suggested that IOP cannot be adequately predicted from clinical parameters or retinal photographs even using state-of-art ML techniques. Further investigation with DL using a larger amount of data would be needed.