Abstract
Previous studies have shown controversial results about the role of androgens in coronary artery disease (CAD). We performed this study to examine and compare the relationship between androgenic hormones and CAD using conventional linear statistical techniques as well as novel non-linear approaches. The study was conducted on 502 consecutive men who were referred for selective coronary angiography at Tehran Heart Center due to different indications. We studied the relationship between androgenic hormones and CAD by using the generalized linear models, generalized additive models, and neural networks. Free testosterone (fT), total testosterone (tT) and dehydroepiandrosterone sulfate levels in patients with significant CAD versus normal individuals were 6.69 ± 3.20 pg/ml, 16.60 ± 6.66 nm/l, and 113.38 ± 72.9 μg/dl versus 7.12 ± 3.58 pg/ml, 15.82 ± 7.26 nm/l, and 109.03 ± 68.19 μg/dl, respectively (P > 0.05). The Generalized linear models was unable to show any significant relationship between androgenic hormones and CAD, while generalized additive model and neural networks supported the significant effect of androgenic hormones on CAD. This finding suggests a nonlinear association of tT levels with CAD: lower levels have a preventive effect on CAD, whereas higher values increase the risk of CAD. Emphasizing the non-linearity of the variables may provide new insight into the possible explanation of the effect of androgenic hormones on CAD.
Similar content being viewed by others
Abbreviations
- AIC:
-
Akaike information criteria
- ANOVA:
-
Analysis of variance
- BMI:
-
Body mass index
- BIC:
-
Bayesian information criteria
- CAD:
-
Coronary artery disease
- CHOL:
-
Cholesterol
- CRP:
-
C-reactive protein
- DHEAS:
-
Dehydroepiandrosterone sulfate
- DM:
-
Diabetes mellitus
- EF:
-
Ejection fraction
- ELISA:
-
Enzyme-linked immunosorbent assay
- fT:
-
Free testosterone
- GLM:
-
Generalized linear models
- HDL:
-
High density lipoprotein
- HTN:
-
Hypertension
- LDL:
-
Low density lipoprotein
- Lp(a):
-
Lipoprotein(a)
- MLP:
-
Multi-layer perceptron
- MSE:
-
Mean square error
- ROC:
-
Receiver operating characteristic
- SCA:
-
Selective coronary angiography
- SD:
-
Standard deviation
- SLP:
-
Single layer perceptron
- TC:
-
Total cholesterol
- TGs:
-
Triglycerides
- tT:
-
Total testosterone
- edf:
-
Equally degree of freedom
References
Liu PY, Death AK, Handelsman DJ. Androgens and cardiovascular disease. Endocr Rev. 2003;24:313–40. doi:10.1210/er.2003-0005.
Callies F, Stromer H, Schwinger RH, et al. Administration of testosterone is associated with a reduced susceptibility to myocardial ischemia. Endocrinology. 2003;144:4478–83. doi:10.1210/en.2003-0058.
Channer KS, Jones TH. Cardiovascular effects of testosterone: implications of the “male menopause”? Heart. 2003;89:121–2. doi:10.1136/heart.89.2.121.
Dobs AS, Bachorik PS, Arver S, et al. Interrelationships among lipoprotein levels, sex hormones, anthropometric parameters, and age in hypogonadal men treated for 1 year with a permeation-enhanced testosterone transdermal system. J Clin Endocrinol Metab. 2001;86:1026–33. doi:10.1210/jc.86.3.1026.
Malkin CJ, Pugh PJ, Jones TH, Channer KS. Testosterone for secondary prevention in men with ischaemic heart disease? QJM. 2003;96:521–9. doi:10.1093/qjmed/hcg086.
Manson JE, Bassuk SS, Harman SM, et al. Postmenopausal hormone therapy: new questions and the case for new clinical trials. Menopause. 2006;13:139–47. doi:10.1097/01.gme.0000177906.94515.ff.
Costarella CE, Stallone JN, Rutecki GW, Whittier FC. Testosterone causes direct relaxation of rat thoracic aorta. J Pharmacol Exp Ther. 1996;277:34–9.
Deenadayalu VP, White RE, Stallone JN, Gao X, Garcia AJ. Testosterone relaxes coronary arteries by opening the large-conductance, calcium-activated potassium channel. Am J Physiol Heart Circ Physiol. 2001;281:H1720–7.
English KM, Jones RD, Jones TH, Morice AH, Channer KS. Testosterone acts as a coronary vasodilator by a calcium antagonistic action. J Endocrinol Invest. 2002;25:455–8.
Malkin CJ, Pugh PJ, Jones RD, Jones TH, Channer KS. Testosterone as a protective factor against atherosclerosis-immunomodulation and influence upon plaque development and stability. J Endocrinol. 2003;178:373–80. doi:10.1677/joe.0.1780373.
Wu FC, von Eckardstein A. Androgens and coronary artery disease. Endocr Rev. 2003;24:183–217. doi:10.1210/er.2001-0025.
Yue P, Chatterjee K, Beale C, Poole-Wilson PA, Collins P. Testosterone relaxes rabbit coronary arteries and aorta. Circulation. 1995;91:1154–60.
Kamischke A, Heuermann T, Kruger K, et al. An effective hormonal male contraceptive using testosterone undecanoate with oral or injectable norethisterone preparations. J Clin Endocrinol Metab. 2002;87:530–9. doi:10.1210/jc.87.2.530.
Zitzmann M, Nieschlag E. Testosterone levels in healthy men and the relation to behavioural and physical characteristics: facts and constructs. Eur J Endocrinol. 2001;144:183–97. doi:10.1530/eje.0.1440183.
Davoodi G, Amirezadegan A, Borumand MA, Dehkori MR, Kazemisaeid A, Yaminisharif A. The relationship between level of androgenic hormones and coronary artery disease in men. Cardiovasc J Afr. 2007;18:362–6.
Bishop CM. Pattern recognition and machine learning: Springer, 2006.
Faraggi D, Simon R. The maximum likelihood neural network as a statistical classification model. J Stat Plan Inference. 1995;46:93–104. doi:10.1016/0378-3758(95)99068-2.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2001.
Ripley BD. Pattern recognition and neural networks. Cambridge: Cambridge University Press; 1996.
Ripley RM, Harris AL, Tarassenko L. Non-linear survival analysis using neural networks. Stat Med. 2004;23:825–42. doi:10.1002/sim.1655.
Friedewald WT, Levy RI, Fredrickson DS. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem. 1972;18:499–502.
Judkins MP. Selective coronary arteriography. I. A percutaneous transfemoral technic. Radiology. 1967;89:815–24.
Gensini GG. A more meaningful scoring system for determining the severity of coronary heart disease. Am J Cardiol. 1983;51:606. doi:10.1016/S0002-9149(83)80105-2.
Pollak A, Rokach A, Blumenfeld A, Rosen LJ, Resnik L, Dresner Pollak R. Association of oestrogen receptor alpha gene polymorphism with the angiographic extent of coronary artery disease. Eur Heart J. 2004;25:240–5. doi:10.1016/j.ehj.2003.10.028.
Pastor R, Guallar E. Use of two-segmented logistic regression to estimate change-points in epidemiologic studies. Am J Epidemiol. 1998;148:631–42.
Funahashi K. On the approximate realization of continuous mapping by neural networks. Neural Netw. 1989;2:183–92. doi:10.1016/0893-6080(89)90003-8.
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2:359–66. doi:10.1016/0893-6080(89)90020-8.
Mathieson MJ. Ordinal models for neural networks. In neural networks in financial engineering. In: Refences A-P, Abu-Mostafa Y, Moody J, Weigend A, editors. Proceedings of the Third International Conference on Neural Networks in the Capital Markets. Singapore: World Scientific; 1996. p. 523–36.
Nabney IT. Netlab: algorithms for pattern recognition. London: Springer; 2001.
Pearlmutter BA. Fast exact multiplication by the Hessian. Neural Comput. 1994;6:147–60. doi:10.1162/neco.1994.6.1.147.
Fallah N, Faghihzadeh S, Mahmoudi M. Comparing and Contrasting Fuzzy Min-Max Neural Network with the Classical Statistical Clustering Methods in classification of Rickets Disease. Bulletin of the 53rd session of the International Statistical Institute. 2001;2:445–6.
Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med. 1991;115:843–8.
Kemeny V, Droste DW, Hermes S, et al. Automatic embolus detection by a neural network. Stroke. 1999;30:807–10.
Das A, Ben-Menachem T, Cooper GS, et al. Prediction of outcome in acute lower-gastrointestinal haemorrhage based on an artificial neural network: internal and external validation of a predictive model. Lancet. 2003;362:1261–6. doi:10.1016/S0140-6736(03)14568-0.
Vijaya G, Kumar V, Verma HK. ANN-based QRS-complex analysis of ECG. J Med Eng Technol. 1998;22:160–7.
Song X, Mitnitski A, MacKnight C, Rockwood K. Assessment of individual risk of death using self-report data: an artificial neural network compared with a frailty index. J Am Geriatr Soc. 2004;52(7):1180–4. doi:10.1111/j.1532-5415.2004.52319.x.
Song X, Mitnitski A, Cox J, Rockwood K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Medinfo. 2004;11:736–9.
Penedo MG, Carreira MJ, Mosquera A, Cabello D. Computer-aided diagnosis: a neural-network-based approach to lung nodule detection. IEEE Trans Med Imaging.. 1998;17:872–80. doi:10.1109/42.746620.
Izenberg SD, Williams MD, Luterman A. Prediction of trauma mortality using a neural network. Am Surg. 1997;63:275–81.
Li YC, Liu L, Chiu WT, Jian WS. Neural network modeling for surgical decisions on traumatic brain injury patients. Int J Med Inform. 2000;57:1–9. doi:10.1016/S1386-5056(99)00054-4.
Grigsby J, Kooken R, Hershberger J. Simulated neural networks to predict outcomes, costs, and length of stay among orthopedic rehabilitation patients. Arch Phys Med Rehabil. 1994;75:1077–81. doi:10.1016/0003-9993(94)90081-7.
Tu JV, Guerriere MR. Use of a neural network as a predictive instrument for length of stay in the intensive care unit following cardiac surgery. Comput Biomed Res. 1993;26:220–9. doi:10.1006/cbmr.1993.1015.
Nguyen T, Malley R, Inkelis S, Kuppermann N. Comparison of prediction models for adverse outcome in pediatric meningococcal disease using artificial neural network and logistic regression analyses. J Clin Epidemiol. 2002;55:687–95. doi:10.1016/S0895-4356(02)00394-3.
Dorsey SG, Waltz CF, Brosch L, Connerney I, Schweitzer EJ, Bartlett ST. A neural network model for predicting pancreas transplant graft outcome. Diabetes Care. 1997;20:1128–33. doi:10.2337/diacare.20.7.1128.
Buscema M, Grossi E, Snowdon D, Antuono P. Auto-contractive maps: an artificial adaptive system for data mining, an application to Alzheimer disease. Curr Alzheimer Res. 2008;5:481–98. doi:10.2174/156720508785908928.
Rossini PM, Buscema M, Capriotti M, Grossi E, Rodriguez G, Del Percio C, et al. Is it possible to automatically distinguish resting EEG data of normal elderly vs. mild cognitive impairment subjects with high degree of accuracy? Clin Neurophysiol. 2008;119:1534–45. doi:10.1016/j.clinph.2008.03.026.
Allore H, Tinetti ME, Araujo KL, Hardy S, Peduzzi P. A case study found that a regression tree outperformed multiple linear regression in predicting the relationship between impairments and social and productive activities scores. J Clin Epidemiol. 2005;58:154–61. doi:10.1016/j.jclinepi.2004.09.001.
DiRusso SM, Chahine AA, Sullivan T, et al. Development of a model for prediction of survival in pediatric trauma patients: comparison of artificial neural networks and logistic regression. J Pediatr Surg. 2002;37:1098–104. discussion 1098–104. doi:10.1053/jpsu.2002.33885.
Eftekhar B, Mohammad K, Ardebili HE, Ghodsi M, Ketabchi E. Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data. BMC Med Inform Decis Mak. 2005;5:3. doi:10.1186/1472-6947-5-3.
Kattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res. 1998;31:363–73. doi:10.1006/cbmr.1998.1488.
Costanza MC, Paccaud F. Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models. BMC Med Res Methodol. 2004;4:7. doi:10.1186/1471-2288-4-7.
Marble RP, Healy JC. A neural network approach to the diagnosis of morbidity outcomes in trauma care. Artif Intell Med. 1999;15:299–307. doi:10.1016/S0933-3657(98)00059-1.
Flouris AD, Duffy J. Application of artificial intelligence systems in the analysis of epidemiological data. Eur J Epidemiol. 2006;21:167–70. doi:10.1007/s10654-006-0005-y.
Grassi M, Villani S, Marinoni A. Classification methods for the identification of `case’ in epidemiological diagnosis of asthma. Eur J Epidemiol. 2001;17(1):19–29. doi:10.1023/A:1010987521885.
Wolfe R, McKenzie DP, Black J, Simpson P, Gabbe BJ, Cameron PA. Models developed by three techniques did not achieve acceptable prediction of binary trauma outcomes. J Clin Epidemiol. 2006;59:26–35. doi:10.1016/j.jclinepi.2005.05.007.
Acknowledgments
This study was sponsored by Tehran University/Medical Sciences. Part of this work was carried out during a sabbatical year (student visitor period) of Nader Fallah at the Mathematics and Statistics Department of Dalhousie University and University of British Columbia. Authors wish to thank R. Ripley for her help and I. Nabney for his Matlab Code. The authors thank Janet Brush, Parveer Pannu, and Catherine Pretty for editing this manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Generalized additive models
Generalized additive models and generalized linear models can be applied in similar situations, but they serve different analytic purposes. Generalized linear models emphasize estimation and inference for the parameters of the model, while generalized additive models focus on exploring data non-parametrically. Therefore, this model could find nonlinearity relation between predictor and response variables. Generalized additive models permit the response probability distribution to be any exponential family of distributions. Many widely used statistical models belong to this general class, including additive models for Gaussian data, binary data, and nonparametric log-linear models for Poisson data.
Suppose that Y is a response random variable and X 1, …, X p is a set of predictor variables. A regression procedure can be viewed as a method for estimating how the value of Y depends on the values of X 1, …, X p . Given a sample of values for Y and X, estimates of β0, β1, …, β n are often obtained by the least squares method. In regression models the effects of prognostic factors x i in terms of a linear predictor of the form \( \sum {x_{j} \beta_{j} } \), where the β j are parameters. The generalized additive model replaces \( \sum {x_{j} \beta_{j} } \) with \( \sum {f_{j} (x_{j} )} \) where f j is a unspecified (non-parametric) function. This function is estimated in a flexible manner using a scatter plot smoother. The estimated function \( \hat{f}_{j} (x_{j} ) \) can reveal possible non linearity in the effect of the x j . Suppose y is a response or outcome variable, and x is a prognostic factor. We wish to fit a smooth curve f(x) that summarizes the dependence of y on x. If we were to find the curve that simply minimizes \( \sum {(y_{i} - f(x_{i} ))} \), the result would be an interpolating curve that would not be smooth at all. The cubic spline smoother imposes smoothness on f(x). We seek the function f(x) that minimizes
Notice that \( \int {f^{\prime\prime}(x)^{2} } \) measures the “Curvature’’ of the function f: λ is a non-negative smoothing parameter that must be chosen by the data analyst. The λ has a direct relation with degree of freedom. Larger values of λ force f to be smoother. More detailed have been presented elsewhere [18].
Appendix 2
Artificial neural networks
Neural network
A neural network is a set of interconnected simple processing elements called neurons, where each connection has an associated weight. The neuron or unit processes its inputs to create an output. The network consists of a number of input units representing the predictors, one or more output units corresponding to the predicted variables and possibly some internal units to increase the model complexity or flexibility. The weights associated with the interconnections between the units will be optimized in fitting the model to the data. The most commonly used form of neural network is the multi-layer perceptron (MLP). A MLP consists of one input layer of units, one output layer of units and possibly one or more layers of ‘hidden’ units. The input units pass their inputs to the units in the first hidden layer or directly to the output units. Each of the hidden layer units adds a constant (termed as ‘bias’) to a weighted sum of its inputs and calculates an activation function \( \phi_{h} \) of the result. This is then passed to hidden units in the next layer or to the output unit(s). The activation function is usually chosen in advance. Common choices of the activation function include the logistic function, tangent hyperbolic function or other monotonic functions. In this paper we fix the activation function as tangent hyperbolic function. The output units apply a linear, logistic, thresholds or other function \( \phi_{0} \) to the weighted sum of their inputs plus its ‘bias’. In this paper we use exponential function in output layer.
Denote the inputs as \( x_{i} \)’s and the outputs \( t_{k} \)’s, for MLP with one hidden layer
If we have only one output node, k will be equal to one. The weights can be determined by optimizing some proper criterion function such as minimizing the sum of squared errors of the predicted variable or maximizing the log-likelihood of the data in cases where a distribution of the response variable can be assumed. The structure of MLP made it possible to fit very general non-linear functional relationships between inputs and outputs. Research results have shown that neural networks with enough hidden units can approximate any arbitrary functional relationships [26, 27]. However, over-fit can be a serious problem in such a framework. This problem usually is overcome either by stopping the optimization early or more often by using regularization techniques to penalize the optimization criterion. By adding a penalty term to the optimization criterion, the estimates of the weights will be shrunk which is also termed as shrinkage method. The following smoothness penalty is often used in shrinkage method:
This process is also known as weight decay in neural network literatures. The tuning parameter λ can be chosen by cross-validation. For fixed number of hidden units, we minimize this penalized log-likelihood in (3) to get the weights estimated. To control the complexity of the model due to the number of the hidden units, criteria such as AIC and BIC are used.
Optimization criteria
Given a training set comprising a set of input vectors\( \left\{ {x_{n} } \right\} \), where n = 1, …, N, together with the corresponding target vector \( \left\{ {y_{n} } \right\} \), if we assume that data points \( y_{n} \) (n = 1, …, N) are independent conditional on \( x_{n} \), the likelihood function can be written as:
or
The error function can be defined as the negative log-likelihood:
Linear and logistic regression
For regression problems with normality assumption, this can be reduced to the most commonly used squared error criterion:
For classification problems, it is often advantageous to associate the network outputs to the posterior probabilities of each class. For a problem with two classes (such as normal and CAD), the target variable \( \left\{ {y_{n} } \right\} \) is binary and can be assumed to follow binomial distribution with its probability as \( t_{n} (x_{n} ;w) \). The error function in (5) then yields the cross-entropy error function:
This definition extended to other generalized linear models (GLM) by other researcher such as in multinomial logistic regression and ordinal logistic regression or Cox regression for survival models [16–20, 26–30]. We will consider the Poisson regression in the following.
Poisson regression
Suppose we have a single target variable with count response, we consider the non-linear Poisson regression for neural networks as an extension of generalized linear models. It seems this model has not been introduced in literatures before.
The Poisson probability distribution for count data is given by:
In linear Poisson regression, the most commonly used formulation is the log-linear link function: \( \ln \lambda_{n} = x^{\prime}_{n} \beta \). Thus the expected value for \( y_{n} \) is given by \( E\left[ {y_{n} |x_{n} } \right] = \lambda_{n} = e^{{x_{n}^{\prime } \beta }} . \)
Here we model \( \lambda_{n} \) as a function of \( x_{n} \) by an MLP neural network:
where \( \phi_{0} \) is fixed as an exponential function.
Substituting Poisson probability function in (5) and using (9) as Poisson means, the negative log-likelihood criterion can be obtained as:
Eliminating the last term which is not related to the model fitting, we have:
Model fitting
We compare the performances of different models using simulations. Likelihood error criterion functions such as that in (11) are used to fit models with fixed number of units in hidden layer. To guard against over-fitting, a penalized version of (11) given as below is used in the non-linear model fitting.
For each neural networks model, to identify the number of units in hidden layer, both criteria Akaike Information Criterion (AIC) and Schwarz Bayesian Information Criterion (BIC) are calculated:
where m is the number of the estimated parameters and N is the number of the observations. The model with the smallest value of the information criterion is considered to be the best. However, it should be noticed that in our neural network model fittings, for each setting of fixed number of hidden units, the negative log-likelihood score we get is suboptimal since the weights are optimized on a penalized version of (11). We thus can only get approximations of the AIC and BIC values. We also calculated MSE for testing set as a reference measure for accuracy, where MSE is defined as
The predictions by different models are ranked by MSE.
The models considered include 2, 3, 4, 5, 10, 20 hidden units. To save the computation time, the weight decay parameter is pre-fixed at 0.012 in our computation. This value is chosen based on some empirical study for different choices of weight decay parameter.
Error gradient calculation
Back propagation is a general computing technique to fit parameters in MLP. The computation involves the numerical evaluation of derivatives of the error function with respect to the weights and biases. The general form of back propagation is described elsewhere [18]. Here we use a special algorithm based on the article by Pearlmutter [30] for computation of Hessian Matrix, similar to Nabney [29] approach. The scaled conjugate gradient algorithm is used for optimization. The code is written in R 2.5 and Matlab 7.2.
Clinical application
Reports in medical literature suggest that neural network models are applicable in diagnosing such as ricket disease [31] myocardial infarction [32] pulmonary emboli [33] and gastrointestinal hemorrhage [34], using waveform analysis of EKGs [35], prediction of health outcome [36, 37], and radiographic images [38]. Neural networks have also been successfully applied in clinical outcome prediction of trauma mortality [39], surgical decision making on traumatic brain injury patients [40], recovery from surgery [41, 42], pediatric meningococcal disease [43], transplantation outcome [44] Alzheimer’s [45] and Dementia [46]. In addition some more technical comparison between statistical methods and artificial intelligence techniques for medical data exist [45–55].
Rights and permissions
About this article
Cite this article
Fallah, N., Mohammad, K., Nourijelyani, K. et al. Nonlinear association between serum testosterone levels and coronary artery disease in Iranian men. Eur J Epidemiol 24, 297–306 (2009). https://doi.org/10.1007/s10654-009-9336-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-009-9336-9