Skip to main content

Advertisement

Log in

Nonlinear association between serum testosterone levels and coronary artery disease in Iranian men

  • Cardiovascular Disease
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

Previous studies have shown controversial results about the role of androgens in coronary artery disease (CAD). We performed this study to examine and compare the relationship between androgenic hormones and CAD using conventional linear statistical techniques as well as novel non-linear approaches. The study was conducted on 502 consecutive men who were referred for selective coronary angiography at Tehran Heart Center due to different indications. We studied the relationship between androgenic hormones and CAD by using the generalized linear models, generalized additive models, and neural networks. Free testosterone (fT), total testosterone (tT) and dehydroepiandrosterone sulfate levels in patients with significant CAD versus normal individuals were 6.69 ± 3.20 pg/ml, 16.60 ± 6.66 nm/l, and 113.38 ± 72.9 μg/dl versus 7.12 ± 3.58 pg/ml, 15.82 ± 7.26 nm/l, and 109.03 ± 68.19 μg/dl, respectively (P > 0.05). The Generalized linear models was unable to show any significant relationship between androgenic hormones and CAD, while generalized additive model and neural networks supported the significant effect of androgenic hormones on CAD. This finding suggests a nonlinear association of tT levels with CAD: lower levels have a preventive effect on CAD, whereas higher values increase the risk of CAD. Emphasizing the non-linearity of the variables may provide new insight into the possible explanation of the effect of androgenic hormones on CAD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Abbreviations

AIC:

Akaike information criteria

ANOVA:

Analysis of variance

BMI:

Body mass index

BIC:

Bayesian information criteria

CAD:

Coronary artery disease

CHOL:

Cholesterol

CRP:

C-reactive protein

DHEAS:

Dehydroepiandrosterone sulfate

DM:

Diabetes mellitus

EF:

Ejection fraction

ELISA:

Enzyme-linked immunosorbent assay

fT:

Free testosterone

GLM:

Generalized linear models

HDL:

High density lipoprotein

HTN:

Hypertension

LDL:

Low density lipoprotein

Lp(a):

Lipoprotein(a)

MLP:

Multi-layer perceptron

MSE:

Mean square error

ROC:

Receiver operating characteristic

SCA:

Selective coronary angiography

SD:

Standard deviation

SLP:

Single layer perceptron

TC:

Total cholesterol

TGs:

Triglycerides

tT:

Total testosterone

edf:

Equally degree of freedom

References

  1. Liu PY, Death AK, Handelsman DJ. Androgens and cardiovascular disease. Endocr Rev. 2003;24:313–40. doi:10.1210/er.2003-0005.

    Article  PubMed  CAS  Google Scholar 

  2. Callies F, Stromer H, Schwinger RH, et al. Administration of testosterone is associated with a reduced susceptibility to myocardial ischemia. Endocrinology. 2003;144:4478–83. doi:10.1210/en.2003-0058.

    Article  PubMed  CAS  Google Scholar 

  3. Channer KS, Jones TH. Cardiovascular effects of testosterone: implications of the “male menopause”? Heart. 2003;89:121–2. doi:10.1136/heart.89.2.121.

    Article  PubMed  CAS  Google Scholar 

  4. Dobs AS, Bachorik PS, Arver S, et al. Interrelationships among lipoprotein levels, sex hormones, anthropometric parameters, and age in hypogonadal men treated for 1 year with a permeation-enhanced testosterone transdermal system. J Clin Endocrinol Metab. 2001;86:1026–33. doi:10.1210/jc.86.3.1026.

    Article  PubMed  CAS  Google Scholar 

  5. Malkin CJ, Pugh PJ, Jones TH, Channer KS. Testosterone for secondary prevention in men with ischaemic heart disease? QJM. 2003;96:521–9. doi:10.1093/qjmed/hcg086.

    Article  PubMed  CAS  Google Scholar 

  6. Manson JE, Bassuk SS, Harman SM, et al. Postmenopausal hormone therapy: new questions and the case for new clinical trials. Menopause. 2006;13:139–47. doi:10.1097/01.gme.0000177906.94515.ff.

    Article  PubMed  Google Scholar 

  7. Costarella CE, Stallone JN, Rutecki GW, Whittier FC. Testosterone causes direct relaxation of rat thoracic aorta. J Pharmacol Exp Ther. 1996;277:34–9.

    PubMed  CAS  Google Scholar 

  8. Deenadayalu VP, White RE, Stallone JN, Gao X, Garcia AJ. Testosterone relaxes coronary arteries by opening the large-conductance, calcium-activated potassium channel. Am J Physiol Heart Circ Physiol. 2001;281:H1720–7.

    PubMed  CAS  Google Scholar 

  9. English KM, Jones RD, Jones TH, Morice AH, Channer KS. Testosterone acts as a coronary vasodilator by a calcium antagonistic action. J Endocrinol Invest. 2002;25:455–8.

    PubMed  CAS  Google Scholar 

  10. Malkin CJ, Pugh PJ, Jones RD, Jones TH, Channer KS. Testosterone as a protective factor against atherosclerosis-immunomodulation and influence upon plaque development and stability. J Endocrinol. 2003;178:373–80. doi:10.1677/joe.0.1780373.

    Article  PubMed  CAS  Google Scholar 

  11. Wu FC, von Eckardstein A. Androgens and coronary artery disease. Endocr Rev. 2003;24:183–217. doi:10.1210/er.2001-0025.

    Article  PubMed  CAS  Google Scholar 

  12. Yue P, Chatterjee K, Beale C, Poole-Wilson PA, Collins P. Testosterone relaxes rabbit coronary arteries and aorta. Circulation. 1995;91:1154–60.

    PubMed  CAS  Google Scholar 

  13. Kamischke A, Heuermann T, Kruger K, et al. An effective hormonal male contraceptive using testosterone undecanoate with oral or injectable norethisterone preparations. J Clin Endocrinol Metab. 2002;87:530–9. doi:10.1210/jc.87.2.530.

    Article  PubMed  CAS  Google Scholar 

  14. Zitzmann M, Nieschlag E. Testosterone levels in healthy men and the relation to behavioural and physical characteristics: facts and constructs. Eur J Endocrinol. 2001;144:183–97. doi:10.1530/eje.0.1440183.

    Article  PubMed  CAS  Google Scholar 

  15. Davoodi G, Amirezadegan A, Borumand MA, Dehkori MR, Kazemisaeid A, Yaminisharif A. The relationship between level of androgenic hormones and coronary artery disease in men. Cardiovasc J Afr. 2007;18:362–6.

    PubMed  Google Scholar 

  16. Bishop CM. Pattern recognition and machine learning: Springer, 2006.

  17. Faraggi D, Simon R. The maximum likelihood neural network as a statistical classification model. J Stat Plan Inference. 1995;46:93–104. doi:10.1016/0378-3758(95)99068-2.

    Article  Google Scholar 

  18. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2001.

    Google Scholar 

  19. Ripley BD. Pattern recognition and neural networks. Cambridge: Cambridge University Press; 1996.

    Google Scholar 

  20. Ripley RM, Harris AL, Tarassenko L. Non-linear survival analysis using neural networks. Stat Med. 2004;23:825–42. doi:10.1002/sim.1655.

    Article  PubMed  Google Scholar 

  21. Friedewald WT, Levy RI, Fredrickson DS. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem. 1972;18:499–502.

    PubMed  CAS  Google Scholar 

  22. Judkins MP. Selective coronary arteriography. I. A percutaneous transfemoral technic. Radiology. 1967;89:815–24.

    PubMed  CAS  Google Scholar 

  23. Gensini GG. A more meaningful scoring system for determining the severity of coronary heart disease. Am J Cardiol. 1983;51:606. doi:10.1016/S0002-9149(83)80105-2.

    Article  PubMed  CAS  Google Scholar 

  24. Pollak A, Rokach A, Blumenfeld A, Rosen LJ, Resnik L, Dresner Pollak R. Association of oestrogen receptor alpha gene polymorphism with the angiographic extent of coronary artery disease. Eur Heart J. 2004;25:240–5. doi:10.1016/j.ehj.2003.10.028.

    Article  PubMed  CAS  Google Scholar 

  25. Pastor R, Guallar E. Use of two-segmented logistic regression to estimate change-points in epidemiologic studies. Am J Epidemiol. 1998;148:631–42.

    PubMed  CAS  Google Scholar 

  26. Funahashi K. On the approximate realization of continuous mapping by neural networks. Neural Netw. 1989;2:183–92. doi:10.1016/0893-6080(89)90003-8.

    Article  Google Scholar 

  27. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2:359–66. doi:10.1016/0893-6080(89)90020-8.

    Article  Google Scholar 

  28. Mathieson MJ. Ordinal models for neural networks. In neural networks in financial engineering. In: Refences A-P, Abu-Mostafa Y, Moody J, Weigend A, editors. Proceedings of the Third International Conference on Neural Networks in the Capital Markets. Singapore: World Scientific; 1996. p. 523–36.

  29. Nabney IT. Netlab: algorithms for pattern recognition. London: Springer; 2001.

    Google Scholar 

  30. Pearlmutter BA. Fast exact multiplication by the Hessian. Neural Comput. 1994;6:147–60. doi:10.1162/neco.1994.6.1.147.

    Article  Google Scholar 

  31. Fallah N, Faghihzadeh S, Mahmoudi M. Comparing and Contrasting Fuzzy Min-Max Neural Network with the Classical Statistical Clustering Methods in classification of Rickets Disease. Bulletin of the 53rd session of the International Statistical Institute. 2001;2:445–6.

  32. Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med. 1991;115:843–8.

    PubMed  CAS  Google Scholar 

  33. Kemeny V, Droste DW, Hermes S, et al. Automatic embolus detection by a neural network. Stroke. 1999;30:807–10.

    PubMed  CAS  Google Scholar 

  34. Das A, Ben-Menachem T, Cooper GS, et al. Prediction of outcome in acute lower-gastrointestinal haemorrhage based on an artificial neural network: internal and external validation of a predictive model. Lancet. 2003;362:1261–6. doi:10.1016/S0140-6736(03)14568-0.

    Article  PubMed  Google Scholar 

  35. Vijaya G, Kumar V, Verma HK. ANN-based QRS-complex analysis of ECG. J Med Eng Technol. 1998;22:160–7.

    Article  PubMed  CAS  Google Scholar 

  36. Song X, Mitnitski A, MacKnight C, Rockwood K. Assessment of individual risk of death using self-report data: an artificial neural network compared with a frailty index. J Am Geriatr Soc. 2004;52(7):1180–4. doi:10.1111/j.1532-5415.2004.52319.x.

    Article  PubMed  Google Scholar 

  37. Song X, Mitnitski A, Cox J, Rockwood K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Medinfo. 2004;11:736–9.

    Google Scholar 

  38. Penedo MG, Carreira MJ, Mosquera A, Cabello D. Computer-aided diagnosis: a neural-network-based approach to lung nodule detection. IEEE Trans Med Imaging.. 1998;17:872–80. doi:10.1109/42.746620.

    Article  PubMed  CAS  Google Scholar 

  39. Izenberg SD, Williams MD, Luterman A. Prediction of trauma mortality using a neural network. Am Surg. 1997;63:275–81.

    PubMed  CAS  Google Scholar 

  40. Li YC, Liu L, Chiu WT, Jian WS. Neural network modeling for surgical decisions on traumatic brain injury patients. Int J Med Inform. 2000;57:1–9. doi:10.1016/S1386-5056(99)00054-4.

    Article  PubMed  CAS  Google Scholar 

  41. Grigsby J, Kooken R, Hershberger J. Simulated neural networks to predict outcomes, costs, and length of stay among orthopedic rehabilitation patients. Arch Phys Med Rehabil. 1994;75:1077–81. doi:10.1016/0003-9993(94)90081-7.

    Article  PubMed  CAS  Google Scholar 

  42. Tu JV, Guerriere MR. Use of a neural network as a predictive instrument for length of stay in the intensive care unit following cardiac surgery. Comput Biomed Res. 1993;26:220–9. doi:10.1006/cbmr.1993.1015.

    Article  PubMed  CAS  Google Scholar 

  43. Nguyen T, Malley R, Inkelis S, Kuppermann N. Comparison of prediction models for adverse outcome in pediatric meningococcal disease using artificial neural network and logistic regression analyses. J Clin Epidemiol. 2002;55:687–95. doi:10.1016/S0895-4356(02)00394-3.

    Article  PubMed  Google Scholar 

  44. Dorsey SG, Waltz CF, Brosch L, Connerney I, Schweitzer EJ, Bartlett ST. A neural network model for predicting pancreas transplant graft outcome. Diabetes Care. 1997;20:1128–33. doi:10.2337/diacare.20.7.1128.

    Article  PubMed  CAS  Google Scholar 

  45. Buscema M, Grossi E, Snowdon D, Antuono P. Auto-contractive maps: an artificial adaptive system for data mining, an application to Alzheimer disease. Curr Alzheimer Res. 2008;5:481–98. doi:10.2174/156720508785908928.

    Article  PubMed  CAS  Google Scholar 

  46. Rossini PM, Buscema M, Capriotti M, Grossi E, Rodriguez G, Del Percio C, et al. Is it possible to automatically distinguish resting EEG data of normal elderly vs. mild cognitive impairment subjects with high degree of accuracy? Clin Neurophysiol. 2008;119:1534–45. doi:10.1016/j.clinph.2008.03.026.

    Article  PubMed  Google Scholar 

  47. Allore H, Tinetti ME, Araujo KL, Hardy S, Peduzzi P. A case study found that a regression tree outperformed multiple linear regression in predicting the relationship between impairments and social and productive activities scores. J Clin Epidemiol. 2005;58:154–61. doi:10.1016/j.jclinepi.2004.09.001.

    Article  PubMed  Google Scholar 

  48. DiRusso SM, Chahine AA, Sullivan T, et al. Development of a model for prediction of survival in pediatric trauma patients: comparison of artificial neural networks and logistic regression. J Pediatr Surg. 2002;37:1098–104. discussion 1098–104. doi:10.1053/jpsu.2002.33885.

    Google Scholar 

  49. Eftekhar B, Mohammad K, Ardebili HE, Ghodsi M, Ketabchi E. Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data. BMC Med Inform Decis Mak. 2005;5:3. doi:10.1186/1472-6947-5-3.

    Article  PubMed  Google Scholar 

  50. Kattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res. 1998;31:363–73. doi:10.1006/cbmr.1998.1488.

    Article  PubMed  CAS  Google Scholar 

  51. Costanza MC, Paccaud F. Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models. BMC Med Res Methodol. 2004;4:7. doi:10.1186/1471-2288-4-7.

    Article  PubMed  Google Scholar 

  52. Marble RP, Healy JC. A neural network approach to the diagnosis of morbidity outcomes in trauma care. Artif Intell Med. 1999;15:299–307. doi:10.1016/S0933-3657(98)00059-1.

    Article  PubMed  CAS  Google Scholar 

  53. Flouris AD, Duffy J. Application of artificial intelligence systems in the analysis of epidemiological data. Eur J Epidemiol. 2006;21:167–70. doi:10.1007/s10654-006-0005-y.

    Article  PubMed  Google Scholar 

  54. Grassi M, Villani S, Marinoni A. Classification methods for the identification of `case’ in epidemiological diagnosis of asthma. Eur J Epidemiol. 2001;17(1):19–29. doi:10.1023/A:1010987521885.

    Article  PubMed  CAS  Google Scholar 

  55. Wolfe R, McKenzie DP, Black J, Simpson P, Gabbe BJ, Cameron PA. Models developed by three techniques did not achieve acceptable prediction of binary trauma outcomes. J Clin Epidemiol. 2006;59:26–35. doi:10.1016/j.jclinepi.2005.05.007.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This study was sponsored by Tehran University/Medical Sciences. Part of this work was carried out during a sabbatical year (student visitor period) of Nader Fallah at the Mathematics and Statistics Department of Dalhousie University and University of British Columbia. Authors wish to thank R. Ripley for her help and I. Nabney for his Matlab Code. The authors thank Janet Brush, Parveer Pannu, and Catherine Pretty for editing this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazem Mohammad.

Appendices

Appendix 1

Generalized additive models

Generalized additive models and generalized linear models can be applied in similar situations, but they serve different analytic purposes. Generalized linear models emphasize estimation and inference for the parameters of the model, while generalized additive models focus on exploring data non-parametrically. Therefore, this model could find nonlinearity relation between predictor and response variables. Generalized additive models permit the response probability distribution to be any exponential family of distributions. Many widely used statistical models belong to this general class, including additive models for Gaussian data, binary data, and nonparametric log-linear models for Poisson data.

Suppose that Y is a response random variable and X 1, …, X p is a set of predictor variables. A regression procedure can be viewed as a method for estimating how the value of Y depends on the values of X 1, …, X p . Given a sample of values for Y and X, estimates of β0, β1, …, β n are often obtained by the least squares method. In regression models the effects of prognostic factors x i in terms of a linear predictor of the form \( \sum {x_{j} \beta_{j} } \), where the β j are parameters. The generalized additive model replaces \( \sum {x_{j} \beta_{j} } \) with \( \sum {f_{j} (x_{j} )} \) where f j is a unspecified (non-parametric) function. This function is estimated in a flexible manner using a scatter plot smoother. The estimated function \( \hat{f}_{j} (x_{j} ) \) can reveal possible non linearity in the effect of the x j . Suppose y is a response or outcome variable, and x is a prognostic factor. We wish to fit a smooth curve f(x) that summarizes the dependence of y on x. If we were to find the curve that simply minimizes \( \sum {(y_{i} - f(x_{i} ))} \), the result would be an interpolating curve that would not be smooth at all. The cubic spline smoother imposes smoothness on f(x). We seek the function f(x) that minimizes

$$ \sum {(y_{i} - f(x_{i} ))}^{2} + \lambda \int {f^{\prime\prime}(x)^{2} } . $$
(1)

Notice that \( \int {f^{\prime\prime}(x)^{2} } \) measures the “Curvature’’ of the function f: λ is a non-negative smoothing parameter that must be chosen by the data analyst. The λ has a direct relation with degree of freedom. Larger values of λ force f to be smoother. More detailed have been presented elsewhere [18].

Appendix 2

Artificial neural networks

Neural network

A neural network is a set of interconnected simple processing elements called neurons, where each connection has an associated weight. The neuron or unit processes its inputs to create an output. The network consists of a number of input units representing the predictors, one or more output units corresponding to the predicted variables and possibly some internal units to increase the model complexity or flexibility. The weights associated with the interconnections between the units will be optimized in fitting the model to the data. The most commonly used form of neural network is the multi-layer perceptron (MLP). A MLP consists of one input layer of units, one output layer of units and possibly one or more layers of ‘hidden’ units. The input units pass their inputs to the units in the first hidden layer or directly to the output units. Each of the hidden layer units adds a constant (termed as ‘bias’) to a weighted sum of its inputs and calculates an activation function \( \phi_{h} \) of the result. This is then passed to hidden units in the next layer or to the output unit(s). The activation function is usually chosen in advance. Common choices of the activation function include the logistic function, tangent hyperbolic function or other monotonic functions. In this paper we fix the activation function as tangent hyperbolic function. The output units apply a linear, logistic, thresholds or other function \( \phi_{0} \) to the weighted sum of their inputs plus its ‘bias’. In this paper we use exponential function in output layer.

Denote the inputs as \( x_{i} \)’s and the outputs \( t_{k} \)’s, for MLP with one hidden layer

$$ \begin{gathered} t_{k} = \phi_{0} \left( {\alpha_{k} + \sum\limits_{j \to k} {\omega_{jk} \phi_{h} \left( {\alpha_{j} + \sum\limits_{i \to j} {\omega_{ij} x_{i} } } \right)} } \right). \hfill \\ \hfill \\ \end{gathered} $$
(2)

If we have only one output node, k will be equal to one. The weights can be determined by optimizing some proper criterion function such as minimizing the sum of squared errors of the predicted variable or maximizing the log-likelihood of the data in cases where a distribution of the response variable can be assumed. The structure of MLP made it possible to fit very general non-linear functional relationships between inputs and outputs. Research results have shown that neural networks with enough hidden units can approximate any arbitrary functional relationships [26, 27]. However, over-fit can be a serious problem in such a framework. This problem usually is overcome either by stopping the optimization early or more often by using regularization techniques to penalize the optimization criterion. By adding a penalty term to the optimization criterion, the estimates of the weights will be shrunk which is also termed as shrinkage method. The following smoothness penalty is often used in shrinkage method:

$$ L = - \log \,{\text{likelihood}} + \lambda \sum\limits_{\text{weights}} {\omega_{ij}^{2} } . $$
(3)

This process is also known as weight decay in neural network literatures. The tuning parameter λ can be chosen by cross-validation. For fixed number of hidden units, we minimize this penalized log-likelihood in (3) to get the weights estimated. To control the complexity of the model due to the number of the hidden units, criteria such as AIC and BIC are used.

Optimization criteria

Given a training set comprising a set of input vectors\( \left\{ {x_{n} } \right\} \), where n = 1, …, N, together with the corresponding target vector \( \left\{ {y_{n} } \right\} \), if we assume that data points \( y_{n} \) (n = 1, …, N) are independent conditional on \( x_{n} \), the likelihood function can be written as:

$$ P(y|x) = \prod\limits_{n = 1}^{N} {p(y_{n} |x_{n} )} $$
(4)

or

$$ P(y_{1} , \ldots ,y_{N} |x_{1} , \ldots ,x_{N} ) = \prod\limits_{n = 1}^{N} {p(y_{n} |x_{n} )} . $$

The error function can be defined as the negative log-likelihood:

$$ E = - \log P(y_{1} , \ldots ,y_{N} |x_{1} , \ldots ,x_{N} ) = - \sum\limits_{n = 1}^{N} {\log p(y_{n} |x_{n} )} . $$
(5)

Linear and logistic regression

For regression problems with normality assumption, this can be reduced to the most commonly used squared error criterion:

$$ E(w) = \frac{1}{2}\sum\limits_{n = 1}^{N} {\left\{ {y_{n} - t_{n} (x_{n} ;w)} \right\}^{2} } . $$
(6)

For classification problems, it is often advantageous to associate the network outputs to the posterior probabilities of each class. For a problem with two classes (such as normal and CAD), the target variable \( \left\{ {y_{n} } \right\} \) is binary and can be assumed to follow binomial distribution with its probability as \( t_{n} (x_{n} ;w) \). The error function in (5) then yields the cross-entropy error function:

$$ E = - \sum {\left\{ {y_{n} \ln t_{n} + (1 - y_{n} )\ln (1 - t_{n} )} \right\}} . $$
(7)

This definition extended to other generalized linear models (GLM) by other researcher such as in multinomial logistic regression and ordinal logistic regression or Cox regression for survival models [1620, 2630]. We will consider the Poisson regression in the following.

Poisson regression

Suppose we have a single target variable with count response, we consider the non-linear Poisson regression for neural networks as an extension of generalized linear models. It seems this model has not been introduced in literatures before.

The Poisson probability distribution for count data is given by:

$$ P\left[ {Y_{n} = y_{n} } \right] = \frac{{e^{{ - \lambda_{n} }} \lambda_{n}^{{y_{n} }} }}{{y_{n} !}},y_{n} = 0,1,2, \ldots . $$
(8)

In linear Poisson regression, the most commonly used formulation is the log-linear link function: \( \ln \lambda_{n} = x^{\prime}_{n} \beta \). Thus the expected value for \( y_{n} \) is given by \( E\left[ {y_{n} |x_{n} } \right] = \lambda_{n} = e^{{x_{n}^{\prime } \beta }} . \)

Here we model \( \lambda_{n} \) as a function of \( x_{n} \) by an MLP neural network:

$$ t_{n} = \hat{\lambda }_{n} = \phi_{0} \left( {\alpha + \sum\limits_{j} {\omega_{j} \phi_{h} } \left( {\alpha_{j} + \sum\limits_{i \to j} {\omega_{ij} x_{n} } } \right)} \right) $$
(9)

where \( \phi_{0} \) is fixed as an exponential function.

Substituting Poisson probability function in (5) and using (9) as Poisson means, the negative log-likelihood criterion can be obtained as:

$$ E = - \sum\limits_{n = 1}^{N} {\left[ { - t_{n} + y_{n} \log t_{n} - \ln y_{n} !} \right]} . $$
(10)

Eliminating the last term which is not related to the model fitting, we have:

$$ E = - \sum\limits_{n = 1}^{N} {\left[ { - t_{n} + y_{n} \log t_{n} } \right]} . $$
(11)

Model fitting

We compare the performances of different models using simulations. Likelihood error criterion functions such as that in (11) are used to fit models with fixed number of units in hidden layer. To guard against over-fitting, a penalized version of (11) given as below is used in the non-linear model fitting.

$$ E_{r} = E + \lambda \sum\limits_{\text{weights}} {\omega_{ij}^{2} } . $$
(12)

For each neural networks model, to identify the number of units in hidden layer, both criteria Akaike Information Criterion (AIC) and Schwarz Bayesian Information Criterion (BIC) are calculated:

$$ {\text{AIC}} = - 2\,{\text{Log}}\,{\text{likelihood}} + 2m $$
(13)
$$ {\text{BIC}} = - 2\,{\text{Log}}\,{\text{likelihood}} + m\log (N) $$
(14)

where m is the number of the estimated parameters and N is the number of the observations. The model with the smallest value of the information criterion is considered to be the best. However, it should be noticed that in our neural network model fittings, for each setting of fixed number of hidden units, the negative log-likelihood score we get is suboptimal since the weights are optimized on a penalized version of (11). We thus can only get approximations of the AIC and BIC values. We also calculated MSE for testing set as a reference measure for accuracy, where MSE is defined as

$$ \frac{1}{N}\sum\limits_{n = 1}^{N} {\left( {\lambda_{n} - t_{n} } \right)^{2} } . $$
(15)

The predictions by different models are ranked by MSE.

The models considered include 2, 3, 4, 5, 10, 20 hidden units. To save the computation time, the weight decay parameter is pre-fixed at 0.012 in our computation. This value is chosen based on some empirical study for different choices of weight decay parameter.

Error gradient calculation

Back propagation is a general computing technique to fit parameters in MLP. The computation involves the numerical evaluation of derivatives of the error function with respect to the weights and biases. The general form of back propagation is described elsewhere [18]. Here we use a special algorithm based on the article by Pearlmutter [30] for computation of Hessian Matrix, similar to Nabney [29] approach. The scaled conjugate gradient algorithm is used for optimization. The code is written in R 2.5 and Matlab 7.2.

Clinical application

Reports in medical literature suggest that neural network models are applicable in diagnosing such as ricket disease [31] myocardial infarction [32] pulmonary emboli [33] and gastrointestinal hemorrhage [34], using waveform analysis of EKGs [35], prediction of health outcome [36, 37], and radiographic images [38]. Neural networks have also been successfully applied in clinical outcome prediction of trauma mortality [39], surgical decision making on traumatic brain injury patients [40], recovery from surgery [41, 42], pediatric meningococcal disease [43], transplantation outcome [44] Alzheimer’s [45] and Dementia [46]. In addition some more technical comparison between statistical methods and artificial intelligence techniques for medical data exist [4555].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fallah, N., Mohammad, K., Nourijelyani, K. et al. Nonlinear association between serum testosterone levels and coronary artery disease in Iranian men. Eur J Epidemiol 24, 297–306 (2009). https://doi.org/10.1007/s10654-009-9336-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-009-9336-9

Keywords

Navigation