Background

Circadian rhythms are biological processes that display endogenous, entrainable oscillation cycles that last approximately 24 hours (owing to the Earth’s rotation around its own axis) [1]. These rhythms tune internal physiology, behaviour and metabolism to external conditions and are considered to be a feature of most living cells and organisms [1].

At the epicentre of circadian rhythms is melatonin (MLT) or N-acetyl-5-methoxy tryptamine, an indoleamine primarily produced by the pineal gland and secreted into the blood [2, 3]. The indoleamine can be administered exogenously, i.e. orally, as capsules, tablets or liquids, sublingually, or as transdermal patches. It is available without prescription (over-the-counter) in many countries for the treatment of insomnia and depression. MLT synchronises the internal hormonal environment to the light–dark cycle of the external environment and controls circadian rhythms [4, 5]. Unfortunately, at night, artificial lighting such as light-emitting diodes (LED) continues to activate the suprachiasmatic nucleus of the brain, suppressing the natural release of MLT and potentially causing health problems [6]. Previous studies have provided evidence of the role of MLT on the regulation of circadian rhythms as well as its connection with the development of various cancers (breast, prostate, endometrial, ovary, colorectal and skin), cardiovascular diseases, gastrointestinal and digestive problems, diabetes, obesity, depression, sleep deprivation, premature ageing and cognitive impairment [7,8,9,10,11,12,13,14,15,16].

A comprehensive, informed and up-to-date review of the current knowledge on the effects of MLT on health is not only timely but urgent, given the technological and lifestyle changes, e.g. chronodisruption, following the overwhelming use of the LEDs omnipresent in computers, smartphones and tablets.

Therefore, the objectives of this umbrella review were to evaluate the evidence for the effects of MLT on health from the published literature, specifically systematic reviews (SRs) and narrative reviews (NRs), to investigate the potential mechanisms of action and to identify which health outcomes are associated with the production and/or supplementation of MLT.

Methods

The Cochrane Handbook for Systematic Reviews of interventions and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [17] were adhered to while writing and reporting this review (Prospero registration number: CRD42016039840; available at www.crd.york.ac.uk/PROSPERO) [18].

Literature search and eligibility criteria

For the electronic search, the following databases were searched for entries from January 1996 until July 2017: MEDLINE (via Ovid), EMBASE (via Ovid), Web of Science, CENTRAL (Wiley), PsycINFO (Ovid) and CINAHL (via EBSCO). We hypothesised that any significant reviews or studies would have been captured by reviews conducted since January 1996 (our search start date). A detailed search strategy for MEDLINE is presented in the Appendix. In addition to the electronic searches, the reference lists of all eligible articles were reviewed for further potentially relevant studies. Only data from the published papers were used; the study authors were not contacted.

We included SRs (defined as research articles with a replicable methods section, e.g. searches, eligibility criteria and critical appraisal of primary studies) [19] or NRs (defined as articles without a replicable methods section) [20] of studies involving both healthy and ill individuals of any age and gender using both endogenous and exogenous MLT and MLT agonists. Reviews that relied on data from animal, human or/and in vitro studies with any type of health-related outcome measures were eligible. All SRs and NRs that are for the same associations throughout the search period regardless of the amount and level of overlap, i.e. one primary study included in two or more reviews and/or two or more identified reviews on the same topic, were eligible. We excluded reviews of plants, abstracts or review protocols and reviews not published in English.

Study selection

The data screening and selection process were performed by the first reviewer (PP) and verified and validated by a second reviewer (BMK). All identified references were imported into EndNote (X7.7.1). The search results from all the bibliographic searches were merged and duplicate records removed.

Data extraction

Working in groups of two, four authors (BMK, UD, GD and SB) independently extracted relevant information from the studies included using a custom-made data extraction form. The data were subsequently validated by a fifth author (PP). The following information was extracted from the reviews included: first authors’ names and publication date, total number of primary studies, total number of patients included, quality of SRs (Oxman checklist score), quality of primary studies (low, moderate or high as determined by the authors of the reviews), subject/condition/indication, administration of MLT (dose, route, frequency and duration), details of any meta-analyses (MAs), health outcomes/effects/overall results, confounders, and any additional comments. Any disagreements were resolved by discussion between the authors.

Quality assessment

The methodological quality of SRs was independently evaluated by five reviewers using the Oxman checklist [21]. This validated tool assesses the quality of review articles across nine domains: (1) reporting of search strategy, (2) comprehensiveness of searches, (3) repeatable eligibility criteria, (4) avoidance of selection bias, (5) presence of a validity assessment tool, (6) use of the validity assessment tool, (7) robustness of data analysis, (8) appropriateness of data analysis and (9) supportiveness of conclusions. Each question was scored as 1 (fulfilled), 0 (partially fulfilled) or -1 (not fulfilled). A score of 1 or below indicates extensive flaws, 2–3 indicates the presence of major flaws, 4–5 means minor flaws and 6–9 indicates minimal or no flaws. Again, any disagreements (N = 6) were resolved by discussion between the authors.

Statistical analysis

The results from NRs or SRs that did not pool data quantitatively (N = 164) are presented narratively using descriptive tables. Sub-group analyses were conducted for the subset of 31 SRs that had pooled their data quantitatively. For that purpose, the approach by Bellou et al. [22] was used. For each health outcome, we calculated the number of participants and original studies involved in the MA, summary effect sizes [with 95% confidence intervals (CI) and P values] using both random- and fixed-effects models. The 95% prediction interval (PI) was calculated, which further accounts for between-study heterogeneity and estimates the uncertainty around the effect that would be anticipated in a new study evaluating that same association. Between-study heterogeneity was measured with the I2 statistic. An I2 value of 50% or more is considered to represent a substantial level of heterogeneity, whereas values exceeding 75% are considered to represent considerable heterogeneity. These values also need to be interpreted in light of the size and direction of effects and the strength of the evidence for heterogeneity, based on the P value from Cochran’s Q test [18]. The evidence of small-study effects (i.e. the tendency of smaller studies to produce substantially larger effect size estimates compared to larger studies) was evaluated by Egger’s regression asymmetry test [23]. In a more conservative way, a P value less than 0.10 from Egger’s test was considered to be evidence of small-study effects. Wherever possible, we extracted the estimate of the largest study (with least standard error) of each MA from a random-effect model to interpret the direction and magnitude of the effect size. We characterised the convincing associations if they met the following criteria: had significance according to a random-effects meta-analysis of less than 0.001, were based on greater than 1000 participants, had between-study heterogeneity (I2) < 50% and a 95% PI excluded the null value, and had no evidence of small-study effects and excess significance bias. MAs where the required information was not available were excluded from mainstream analyses and presented in a separate table. The statistical analyses were done with open-source R software (version 3.3.1) for Windows using the Meta package. The Pieper et al. formula [24] was used for calculating the amount of overlap (as a percentage) of primary trials in the included SRs (i.e., corrected covered area). A corrected covered area within the range 0–5% indicates a slight overlap, 6–10% indicates a moderate overlap, 11–15% indicates a high overlap and > 15% indicates a very high amount of overlap.

Results

Our searches identified a total of 4329 records; 195 review articles met the inclusion criteria (Fig. 1). Table 1 presents the biological mechanisms of action involved. Tables 2 and 3 summarise MAs of MLT for health with and without sufficient data for quantitative synthesis, respectively. Table 4 summarises reviews with overlapping conditions (Fig. 2). The key data from the included SRs or NRs are summarised in Additional file 1: Table S1 and Additional file 2: Table S2. Additional file 3: Table S3 gives the methodological quality of the papers included. Additional file 4: Table S4 lists all randomised controlled trials (RCTs) covered in the subset of 31 SRs and indicates the amount of overlap (Fig. 3). Additional file 5: Table S5 lists adverse effects (AEs) reported in SRs. Altogether, 31 reviews were synthesised quantitatively, whereas the remaining 164 reviews were synthesised narratively.

Fig. 1
figure 1

Flow diagram for studies included. MLT melatonin

Table 1 Biological functions and processes that may be affected by MLT and suggested mechanisms of action in various models
Table 2 Characteristics and quantitative synthesis of the eligible MAs of MLT for health
Table 3 Characteristics of the eligible MAs of MLT for health (with insufficient data for quantitative synthesis)
Table 4 Reviews with overlapping conditions
Fig. 2
figure 2

Health conditions with more than ten systematic reviews

Fig. 3
figure 3

Distribution of citations of different RCTs in the subset of 31 SRs and MAs included. MA meta-analysis, RCT randomised controlled trial, SR systematic review

Characteristics of studies included (N = 195)

The number of primary studies in each SR ranged from 0 to 68 (mean 6.5 ± 10.78). The total number of participants was inestimable due to overlapping studies (optional range 61 to 5812). In 117 of the reviews (60%), either the number of primary studies or the number of participants was not available. None of the included SRs or MAs had access to individual participant data and all relied on summary-level data from the published literature. Eighteen SRs relied on continuous data for their respective MAs [standardised mean difference (SMD), mean difference (MD) and weighted mean difference (WMD)]; and 12 (6.1%) used dichotomous data for pooling [odds ratio (OR) and risk ratio (RR)]; with only one MA using both types of data and analyses (RR and MD) [25]. Three MAs used effect sizes for presenting the overall estimates [26,27,28].

Various conditions were evaluated, ranging from acute coronary syndrome to various cancers, with insomnia/sleep disorders being the most frequent (N = 50; 25.6%). Of these, 26 focused on insomnia/primary sleep disorders only, whereas the remaining 24 evaluated other health conditions with underlying (secondary) sleep disorders. Four reviews (2%) included healthy individuals; and six (3%) evaluated a mixture of healthy and unhealthy patients. Human studies varied from case studies (N = 4), case series (N = 4), case control (N = 2), cohort (N = 1), open-label (N = 13) and uncontrolled before–after (N = 2) to RCTs of parallel and cross-over design with or without the use of a placebo (N = 71).

Administration routes varied from oral and intravenous to sublingual; and MLT preparations included patches, pills, capsules and solutions. In total, 99 reviews (50.7%) included animal/in vivo studies and 55 reviews (28.2%) also included in vitro studies, whereas 84 reviews (43%) included humans only. Confounding factors were not mentioned in 82 reviews (42%). In the remaining 113 reviews, both exogenous and endogenous MLT levels were influenced by a range of genetic, epigenetic and environmental factors including age, gender, menopausal status, parity, oestrogen levels, lifestyle (alcohol use, body mass index, body posture, caffeine, diet, supplements, drug use, night-shift work, artificial light at night, physical activity, psychological stress and sleep hygiene) and others, including individual chronotypes, sessional variations and time, dose and route of MLT administration. In medically compromised patients, e.g. those with cancer, MLT was frequently used as an adjunct to usual care or conventional treatment such as chemotherapy, radiotherapy, supportive care and palliative care.

The most commonly cited effects of MLT were its anti-oxidative, anti-inflammatory and immunomodulatory properties (Table 1). In neoplastic diseases, the most common mechanisms of action included free radical scavenging (hydroxyl radical, hydrogen peroxide, hypochlorous acid, singlet oxygen, the peroxynitrite anion and peroxynitrous acid); stimulation of immune system; improvement of oxidative phosphorylation and ATP generation; co-activating protein kinase enzymes; reduction of cellular proliferation; inhibition of angiogenesis; prostaglandin E2 or 17β-oestradiol; the uptake of linoleic acid, DNA methyltransferase or telomerase.

Evaluation of the evidence

Four MAs [25, 29,30,31] had large levels of heterogeneity (I2 ≥ 50% and ≤ 75%) and six SRs [32,33,34,35,36,37] had very large levels of heterogeneity (I2 > 75%). The median number of studies per MA was 5 (IQR = 4.75) with a median of 557 participants (IQR = 1561). In each of the 13 MAs, more than 1000 cases were analysed. For sleep latency, pre-operative anxiety, prevention of agitation or risk of breast cancer, ten (32%) of 31 MAs reported effects that were significant at P values less than 0.05 under the random-effects model, and seven (23%) were significant at P values less than 0.001 under the random-effects model [31, 33, 38,39,40,41]. For eight MAs (25.8%), we were unable to calculate 95% PIs. The remaining 23 MAs had a 95% PI that included the null value, meaning that, although on average MLT improves various health outcomes, this might depend on dose, duration, intensity, age, gender or underlying co-morbidities. Evidence for small-study effects was noted in three MAs (9.6%). These MAs pertained to the incidence of delirium [35], spinal cord injury [32] or post-operative pain [33] (Table 2).

Only one review [39] for the association of MLT and sleep quality met our predefined convincing association criterion. It highlighted that ramelteon can improve sleep quality in insomnia (SMD = -0.08, 95% CI = -0.13 to -0.03). If we reduced the minimum number of participants in an MA to ≥500, then one more review [31] would satisfy the inclusion criterion. It highlighted that melatonin therapy can improve the partial and complete remission of solid tumour cancers (RR = 1.95, 95% CI = 1.49 to 2.54).

Quality of SRs

The quality of the reviews as measured with the Oxman checklist was typically low (range = -9 to 9; mean = -4.5, SD = 6.7) (Additional file 3: Table S3). Of the reviews included, 153 (153/195; 78.4%) did not use appropriate methods for combining studies and hence were scored as -1.

Quality (and number) of primary studies

Altogether 154 reviews (78.9%) did not evaluate the methodological quality of the primary studies (no validity assessments). In 41 reviews (21.1%) that did undertake this, the methodological quality of the primary data ranged from poor (N = 5) to high (N = 13), with moderate being most commonly reported (N = 18), as assessed by the Cochrane Risk of Bias Tool or the Jadad Scale. The median number of primary studies included was N = 9 (when possible to estimate).

Melatonin receptor agonists

Melatonin receptor agonists, such as Circadin® (prolonged-release MLT), ramelteon, agomelatine or tasimelteon, bind to and activate the MLT receptors 1 and 2 [42]. These analogues of MLT are believed to have the same mechanisms of action as MLT and are typically used for the treatment of sleep disorders and depression [43]. Two reviews of Circadin (prolonged-release MLT), four of ramelteon, two of agomelatine and one of tasimelteon were included. The duration, intensity and frequency varied across the reviews, with 8 mg being most commonly used in ramelteon studies, 2 mg for Circadin; 25–50 mg for agomelatine and 1–50 mg for tasimelteon.

Endogenous vs. exogenous MLT

In total, 31 reviews (15.8%) evaluated both exogenous and endogenous MLT. However, it was often difficult to ascertain the number of studies looking at exogenous MLT vs. endogenous MLT only. The exogenous vs. endogenous MLT doses are also incomparable, as the routes of administration and types of studies differed considerably (optional range 0.003 mg to 3 g).

Discussion

This umbrella review aimed to summarise and critically evaluate the evidence from SRs and NRs of the effects of MLT on health and to identify the biological mechanisms of action involved. In total, 195 reviews were included (96% of the reviews were published after 2000). Of the reviews, 99 included evidence from in vitro or animal experiments, which highlights the still experimental phase of some MLT research and the translational potential for human trials.

There was a considerable clinical and methodological heterogeneity in terms of populations evaluated (from neonates to elderly), doses, excipients, quality or purity of MLT preparations, comparators, outcome measures, study designs, lengths of follow-ups, settings, etc. Despite that, the present review does lend support to the notion that endogenous and exogenous MLT is associated with improved health outcomes. However, caution is advised for the use or supplementation of MLT in some autoimmune conditions, such as rheumatoid arthritis, asthma or organ transplantation as MLT has been reported to stimulate the function of the immune system via the production of interleukins (IL-1, IL-2, IL-6 and IL-12), interferon γ (IFN-γ), Th cells, cytotoxic T cells, and B- and T-cell precursors [44].

Overall, though it seems that the connection between MLT and health is well founded, there is less evidence connecting MLT with specific diseases in a systematic way. The physiological role of MLT, as uncovered by various experimental studies, does, quite robustly, point to a direct relation between MLT and critical elements of health. However, the connection with specific conditions needs to be researched comprehensively. Thus, we suggest the need for high-quality primary data and we underline the importance of targeted studies on specific conditions, such Alzheimer’s or cardiovascular diseases.

Mechanisms of action

Some of the effects of MLT are via anti-oxidative (e.g. [45,46,47,48,49]), anti-inflammatory (e.g. [50,51,52]), anti-apoptotic (e.g. [53, 54]), anti-nociceptive (e.g. [33, 55]), anti-hypertensive (e.g. [56,57,58]), cytoprotective, neuroprotective, cardioprotective or nephroprotective effects (e.g. [59,60,61,62,63,64]), and by enhancing mitochondrial function and protecting nuclear and mitochondrial DNA or regulating homeostasis (e.g. [53, 65]; Table 1). Even though some of the mechanisms of action are well established, the relative absence of the exact role of confounding factors such as diet, exercise, sleep and genetics on the role of MLT to health limits the generalisability of the results. We here identify three important factors that can be taken into account by future researchers. Firstly, the climatic conditions – and especially latitude – could bias the physiological response. Secondly, the urban environment of cities and the presence of LED light could disrupt circadian rhythms and suppress the production of MLT. Finally, the overall cultural background could also have a significant impact, as this affects nutrition and clothing.

Safety

AEs of exogenous MLT and MLT analogues were reported in 11 (5.6%) of the included reviews. Two reviews pooled the safety data [40, 66]. In Liu and Wang [40], there were more subjective reports of at least one AE after treatment with ramelteon compared to placebo (RR = 1.11, 1.03 to 1.20, P < 0.01; seven studies). In Huang et al. [66], however, agomelatine revealed a lower rate of discontinuation due to AEs compared with selective serotonin reuptake inhibitors or serotonin–norepinephrine reuptake inhibitors (RR = 0.38, 95% CI = 0.25 to 0.57). AEs were typically mild and included worsening of symptoms (seizures, asthma or headaches), transient headaches and dizziness, abdominal pain, pharyngitis, back pain and asthenia, somnolence, fatigue, nasopharyngitis, upper respiratory infection, nausea, dizziness, diarrhoea, dyspepsia, dysmenorrhoea, diarrhoea, dry mouth, increased alanine aminotransferase, nightmares, morning drowsiness, enuresis, rash and hypothermia (Additional file 5: Table S5). Given the overwhelming benefits of MLT treatment and the existence of very few and mild AEs (also for long-term use), the risk–benefit ratio favours MLT.

Cost-effectiveness

Only two reviews undertook any health economic analysis of MLT. One review stated that the cost of a 30-tablet pack of 2 mg of Circadin was £15.39 [67], whereas Liira et al. [38] ‘did not find evidence on the cost-effectiveness of the drugs in the included trials’. More cost-effectiveness or cost-benefit analyses would be required to confirm the economic benefits of MLT and to inform various stakeholders and policymakers.

Quality (and quantity) of primary data

In 154 (78.9%) of the reviews, the quality of the primary data was not evaluated. In the 41 reviews (21%) that did evaluate it, the quality of the primary data ranged from poor to high (average = moderate), as judged by the authors of the included reviews, primarily using the Cochrane Risk of Bias Tool. The relatively low number of primary studies (median 9) included in the SRs or NRs might be of potential concern, and signals the need for more research into a wide range of conditions and clinical areas including oncology, emergency medicine, neurology, metabolic diseases, cardiovascular medicine, gynaecology, paediatrics, psychiatry, mental health, gastrointestinal diseases and pain management.

Review quality

The methodological quality of the included SRs was frequently poor (Additional file 3: Table S3). Most of the articles that scored poorly on the Oxman checklist (quality rating scale) were NRs, which are often of poorer quality compared to SRs. As these articles do contribute relevant information, we decided to include them in our study. Of the reviews, however, 36 (18.4%) scored 6–9 on the Oxman checklist, meaning they had minimal or no flaws.

Strengths and weaknesses

This umbrella review has important strengths, such as the inclusion and critical appraisal of 195 review articles, identification of gaps and uncertainties in the evidence base, and categorisation of significant health-related effects and associated mechanisms of action. However, this umbrella review of both SRs and NRs has several limitations that ought to be kept in mind when interpreting its results. First and foremost, even though comprehensive searches were employed, there is no guarantee that all relevant SRs of MLT were included. The searches were restricted to the past 21 years, thereby omitting some potentially older and potentially important reviews, as well as reviews published in languages other than English.

Secondly, one of the limitations of our overview is that many SRs often analysed the same primary studies. This overlap between SRs is important when interpreting results of this overview (Additional file 4: Table S4, Fig. 2). For instance, due to the double counting of the patient data resulting from the overlapping studies, the total number of patients included in our analyses is inestimable. Also, in the subset of 31 MAs, 238 RCTs were included. These RCTs were frequently used in more than one MA (range = 1–4, mean = 1.4, SD = 0.66), meaning that there were overlapping studies and double counting of the data (Fig. 2). To further illustrate this, three [31, 37, 68] of five MAs [31, 37, 68,69,70] evaluating MLT for cancers relied on the same data from the same four primary trials (Lissoni 1996, 1997, 1999, 2003). However, the amount of overlap was calculated (corrected covered area) and found to be 1.2%, which is 'slight' according to Pieper's formula.

Thirdly, although, four SRs were methodologically sound (Oxman checklist score ≥ 6), they were based on poor-quality primary data, which (logically) might seem contradictory.

Fourthly, we did not evaluate whether there was evidence for small-study effects using funnel plot asymmetry [23] (publication bias) because of insufficient data.

Fifthly, reviewing SRs might abandon the nuances that may be embedded in the original data, such as conflicts of interest, sources of funding, validity, generalisability etc.

Sixthly, various animal, human and in vitro models; different modes of administration; and exogenous and endogenous MLT were frequently analysed together, thereby giving limited understanding of how the results vary depending on the health outcomes evaluated.

Lastly, there is no commonly accepted cut-off point differentiating NRs vs. SRs using the Oxman scoring system. For example, a review that arbitrarily scored 2–3 on the scale (indicating the presence of major flaws) may be arbitrarily assigned as an NR as well as an SR (the definition being arbitrary too). In another example, reviews that could be arbitrarily judged as narrative with extensive flaws (a score of 1 or below), e.g. De Jonghe et al. [71], may include information about the number of primary studies and total sample size, i.e. 9/330. On the other hand, reviews that had no flaws (a score of 6–9) may not have that information, e.g. Liira et al. [38]. Taken together, these limitations reduce the conclusiveness of our findings, making them prone to criticism.

Conclusions

Despite the abundance of evidence, more systematic research is needed to understand and establish the connection between MLT and specific aspects of health, potentially as a function of important lifestyle choices.