Postoperative sleep disturbances are common. Several studies have reported its incidence as ranging from 19.5% to 70%.1,2,3,4 Although pain is one of the most important contributing factors to postoperative sleep disturbance, patients with low postoperative pain levels also suffer from poor sleep quality.5,6 Postoperative sleep disturbance is also associated with subsequent cognitive dysfunction,7 and there is expert consensus that postoperative sleep quality is one of the patient outcomes to be studied during the postoperative period.8 In response, the importance of its prevention is drawing attention.

Melatonin is a central circadian regulator, primarily produced by the pineal gland, and a normal melatonin rhythm is partly instrumental in the regulation of the sleep–wake cycle.9 The circadian rhythm of melatonin secretion is disturbed after surgery,10,11 and this disturbance is thought to contribute to postoperative sleep disturbance.12 Therefore, melatonin administration may modify the circadian rhythm and improve postoperative sleep disturbance. Furthermore, melatonin is known to have an excellent safety profile13 and has no major adverse effects, such as respiratory depression. Therefore, melatonin has been researched in nonperioperative settings such as insomnia,14 jet lag,15 and shift work.16 In the perioperative period, melatonin is reportedly effective in decreasing preoperative anxiety,17 emergence agitation,18 and postoperative delirium.19 Since ramelteon, one of the melatonin agonists, has a six-fold higher affinity for MT1 receptors and a three-fold higher affinity for MT2 receptors than that of melatonin,20,21 ramelteon may be more effective than melatonin in improving sleep quality.

Few studies investigating the preventive effect of melatonin administration on postoperative sleep quality have reached any definitive conclusion.22,23,24 Therefore, a systematic review and meta-analysis of these small studies could add value to our current knowledge of postoperative melatonin administration. Zhang et al. conducted a systematic review and meta-analysis of randomized controlled trials (RCTs) and suggested that melatonin intervention did not significantly influence sleep quality compared with a control group.25 Nevertheless, since they limited the surgery to laparoscopic cholecystectomy and analyzed only two RCTs for sleep quality, their studies may be underpowered.

The primary purpose of this systematic review was to compare the effects of melatonin or melatonin agonists on postoperative sleep quality with those of no treatment or placebo in adult patients who underwent surgery under general or regional anesthesia. Secondarily, we sought to assess the effects on pain, opioid consumption, quality of recovery, and any adverse events.

Methods and analysis

The manuscript was prepared following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.26 The protocol for this systematic review was registered in PROSPERO on 27 October 2020, with registration number CRD42020180167 and published elsewhere.27

Eligibility criteria

We included all RCTs that tested the effect of melatonin or melatonin agonists on postoperative sleep quality in adult patients (≥ 18 yr) undergoing general or regional anesthesia with sedation for any surgery. We excluded patients who received surgical intervention under regional anesthesia without sedation. We excluded data from case reports, observational studies, reviews, and animal studies. Eligibility was not restricted by language, type of surgery, or type of anesthesia.

The intervention of interest was the perioperative (seven days before and after the date of surgery) administration of melatonin or melatonin agonists. There were no restrictions on dosing, frequency, timing, route of administration, or therapy duration. No treatment or placebo was included as the control intervention.

Information sources and search strategy

We conducted a search in MEDLINE, Cochrane Central Register of Controlled Trials (CENTRAL), EMBASE, and Web of Science. The final search was conducted on 18 April 2022. We also searched the reference lists of relevant articles. Furthermore, we conducted a search of ClinicalTrials.gov, the World Health Organization (WHO) International Clinical Trials Registry Platform, and the University Hospital Medical Information Network (UMIN) Clinical Trials Registry. We searched the gray literature using Web of Science. The literature search was limited to studies with human participants. The search strategy, combining free text and Medical Subject Headings terms for PubMed, is shown in Electronic Supplementary Material (ESM) eTable.

Study records

Two authors (A. T. and T. T.) independently scanned the titles and abstracts of the reports identified using the search strategies described above. When eligibility was determined based on the title or abstract, the full paper was retrieved. Potentially relevant studies chosen by at least one author were retrieved and evaluated in full-text versions. Articles that met the inclusion criteria were assessed independently by two authors, and any discrepancies were resolved through discussion.

Two authors (A. T. and T. T.) extracted data independently and in duplicate from each eligible study. The reviewers resolved disagreements through discussion. We contacted the authors of the study to resolve any uncertainties.

Outcomes and prioritization

Primary outcome

The primary outcome was sleep quality measured using a visual analog scale (VAS) (0 mm = best conceivable sleep and 100 mm = worst conceivable sleep) during the early postoperative period. We defined the early postoperative period as the period between the first postoperative night and the fourth night after surgery. If the outcomes were measured several times during the early postoperative period, we considered the mean of these values as our primary outcome. Although we defined the early postoperative period as the interval between the day after surgery and three days after surgery (i.e., between the second postoperative night and the fourth night after surgery) in the protocol,27 we changed the definition as described above because almost all included studies assessed sleep quality from the first night after surgery.

Secondary outcomes

The secondary outcomes were as follows:

  1. 1.

    Total sleep time (minutes);

  2. 2.

    Sleepiness during the daytime was measured using VAS, the Karolinska Sleepiness Scale (KSS),28 or the Stanford Sleepiness Scale (SSS);29

  3. 3.

    Quality of recovery assessed by the QoR-4030 or QoR-15 measures;31

  4. 4.

    Opioid consumption (expressed as cumulative dose [mg] of intravenous morphine or morphine equivalent);

  5. 5.

    Pain as assessed using validated assessment tool scores, such as a VAS, numerical rating scale (NRS), and verbal categorical rating scale; and

  6. 6.

    Any adverse events such as dizziness, desaturation event, or delayed recovery.

Risk of bias in individual studies

We assessed the risk of bias using the Cochrane Risk of Bias Tool version 2 (RoB 2) for randomized controlled trials.32 The RoB 2.0 assessment for individually randomized trials (including crossover trials) has five domains and one overall risk of bias domain, as follows:

  1. 1.

    Bias arising from the randomization process;

  2. 2.

    Bias due to deviations from intended interventions;

  3. 3.

    Bias due to missing outcome data;

  4. 4.

    Bias in the measurement of outcomes;

  5. 5.

    Bias for selection of the reported result; and

  6. 6.

    Overall risk of bias

The risk of bias was assessed as “low,” “some concern,” or “high” in each domain.

Data synthesis

We summarized continuous data using mean differences (MDs) or standardized mean differences (SMDs) with a 95% confidence interval (CI), and dichotomous data as risk ratio (RR) with 95% CI. If the 95% CI included a value of 0 or 1 for continuous or dichotomous data, respectively, we considered the difference not statistically significant. We contacted the original authors of the study to obtain the relevant missing data. Heterogeneity was quantified using the I2 and Cochran’s Q statistics. We considered significant heterogeneity to exist when the I2 statistic exceeded 50%. We conducted a subgroup analysis to explore the possible causes of the high heterogeneity. We used a random-effects model (DerSimonian and Laird methods33) considering clinical and methodological heterogeneity to combine the results.

We conducted a subgroup analysis according to the following predefined factors when the I2 statistic exceeded 50%: 1) type of anesthesia (regional anesthesia, inhaled general anesthesia, or total intravenous general anesthesia); 2) type of surgery; 3) timing of surgery (daytime vs nighttime); 4) drug type (melatonin vs melatonin agonists); 5) dose of melatonin/melatonin agonists; 6) age; or 7) type of control (placebo vs no treatment). Subgroup analyses were not performed if there were fewer than three studies. Sensitivity analysis excluding studies with a high risk of bias was performed for the primary outcome.

Trial sequential analysis (TSA) was performed to correct for random error and repetitive testing of accumulating and sparse data34 using TSA viewer version 0.9.5.10 β (www.ctu.dk/tsa).35 The risk of type 1 error was maintained at 5%, with a power of 90%. We considered a 10-mm reduction in MD for the primary outcome clinically meaningful. We used diversity (D2) as an estimator of heterogeneity for the required information size calculation.36 We assumed D2 is 30% or a model variance-based value if it is higher than 30%. If the cumulative z-curve did not cross the TSA monitoring boundaries, we downgraded the certainty of the evidence owing to imprecision in the results.

Statistical analyses were performed using R software, version 4.1.0 (R Foundation for Statistical Computing, Vienna, Austria), and the “meta” package was used to perform the meta-analysis.

Reporting bias and publication bias

To determine whether reporting bias was present, we determined whether the RCT protocol was published before patients were recruited for the study. For studies published after July 1, 2005, we screened the Clinical Trial Register at ClinicalTrials.gov (https://clinicaltrials.gov/), the WHO International Clinical Trials Registry Platform (https://www.who.int/clinical-trials-registry-platform), and the UMIN Clinical Trials Registry (https://www.umin.ac.jp/ctr/). We evaluated whether selective reporting of outcomes was present (outcome reporting bias) by comparing the outcomes mentioned in the published study protocol or trial registry with the outcomes reported in the paper. The small study effect was assessed using a funnel plot and Egger’s regression asymmetry test37 and was considered positive if P < 0.1 in the regression asymmetry test.

Summary of evidence

We graded the certainty of the evidence of the main outcomes using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach38,39 with the GRADEpro guideline development tool (https://gradepro.org/). The certainty of the evidence was judged based on the presence or absence of the following variables: limitations in the study design, inconsistency, indirectness, imprecision of the results, and publication bias. The certainty of the evidence for the main outcomes was graded as very low, low, moderate, or high.

Results

Search selection and study characteristics

In the initial search of the databases, 2,526 articles were identified. We examined the full texts of 40 articles in detail. Of these, 19 trials were included in this systematic review, and eight trials with 516 patients were included for quantitative synthesis of sleep quality. The PRISMA flow diagram detailing the disposition of the retrieved publications is shown in Fig. 1. The features of the randomized studies included in this meta-analysis are listed in Table 1. Oral or sublingual melatonin or ramelteon was used in 17 trials, intravenous melatonin was used in one trial, and a dermal melatonin patch was used in one trial. The doses of oral melatonin or ramelteon ranged from 5.0 to 12.0 mg, and 13 of the 19 studies used only a short duration of melatonin administration. Of these, six studies investigated melatonin administration on the night before and on the day of surgery, and seven studies assessed melatonin use during the day of the surgery, before surgery only (i.e., lack of nighttime dosing). The type of control was placebo except for one study where there was no control. To date, there have been no studies on regional anesthesia under sedation.

Fig. 1
figure 1

PRISMA flow diagram. PRISMA = Preferred Reporting Items for Systematic Review and Meta-analysis

Table 1 Features of the included studies measuring sleep quality using a visual analog scale

Sleep quality assessed by VAS

Eight trials evaluated postoperative sleep quality using the VAS.23,24,40,41,42,43,44,45 The time point of assessment of sleep quality varied between the included studies. The combined results are shown in Fig. 2. Melatonin did not improve VAS scores compared with placebo (MD, -0.75 mm; 95% CI, -4.86 to 3.35; I2, 5%; Cochran’s Q, 7.38). The CI was corrected to -5.46 to 3.97 by TSA. Trial sequential analysis revealed that the accrued information size (n = 516) reached the estimated required information size (n = 295) (ESM eFig. 1). We considered three trials to have some concerns of risk of bias, whereas five were at a high risk of bias (Fig. 3). Sensitivity analyses that removed studies with a high risk of bias showed consistent results with the primary meta-analyses (MD, -0.37 mm; 95% CI, -8.60 to 7.87, I2, 0%; Cochran’s Q, 0.14; ESM eFig. 2). We did not conduct a subgroup analysis because the I2 statistic did not exceed 50%. We did not conduct an asymmetry test for the funnel plot because only eight trials were included. The overall GRADE certainty of the evidence was rated moderate (Table 2). The evidence was downgraded for the risk of bias.

Fig. 2
figure 2

Meta-analysis of sleep quality measured by visual analogue scale. Graphs represent the mean differences (MDs) with 95% confidence intervals (CIs).

Fig. 3
figure 3

Risk of bias in the included studies measuring sleep quality by visual analogue scale

Table 2 Evidence profile

Sleep time

Three trials41,44,45 evaluated perioperative sleep time on the preoperative night and on the first to third postoperative nights by actigraphy. Postoperative sleep time was defined as the difference from the preoperative sleep time and changed to a mean value of three days. The combined results are shown in ESM eFig. 3. Melatonin and ramelteon did not improve sleep time compared with placebo (MD, 30.2 min; 95% CI, -1.7 to 62.0; I2, 47.5%; Cochran’s Q, 3.81). We considered all trials to be at high risk of bias (ESM eFig. 4). The CI was corrected to -30.81 to 91.10 by TSA. The TSA showed that the estimated required information size was 382; however, the accrued information size was only 125 (32.7%). The z-curve did not cross the TSA monitoring boundary or reach the required information size (ESM eFig. 5). This indicates that sufficient data were not accumulated to conclusively determine whether melatonin improves sleep duration. The overall GRADE certainty of the evidence was rated very low (Table 2). The evidence was downgraded for the risk of bias, inconsistency, and imprecision.

Sleepiness

Five trials evaluated postoperative sleepiness using the VAS,45,46 KSS,40,44 and SSS.41 We did not combine these results because the measurements differed in nature; the VAS score measures only sleepiness whereas the KSS and the SSS contain several conditions (alert/vital/active, relaxed, and sleepy) that are not always considered sequential. Moreover, the absolute scores for the relaxation state considered the best condition differed between the KSS and the SSS. In studies measuring sleepiness with KSS, the MD was 0.00 (95% CI, -2.70 to 2.70)43 and -0.78 (95% CI, -4.51 to 2.95).40 In studies measuring sleepiness with VAS, the MD was 0.51 (95% CI, -1.16 to 2.18)46 and 0.56 (95% CI, -0.24 to 1.36).45 In a study measuring sleepiness with SSS,41 the MD was 0.00 (95% CI, -0.39 to 0.39) (ESM eFig. 6). We considered three studies to be at a high risk of bias and two studies to be at some concerns of risk of bias (ESM eFig. 7).

Pain

Fifteen trials evaluated postoperative pain using a VAS23,24,41,42,43,44,45,46,47,48,49,50,51 and NRS.52,53 Each score until postoperative day 3 was converted to the mean value. We considered an NRS score (measured from 0 to 10) as equivalent to the VAS, and synthesized a ten-fold value of the NRS with the VAS. The combined results are presented in Fig. 4. Melatonin improved pain compared with placebo (MD, -6.89 mm; 95% CI, -11.50 to -2.28; I2, 86%; Cochran’s Q, 103.32). We considered seven, six, and two studies to be at high risk of bias, some concerns of risk of bias, and low risk of bias, respectively (ESM eFig. 8). The CI was corrected to -11.89 -1.88 by TSA. Trial sequential analysis revealed that the accrued information size (n = 962) reached the estimated required information size (n = 562) (ESM eFig. 9). The overall GRADE certainty of the evidence was rated very low (Table 2). In post hoc analysis, we performed subgroup analyses according to the 1) type of anesthesia (inhaled vs total intravenous general anesthesia), 2) type of surgery (major vs minor surgery), and 3) age (> 50 vs < 50 yr), which were preplanned for the subgroup analyses of the primary outcome. The interaction P values were 0.82, 0.10, and 0.43 for the type of anesthesia, type of surgery, and age, respectively.

Fig. 4
figure 4

Forest plot of postoperative pain measured using the visual analog scale. Graphs represent the mean differences (MDs) with 95% confidence intervals (CIs).

Opioid consumption

Nine trials23,24,40,41,46,47,49,50,54 evaluated postoperative opioid use. One study40 was excluded from the analysis because it had no opioid use in the control group. We recalculated accumulated morphine consumption. The combined results are shown in ESM eFig. 10. Melatonin and ramelteon reduced opioid consumption compared with placebo (SMD, -3.25 mg; 95% CI, -4.50 to -2.01; I2, 97%; Cochran’s Q, 221.40). We considered 3, 5, and 1 as high, some concern, and low risk of bias, respectively (ESM eFig. 11). Trial sequential analysis showed that the accrued information size (n = 550) reached only 2.1% of the estimated required information size; thus, we could not calculate the TSA-adjusted CI. The z-curve did not cross the TSA monitoring boundary or reach the required information size (ESM eFig. 12). The overall GRADE certainty of the evidence was rated very low (Table 2).

Quality of recovery

Three trials evaluated the postoperative quality of recovery by the QoR-4041,55 or QoR-15 measures.43 The combined results are shown in ESM eFig. 13. Melatonin did not improve postoperative quality of recovery compared with placebo (SMD, 0.26; 95% CI, -0.15 to 0.67; I2, 25%; Cochran’s Q, 2.66). All trials were considered to have a high risk of bias (ESM eFig. 14). The CI was corrected to -0.78 to 1.31 by TSA. The TSA showed that the estimated required information size was 667; however, the accrued information size was only 126 (18.9%). The z-curve did not cross the TSA monitoring boundary or reach the required information size (ESM eFig. 15). The overall GRADE certainty of the evidence was rated very low (Table 2).

Adverse events and other outcomes

Six trials evaluated postoperative nausea and vomiting (PONV) using VAS,46,47,50 NRS,48 or a dichotomous variable.23,41 The results of one study50 were not used for the synthesis because the original data were not available. Visual analog scale scores ranged from 0 (absence of symptoms) to 10 (maximum symptoms). Melatonin did not affect the incidence and severity of PONV (RR, 1.20; 95% CI, 0.63 to 2.28; I2, 45%; Cochran’s Q, 1.82 and MD, -1.91 mm; 95% CI, -4.84 to 1.02; I2, 0%; Cochran’s Q, 0.03) compared with placebo (ESM eFigs 16 and 17).

Three trials23,41,56 evaluated dizziness; the combined results are shown in ESM eFig. 18. Melatonin did not affect the incidence of dizziness compared with placebo (RR, 0.78; 95% CI, 0.60 to 1.02; I2, 0%; Cochran’s Q, 1.91).

Two trials23,56 evaluated headaches; the combined results are shown in ESM eFig. 19. Melatonin did not affect the incidence of headaches compared with placebo (RR, 1.21; 95% CI, 0.78 to 1.88; I2, 0%; Cochran’s Q, 0.16).

Included studies did not report any other secondary outcomes preplanned in the published protocol.27

Discussion

Our systematic review’s results indicate that melatonin supplementation does not improve postoperative sleep quality measured with the VAS compared with placebo in adult patients (GRADE: moderate). Trial sequential analysis indicated sufficient precision for the conclusion, but we downgraded the certainty of evidence because of the high risk of bias. Furthermore, melatonin may not improve sleep time, daytime sleepiness, or quality of recovery (GRADE: very low). Nevertheless, melatonin may reduce postoperative pain scores (GRADE: very low) and opioid consumption (GRADE: very low).

Compared with placebo, moderate evidence indicates that melatonin has no positive effect on postoperative sleep disturbance in adult patients. In the CI of the present results, the minimum and maximum values were less than the clinically minimally important difference (10 mm of VAS), indicating reasonably strong evidence. There are several possible reasons for the finding that melatonin did not improve postoperative sleep disturbance. First, the doses of melatonin intervention ranged from 5 to 10 mg, which may be inadequate for improving postoperative sleep disturbances. In healthy volunteers, the same dose mediates the circadian rhythm and improves sleep disturbances.57,58 In contrast, this dose might be insufficient to shift the circadian rhythm in the perioperative period. The effects of melatonin, such as circadian phase-shifting and hypnotic effects,59 have also been reported to be dose-dependent. Therefore, a higher dose might mediate circadian rhythm and improve sleep disturbances. Second, the duration of melatonin administration was short in four of eight studies, only during the day of the surgery before surgery40,42 or the day before surgery and during the day of surgery before surgery.24,43 The half-life of both oral and intravenous melatonin is 45 min;60 therefore, melatonin administration before surgery only might not affect postoperative sleep disturbances. Although four studies23,41,44,45 administering melatonin in the postoperative period might positively affect sleep disturbance, we could not interpret its effect from the forest plot (Fig. 2). Third, even if the dose or duration is sufficient and melatonin shifts the circadian rhythm, it might not have a clinical impact on postoperative sleep disturbances. A prior systematic review found that melatonin or ramelteon compared with placebo was not efficacious for insomnia disorders.14 Fourth, factors other than circadian rhythm may be related to postoperative sleep disturbances. Natural sleep has been recognized to be controlled by the combined actions of two different but related mechanisms: the sleep homeostat and the circadian rhythm.61 Even if melatonin improves disturbed circadian rhythm, postoperative sleep disturbance may remain because of disturbed sleep homeostasis. Slow-wave sleep during nonrapid eye movement is a classical marker of sleep homeostatic processes. Several factors arising from perioperative periods, such as surgical endocrine62,63 and cytokine response,64,65 anesthesia,66 and opioids,67 disturb rapid eye movement sleep and slow-wave sleep. Moreover, daytime surgery might disturb sleep homeostasis more significantly and lead to sleep disturbance than nighttime surgery might.

A prior systematic review,25 which included RCTs on laparoscopic cholecystectomy, synthesized only two RCTs and found that melatonin had no substantial effects; the present results appear to be largely in line with this small study. Because the present study did not restrict the type of surgery and synthesized more studies, our results can be adapted to larger populations with more robust evidence than the previous systematic review.

Melatonin decreased postoperative pain scores, which was in line with a prior study.68 Previous studies suggested that melatonin may mediate antinociceptive antihyperalgesia through central69 and peripheral70 effects. Melatonin may also affect the opioid and gamma-aminobutyric acid systems.70,71,72 Therefore, the present finding that melatonin decreased postoperative pain scores is biologically plausible. To clarify the reason for high heterogeneity, we conducted post hoc subgroup analyses, but could not determine any possible reasons for the high heterogeneity. Moreover, the difference in VAS scores was associated with very low certainty of evidence. Therefore, our study could not conclude that melatonin was clinically effective at improving postoperative pain.

The results of sleep time were in line with the results of the primary outcome. Although there were no statistically significant differences in sleep time, the TSA result indicates that further studies are needed. Melatonin might be biologically effective in reducing sleep disturbance and improving sleep time (MD, 30 min; 95% CI, -1.7 to 62.0), but this might not be enough to improve the patient's subjective assessment of sleep quality as measured by the VAS. Postoperative sleepiness and quality of recovery are concepts that encompass sleep disturbances. Although we did not combine the results of the studies assessing sleepiness, individual studies seemed to have no positive effect on sleepiness. Melatonin did not improve the quality of recovery, which was similar to the primary outcome. Nevertheless, all secondary outcomes had a very low level of evidence, and it was difficult to obtain plausible imprecation.

The present study has several limitations. First, many studies had a high risk of bias. The reasons for the high risk of bias are as follows: 1) An intention-to-treat analysis was not performed; 2) the number of missing outcomes was significant and it could not be considered that “missing” was completely at random; and 3) there was no protocol, and it was difficult to determine whether the reported results were selected. If patients with low melatonin efficacy were missing or if the authors failed to report unfavorable results, the direction of bias would favor melatonin. Nevertheless, our systematic review concluded that melatonin had no effect. Moreover, we did not detect inconsistency, indirectness, imprecision, or publication bias. Therefore, we downgraded the overall GRADE certainty of the evidence by only one level and rated it as moderate for the primary outcome. Second, melatonin was the only intervention drug among the included studies that assessed sleep quality. Since ramelteon has a six-fold higher affinity for MT1 receptors and a three-fold higher affinity for MT2 receptors than melatonin does,20,21 ramelteon may be more effective than melatonin in improving sleep quality. Therefore, our results cannot be extrapolated to melatonin agonists, and further RCTs are needed to know the effects of these drugs.

In conclusion, the findings from this systematic review indicate that melatonin supplementation does not reduce sleep disturbances after general anesthesia but may reduce postoperative pain scores and opioid consumption. Further RCTs with a low risk of bias are needed to assess the effect of melatonin or melatonin agonists on postoperative sleep disturbances.