Introduction

Hypnosis can be an effective treatment for numerous disorders [1,2,3]. The hypnotic response may depend on factors such as hypnotisability, the hypnotic state, expectations [4,5,6], motivation, the therapeutic relationship, etc. [7]. Hypnotisability is a personal trait that greatly affects treatment outcomes [8,9,10,11,12,13]. Nevertheless, hypnosis is currently defined as a state of consciousness [14], and this study is focused solely on the hypnotic state. Hypnotic “depth” has been a contentious issue, but it is still used in contemporary research to describe the hypnotic state quantitatively [5, 15,16,17,18,19,20,21,22], alongside the related concepts of “deep hypnosis” [6, 18, 19, 23,24,25,26,27,28,29,30,31] and “deepening” [16, 32,33,34,35,36,37,38,39]. Multiple studies indicate that sufficient depth is beneficial in some cases, e.g., in non-pharmacological analgesia [24, 25, 40], general hypnotic anaesthesia [26, 30, 33], etc. Greater depth could result in subjects’ feeling more influenced by hypnotic procedures, leading to better compliance [41, 42]. Depth self-ratings can correlate with hypnotisability scores [19].

Probably, a hypnotic state gradually evolves during a session and tends to fluctuate [43]. The possible usefulness of a more accurate estimation of subtle depth alterations, unable to be seen visually, led researchers to the idea of a “hypnometer” [40], a device for real-time hypnotic depth measures to help a practitioner decide whether to continue with deepening or to begin therapeutic suggestions. Heart rate variability (HRV) [40] and an EEG-based parameter, the bispectral index (BIS), were considered the bases for such measures. BIS is a promising method [36, 44]; however, its calculation algorithm is designed primarily for pharmacological anaesthesia rather than hypnotherapy. Moreover, although hypnosis is characterised by some common EEG correlates [18, 31, 36, 44,45,46,47,48,49,50,51,52,53,54,55,56,57], differences between subjects are also observed. Perhaps an approach that addresses individual variability could have benefits.

Passive brain-computer interfaces (pBCI) are used to assess mental states such as fatigue, concentration, etc. [58, 59]. We hypothesise that machine learning might be used to recognise and continuously quantify EEG correlates of hypnosis specific to a person. We aim to explore this possibility by designing a prototype system using an EEG-based pBCI to real-time monitor hypnosis deepening and conducting its initial feasibility test.

Materials and methods

Participants

Nine outpatients (six women, mean age: 38.33 ± 10.61 years) underwent up to seven hypnosis sessions. Inclusion criteria: age 18–65 years; consent for participation. Patients had previously reported experiencing deep hypnosis, described as a lack of self-awareness, external awareness, and memories for the deepest period of a session. Thus, probably all included participants could be classified as “dissociators” [60,61,62], “amnesia-prone” [63], or “dissociative” subtype individuals [64]. This is common, yet not the only type to experience hypnosis [4, 5, 15, 65]. Assumed neural correlates of such phenomena were shown in several studies [29, 66]. Exclusion criteria: cognitive decline, epilepsy, psychosis, and no episode of feeling deeply hypnotised during the first session. See “Supplements A(1)” for the participants’ summary.

Hardware and software equipment

The BCI system included equipment for synchronised EEG and video recording and also software: WinEEG 2.130.101 [67], EEGLab 2019.0 [68], and OpenVibe 2.2.0 [69]. See “Supplements A(2)” for details.

A brief description of a hypnotic session

In each patient, after installing the electrodes and equipment for EEG and video recording, a baseline EEG was recorded for 3–5 min with eyes closed. Hypnosis was then induced and deepened by the counting method. After therapeutic suggestions, a patient was awakened. Feedback was then collected.

The principle of the proposed approach

We used the passive type of BCI [58, 59, 70] and supervised learning [71, 72]. A recording from the first session with each patient was used as a calibration file to train a classifier. We first manually identified and then labelled the EEG intervals corresponding to two opposite states of an implied neurophysiological continuum: wakefulness (which matched the baseline registration periods) and the deepest states. The timing of the deepest states was identified with two criteria that had to be present simultaneously:

  1. a)

    The physical signs recommended to verify sufficient depth [23, 24, 26]: substantial changes in breathing, relaxation of facial muscles, etc.

  2. b)

    The patients’ post-session feedback on the hypnotherapist’s counting range during which they felt most deeply hypnotised (see “Supplements A(3)” for this procedure’s details).

The file with two sets of labels was then used to train a classifier to continuously recognise (“predict”) these two opposite states in subsequent sessions with interdependent probability. Assuming that deepening is a continuous transition from wakefulness to the deepest hypnotic state, we hypothesise that the continual real-time measurement of the probability of a deep hypnosis during a session could tentatively, to some extent, operate as a quantitative reflection of the deepening process.

Analysis and processing of obtained recordings

The collected data were processed in four ways.

  1. a)

    Using WinEEG to compare the averaged power spectra of the identified deep hypnosis and the waking periods (5–10 min each) in each EEG, we obtained an overview of each patient’s assumed deep hypnosis patterns (which rhythms at which locations tend to alter while deeply hypnotised). We also assessed their putative stability throughout the treatment, qualitatively comparing them over different patient sessions. This analysis was conducted in parallel with all the others.

  2. b)

    During supervised learning, we trained prediction models, and those derived from the first sessions were then used in the following sessions for real-time classification. The Common Spatial Pattern (CSP) [73,74,75,76] method was employed for signal spatial filtering, and Linear Discriminant Analysis (LDA) was a classification method. We tested four frequency bands: 1.5–45 Hz; 1.5–8 Hz; 1.5–14 Hz; and 4–15 Hz, to determine which could yield the models with the most classification accuracy, as assessed by the 10-fold cross-validation test. To get the cross-validation results for all sessions, the training procedure should be performed in each session as we did in the first (calibration) one. Thus, each second and subsequent session produced the “auxiliary” models. See “Supplements A(3)” for details.

  3. c)

    Using models derived from the first sessions for real-time state predictions in subsequent sessions yielded a Probability Value parameter, varying between 0 and 1, displayed as a moving curve, which informed the hypnotherapist of the probability of a deep hypnotic state occurring. We called this curve the Predictive curve. “Supplements A(4)” describe details.

  4. d)

    Each second and subsequent session was labelled by a specialist not privy to their predictive outcomes, and a percentage of correctly predicted states for different epochs was calculated to additionally test the accuracy [77]. The above-mentioned “auxiliary” models from these sessions were applied to classify the same data on which these models had been trained to plot a curve that reflected depth dynamics in a session most accurately—the Native curve. The Native and Predictive curves of the same sessions were then visually compared. For details, see “Supplements A(5)”.

Results and their discussion

Patient T reported no periods of deep hypnosis in the first session and was excluded from the analysis. Due to the artefacts, the recording from session #5 of Patient E was also excluded. Thus, the total number of EEGs from the remaining eight participants was 27.

Results of the qualitative assessment of the estimated patterns of deep hypnosis in patients and their stability throughout the treatment

Figure 1 demonstrates examples of topographic maps representing these results for three patients.

Fig. 1
figure 1

Comparison of power spectra between periods of deep hypnosis and wakefulness

This is an example of topographic maps displaying the differences in the averaged power spectra between deep hypnosis and the waking periods of EEG (“deep hypnosis” minus “wakefulness”) for three patients. Sessions for display are arbitrarily selected. Thus, the maps demonstrate the alterations in the power of different rhythms for different localizations while achieving deep hypnosis. Power changes are displayed in colour according to the graduation of a nearby colour scale (in uV²). The bands used are: Delta (1.5-4 Hz), Theta (4–8 Hz), Alpha (8–12 Hz), Sensory-motor or Low beta (12–15 Hz), Beta1 (15–18 Hz), Beta2 (18–25 Hz) and Gamma (25–45 Hz)

As seen, the electrophysiological changes tend to be generally similar in different sessions of a particular patient, which suggests that a classifier would correctly recognise the target patterns in the following sessions. Correlates common to the patients are seen, e.g., a slow-wave activity increase, which is consistent with the literature [18, 31, 45,46,47,48,49,50,51,52,53,54,55,56,57]. Several differences between individuals are also shown; therefore, an individualised approach to quantifying hypnosis appeared to be preferable. These points suggest that machine learning may potentially apply to hypnosis. However, the qualitative analysis is approximate, and further statistical assessment may be useful.

“Supplements B” contain all the maps for all 27 sessions for all patients and the details on electrophysiological changes revealed.

Results of the 10-fold cross-validation test of classification accuracy

Table 1(A) shows that the accuracy exceeded 85% in the majority of cases. Results of 100% were possibly due to the optimistic calculation outcomes of the software [78] or the over-fitting phenomenon [79], which are undesirable and should be addressed further.

Table 1 Deep hypnosis and wakefulness classification accuracy according to the band used to train the models

Results of real-time visual testing of the method (in the second and subsequent sessions)

As an example of such results, Fig. 2(A) shows Patient A’s Predictive curve from his seventh session.

Fig. 2
figure 2

Examples of the Predictive (A) and Native (B) curves. The horizontal axis is a time scale (s), and the vertical axis is a Probability Value scale (from 0 to 1). (A) The Predictive curve graph was played back using data from the seventh session of Patient A. In this example, the classification model was trained using the 1.5–14 Hz range. The curve was smoothed by the Moving epoch average (Immediate) function. The number of 4-s epochs with an overlap of 0.5 s used for averaging was 50. This curve was obtained by feeding the real-time patient’s EEG during the seventh session to the model trained on data from the first session and represents the changing probability of a deep hypnotic state over the session. “Supplements C” contain a detailed case-related analysis of how it could potentially describe session dynamics. (B) The Native curve of the seventh session with Patient A. The band and the smoothing features are the same as in the Predictive curve. This curve was obtained after this (seventh) session by training a model (auxiliary) on data from the same session and then feeding this EEG recording to this model. Thus, this curve, which could only be constructed after the session was over, reflects the changes in the probability of deep hypnosis with very high accuracy

“Supplements C” contain the Predictive curves for each patient. In general, post-session patients’ reports revealed an approximate match between the time when a curve was consistently above 0.7 (approximately, depending on smoothing settings) and a period of unawareness. The high-amplitude wave-like motion was associated with alternating awareness, fractional memories, etc. Thus, this could potentially confirm instrumentally the literature reports of hypnotic depth variations during a session [43]. “Supplements C” contain a detailed analytical description using the individual case.

Results of the classification accuracy evaluation based on the data from the second and subsequent sessions

Table 1(B) shows these results. As seen, the accuracy was high in many cases. Some poor results were due to either myographic artefacts or drift [80]. For a detailed analysis of the case-related findings, see “Supplements D”. In each session, at least one band yielded an accuracy exceeding 74%. Also, as seen, each patient had their “preferred” band for which the model was most accurate, e.g., for Patient A, it was 1.5–14 Hz; for Patients G and S, 1.5–8 Hz, etc., which might correspond with the literature observing variation in findings on electrophysiological correlates of hypnosis [31].

The accuracy averaged across all these sessions was highest when using bands 1.5–14 and 4–15 Hz. This is also in line with studies that reported the most changes in alpha and theta activity [18, 31, 36, 51, 54, 55].

Results of a visual comparison of the Predictive and Native curves

Figure 2(B) shows the Native curve of Patient A’s seventh session. The configurations of curves (A) and (B) largely coincide, giving us additional confirmation that the model can reflect the real picture relatively accurately. “Supplements C” contain the pairs of the Predictive and Native curves for each patient for the second and following sessions.

This study extends the idea of a “hypnometer” [40] but focuses on direct monitoring of brain activity rather than peripheral measurements. BIS for this purpose is promising [36, 44]; however, we suggest that a new approach, which addresses individual correlates of neural activity, may have benefits. It could be used to optimise therapy by controlling depth more precisely when sufficient depth can be helpful [24,25,26, 30, 32, 33, 40,41,42,43, 81].

The designed system is a trial version only and requires further substantial improvements, using both the results and issues we revealed. However, we suppose it could initially demonstrate that pBCI applies to hypnosis.

Limitations

  • Small sample size, the heterogeneous number of sessions, and the artefacts in some recordings.

  • Involvement only of those patients who described their deepest experiences as a lack of awareness and self-awareness. Although these phenomena are common for hypnotic experiences [4, 5, 15, 29, 65, 66], they are probably only characteristics of “dissociators” [60,61,62] or “amnesia-prone” individuals [63]. Perhaps the “fantasizers” [60,61,62] or “fantasy-prone” individuals [63] occurred outside of our focus, and further research should include this group.

  • We did not measure the participants’ hypnotisability. Trait effects are considered to have a different basis [4,5,6,7, 9], and the research of the interaction of state and trait is a serious task that we believe deserves a separate investigation.

  • A single rater was used to label each given recording.

  • Using the measurement of the continuously changing deep hypnosis probability as a tentative reflection of the deepening process is a hypothetical idea in the early stages of testing. The qualitative analysis conducted could partly underpin the information from a Predictive curve, but it is not comprehensive. Further studies, utilising the classification of intermediate levels of hypnosis, are needed.

  • The qualitative analysis of estimated hypnotic patterns is approximate, and further studies can incorporate statistical assessment to strengthen these findings as well as quantify differences in accuracies across subjects and bands and identify similarities between the Native and Predictive curves.

  • Our system is a prototype only, and the signal processing techniques used are not comprehensive, being the initial option. Further research is necessary to test other feature extraction approaches and classification methods [74, 82].