The present results confirm that the accuracy of MMA events scoring varies with different recording setups and this is largely attributable to the capability to recognize MMA events from OFA and OMA events. As we hypothesized, the PSG-A setup displayed the most similar MMA event scoring accuracy when compared to PSG-AV. The PSG-A setup exhibited the highest ICC and the best linear correlation with the PSG-AV, both of which were near the same level with the excellent intra-scorer accuracy. The possibility to exclude video recordings without a significant loss of scoring accuracy is especially beneficial with home PSG setups.
There seems to be two significant differences present between the accuracies of FES-A and PSG-A: on the average, more MMA events were scored with the FES-A setup and the variance in MMA index difference compared to PSG-AV was slightly higher. The differences in sleep stage scoring did not explain this discrepancy since this phenomenon was observable when events occurring during wakefulness were included in the MMA indices under comparison. The most probable explanation for the differences is the use of two EMG channels for MMA scoring in FES-A instead the of four-channel setup in PSG-A. The data in Fig. 3 reveal that more false positive events are scored with FES-A setup compared to PSG-A. It is possible that the 2-channel setups are more susceptible to detect a higher number of events (true or false), categorized here as false positives, compared to the 3/4-channel setups. This is the case also with the 2- and 3/4-channel EMG-only setups. It is concluded that there are events that are visible only on the masseter channels but not on temporalis channels. On the other hand, as the number of false negatives on 2-channel EMG setup was also higher than with the other EMG-only setups, there are also events that could easily be scored on three out of four masseter or temporalis EMG channels but not on both masseter EMG channels (Fig. 4a).
The poor accuracy of PSG-N in MMA scoring was somewhat of a surprise. There was a significantly lower correlation with PSG-AV than achieved with PSG-A and FES-A, significantly more false positive and false negative events and less true positive and true negative events, as well as a larger variance in the difference of MMA indices compared to PSG-AV. Furthermore, PSG-N suffered from poor intra-scorer accuracy (0.764), a value which was significantly lower than reported by Carra et al. for PSG-N (0.97) . One possible explanation for these results could be that, with PSG-N, it is very difficult to recognize those MMA events that occur simultaneously with other body movements (16–68% of MMA events ). MMA could also be confused with other activities, e.g., swallowing or yawning [11, 12, 18], activities that could be easily recognized based on audio recording (Fig. 4b). Some of these other activities may have clinical relevance, especially when utilized for the recognition of other conditions besides SB (or their features), e.g., myoclonus related to rapid eye movement sleep behavior disorder , swallowing related to gastroesophageal reflux disorder , and movements related to restless legs syndrome . The artifacts present in the EEG, EOG, and chin EMG channel do not appear to be very specific indicators for reliable scoring of OFA/OMA events. This is supported by the present data as the number of OFA detected was much lower with PSG-N when compared to the other PSG setups (Fig. 2, Table 3). It should be noted that we did not record leg EMG, which may be one way to improve the recognition of OMA. Carra et al. did utilize leg EMG in their study and reported higher values of ICC (0.92) between PSG-N and PSG-AV than observed here (0.835) .
All of the EMG-only setups had very poor accuracy. Interestingly, we found that the more channels recorded, the fewer events were scored (Fig. 1). As others have also concluded [7, 18, 20], it seems that setups based exclusively on EMG provide a poor indication of the true MMA activity. There is a significant risk for MMA event overscoring with EMG-only setups, as OFA or OMA events could not be recognized reliably (Figs. 2, 3, and 4b). Audio and video footage are considered to provide the most accurate recognition of true MMA events that involve actual tooth grinding or clenching . However, it should also be noted that even with audio and video footage, scoring is not always unambiguous, and some events involving tooth grinding or clenching might go unnoticed, e.g., in cases when patient is fully under the blanket, or when true MMA-related movement blends in with other major body movements such as changing position in the video footage. The significance of these concomitant MMA and movement events for the contribution to the clinical consequences of SB is currently not clear and requires further examination. Nonetheless, it has been shown that EMG-only setups may be used for other purposes, such as determining the general level of EMG activity, e.g., by calculating root mean square for the entire signal, that has been found to be a good indicator for the occurrence of temporomandibular pain , but not for satisfactory MMA recognition.
Sleep stage scoring and assessing sleep-time events were found to improve the accuracy of all setups. The main reason probably for this improvement lies in the proportion of OFA/OMA events occurring during wakefulness (PSG-AV: OFA 42%, OMA 46%, calculated from data shown in Fig. 2) that is significantly higher compared than the proportion of MMA events during wakefulness (PSG-AV 12%). With respect to OMA/OFA, these percentages are lower than those reported by Yamaguchi et al. (71%) . Compared to the report of Carra et al., our proportion for MMA during wakefulness is somewhat lower (12% vs 26%), for OFA somewhat higher (42% vs 26%) but for OMA, the percentages are exactly the same (46%) . However, these differences may be caused by different study populations, which were rather small (n < 20) in all of these studies. It should be noted that the MMA events during wakefulness (just as MMA during body movements) may have clinical relevance in a similar way as the events during sleep have. In this study, the exclusion of MMA during wakefulness was only used as a means to evaluate the effect on the scoring reliability between different PSG setups, and the effect of this exclusion for the reliability of MMA index as a predictor for the clinical consequences of SB should be assessed separately.
This study has its limitations, especially due to the small study population. The original, larger study population of 31 subjects  had to be narrowed down to 19 subjects, as we wanted to avoid any bias on the results that would be caused by some subjects not having complete sets of scorable data or audio and video footage present in their recordings. The activities in which each subject engages during the sleep (as well as their frequency) may differ vastly , and the risk for bias due to individual subjects with atypical sleep behavior is higher in small study populations. In the future, it would be preferable to verify the present results in a larger population. We had only one scorer of MMA events in this study and thus cannot provide estimation of the possible inter-scorer differences with different sleep study setups. However, this issue would be interesting to assess with a wide range of setups in a similar fashion as in the present study. Furthermore, caution is advised in generalizing the findings of this study for PSG montages that differ from the present ones, as any missing or extra channels may affect the reliability of the scoring.
With manual MMA scoring, achieving high scoring accuracy unavoidably involves a certain level of inconvenience. PSG-A and FES-A were the two most accurate of the setups but also the most time-consuming to score, especially due to sleep staging and listening to the audio associated with every possible event. There seems to be a trade-off between accuracy and applicability, affordability and accessibility. A good example is that if the PSG-AV were to be replaced with the FES-A setup, one would lose only minimal accuracy away but gain significant benefits, i.e., improved applicability, affordability, and especially, accessibility due to the fact that recordings would no longer be confined to the sleep laboratory. On the other hand, for EMG-only systems, the trade-off is less favorable, i.e., improvements in applicability, affordability, and accessibility but at the cost of unsatisfactory accuracy. Besides clinical setting, accurate scoring of MMA events is also necessary for obtaining reliable results in research settings. For example, it has been proven difficult to establish links between level of MMA and the clinical findings of SB, such as tooth wear [28, 29]. As the level of MMA is highly variable between nights [13,14,15,16,17], and tooth wear accumulates over the course of several years, it would be beneficial to have means to study the contribution of SB in the tooth wear process accurately in the long-term follow-ups rather than with one-night studies that are commonly utilized . Besides accurate scoring of MMA, this requires recordings that can be obtained in a widely available and affordable fashion and could be reliably repeated throughout the years. In the present study, we included only manually scored setups, but nevertheless, any setup, automated or manual, that is used to assess SB activity should always be tested for its event recognition accuracy and the resulting trade-offs between requirements have to be acceptable.
To conclude, accurate MMA scoring seems to be possible even without video recordings, which is especially beneficial for quantifying SB activity with home PSG. The present results showed that either audio or audio-video recordings are required if MMA scoring hopes to achieve the best accuracy; in contrast, relying exclusively on EMG is unsatisfactory and unreliable. Furthermore, it was shown that the scoring accuracy and repeatability could be improved by using only sleep-time MMA events when assessing the MMA index. Finally, it was observed that the number of EMG channels and the MMA scoring rules may affect the scoring outcome.