We conducted a dual-site exploratory Ambulatory Assessment study with a total of N = 63 undergraduate students (study site 1: n = 37; study site 2: n = 26). Data from three participants (n = 30 single observations) were excluded because they completed fewer than 33% of the total signals . Thus, the final sample consisted of 60 participants (37 women), aged 18 to 34 years (M = 22.4 years, SD = 3.5). Inclusion criteria were sufficient mastery of the German language and the ability to operate a smartphone. Participants were excluded if they indicated that they were pregnant, currently breastfeeding, or suffering from a mental disorder. The study protocol was approved by the local ethics committees at both study sites.
Informed consent was obtained from all individual participants included in the study. Basic demographic characteristics and self-reports on health using the Patient Health Questionnaire  were given from all participants. Then, participants received instructions on how to use the electronic diary device (iPod touch at study site 1; department- or participant-owned smartphones at study site 2). Since participants were instructed to listen to music only using the study device via the application “Simple Last.fm Scrobbler” (The SLS Team, 2016), the music files to which they intended to listen during the ensuing week were uploaded onto the electronic diary device. Then, the use of the application was explained. The “Simple Last.fm Scrobbler” application automatically logged the exact time point of music listening for any song that was listened to for at least half the duration of the track. The collected data on music listening for each participant were saved on the Last.fm servers (Last.fm Limited, London, UK). Starting from the next day, for a total of six (study site 2) or seven (study site 1) consecutive days, participants received six signals over a time window of 12 h, beginning at 10.00. Upon each signal, participants were asked to complete items concerning stress, mood, and music listening behavior, among others. Following recommendations of Hektner, Schmidt, and Csikszentmihalyi , this time window was divided into six blocks of 2 h, with the condition that consecutive signals were at least 30 min apart. If participants failed to respond to a signal or did not fully complete data entry, they were reminded twice with further signals. Participants were also able to postpone the signal if they were unable to complete the scheduled data entry at the current moment. Additionally, directly after waking up, participants completed a questionnaire on their electronic diary device, which included items on sleep quality, mood, and stress. However, data from this assessment are not included in the current analyses as no items on music listening were presented at this point. After completion of data collection, participants returned to the lab and were debriefed and reimbursed for their participation (either 20 euros or research credits).
A key advantage of Ambulatory Assessment methodology is its high external and ecological validity , as processes are investigated in their daily life . The internal validity of this approach is limited, though, given the lack of rigorous experimental control, which renders causal inferences difficult. Ambulatory Assessment, thus, serves as a methodological approach that complements experimental studies. It opens up opportunities to assess ecologically valid data and allows for the identification of meaningful associative patterns.
Participants at study site 1 received an iPod touch with the pre-installed application “iDialog Pad” (G. Mutz, University of Cologne, Germany), which was used for presenting the items. The data were stored locally on the iPod touch and exported upon completion of the Ambulatory Assessment period when participants returned to the lab. Participants at study site 2 either received an Android-based smartphone with Android OS 5.0.1 (Google, Mountain View, CA, USA) or used their own Android-based smartphones. The data were collected via the movisensXS experience sampling application, version 0.8.4203 (movisens GmbH, Karlsruhe, Germany), which was downloaded and installed either on the personal or the department-owned smartphone at the beginning of the study. The data were stored locally and uploaded to a secure server when the smartphone was connected to the internet. The administration of the items was comparable when either using the iPod or the smartphone.
Ambulatory Assessment Measures
To limit the burden on the participants, perceived stress (M = 1.17, SD = 1.02) was assessed using one item (“At this moment, I feel stressed,” 5-point scale ranging from 0 = not at all to 4 = very much; ).
Self-Reported Music Listening
For each signal, participants were asked whether they were currently listening to music (yes vs. no) and, if not, whether they had listened to music since the last signal (yes vs. no). Participants also indicated the duration of music listening on a 4-point scale: < 5 min, 5–20 min, 21–45 min, > 45 min.
Objectively Assessed Music Listening
The “Simple Last.fm Scrobbler” application collected information for each track (artist, title, date and time at the start of the track).
To test for associations between self-reported music listening and stress, we created the dichotomous variable self-reported music listening, which was coded “1” if participants either indicated currently listening to music at the time of the signal or having listened to music since the last signal. It was coded “0” if they had not listened to music. To test for the latency with which self-reported music listening and stress might be correlated, we created another variable, time lag, which compared whether participants were currently listening to music (1) or had listened to music since the last signal (0) according to self-reports. This enabled an assessment of whether self-reported music was associated acute and/or delayed with stress.
In line with the variables on self-reported music listening, we created a binary variable, with current music listening or music listening since the last signal coded as “1” and no music listening in this time frame coded as “0,” respectively. Since Last.fm also stores exact dates and times, we generated the duration of music listening in this time frame (M = 36.7 min, SD = 33.9) and the time lag between the current signal and the most recent track (M = 46.5, SD = 44.6) as two continuous variables. This enabled us to quantify and directly test associations among the amount of music listening, the latency of music listening relative to the assessment of stress, and stress reports.
Since multiple data entries (Level 1) are nested within participants (Level 2), we computed two-level models with random intercepts in Stata 14 (Stata Corporation, College Station, TX, USA) and investigated within-person processes. Continuous independent variables on Level 1 were person-mean centered, and those on Level 2 were grand-mean centered . We did not enter the averaged continuous Level-1 independent variables as a measure of between-subject processes, since we were primarily interested in within-subject processes only and thus wished to keep the model as parsimonious as possible . Categorical variables were dummy-coded (reference of subjectively assessed duration: “< 5 min”).
We computed separate multilevel models for self-reported and objective music listening predictors to increase statistical power, since self-reported and objectively assessed data on music listening did not always converge. We first fitted the unconditional model, which revealed that 68.3% of the total variance in stress was attributable to within-person variability (ICC (type 1) = 0.317). Then, in a first step, we entered music listening as Level-1 predictor to test its association with stress. In the second step, we included the variables assessing duration of music listening and the time lag between the last track that was listened to and the current signal. Besides time, all analyses controlled for study site and gender given the outlined methodological idiosyncrasies between the study sites and the unequal distribution of gender across the study sites (χ2(1) = 2.37, p = .124, Cramérs V = .20). We additionally included the lag-1 serial autocorrelation, that is the stress level reported at the last signal.Footnote 1 Finally, we did not include random effects of the Level-1 predictors, because their inclusion did not improve model fit (χ2(3) = 4.35, p = .226 for the model with the subjective data and χ2(3) = 2.76, p = .430 for the model with the objective data).
As there are no widely accepted recommendations for computing statistical power in multilevel models, we opted for the general recommendation of having at least 50 participants to obtain acceptable estimates of standard errors of predictors . P values of < 0.050 were considered significant. All tests were two-tailed.