There are dozens of definitions and operationalizations of well-being (Linton et al. 2016). In the present research, we adopt a common broad and global definition of subjective well-being, following (Diener et al. 2009) who defined well-being as “the fact that the person subjectively believes his or her life is desirable, pleasant, and good” (p. 1). In other words, well-being can be understood as whether a person is overall satisfied with their lives and believes the conditions of their lives are excellent (Diener et al. 1985). Psychological variables such as anxiety, loneliness, or stress can be understood as parts of general well-being or as determinants thereof (Keyes and Waterman 2003). We consider those variables as determinants and assess the degree with which variables play a role in software engineers’ overall well-being.
The variables we plan to measure in the present two-wave longitudinal study are displayed in Fig. 1. To facilitate its interpretation, we categorized the variables into four broad sets of predictors, partly overlapping. To summarize, while the initial selection of predictors is theory-driven, based on previous research, or recent guidelines, the selection of predictors included in the second wave is data-driven. In other words, we used a two-step approach to select our variables: First, the initial selection of 51 predictors as based on existing theory, which we then reduced based on how strongly they are associated with well-being and productivity for an initial multiple regression analysis and the subsequent longitudinal analysis. This approach helped us to focus on the most relevant predictors while keeping their amount manageable.
During the COVID-19 pandemic, many governments and organizations have called for volunteers to support self-isolation (see, for example, NHS 2020b, City of New York 2020). While also relevant to the community at large, research suggests that acts of kindness positively affect people’s well-being (Buchanan and Bardi 2010). Additionally, volunteering has the benefit of leaving one’s home for a legitimate reason and reducing cabin fever. We, therefore, decided to include volunteering as a potential predictor for well-being.
Coping strategies such as making plans or reappraising the situation are, in general, effective for one’s well-being (Webb et al. 2012; Carver et al. 1989). For example, altruistic acceptance — accepting restrictions because it is serving a greater good — while being quarantined was negatively associated with depression rates three years later (Liu et al. 2012). Conversely, believing that the quarantine measures are redundant because COVID-19 is nothing but ordinary flu or was intentionally released by the Chinese government (i.e., beliefs in conspiracy theories) will likely lead to dissatisfaction because of greater feelings of non-autonomy. Indeed, beliefs in conspiracy theories are associated with lower well-being (Freeman and Bentall 2017).
We further propose that three needs are relevant to people’s well-being and productivity (Ryan and Deci 2000). Specifically, we propose that the need for autonomy and competence are deprived of many people who are quarantined, which negatively affects well-being and motivation (Calvo et al. 2020). Further, we propose that the need for competence was deprived, mainly for the people who cannot maintain their productivity-level. This might especially be the case for those living with their families. In contrast, the need for relatedness might be over satisfied for those living with their families.
Another important factor associated with one’s well-being is the quality of one’s social relationships (Birditt and Antonucci 2007). As people have fewer opportunities to engage with others they know less well, such as colleagues in the office or their sports teammates, the quality of existing relationships becomes more important, as having more good friends facilitates social interactions either in person (e.g., with their partner in the same household) or online (e.g., video chats with friends).
Moreover, we expect that extraversion is linked to well-being and productivity. For example, extraverted people prefer more sensory input than introverted people (Ludvigh and Happ 1974), which is why they might struggle more with being quarantined. Extraversion correlated negatively with support for social distancing measures (Carvalho et al. 2020), which is a proxy of stimulation (e.g., being closer to other people, will more likely result in sensory stimulation). Finally, research on productivity predictors while working from home can be theoretically grounded in models of job satisfaction and productivity, such as Herzberg’s two-factor theory (Herzberg et al. 2017). This theory states that causes of job satisfaction can be clustered in motivators and hygiene factors. Motivators are intrinsic and include advancement, recognition, work itself, growth, and responsibilities. Hygiene factors are extrinsic and include the relationship with peers and supervisor, supervision, policy and administration, salary, working conditions, status, personal link, and job security. Both factors are positively associated with productivity (Bassett-Jones and Lloyd 2005). As there are little differences between remote and on-site workers in terms of motivators and hygiene factors (Green 2009), the two-factor theory provides a good theoretical predictor of productivity of people working remotely.
Our two-wave study covers an extensive set of 51 predictors, as identified above. Based on the literature mentioned earlier, we expected the strength of the association between the predictors and the outcomes’ well-being and productivity to vary between medium to large. Therefore, we assumed for our power analysis a medium-to-large effect size of f2 = .20 and a power of .80. Power analysis with G*Power 22.214.171.124 (Faul et al. 2009) revealed that we would need a sample size of 190 participants.
To collect our responses, we used Prolific,Footnote 1 a data collection platform, commonly used in Computer Science (see e.g., Hosio et al. 2020). We opted for this solution because of the high reliability, replicability, and data quality of dedicated platforms, especially as compared with e.g. mailing lists (Peer et al. 2017; Palan and Schitter 2018).
Specifically, the use of crowdsourcing platforms allows us to (i) avoid overloading members of mailing lists or groups on social media (e.g., LinkedIn, Discord) with unsolicited participation requests; (ii) recruit participants of the target population (e.g., only software engineers) using automatic screening option, or by running ad hoc screening studies; (iii) recruit only participants who are interested in the research; (iv) have a high degree of control with regards to data quality since participants can get reputed without paying them and lowering their acceptance rate, which will influence future recruitment; (v) compensate participants for their time so that they will take care of the responses due to a contractual obligation; and (vi) minimize self-selection bias, since potential candidates are randomly assigned to each study (if they meet the inclusion criteria), lowering the probability that opinionated individuals take part to the survey. In sum, it is a convenient, fair, and efficient way to recruit survey informants (Bethlehem 2010). For these reasons, crowdsourcing platforms are commonly used in studies published in top-tier outlets (Anumanchipalli et al. 2019; Kraft-Todd et al. 2018; Berens et al. 2020).
To administer the surveys, we used QualtricsFootnote 2 and shared it on the Prolific platform. In order to ensure data quality and consistency, and to account for potential dropout of participants between the two waves, we invited almost 500 participants who were identified as software engineers in a previous study (Russo and Stol 2020) to participate in a screening study in April 2020. The 483 candidates already passed a multi-stage screening process, as described by Russo & Stol, to ensure the highest possible data quality through cluster sampling (Baltes and Ralph 2020).
To run a coherent and reliable investigation, we only recruited software engineers who were living similar experiences both from a professional and personal perspective (i.e., working remotely during a lockdown). Thus, we performed a screening study completed by 305 software professionals who agreed to participate in a multi-wave study. From the 305 candidates, we excluded those living in countries with unclear, mixed policies or early reopening (e.g., Denmark, Germany, Sweden) and professionals working from home during the lockdown less than 20h a week (i.e., excluding unemployed, or developers which had to work in their offices). In both waves, all participants stated that they were working from home during the lockdown (a negative answer of one of these two conditions would have resulted in discarding the delivered responses from our data set).
As a result of this screening, in the first wave of data collection, which took place in the week of April 20 - 26 2020, 192 participants completed the first survey. Participation in the second wave (May 4 - 10) was high (96%), with 184 completed surveys. Participants have been uniquely identified through their Prolific ID, which was essential to run the longitudinal analysis while allowing participants to remain anonymous.
Additionally, to enhance our responses’ reliability, in each survey we included three test items (e.g., “Please select response option ‘slightly disagree”’). As none of our participants failed at least two of the three test items, all participants reported working remotely and answered the survey in an appropriate time frame, and we did not exclude anyone.
The 192 participants’ mean age was 36.65 years (SD = 10.77, range= 19 −− 63; 154 men, 38 women). Participants were compensated in line with the current US minimum wage (average completion time 1202 seconds, SD = 795.41). Out of our sample of 192 participants, 63 were based in the UK, 52 were based in the USA, 19 from Portugal, 10 from Poland, 7 from Italy, 6 from Canada, and the remaining 35 participants from other countries in Europe. A minority of 30 participants reported living alone, with most participants (162) reported living together with others – including babies, children, and adults. Our participants are employed primarily at private companies (156), followed by 30 participants employed at a public institution. Six participants indicated to work either for a different type of company or were unsure how to categorize their employer. When asked in our screening study what percentage of their time participants were working remotely (i.e., not physically in their office) over the past 12 months, 54.7% reported 25% or less of their time, 15.6% between 25% – 50%, 2.1% between 50% – 75%, and 27.1% of the participants to work remotely for at least 75% of their time.
We employed a longitudinal design, with two waves set two-weeks apart from each other towards the end of the lockdown, which allowed us to test for internal replication. Also, running this study towards the end of the lockdowns in the vast majority of countries allowed participants to provide a more reliable interpretation of lockdown conditions. We chose a period of two weeks because we wanted to balance change in our variables over time with the end of a stricter lockdown that was discussed across many countries when we run wave 2. Many of our variables are thought to be stable over time. That is, a person’s scores on X at time 1 is strongly predictive of a person’s scores on X at time 2 (indeed, the test-retest reliabilities we found support this assumption, see Table 1). The closer the temporal distance between wave 1 and 2, the higher the stability of a variable. In other words, if we had measured the same variables again after only one or two days, there would not have been much variance that could have been explained by any other variable, because X measured at time 1 already explains almost all variance of X measured at time 2. In contrast, we aimed to collect data for wave 2 while people were still quarantined. If at time 1 of the data collection people would still be in lockdown and at time 2 the lockdown would have been eased, this would have included a major confounding factor. Thus, to balance those two conflicting design requirements, we opted for a two weeks break in between the two waves.
We describe the measures of the two dependent (or outcome) variables in Section 3.3. Predictors (or independent variables) are explained in Sections 3.4, 3.5, 3.6, and 3.7. Wherever possible, we relied on validated scales. If this was not possible (e.g., COVID-19 specific conspiracy beliefs), we created a scale. In those cases, we followed scale development guidelines, including avoiding negatives and especially double-negatives, two-statements within one item, and less common expressions (Boateng et al. 2018). The questionnaires are reported in the Supplemental Materials, while the summary of the measurement instruments with their readabilities are listed in Table 9. Test score reliability has been measured using Cronbach’s alpha and reported for each instrument. If the instrument was used in wave 1 and wave 2, we report both Cronbach’s alpha values (i.e., αtime1, αtime2); if we used it only in the first wave, we reported only the result for wave 1 (α1) Additionally, we also explore whether there are any mean changes in the variables we measured at both times (e.g., has people’s well-being changed?), and mean differences between gender and people based on different countries.
Measurement of the Dependent Variables
was measured with an adapted version of the 5-item Satisfaction with Life Scale (Diener et al. 1985). We adapted the items to measure satisfaction with life in the past week, which is in line with recommendations that the scale can be adapted to different time frames (Pavot and Diener 2009). Example items include “The conditions of my life in the past week were excellent” and “I was satisfied with my life in the past week”. Responses were given on a 7-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree, αtime1 = .90, αtime2 = .90).
was measured relative to the expected productivity. We contrasted productivity in the past week with the participant’s expected productivity (i.e., productivity level without the lockdown). As we recruited participants working in different positions, including freelancers, we can neither use objective measures of productivity nor supervisor assessments and rely on self-reports. We expect limited effects of socially desirable responses as the survey was anonymous. We operationalized productivity as a function of time spent working and efficiency per hour, compared to a normal week. Specifically, we asked participants: “How many hours have you been working approximately in the past week?” (Item P1) and “How many hours were you expecting to work over the past week assuming there would be no global pandemic and lockdown?” (Item P2). Finally, to measure perceived efficiency, we asked: “If you rate your productivity (i.e., outcome) per hour, has it been more or less over the past week compared to a normal week?” (Item P3). Responses to the last item were given on a bipolar slider measure ranging from ‘100% less productive’ to ‘0%: as productive as normal’ to ‘≥ 100% more productive’ (coded as -100, 0, and 100). To compute an overall score of productivity for each participant, we used the following formula: productivity = (P1/P2) × ((P3 + 100)/100). Values between 0 and .99 would reflect that people were less productive than normal, and values above 1 would indicate that they were more productive than usual. For example, if one person worked only 50% of their normal time in the past week but would be twice as efficient, the total productivity was considered the same compared to a normal week.
We preferred this approach over the use of other self-report instruments, such as the WHO’s Health at Work Performance Questionnaire (Kessler et al. 2003), because we were interested in the change of productivity while being quarantined as compared to ‘normal’ conditions. The WHO’s questionnaire, for example, assesses productivity also in comparison to other workers. We deemed this unfit for our purpose as it is unclear to what extent software engineers who work remotely are aware of other workers’ productivity. Also, our measure consists of only three items and showed good test-retest reliability (Table 1). Test-retest reliability is the agreement or stability of a measure across two or more time-points. A coefficient of 0 would indicate that responses at time 1 would not be linearly associated with those at time 2, which is typically undesired. Higher coefficients are an additional indicator of the reliability of the measures, although they can be influenced by a range of factors such as the internal consistency of the measure itself and external factors. For example, the test-retest reliability for productivity is r = .50 lower than for most other variables such as needs or well-being, but this is because the latter constructs are operationalized as stable over time. In contrast, productivity can vary more extensively due to external factors such as the number of projects or the reliability of one’s internet connection.Footnote 3
was measured with 3-items of the Brief Self-Control Scale (Tangney et al. 2004). Example items include “I am good at resisting temptation” and “I wish I had more self-discipline” (recoded). Responses were registered on a 5-point scale ranging from 1 (Not at all) to 5 (Very; α = .64).
were measured using the 28-item Brief COPE scale, which measures 14 coping dimensions (Carver 1997). Example items include “I’ve been trying to come up with a strategy about what to do” (Planning) and “I’ve been making fun of the situation” (Humor). Responses were on a 5-point scale ranging from 0 (I have not been doing this at all) to 4 (I have been doing this a lot). The internal consistencies were satisfactory to very good for two-item scales: Self-distraction (α = .65), active coping (α = .61), Denial (α = .66), Substance use (α = .96), Use of emotional support (α = .77), Use of instrumental support (α = .75), Behavioral disengagement (α1 = .76, α2 = .71), Venting (α = .65), Positive reframing (α = .72), Planning (α = .76), Humor (α = .83), Acceptance (α = .61), Religion (α = .83), and Self-blame (α1 = .75, α2 = .71).
was measured using the 6-item version of the De Jong Gierveld Loneliness Scale (Gierveld and Tilburg 2006). The items are equally distributed among two factors, emotional; α1 = .68, α2 = .69) (e.g., “I often feel rejected”) and social; α1 = .84, α2 = .87 (e.g., “There are plenty of people I can rely on when I have problems”). Participants indicated how lonely they felt during the past week. Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Every day).
with official recommendations was measured using three items of a compliance scale (Wolf and Maio 2020). The items are ‘Washing hands thoroughly with soap’, ‘Staying at home (except for groceries and 1x exercise per day)’ and ‘Keeping a 2m (6 feet) distance to others when outside.’ Responses were given on a 7-point scale ranging from 1 (never complying to this guideline) to 7 (always complying to this guideline, α = .71).
was measured using an adapted version of the 7-item Generalized Anxiety Disorder scale (Spitzer et al. 2006). Participants indicate how often they have experienced anxiety over the past week to different situations. Example questions are “Feeling nervous, anxious, or on edge” and “Not being able to stop or control worrying”. Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Every day, α1 = .93, α2 = .93). Additionally, we measured specific COVID-19 and future pandemic related concerns with two items “How concerned do you feel about COVID-19?” and “How concerned to you about future pandemics?” Responses on this were given by a 5-point scale ranging from 1 (Not at all concerned) to 5 (Extremely concerned; α = .82) (Nelson et al. 2020).
was measured using a four-item version of the Perceived Stress Scale (Cohen 1988). Participants indicate how often they experienced stressful situations in the past week. Example items include “In the last week, how often have you felt that you were unable to control the important things in your life?” and “In the last week, how often have you felt confident about your ability to handle your personal problems?”. Responses were registered on a 4-point scale ranging from 1 (Never) to 4 (Very often; α1 = .80, α2 = .77).
was measured using the 8-item version (Struk et al. 2017) of the Boredom Proneness Scale (Farmer and Sundberg 1986). Example items include “It is easy for me to concentrate on my activities” and “Many things I have to do are repetitive and monotonous”. Responses were on a 4-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree; α1 = .87, α2 = .87).
was measured with five items: “I am planning a daily schedule and follow it”, “I follow certain tasks regularly (such as meditating, going for walks, working in timeslots, etc.)”, “I am getting up and going to bed roughly at the same time every day during the past week”, “I am exercising roughly at the same time (e.g., going for a walk every day at noon)”, and “I am eating roughly at the same time every day”. Responses were taken on a 7-point Likert scale ranging from 1 (Does not apply at all) to 7 (Fully applies; α1 = .75, α2 = .78).
was measured with a 5-item scale as designed by ourselves for this study. The first two items were adapted from the Flexible Inventory of Conspiracy Suspicions (Wood 2017), whereas the latter three are based on more specific conspiracy beliefs: “The real truth about Coronavirus is being kept from the public.”, “The facts about Coronavirus simply do not match what we have been told by ‘experts’ and the mainstream media”, “Coronavirus is a bio-weapon designed by the Chinese government because they are benefiting from the pandemic most”, “Coronavirus is a bio-weapon designed by environmental activists because the environment is benefiting from the virus most”, and “Coronavirus is just like a normal flu”. Responses were collected on a 7-point Likert scale ranging from 1 (Totally disagree) to 7 (Totally agree, α = .83).
was measured using the 4-item extraversion subscale of the Brief HEXACO Inventory (de Vries 2013). Responses were given on a 5-point Likert scale ranging from 1 (Strongly disagree) to 5 (Strongly agree; α1 = .71, α2 = .69). Low scores on extraversion are an indication of introversion. Since we found at wave 1 that extraversion and well-being were positively correlated contrary to our hypothesis (see below), and, in our view, contrary to widespread expectations, we decided to measure in wave 2 what participants’ views are regarding the association between extraversion and well-being. We measured expectations with one item: “Who do you think struggles more with the current pandemic, introverts or extraverts?” Response options were ‘Introverts’, ‘Both around the same’, and ‘extraverts’.
Autonomy, Competence, and Relatedness
needs of the self-determination theory (Ryan and Deci 2000) was measured using the 18-item balanced measure of psychological needs scale (Sheldon and Hilpert 2012). Example items include “I was free to do things my own way’ (need for autonomy; α1 = .72, α2 = .76), “I did well even at the hard things” (competence; α1 = .77, α2 = .77), and “I felt unappreciated by one or more important people” (recoded; relatedness; α1 = .79, α2 = .78). Participants were asked to report how true each statement was for them in the past week. Responses were given on a 5-point scale ranging from 1 (no agreement) to 5 (much agreement).
Extrinsic and Intrinsic Work Motivation
was measured with the 6-item extrinsic regulation 3-item and intrinsic motivation subscales of the Multidimensional Work Motivation Scale (Gagné et al. 2015). The extrinsic regulation subscale measures social and material regulations. Specifically, participants were asked to answer some questions about why they put effort into their current job. Example items include “To get others’ approval (e.g., supervisor, colleagues, family, clients ...)” (social extrinsic regulation; α = .85), “Because others will reward me nancially only if I put enough effort in my job (e.g., employer, supervisor...)” (material extrinsic regulation; α = .71) and “Because I have fun doing my job” (intrinsic motivation; α = .94). Responses were given on a 7-point scale ranging from 1 (not at all) to 7 (completely).
was measured with two items: “I did a lot to keep my brain active” and “I performed mental exercises (e.g., Sudokus, riddles, crosswords)”. Participants indicated the extent to which the items were true for them in the past week on a 7-point scale ranging from 1 (Not at all) to 7 (Very; α = .56).
was measured with one item: “How well do your technological skills equip you for working remotely from home?” Responses were given on a 7-point scale ranging from 1 (Far too little) to 7 (Perfectly).
was measured with two items (European Social Survey 2014): “How often do you eat fruit, excluding drinking juice?” and “How often do you eat vegetables or salad, excluding potatoes?”. Responses were given on a 7-point scale ranging from 1 (Never) to 7 (Three times or more a day; α = .60)
Quality of Sleep
was measured with one item: “How has the quality of your sleep overall been in the past week?” Responses were given on a 7-point scale ranging from 1 (very low) to 7 (perfectly).
was measured with an adapted version of the 3-item Leisure Time Exercise Questionnaire (Godin and Shephard 1985). Participants were be asked to report how many hours in the past they have been mildly, moderately, and strenuously exercising. The overall score was computed as followed Godin and Shephard (1985): 3 × mild + 5 × moderate + 9 × strenuously. Missing responses for one or more of the exercise were treated as 0.
Quality and Quantity of Social Contacts Outside of Work
were measured with three items. We adapted two items from the social relationship quality scale (Birditt and Antonucci 2007) and added one item to measure the quantity: “I feel that the people with whom I have been in contact over the past week support me”, “I feel that the people with whom I have been in contact over the past week believe in me”, and “I am happy with the amount of social contact I had in the past week.” Responses were given on a 6-point Likert scale ranging from 1 (Strongly disagree) to 6 (Strongly agree; α1 = .73, α2 = .77).
was measured with three items that measure people’s behavior over the past week: “I have been volunteering in my community (e.g., supported elderly or other people in high-risk groups)”, “I have been supporting my family (e.g., homeschooling my children)” and “I have been supporting friends, and family members (e.g., listened to the worries of my friends)”. Responses were given on a 7-point scale ranging from 1 (Not at all) to 7 (Very often; α = .45).
Quality and Quantity of Communication with Colleagues and Line Managers
was measured with three items: “I feel that my colleagues and line manager have been supporting me over the past week”, “I feel that my colleagues and line manager believed in me over the past week”, and “Overall, I am happy with the interactions with my colleagues and line managers over the past week.” Responses were given on a 6-point Likert scale ranging from 1 (Strongly disagree) to 6 (Strongly agree; α1 = .88, α2 = .92).
Situational Factors and Demographics
Distractions at Home
was measured with two items: “I am often distracted from my work (e.g., noisy neighbors, children who need my attention)” and “I am able to focus on my work for longer time periods” (recoded). Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Very often; α1 = .64, α2 = .63).
Whether participants lived alone or with other people was assessed by asking them how many Babies, Toddlers, Children, Teenagers, and Adults participants were currently living with. We asked for the specific five groups separately because it allowed us to explore whether, for example, toddlers had a different impact on well-being and productivity than teenagers. However, the number of babies, toddlers, children, teenagers, and adults the participants were living with was uncorrelated to their well-being and productivity, r s ≤ .19. Therefore, we summed them up into one variable, which we called people (i.e., the number of people the participant was living with).
was measured with two items that reflect the current but also the expected financial situation (Glei et al. 2019): “Using a scale from 0 to 10 where 0 means ‘the worst possible financial situation’ and 10 means ‘the best possible financial situation’, how would you rate your financial situation these days?” and “Looking ahead six months into the future, what do you expect your financial situation will be like at that time?”. Responses were given on a 11-point scale ranging from 0 (the worst possible financial situation) to 10 (the best possible financial situation; α = .81).
was measured with three items: “In my home office, I do have the technical equipment to do the work I need to do (e.g., appropriate PC, printer, stable and fast internet connection)”, “On the computer or laptop I use while working from home I do have the software and access rights I need”, and ‘My office chair and desk are comfortable and designed to prevent back pain or other related issues”. Responses were given on a 7-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree; α = .65).
were assessed with the following items: “What is your gender?”, “How old are you?” “What type of organization do you work in” (public, private, unsure, other), “What is your yearly gross income?” (US$< 20,000, US$20 − 40,000, US$40.001 − 60,000, US$60,001 − 80,000, US$80,001 − 100,000, >US$100,000; converted to the participant’s local currency), “In which country are you based?”, “What percentage of your time have you been working remotely (i.e., not physically in your office) over the past 12 months?”, “In which region/state and country are you living?”, “Is there still a lockdown where you are living?”.