Keywords

Life course studies need to use longitudinal data and focusing on vulnerability processes that unfold in the medium or long term only reinforces this need by requiring very long data sequences. For example, the quality of a person’s educational trajectory may be considered a resource or a reserve (Cullati et al., 2018) that could be used later to help cope with adverse events such as health issues (see Chap. 14). Therefore, complete data sequences are often required to fully study life course trajectories, ideally starting at birth.

Most commonly, life course research relies on data from panel surveys that do not always collect detailed data on periods of time preceding the start of the survey. For example, if data are collected only among the elderly, researchers may only know the level of education ultimately achieved but not the entire educational trajectory. Hence, complete data regarding all domains of the life course and vulnerability processes are often unavailable. A few birth cohort studies, such as the British Cohort Study (e.g., Elliott & Shepherd, 2006), have been created to overcome this gap, collecting data on the respondents from birth to present. However, this type of study is quite rare, and their use in terms of cross-country comparisons or target-population studies (e.g., on migrants) is limited. Moreover, such prospective surveys are very costly, both in terms of time and money, and complete data sequences become available only several decades after the start of the survey.

Given the difficulties of collecting long sequences of data prospectively, this chapter considers the possibility of collecting data retrospectively instead of or complementarily to prospective collection, and it draws on research with life calendars performed within the NCCR LIVES research program. We show that life calendars are able to capture accurate retrospective data and that this method can be used in a variety of modes, including web surveys. We provide a reflection, and some guidelines, on critical points that researchers should consider either when designing a retrospective survey or when analysing this type of data.

Retrospective Data

An alternative to prospective research designs is to collect data retrospectively (Scott & Alwin, 1998). This approach can be used to complete the portion of the life course that is not covered by a prospective survey. The advantages of retrospective designs are obvious: All past information can be collected at once, saving both time and funding, and the exploitation of the data can begin very quickly. It also implies a lower long-term burden for the interviewees and eliminates the risk of attrition often encountered in prospective designs (e.g., Drasch & Matthes, 2013; Assaad et al., 2018).

Because retrospective data strongly depend on the respondent’s ability to recollect events, they have often been accused of being less accurate than prospective data, and for this reason, they are sometimes considered a sort of B-class data (e.g., Nagurney et al., 2005; Pina-SĂ¡nchez et al., 2014; Song & Mare, 2015). However, even with a prospective approach, most of the data collected are still partially retrospective in nature, since they usually cover the year or month prior to the survey. Nevertheless, we can argue that it is easier to remember what happened over a short period of time rather than over several years or decades, with more remote events and conditions being more difficult to remember. This characteristic is even more relevant in subjective aspects such as health, well-being or mood, for which recollecting experiences might be severely imprecise. Therefore, the quality of retrospective and prospective data is likely not the same.

Retrospective data collection does depend on memory and related cognitive processes. For instance, the so-called telescoping effect observed in retrospective data collection is a tendency to perceive old events as having occurred more recently and recent events as having occurred more remotely (e.g., Bradburn et al., 1994; Sudman & Bradburn, 1973). In addition, the respondents’ current situation when completing a survey is likely to influence their recall of the past, whether in terms of the number of events recalled, the interpretation of events (good or bad) or the exact moment of their occurrence (e.g., Couppié & Demazière, 1995). The collection of accurate retrospective data thus requires the use of specially designed survey tools and checking procedures to boost memory recollection and data accuracy. To attain this goal, researchers must first understand which mnemonic processes are at play in the recollection of a specific event.

Friedman (1993) argued that cognitive theories on memory recollection have highlighted three different memory types or retrieval mechanisms (Fig. 20.1). The distance mechanism focuses on the perceived relative distance of an event and demonstrates that episodes of which people have more accurate knowledge are remembered as more recent (Brown et al., 1985; Hinrichs, 1970). Location mechanisms show that an event is placed within a larger time period, which is used to recollect the date. Finally, serial order memory stresses the sequential aspect of the retrieval mechanism, illustrating that people use anchoring points to orient themselves in time to judge whether the event happened before or after such points. Research has shown that instead of relying on a single mechanism, people use a combination of them to date events (e.g., Betz & Skowronski, 1997; Janssen et al., 2006; Shimojima, 2002). Conway (2001) argued that these mechanisms operate simultaneously when reconstructing autobiographical memories and life-course trajectories. In particular, Conway has stressed that people simultaneously employ top-down (from general periods and life domains to single events), sequential (the order of different events within the same life domain) and parallel (in relation to events in other life domains) mechanisms to remember when something has occurred. In the last three decades, survey methodology has integrated this research to maximise data quality in retrospective survey designs (Auriat, 1992; Belli, 1998; Tourangeau et al., 2000; Van der Vaart, 2004).

Fig. 20.1
A flowchart of 3 retrieval mechanisms. They are distance mechanism, location mechanism, and serial order mechanism in relation to the present time frame.

The three kinds of retrieval mechanisms. Better-remembered events are considered more recent. An event is first assigned to a time period and then compared to well-known events (e.g., a marriage)

To date, two main approaches have been developed to collect retrospective data in surveys. The first involves the use of a question list of biographical events, such as that used for the first two samples of the Swiss Household Panel (Scherpenzeel et al., 2002). In these questionnaires, respondents are asked to answer a series of questions about their personal history to provide information about several domains, such as employment history, family events, or places of residence. Such questioning is usually conducted by applying a sequential approach in which respondents report events from the most recent to the most remote, or vice versa, and treat each life domain in succession. Although this method might offer some advantages, as it is easy to design and implement, it strongly limits the use of some retrieval mechanisms. For instance, it limits the possibility of establishing links among domains (parallel processing) that could help the respondent remember more events or better date them.

To overcome this limitation, in recent decades, the life history calendar (LHC), also known as the event history calendar method, has progressively become more popular among survey researchers. The LHC (Fig. 20.2) is a specially designed tool for the collection of retrospective data (Freedman et al., 1988). It consists of a diary in grid form with, generally, different rows that each correspond to a specific time unit (e.g., a year, a trimester or a month) and different columns that each correspond to a different life domain (e.g., place of residence, family events, education, professional life). A set of predefined information, or cues, can also appear on the calendar. Respondents are asked to indicate all the events that happened in these different domains, the idea being that the recall of events in one specific domain should trigger the recall of events in the other domains, thereby maximising the complimentary use of top-down, parallel, and sequential mechanisms (Belli, 1998; Caspi et al., 1996). Punctual events as well as events extending over a more or less long period of time can be easily recorded through this method. Figure 20.2 provides an example of a completed paper-and-pencil LHC.

Fig. 20.2
A life history calendar has 5 columns labeled time reference, milestone in society, moving, education, and cultural, sports, political, and associative activities.

Example of a paper-and-pencil LHC (Berchtold, Wicht, & Rohrer, 2021)

Overall Design of a Life History Calendar

The goal of an LHC, compared to biographical question lists, is to improve respondents’ accuracy by relying on various retrieval mechanisms to elicit information recall from several life domains. To do so, the LHC structure needs to facilitate links among the events and answers provided by the respondents. For instance, if the LHC is administered by an interviewer, the interviewing protocol and information technology interface should help the interviewer detect incoherence between answers or to highlight missing information. Similarly, in self-administered designs, the graphical layout of the LHC is pivotal to give respondents an overview of their previous answers and their overall life trajectory. Therefore, the graphic design of the calendar tool is crucial, which is even more true if the LHC is to be used in a self-administered mode, as in the case of online surveys.

Regardless of the substantive data that need to be collected with the LHC, researchers and (survey) designers should focus on at least three aspects: the general layout of the different parts of the calendar (e.g., time dimension from top to bottom, bottom to top, or left to right; order of the different life domains); the number, location, and types of cues appearing on the calendar (e.g., age of the respondent, year of measurement, seasons, important public events, events specific to the respondents); and the instructions provided on how to fill in the calendar (e.g., directly on the calendar or on a different sheet, written or orally). Although these aspects are independent of the substantive issue, they are not trivial because they exert a strong impact on data quality. For instance, a selection of meaningful cues such as major political or cultural events will help respondents date their own life events (serial order memory mechanism), and ordering life domains from the simplest to the most difficult will decrease participants’ stress in completing them. Researchers should thus be aware that, as this method is becoming more common in surveys, data collected with calendars with very different designs may lack comparability.

The overall design of the calendar can be considered in different ways. For instance, research within the LIVES research program has shown that reliable data can be collected with self-administered LHCs if respondents are provided with detailed instructions, thereby considerably reducing the costs and disadvantages of face-to-face interviews (Morselli et al., 2019). Self-administered LHCs have been used to collect data from large samples, such as the third sample of the Swiss Household Panel (N = 6090) and the LIVES-FORS Cohort Survey (N = 1691; Spini et al., 2019), for which face-to-face methods would have been prohibitively expensive. Moreover, with the advent of online data collection, the possibility of replacing paper-and-pencil life calendars with web-based calendars, in addition to making them self-administered, could greatly increase the applicability of LHC methods. Given events such as the COVID-19 pandemic, it is becoming vital to offer such new approaches for the collection of retrospective data because direct contact with some categories of people might be diminished or become impossible. Therefore, it is crucial to develop life calendars that are suitable to many substantive areas and to ensure and demonstrate the accuracy of the resulting data.

Improving Memory with Cues

The LHC makes it easier to recall the exact time of occurrence of different events and to track inconsistencies in respondents’ answers. Specific cues can also be added to an LHC to further enhance memory recall. The most common memory cue is the respondent’s age at each calendar period. It is also possible to indicate important cues at the international, national, or regional level as an additional means to anchor one’s life events. Such cues can also be respondent-specific, for example, by inviting respondents to provide a set of personal landmarks at the beginning of the LHC.

Traditionally, LHCs are offered in a paper-and-pencil format and are answered in the presence of a trained interviewer who can assist the respondents or even trigger the recollection of additional events. However, LHCs are not a fixed tool, and many developments are still desirable. For instance, the kinds of cues that are most efficient in triggering memories of past events remain a matter of debate. Some studies have shown important differences in the representation of collective and historical events across cohorts (e.g., Martenot & Cavalli, 2014; Dasoki et al., 2016) and genders (e.g., Dasoki et al., 2018). Indeed, the choice of cue events may be helpful for some respondents but neutral or even detrimental for others, which might be particularly true when investigating vulnerable populations. For instance, cues referring to traumatic or politically connoted events might induce cognitive closure, thereby making the respondents less cooperative or misdirecting their focus. Collective memories often have group-specific meanings that are entangled with relationships among social groups, historical narratives, and political manifolds. The type of cues indicated in a questionnaire and how they are mentioned colour the respondents’ information in the research and can make them more cooperative or reluctant to answer or can even direct the recollection process. For instance, while the 1967 Six-Day War in the Gaza Strip is referred to as a ‘settlement’ by Israel, for Palestinians, it is an ‘occupation’. Choosing either of the terms to refer to the event on the questionnaire would prime the researcher’s favouritism for one of the two groups and potentially make the respondent reluctant to answer. Indeed, pilot research using LHCs in postconflict regions has shown that collective events are not neutral but burdened with historical and political meanings that could trigger adverse reactions from respondents and, consequently, affect their reporting of events (Spini et al., 2011).

Furthermore, cognitive research on memory has shown that priming positive events might inhibit the recollection of negative events, and vice versa (e.g., Barnier et al., 2004; GarcĂ­a-Bajos & Migueles, 2017; Harris et al., 2010). Cognitive processes reduce the accessibility of negative experiences, thereby preventing negative events from coming to mind through bias (Anderson & Hanslmayr, 2014). Such processes can be triggered by priming positive memories or as blocking responses to offensive memories (Pica et al., 2016).

Along these lines, we conducted an experiment regarding the use of public vs. private cues using a paper-and-pencil setting with a time accuracy of one trimester. One hundred and four students answered a self-administered LHC about information over the previous ten years in five different domains (moving, education, holidays, activities (cultural, sports, political and associative), and employment). We implemented four different conditions, each prompted by the type of cues that we provided in the LHC. In addition to the year of occurrence (printed on the calendar) and the respondents’ age in each year (written by the respondents themselves), condition 1 included a list of public events (Fukushima nuclear accident, Donald Trump elected President of the United States, …), condition 2 asked the respondents to recall their own important personal events, condition 3 combined the cues of the first two conditions, and condition 4 provided no cues at all. The results indicated that public events used as cues are much less useful to recall events compared to personal events provided by respondents themselves, with the difference being significant both among all respondents and among condition 3 respondents who accessed both types of cues simultaneously (Berchtold et al., 2021).

Thus, in many contexts, using personal, respondent-chosen cues may be an easier and safer choice when designing an LHC. However, little research has been conducted on the data quality of the answers given by respondents that indicate some categories of cues (e.g., family related, such as marriage and child births) instead of others (e.g., socially or job related, such as promotions or social events). Further research should deepen the understanding between respondent-driven cues and the quality of their answers. For instance, reporting some cues (e.g., child births, deaths in the family) might prime some life domains (e.g., family) over others (e.g., occupational trajectory). Such influence could bias the data in the sense that the absence of reported events in some life domains may not indicate that the event did not happen but that the recollection process was overfocused on a different life domain. We would thus like to stress the importance of this aspect and how it should be thoughtfully handled by survey designers.

Online Life History Calendars

In recent years, the general trend in social sciences data collection has been to ‘go online,’ and this tendency has also affected the assessment of retrospective data. Only a few years ago, web-based LHCs seemed too complex to be put into practice. Thus, survey agencies generally opted for face-to-face (e.g., SHARELIFE, Börsch-Supan, 2019), telephone interviews (e.g., Panel Study of Income Dynamics, Beaule et al., 2007), or the classic self-administered paper-and-pencil format (e.g., Swiss Household Panel III, Tillmann et al., 2016; LIVES-FORS cohort survey, Spini et al., 2019). Recently, more affordable and modern online LHC formats have been proposed, but in addition to considering cues, layout/usability, and instructions as carefully as in paper-and-pencil LHCs, they must also consider the constant evolution of software development and the change of technical aspects over time.

The easiest solution for online LHCs at the time of writing consists of a fully self-administered web-based tool. This approach requires considerable thought about the layout and usability of the tool, via, for instance, graphical user interfaces (GUIs), as well as clear and detailed instructions on how to complete the questionnaire. To the greatest extent possible, the instructions should be included in the GUI, for instance, through contextual tooltips, rather than buried in a long, external document. In collaboration with the research group on adolescent health (GRSA) of the Lausanne University Hospital, the LIVES research program developed an online LHC for the ‘Sexual health and behaviour of young people in Switzerland’ study (Barrense-Dias et al., 2018). The results from a pilot study comparing data from the LHC with data from a traditional questionnaire indicated that the two methods were able to collect similar amounts of both sensitive and nonsensitive information. More importantly, data obtained through the LHC were shown to be more consistent than those from the traditional questionnaire (Morselli et al., 2016). Two main design features can explain these results. First, the GUI was designed to be not only clear but also captivating for the target population. Unlike other implementations of online LHCs (e.g., Glasner et al., 2015; Sage et al., 2013; Wieczorek et al., 2020), we used icon marking events to facilitate the visualisation of the respondents’ life trajectory at a single glance (Fig. 20.3). The idea was to improve the visual appeal of the LHC to increase associations among life domains and to rely on the reported events to trigger the recollection of new ones. These icons allowed respondents to view all life domains and events at a single glance and to enter new events in their preferred, rather than a predefined, order. Second, the GUI also facilitated the editing of reported events by allowing respondents to delete and modify any event or date. The online format also allowed us to analyse the respondents’ behaviour during the LHC thanks to the automatically recorded log data. These analyses showed that the GUI features were effectively used by the respondents, thereby increasing the overall data accuracy.

Fig. 20.3
2 tables present different versions of L H C. Each table has a set of criteria and checkboxes. The timeline is from 2007 to 2010.

Pilot (top) and final (bottom) versions of the online LHC used for the ‘Sexual health and behaviour of young people in Switzerland’ study

Improved versions of this first online LHC were also used in two further LIVES studies, ‘A retrospective look at your career path’ (2019–2020) and ‘The long-term consequences of mass layoffs’ (2020). The most important change was the possibility for the respondents to enter events extending over a period of time in addition to punctual events (Fig. 20.4).

Fig. 20.4
The online portal of L H C with various columns for 5 years, starting from 2008.

Online LHC with the possibility of indicating both punctual events (represented by icons) and time periods (represented in colours)

Missing Data in Life History Calendars

Thus far, we have stressed several aspects that should be considered when designing an LHC to boost and facilitate recollection processes, such as an effective visual layout of the LHC, the use of cues, and the possibility to edit answers. Further reflection is needed on data quality, and in particular missingness, in LHCs, as some measures can be implemented to either prevent or check it.

The quality of retrospective data can be conceptualised not only in terms of the quantity and time accuracy of collected information but also in terms of missing data. In particular, missing data in LHCs must be evaluated differently than in traditional questionnaires, where an answer to each question is expected, and therefore, the precise amount of missing information can be estimated. With an LHC, on the one hand, researchers are able to collect a more exhaustive amount of information or events because there is no such limit as the number of asked questions, but on the other hand, we cannot generally know for sure whether all events experienced by the respondent were reported. This is particularly true for self-administered questionnaires for which there is an interviewer intervening to help respondents remember their past.

In some situations, it is possible to identify missing data by considering the mandatory chronology of specific events (e.g., before becoming a widow, one must have been married) or when information takes the form of successive periods (e.g., a temporal gap in the different places of residence). In these circumstances, online modes offer a critical advantage over paper-and-pencil self-administered LHCs. The GUI interface can be developed to include instant checks for gaps and to warn the respondent of a possible mistake. However, not all life-history events can be checked for missingness. For instance, a romantic relationship can be missing without the possibility for the researcher to determine such. In other words, an empty cell in an LHC grid may often mean either a correct nonoccurrence of an event or the omission of an event that actually occurred.

To perform a missing data analysis on LHCs for the ‘Sexual health and behaviour of young people in Switzerland’ project, we asked participants the age at which different events occurred twice: once in an online LHC and once using a traditional questionnaire. We thus could compare both answers and evaluate their quality (Berchtold et al., 2021). Although our results confirmed that accurate data can be obtained when using an online LHC, we found that data quality was quite variable across respondents. Women were more consistent in their answers than men, in particular with regard to the time at which an event occurred. More generally, it was less difficult for participants to remember the occurrence of an event than its exact timing. Recollecting an event did not imply that its time of occurrence was also correctly remembered.

Although data consistency of some sequential events can be ensured during collection, especially in web modes, these other types of checks are only possible if a series of questions are repeated in the questionnaire, thereby raising concerns about questionnaire length and redundancy. In our experience, it is worth introducing even a limited number of these checks to allow for the assessment of data quality. Other complementary solutions can also reduce missingness when collecting data. For instance, in an online survey, the GUI can ask for confirmation of the columns in the calendar in which there are no or few reported events, or it can ask the respondent to indicate a nonresponse option. Although this measure does not ensure the completeness of the data, it stresses to the respondents the importance of double checking their answer, thereby possibly increasing data quality.

Conclusions and Recommendations

Whether paper-and-pencil or online, the LHC is a cost-effective method to collect complete life course data in the social sciences. This is especially true in the context of vulnerability: Respondents can use the nonstructured approach of the LHC (compared to a standard list of questions) to report events that otherwise could go unnoticed because the researcher would not think to ask the right question.

In this chapter, we underscored two important aspects of the LHC, and retrospective surveys in general, that researchers should keep in mind when designing this kind of tool. First, we considered mnemonic and cognitive processes before designing data collection tools. It is important that the tool be adapted to the target population. For instance, younger respondents may use different cognitive mechanisms than elderly respondents, and memory can be boosted by specific features of the questionnaire. Similarly, the recollection of some type of events (e.g., traumatic events) may require particular attention because they may be facilitated by some cognitive mechanisms while being obstructed by others. For instance, happy events can induce a bias towards positive memories, while negative events may trigger other negative memories. In addition, the same event can have different connotations for different people. Marriage (but also the birth of a child or divorce) can be a positive memory for some and very negative for others. Along these lines, the cues or anchoring points that can be primed to facilitate recollection should be carefully evaluated. Interdisciplinary research between social and cognitive scientists would help further develop these aspects.

Second, we urge researchers to carefully design the visual layout and instructions to be implemented in an LHC. In social research, LHCs are most often self-administered, meaning that the respondents cannot rely on external help. Hence, researchers should consider the visual features of the tool (or of its interface) and how they may influence mnemonic processes. Ultimately, given the different cognitive styles and mnemonic mechanisms that shape the recollection of one’s life events, the flexibility of the LHC, especially when implemented online, makes it a powerful tool to shed light on the complexity of idiosyncratic life trajectories. Moreover, LHCs allow respondents to cross-link events from different life domains and to edit and modify their answers.

The constant technical advancement in the field of informatics opens new possibilities for collecting retrospective data via LHC methods. Developments in artificial intelligence and natural language processing may open new scenarios for implementing LHC interviews online with fully automated assistance. Similarly, the availability of open-source packages (e.g., Wieczorek et al., 2020) may greatly facilitate the implementation of LHC in web surveys. Nevertheless, we insist that technological developments should not supersede a thorough examination of the specific mnemonic processes, supporting memory cues, flexibility and clarity of the user interface, and measures for limiting or checking missingness. On the contrary, these aspects are key to ensuring good data quality in life course research.