1 Introduction

Much has been written regarding the rise of women’s participation in the labour force during the second half of the twentieth century. In particular, recent research highlights the potential multiplier effect of intergenerational transmission mechanisms on the dynamics of female labour force participation (FLFP) subsequent to the second world war.Footnote 1 Yet much less is known about women’s participation in earlier periods, arguably due to the difficulty in finding full participation information of mother–daughter pairs in historical registries. In this paper, we propose a methodology that addresses this problem and renders consistent estimates even in the presence of a high proportion of non-ignorable missing values. Using unique individual-level Portuguese data from 1675 to 1925 with a high proportion of missingness, we show the existence of an earlier and stronger association between mothers’ and daughters’ labour market participation decisions. Our finding of this early and substantial intergenerational link, which acts as a potent catalyst of shocks, contributes to the understanding of the long-term dynamics of FLFP.

We model a woman’s decision to participate in the labour market as a function of her mother’s participation. One difficulty in studying the mother–daughter transmission of FLFP involves disentangling it from the transmission originated from other factors. The influence of factors such as the introduction or enforcement of mandatory education and the expansion of the service sector is minimized by choosing a sample period before these factors became important.Footnote 2 Time and location dummy variables are included to capture factors such as labour market conditions and social multiplier effects (Maurin and Moschion 2009; Olivetti et al. 2018). In an extended variable specification, controls are added for the social status of the father, for the ownership of property, and a for a proxy for household income to prevent confoundedness with other sources of inertia within the family.

Like most historical data, ours are characterized by abundant non-ignorable missing information, which affects both mothers’ and daughters’ participation statuses. Nevertheless, the number of mother–daughter pairs without missing values is not small (\(n=696\)). An enticing approach is to estimate the model only with observations for which we have complete information. (Hereinafter, this approach is labelled “Ignorability”.) However, when the missing process is not random, then Ignorability suffers from sample selection bias. We consider two approaches to address this non-ignorability problem in our estimation. First, informed by historical records, we conservatively impute missing values with the perceived predominant female labour market status at the time, that is, non-participation, which sets the participation rate at an unrealistically low value relative to census data. This approach is labelled “Imputation”. Second, we adapt a methodology based on Ramalho and Smith (2013), which allows models to be estimated in contexts in which missing data are abundant and non-random. This methodology involves maximum likelihood (ML) estimation by using all observations and allowing the missing values to be endogenous. The identification of the parameters of the model is improved by incorporating external information regarding aggregate female labour market participation from historical census data. This approach is labelled “the Likelihood Approach”. We show that failure to recognize the endogeneity of the missing process reduces the estimates of the mother–daughter link by more than a half.

Using the Likelihood Approach with only location and time dummies, we estimate a positive and statistically significant average marginal effect of the mother’s participation status on the daughter’s probability of participation (hereinafter, AME) of 39.7 percentage points (pp) with a standard error (s.e.) of 3.0 pp. This estimate comprises a number of mother-to-daughter intergenerational links: (a) the transmission of human capital such as that involved in certain crafts; (b) the transmission of physical capital, such as a small shop, or valuable craft-specific tools; and (c) the transmission of preferences, beliefs, traits, and values regarding the participation in the labour market from a mother to her daughter. There is evidence of such transmission within the family during the twentieth century (Fernández et al. 2004; Farré and Vella 2013), and hence, it is possible that it was also present previously. We explore the importance of the transmission of human and physical capital in a variable specification where controls are added for several socio-economic factors. In this extended specification, the AME estimates remain large and statistically significant at 26.7 pp (s.e. 2.8 pp).

Our Likelihood Approach estimates of the mother–daughter link are always substantially higher than those by Morrill and Morrill (2013) for the late twentieth century: in the range of 2–7 pp. Although differences with Morrill and Morrill (2013) partly stem from using different periods, a reduction in the effect should be expected as the importance of the mother vis-a-vis external factors, such us access to formal education, decreases with economic and cultural development. Indeed, when we split our sample into different periods, the AME declines with time.

Estimating the model with the Likelihood Approach allows us to test whether the missing process is ignorable. We strongly reject Ignorability, which implies that observations with missing values have valuable information and should not be discarded. Moreover, the implicit assumption under Imputation, that is, that the missing occupations correspond to non-participation in the labour market, is inconsistent with our Likelihood Approach estimates.

Finally, by means of simulation methods we evaluate the sensitivity of Ignorability and the Likelihood Approach to varying degrees of the incidence of missing information. This is carried out by taking the estimated parameters to simulate the data and applying different percentages of missing observations. We then re-estimate the AME in these different samples. We find that ignoring the missing observations (Ignorability) results in estimates with large downward bias, which increases with the incidence of the missing information. For example, when the level of missing information matches that of the actual data, estimates are on average less than half of the true value in the data generating process. In contrast, the Likelihood Approach always delivers unbiased estimates, although, as expected, their standard errors increase with the incidence of missingness. By attaining unbiased estimation in the presence of non-random missing data, our approach confers considerable potential to incomplete historical data.

The remainder of this paper proceeds as follows. Sections 2 and 3 describe the dataset and the econometric model, respectively. Section 4 presents our main estimation results, and Section 5 discusses the robustness of the Likelihood Approach using simulated samples with varying prevalence of missing observations. Section 6 yields the conclusions, while Appendix A provides institutional background, Appendix B describes additional features of the data, and Appendices C, D, and E contain technical details.

2 Data

2.1 Parish data

The main data source is parish information that dates back to the end of the sixteenth century and was extracted from parish records in the villages of Ronfe and Ruivães in the Minho region, in the north-western part of Portugal, and the city of Horta on Faial Island located in the Azores. A research team from the University of Minho collected the main datasets based on all baptism, marriage, and death certificates found in the local churches.Footnote 3\(^{,}\)Footnote 4 The resulting dataset was matched with other church individual-specific records known as “rol de confessados” (literally, the list of the confessed). The latter originates from parochial censuses of residents over seven years of age. They were initially produced by the priest during Lent to administer the sacrament of penance to the parishioners and contain information regarding occupation and/or social status.

The original baptism, marriage, and death certificate records allowed family linkages to be reconstructed within each location beginning in the 1550s up until the mid-twentieth century (Amorim 1991). Altogether, following basic cleaning, the dataset includes entries for 34,897 women (50.2 per cent of all records). All observations of slaves and their children (almost exclusively found in Horta) have also been excluded, and the number of female observations drops to 34,075.

The dataset holds information on the dates of birth, marriage, and death together with a family identification code. Gender is inferred from the individual’s first name in the baptism Parish registry. We use the family identification code to link women with their mothers (and fathers) and to compute the number of siblings for each woman. The year of birth is available for 59.3 per cent of all records. In contrast, death information is available for only 33.3 per cent of all records. We complete records for which the date of birth is missing using birth date information from other members of the family. To do so, all individuals for which the information is available are grouped into generational cohorts spanning 25 years. Observations for which the year of birth is missing are completed by sequentially examining the 25-year birth period of siblings, spouse, and children, in that order. Once the cohort of the siblings or the spouse is identified, then the record is completed with the 25-year birth period of the spouse or sibling. In the event that only cohorts of the children are identified, then the 25-year period previous to the cohort of the eldest child is assigned to the missing record. This procedure is repeated until no changes are produced. As a result, 82.4 per cent of the original data can be associated with a given 25-year period.Footnote 5 Finally, records of individuals born before 1675 and after 1950 become sporadic in the original dataset and hence a restriction is imposed of only dates of birth between 1675 and 1950, resulting in a total of 21,645 female observations.

2.2 Non-ignorable missing occupations

The occupation/social status information contained in the matched records is not as complete as the baptism and marriage information. Columns 1, 3, and 5 of Table 1 report the number of observations per location and quarter-century. (Throughout the paper, each quarter-century is identified by its initial year.) Columns 2, 4, and 6 show the proportion (as a percentage) of observations with occupation/social status information per quarter-century and location.

Table 1 Original Sample Sizes & Reported Occupations/Social Status

Occupations/social status coverage varies by quarter-century and across locations. For Ruivães from 1700 until 1800 and for Horta in 1900, there is no occupations/social status information in the matched records. These quarter-century/location combinations are therefore omitted from our analysis. We also discard the observations for Horta and Ruivães in 1675 and Horta in 1875 since both the number of observations and coverage are unusually low. Hence, our working sample includes only observations from Horta in the period 1700–1850, Ronfe in the period 1675–1925, and Ruivães in the period 1825–1925.

Our working sample shows location-specific trends in the coverage. For example, whereas coverage in Horta in 1725 is 5.6 per cent, it increases monotonically to reach 25.8 per cent one century later. Coverage in Ronfe follows a U-shaped trend decreasing steadily from 15.7 per cent in 1675 to 3.4 per cent in 1775 to rise again to around 18 per cent levels in the beginning of the twentieth century. Location-specific trends suggest that location factors are at work. This does not preclude the influence of individual-specific factors, which are a potential source of selection bias.

Coverage never exceeds 36 per cent. Three reasons for these low figures can be discarded. First, accounting for early deaths can at most only explain a small proportion of non-coverage (Brettell 1998). Second, gender bias in priests’ recording practices appears to be minor because the gender differential in coverage is only 2.2 pp and follows a similar location/period pattern. Third, the priest could report occupation or social status to differentiate between women with a common name. If there were an excessive number of, say, “Marias”, then their occupation or social status would help distinguish between them. However, common names, such as Maria and Ana, are typically followed by a second given name and are as likely among those reported as among those unreported. For example, among the reported, 18.3 per cent are named “Maria” and 4.7 per cent “Ana” compared with 16.5 per cent and 5.2 per cent, respectively, among the unreported.

Why would the priest report some women’s occupation/social status and not that of others? We believe that there are at least four potential sources of selection bias. The first arises from the Lent census data-collection practices. The censuses were organized per street and gathered by priests door-to-door. Whenever priests gave priority to nearby households, then remote rural locations (where farmers and poorer people would tend to live) would be less likely to be covered.

The second source of selection bias stems from the activity itself. Priests might tend to only record the activity of those whose labour or social status was uncommon in the region and period, such as that of civil servants or of the miller in a village of farmers. Indeed, in the two rural locations, the share of farmers among reported occupations is unrealistically low: 10.3 per cent in Ronfe and 8.2 per cent in Ruivães.

The third source originates from social status: parochial priests are arguably more likely to register those parishioners who give large donations to the church.

The fourth and last source of non-coverage is due to the absence of identification of the mother–daughter pair. For 31.5 per cent of the daughters (5, 839 observations), the mother’s identification code is unknown. By construction, in all these cases, the participation status of the mother is missing. There are several reasons for this failure in identification. When the mother’s record precedes the first currently available registries, or it is illegible, or it contains coding errors, then the match with the daughter’s record is impossible. These problems are slightly more likely to occur the older the records are: in our working sample, around 35 per cent of the observations from 1675 have an unidentified mother compared with 33 per cent one hundred years later and 30 per cent two hundred years later. These registry issues affecting the identification of the mother–daughter pair are not systematically related to participation decisions or to the priest’s recording practices and are thus not a likely source of selection bias in our estimates. However, there are at least two sources of non-identification of the mother that may lead to non-ignorability. The first source is illegitimacy. Annotations from the University of Minho’s team suggest that at least 4 per cent of all cases with an unidentified mother were orphans or illegitimate children. Historical accounts suggest that there were more cases. For example, Scott (1999) reports that in Ronfe, 20.7 per cent of the heads of households were single females in 1750 and at least 18 per cent of the children baptized in 1700 were illegitimate. Illegitimate children and orphans are more likely to be poor, landless, and to work for pay. The second source involves the absence of the mother in the records: mothers who were not born, married, or deceased in the parish leave no personal records there and, thus, cannot be found in our dataset. This situation arises, for example, when, for reasons of work or marriage, a woman migrates from another location to one of our three locations and her mother remains in her original place of residence. The raw data show that daughters with identified mothers are less likely to participate. (Identification of the mother correlates with the woman’s participation conditional on reported, correlation value of \(-0.202\) with a p value smaller than 0.01.) Hence, this source might be non-ignorable.

To sum up, lack of coverage, that is, missingness, might be associated with non-random individual factors such as social status and activity choices and, thus, is probably non-ignorable.

2.3 Labour force participation

We construct women’s occupation/social status using information from two variables from the original files: “profissão” (profession) and “título” (title). In most cases, “profissão” reports professions and provides information on participation. Although “título” could be interpreted as social status, in many cases it also provides information on participation. Therefore, we infer participation from all the information available, including that in “título”.

The information on occupation/social status is not systematically classified across parochial priests and across time. As a result, the original data include more than 500 occupations/social status categories, many of which are close substitutes for one another. To make this information tractable, we group all categories into four major classes: employee/farmer, professional/capital owner, domestic production, and unproductive.Footnote 6 The class employee/farmer includes all paid and unqualified jobs. professional/capital owner includes landlords, liberal professions, traders, businesswomen, the self-employed and qualified and managerial jobs. Domestic production includes observations classified as “doméstica” in Portuguese (a term that can be interpreted as a housewife) and women to whom the priest listed as “dona”, a term originally employed to signal a woman of the upper class, and this was gradually adopted to also indicate the bourgeoisie during the eighteenth and nineteenth centuries. The unproductive category includes the indigent or those described as very poor, and individuals registered as nobility by the priest, and others. Based on these four major categories, we define labour force participation as being an employee/farmer or a professional/capital owner. Our definition mostly differs from the usual labour market participation measure due to the lack of precision in the priest’s registry of owners, given in Portuguese as “proprietário” (owner), whereas we correctly include as participants all small capital owners who are self-employed, such as shop owners; the term “proprietário” also pertains to landlords who have no labour market involvement. As a robustness check, we also conduct our main analysis under an alternative definition of participation that excludes property owners as participants. (See footnote 11.)

Table 2 Distribution (%) of Women Across Four Major Economic Categories

In Table 2, we report the distribution of women across our four major categories by location. Distributions vary significantly across locations and may be the combination of location-specific economic factors and differentials in the incidence of missing information. Certain categories appear sometimes under-represented (i.e. employee/farmer in Horta), whereas others seem over-represented (i.e. professional/capital owner in Ronfe). As shown in the last row of Table 2, these disparities lead to large differences in the proportion of women participating per location.

Observed participation rates are presumably contaminated by non-random missingness. More importantly for our objective, disregarding observations with missing participation status may bias estimates of the association between the mother’s labour force participation and the daughter’s probability of participation. Table 3 shows participation rates in terms of mother participation statuses in the subsample of mother–daughter pairs for whom the participation information is available. The raw estimate for the marginal effect (\(69.72-7.33=62.39\) pp) is probably biased due to non-random missingness.

Table 3 Women Participation Frequencies by Mother’s Participation

2.4 External data on female labour market participation rates

We use census data on local and national FLFP rates as external sources of information to enhance model identification. These census data, however, suffer from at least three limitations. First, the first census in Portugal with information on labour market participation was taken in the year 1890. Second, for Ronfe and Ruivães only information on the administrative regions that they belong to, Guimarães and Vila Nova de Famalicão, can be used. Third, the definition of labour participation used in census data in Western countries has changed over time. The main difference is found in the concepts of occupation and profession. The concept of occupation, adopted early on, classifies most women as active in the labour market, while the concept of profession, adopted later, does not. The difference applies most notably to the case of women whose (at times irregular) work was carried out in the home or the family farm/business, who would only be included as part of the labour force in early censuses (Goldin 1995). The rates such as that obtained from the 1890 Census should therefore be regarded as approximations relative to modern definitions. Reis (2005) provides a national-level estimate of the female labour market participation in 1862 that is substantially lower (19.1 per cent) than the rate obtained for 1890 using census data (38.5 per cent); for a detailed account of the census data, see Appendix B. On the other hand, it has been observed that women’s work has been vastly under-reported in historical official statistics during the same period in other countries (Humphries and Sarasúa 2012; Goldin 1995).

For the purpose of obtaining FLFP rates for our three locations, we construct predictions for participation rates using a log functional specification with the 1862 data from Reis (2005) and data from all censuses up to 1991 (see Appendix B). These predictions rank our three locations (Ronfe, Ruivães, and Horta) in the same way as the reported participation rates in our data (see Table 2). The high participation rates and the fact that women in rural settings participated more in the labour market than urban women are consistent with historical accounts for the seventeenth, eighteenth, and nineteenth centuries (Humphries and Sarasúa 2012).

For all the years since 1890, we use our predictions as external data (see Fig. 3 in Appendix B). Before 1890, neither census data nor predictions are available. In their stead, we postulate alternative scenarios. We initially report results under what we refer to as the “Baseline scenario”. In this scenario, we take the values of the predictions for our three locations in 1890 as the external female participation rates for all previous periods. In order to gauge the validity of these results, we estimate the model for a very large number (over 65,000) of alternative scenarios in Sect. 4.1.

3 The econometric model

We model a woman’s participation decision as dependent on her mother’s participation. Woman i chooses either to participate in the labour market, \(y_{i}=1\), or not, \(y_{i}=0\). The discrete choice is expressed in the following linear specification:

$$\begin{aligned} y_{i}={\varvec{1}}\left\{ \alpha y{}_{i}^{m}+x_{i}\beta +\epsilon _{i}>0\right\} \end{aligned}$$

where dummy variable \(y_{i}^{m}\) indicates the labour force participation status of the mother. Parameter \(\alpha \) captures the effect of a mother’s participation status on that of her daughter. Vector \(x_{i}\) includes additional controls.

3.1 Likelihood with missing information

Most observations have missing entries in \(\left\{ y_{i},y_{i}^{m}\right\} \) (86.1 per cent for \(y_{i}\) and 87.9 per cent for \(y_{i}^{m}\)). Let us define a binary indicator \(I_{i}\), which takes value 1 if the daughter’s occupation or social status is reported by the priest and available in the dataset and 0 otherwise. Similarly, let \(I_{i}^{m}\) take value 1 if the mother’s occupation or social status is reported and 0 otherwise.

When the probability of a missing observation is independent of \(y_{i}\), missing observations are ignorable in the sense that their omission from estimation does not bias the results. As argued in Sect. 2, missing participation is likely to be related to non-random individual factors such as social status and activity choices and is, therefore, non-ignorable.

The aim is to estimate parameter vector \(\theta \equiv \left\{ \alpha ,\beta \right\} \) where:

$$\begin{aligned} \Pr \left\{ y_{i}=v\left| y_{i}^{m}=w,x_{i}\right. \right\} =F\left( v,w,x_{i};\theta \right) \end{aligned}$$

where v and w are values that y and \(y^{m}\) can take, that is \(v,w\in \left\{ 0,1\right\} \). By assuming normality, the conditional probit model is obtained:

$$\begin{aligned} F\left( v,w,x_{i};\theta \right) \equiv \left\{ \begin{array}{ll} \Phi \left( \alpha w+x_{i}\beta \right) &{} \text{ if } v=1\\ 1-\Phi \left( \alpha w+x_{i}\beta \right) &{} \text{ otherwise } \end{array}\right. . \end{aligned}$$

The missingness mechanism and the participation process jointly define the event’s probability:

$$\begin{aligned}{} & {} \Pr \left\{ I_{i}=r,I_{i}^{m}=s,y_{i}=v,y{}_{i}^{m}=w,x_{i}\right\} \nonumber \\{} & {} \quad = \Pr \left\{ I_{i}=r,I_{i}^{m}=s\left| y_{i}=v,y{}_{i}^{m}=w,x_{i}\right. \right\} \nonumber \\{} & {} \qquad \times F\left( v,w,x;\theta \right) \times \Pr \left\{ y_{i}^{m}=w,x_{i}\right\} . \end{aligned}$$

for \(r,s\in \left\{ 0,1\right\} \). For an observation with non-missing information, the joint probability of non-missingness, that is, \(I_{i}=I_{i}^{m}=1\), and the vector variables \(\left\{ y_{i},y{}_{i}^{m},x_{i}\right\} \) is \(\Pr \left\{ I_{i}=I_{i}^{m}=1,y_{i}=v,y{}_{i}^{m}=w,x_{i}\right\} \), which is a particular case of Eq. (3.3).

There are three situations in which a given observation may have missing information: when the daughter’s information is missing but the mother’s is not, when the mother’s information is missing but the daughter’s is not, and when information for both the daughter and the mother is missing. This implies the following probability of observation i:

$$\begin{aligned} \begin{array}{ll} p_{i}= &{} \left( \Pr \left\{ I_{i}=I_{i}^{m}=1,y_{i},y{}_{i}^{m},x_{i}\right\} \right) ^{I_{i}I_{i}^{m}}\times \\ &{} \left( \Pr \left\{ I_{i}=0,I_{i}^{m}=1,y{}_{i}^{m},x_{i}\right\} \right) ^{(1-I_{i})I_{i}^{m}}\times \\ &{} \left( \Pr \left\{ I_{i}=1,I_{i}^{m}=0,y_{i},x_{i}\right\} \right) ^{I_{i}(1-I_{i}^{m})}\times \\ &{} \left( \Pr \left\{ I_{i}=I_{i}^{m}=0,x_{i}\right\} \right) ^{(1-I_{i})(1-I_{i}^{m})}. \end{array} \end{aligned}$$

Appendix C shows the expression of the different terms of \(p_{i}\) with probabilities given by the model in Eq. (3.3).

3.1.1 Ignorability of the Missing Process

If, conditional on vector \(x_{i}\), the missing mechanisms of the mother and the daughter are independent of their participation decisions, then we can simplify Eq. (3.3) to

$$\begin{aligned}{} & {} \Pr \left\{ I_{i}=r,I_{i}^{m}=s,y_{i}=v,y{}_{i}^{m}=w,x_{i}\right\} \\{} & {} \quad =\Pr \left\{ I_{i}=r,I_{i}^{m}=s\right\} F\left( v,w,x;\theta \right) \Pr \left\{ y_{i}^{m}=w,x_{i}\right\} . \end{aligned}$$

The probability of a non-missing observation is

$$\begin{aligned}{} & {} \Pr \left\{ I_{i}=1,I_{i}^{m}=1,y_{i}=v,y{}_{i}^{m}=w,x_{i}\right\} \\{} & {} \quad =\Pr \left\{ I_{i}=I_{i}^{m}=1\right\} F\left( v,w,x;\theta \right) \Pr \left\{ y_{i}^{m}=w,x_{i}\right\} \end{aligned}$$

, and the probability conditional on the observation being non-missing is

$$\begin{aligned} F\left( v,w,x_{i};\theta \right) \times \Pr \left\{ y_{i}^{m}=w,x_{i}\right\} . \end{aligned}$$

Thus, \(\theta \) can be consistently estimated by maximum likelihood using only the observations for which no information is missing, and the missing process is ignorable.

3.2 The likelihood approach to addressing non-ignorable missingness

The traditional solution to non-ignorable missingness is to perform a procedure in which the missing values are imputed (see, among others, Little and Rubin 2002). This presumes that certain events have zero probability. In Sect. 4, we present estimations under the assumption that daughters (mothers) with missing participation do not participate in the labour force. This is consistent with assuming that all missing observations engage in domestic production (either as housewives or as unpaid farmers).Footnote 7

These imputation procedures are ad hoc. An alternative approach is to propose a model for the missing data mechanism and jointly estimate the participation model together with the missing data generation process. In this section, we follow Ramalho and Smith (2013) and state weaker assumptions regarding the missing data mechanism to identify participation while controlling for potentially non-ignorable missing information. Ramalho and Smith (2013) consider the situation in which the missingness mechanism is conditionally dependent on the outcome variable and on a discrete partition of the covariates. Our empirical application calls for further adjustments to this strategy. From Eq. (3.3) and without any loss of generality,

$$\begin{aligned} \Pr \left\{ I_{i},I_{i}^{m}\left| y_{i},y_{i}^{m},x_{i}\right. \right\}&=\Pr \left\{ I_{i}\left| I_{i}^{m},y_{i},y_{i}^{m},x_{i}\right. \right\} \times \Pr \left\{ I_{i}^{m}\left| y_{i},y_{i}^{m},x_{i}\right. \right\} . \end{aligned}$$

Our strategy depends on additional assumptions regarding both \(\Pr \left\{ I_{i}\left| I_{i}^{m},y_{i},y_{i}^{m},x_{i}\right. \right\} \) and \(\Pr \left\{ I_{i}^{m}\left| y_{i},y_{i}^{m},x_{i}\right. \right\} .\)

Priests might have been more likely to under-report the incidence of occupations that were common, such as farmers, and more likely to record the occupations for those whose labour status presented a differentiating characteristic. Hence, the chance of coverage is probably conditioned by the type of occupation of the woman. However, due to the high non-coverage rate, insufficient information is available to control for detailed occupations. We solve this problem by exploiting the information contained in the joint missing process for mothers and daughters. As our data show, the coverage of mothers and daughters is correlated. Daughters of women whose occupation/social status is not reported by the priest have an 88.4 per cent chance of not being reported. Likewise, daughters of women whose status is reported have a much larger chance of having theirs also reported compared to the average daughter (31.0 per cent versus 13.9 per cent). By assuming that observability of the daughter’s participation depends both on her participation status and on the observability of her mother, which can be interpreted as a proxy of social status, the model can capture different effects on coverage by occupation. These considerations warrant the following:

Assumption 3.1

(Daughter’s Response Conditional Independence) Non-missingness in \(y_{i}\) is conditionally independent of \(y_{i}^{m}\) and \(x_{i}\), that is,

$$\begin{aligned} \Pr \left\{ I_{i}\left| I_{i}^{m},y_{i},y_{i}^{m},x_{i}\right. \right\} =\Pr \left\{ I_{i}\left| I_{i}^{m},y_{i}\right. \right\} . \end{aligned}$$

Assumption 3.1 is not an imputation procedure because it does not replace the missing observations with any set of values. All information affecting the probability of daughter’s occupation coverage is contained jointly in the daughter’s occupation status and the mother’s coverage. Since the mother’s information is probably collected early on and through a similar process, an assumption closely related to Assumption 3.1 but referring to the availability of the mother’s participation decision can also be made. Note, however, that although the mother’s coverage status is allowed to correlate with the daughter’s through Assumption 3.1, Eq. (3.6) warrants a simpler assumption for the mother:

Assumption 3.2

(Mother’s Response Conditional Independence) Non-missingness in \(y_{i}^{m}\) is conditionally independent of \(y_{i}\), and \(x_{i}\), that is,

$$\begin{aligned} \Pr \left\{ I_{i}^{m}\left| y_{i},y_{i}^{m},x_{i}\right. \right\} =\Pr \left\{ I_{i}^{m}\left| y_{i}^{m}\right. \right\} . \end{aligned}$$

Given the values of \(y_{i}\) and \(I_{i}^{m}\) (\(y_{i}^{m}\)), Assumptions 3.1 and 3.2 state that the missing process is independent of the covariates. This implies, for example, that priests of any place and location in our sample use the same criteria to decide whether to report or not the labour status of women. This restriction is mitigated because Assumption 3.1 allows for the missing processes for daughter and mother to be related. In this way, the correlation of daughters’ and mothers’ coverage may be due to common unobservable factors affecting the priests’ decisions.Footnote 8 An important implication of Assumptions 3.1 and 3.2 is that covariates in \(x_{i}\) affect the probability that the priest reports the information only through their effects on \(y_{i}\) and \(y_{i}^{m}\). Indeed, it is precisely through this implied correlation between controls and priests’ reporting decisions that the parameter of interest \(\alpha \) is identified.Footnote 9

Henceforth, we assume that control vector \(x_{i}\) is a vector of discrete variables, which is the case in our data.Footnote 10 Assumptions 3.1 and 3.2 are sufficient conditions to identify the parameter vector \(\theta \). Let \(H_{rsv}\equiv \Pr \left\{ I_{i}=r,I_{i}^{m}=s,y_{i}=v\right\} \) and \(H_{sw}^{m}\equiv \Pr \left\{ I_{i}^{m}=s,y_{i}^{m}=w\right\} \) with \(r,s,v,\text { and }w\in \left\{ 0,1\right\} \). Furthermore, the unconditional probabilities of discrete variables \(y_{i}\) and \(y_{i}^{m}\) are denoted by \(\Pr \{y_{i}=v\}=\Pi _{v}\) and \(\Pr \{y_{i}^{m}=w\}=\Pi _{w}^{m}\), respectively. Finally, \(\Pi _{w,x}\equiv \Pr \left\{ y_{i}^{m}=w,x_{i}\right\} \), where the number of parameters in \(\Pi _{w,x}\) is given by the number of different value combinations of variables \(y_{i}^{m}\) and \(x_{i}\) observed in the data. Assumptions 3.1 and 3.2 imply that:

$$\begin{aligned} \Pr \left\{ I_{i}=I_{i}^{m}=1,y_{i}=y{}_{i}^{m}=1,x_{i}=x\right\} =\left( \frac{H_{111}}{\Pi _{11}}\right) \left( \frac{H_{11}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 1,1,x;\theta \right\} \Pi _{1,x},\nonumber \\ \end{aligned}$$

where \(\Pi _{11}=\Pr \left\{ I_{i}^{m}=1,y_{i}=1\right\} \) and \(\Pi _{1,x}\) is the parameter of the matrix \(\Pi _{w,x}\) that corresponds to the specific combination of values of variables \(\left( y_{i}^{m},x_{i}\right) =\left( 1,x\right) \).

Consider the case in which only the daughter’s information is missing. The joint probability for \(\left\{ I_{i}=0,I_{i}^{m}=1,y{}_{i}^{m}=1,x_{i}=x\right\} \), which corresponds to the second term in Eq. (3.4), is:

$$\begin{aligned} \begin{array}{cc} \begin{array}{cc} &{}\Pr \left\{ I_{i}=0,I_{i}^{m}=1,y{}_{i}^{m}=1,x_{i}=x\right\} =\left\{ \Pr \left\{ I_{i}=0,I_{i}^{m}=1,y_{i}=1,y{}_{i}^{m}=1,x_{i}\right\} \right. \\ &{}\qquad \qquad \left. +\Pr \left\{ I_{i}=0,I_{i}^{m}=1,y_{i}=0,y{}_{i}^{m}=1,x_{i}\right\} \right\} \\ &{}\quad =\left( \frac{H_{011}}{\Pi _{11}}\right) \left( \frac{H_{11}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 1,1,x_{i};\theta \right\} \Pi _{1,x_{i}}+\left( \frac{H_{010}}{\Pi _{10}}\right) \left( \frac{H_{11}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 0,1,x_{i};\theta \right\} \Pi _{1,x_{i}} \end{array}\end{array} \end{aligned}$$

Following a similar argument, the joint probability of an observation in which the daughter participates and the mother’s participation decision is missing is:

$$\begin{aligned} \Pr \left\{ I_{i}=1,I_{i}^{m}=0,y{}_{i}=1,x_{i}\right\}&=\left( \frac{H_{101}}{\Pi _{01}}\right) \left( \frac{H_{01}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 1,1,x_{i};\theta \right\} \Pi _{1,x_{i}}\nonumber \\&\qquad +\left( \frac{H_{101}}{\Pi _{01}}\right) \left( \frac{H_{00}^{m}}{\Pi _{0}^{m}}\right) F\left\{ 1,0,x_{i};\theta \right\} \Pi _{0,x_{i}}. \end{aligned}$$

The joint probability of an observation in which both participation decisions are missing is:

$$\begin{aligned} \begin{array}{ll} \Pr \left\{ I_{i}=0,I_{i}^{m}=0,x_{i}\right\} = &{} \left( \frac{H_{001}}{\Pi _{01}}\right) \left( \frac{H_{01}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 1,1,x_{i};\theta \right\} \Pi _{1,x_{i}}+\\ &{} \left( \frac{H_{000}}{\Pi _{00}}\right) \left( \frac{H_{01}^{m}}{\Pi _{1}^{m}}\right) F\left\{ 0,1,x_{i};\theta \right\} \Pi _{1,x_{i}}+\\ &{} \left( \frac{H_{001}}{\Pi _{01}}\right) \left( \frac{H_{00}^{m}}{\Pi _{0}^{m}}\right) F\left\{ 1,0,x_{i};\theta \right\} \Pi _{0,x_{i}}+\\ &{} \left( \frac{H_{000}}{\Pi _{00}}\right) \left( \frac{H_{00}^{m}}{\Pi _{0}^{m}}\right) F\left\{ 0,0,x_{i};\theta \right\} \Pi _{0,x_{i}}. \end{array} \end{aligned}$$

Thus, Eq. (3.4) simply becomes:

$$\begin{aligned} \begin{array}{ll} p_{i}= &{} \left( \left( \frac{H_{11y_{i}}}{\Pi _{1y_{i}}}\right) \left( \frac{H_{1y_{i}^{m}}^{m}}{\Pi _{y_{i}^{m}}^{m}}\right) F\left\{ y_{i},y_{i}^{m},x_{i};\theta \right\} \Pi _{y_{i}^{m},x_{i}}\right) ^{I_{i}I_{i}^{m}}\times \\ &{} \left( {\sum }_{v\in \left\{ 0,1\right\} }\left( \frac{H_{01v}}{\Pi _{1v}}\right) \left( \frac{H_{1y_{i}^{m}}^{m}}{\Pi _{y_{i}^{m}}^{m}}\right) F\left\{ v,y_{i}^{m},x_{i};\theta \right\} \Pi _{y_{i}^{m},x_{i}}\right) ^{(1-I_{i})I_{i}^{m}}\times \\ &{} \left( {\sum }_{w\in \left\{ 0,1\right\} }\left( \frac{H_{10y_{i}}}{\Pi _{0y_{i}}}\right) \left( \frac{H_{0w}^{m}}{\Pi _{w}^{m}}\right) F\left\{ y_{i},w,x_{i};\theta \right\} \Pi _{w,x_{i}}\right) ^{I_{i}(1-I_{i}^{m})}\times \\ &{} \left( {\sum }_{v,w\in \left\{ 0,1\right\} }\left( \frac{H_{00v}}{\Pi _{0v}}\right) \left( \frac{H_{0w}^{m}}{\Pi _{w}^{m}}\right) F\left\{ v,w,x_{i};\theta \right\} \Pi _{w,x_{i}}\right) ^{(1-I_{i})(1-I_{i}^{m})}. \end{array} \end{aligned}$$

A clarification concerning our notation is perhaps in order. The meaning of subscript i in a given variable is that the function is to be evaluated at the value of the variable at observation i. For example, \(F\left\{ y_{i},w,x_{i};\theta \right\} \) in the third row of Eq. (3.13) should be evaluated at the value that variables y and x have at observation i and a running value w over the two potential values \(\left\{ 0,1\right\} \) for \(y_{i}^{m}\). In contrast, whenever the subscript i is used in parameters, it indicates that the relevant parameter is that which corresponds to the value at that observation. For example, if, for observation i, \(y_{i}^{m}=a\) and \(x_{i}=b\), then \(\Pi _{y_{i}^{m},x_{i}}\equiv \Pr \left\{ y_{i}^{m}=a,x_{i}=b\right\} \).

The vector of parameters includes, in addition to \(\theta \), the probabilities \(\left\{ H_{rsv}\right\} \), for \(r,s,v\in \left\{ 0,1\right\} \),\(\left\{ H_{sw}^{m}\right\} \), for \(s,w\in \left\{ 0,1\right\} \), and \(\left\{ \Pi _{w,x}\right\} \), which has as many parameters as the combinations of the values of \(y_{i}^{m}\) and \(x_{i}\) in the data. Equation (3.13) represents the likelihood \({\mathcal {L}}_{i}\) for any given observation i as a function of the expanded parameter vector \(\left\{ \theta ,\left\{ H_{rsv}\right\} ,\left\{ H_{sw}^{m}\right\} ,\left\{ \Pi _{w,x}\right\} \right\} \). The log-likelihood function results from the sum of the log of \({\mathcal {L}}_{i}\), \(\log \left( {\mathcal {L}}\right) =\sum _{i=1}^{N}\log \left( {\mathcal {L}}_{i}\right) \), subject to the following constraints:

$$\begin{aligned}{} & {} \begin{array}{l} {\sum }_{r,s,v}H_{rsv}=1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} {\sum }_{s,w}H_{sw}^{m}=1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} {\sum }_{r,v}H_{r,s,v}={\sum }_{w}H_{sw}^{m} \text{ for } \end{array}s=0,1 \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} {\sum }_{w,x}\Pi _{w,x}=1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} \Pi _{v}={\sum }_{w,x}F\left\{ v,w,x;\theta \right\} \Pi _{w,x}\,\text{ for } v=0,1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} \Pi _{v}={\sum }_{r,s}H_{rsv}\, \text{ for } v=0,1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} \Pi _{w}^{m}={\sum }_{s}H_{sw}^{m}\text { for }w=0,1\end{array} \end{aligned}$$
$$\begin{aligned}{} & {} \begin{array}{l} \Pi _{w}^{m}={\sum }_{x}\Pi _{w,x}\,\text{ for } w=0,1\end{array}. \end{aligned}$$

Maximum likelihood estimation yields consistent and asymptotically efficient estimates of \(\theta \). One of the difficulties associated with this model is that the vector of parameters \(\Pi _{w,x}\) grows with the number of different value combinations of variables \(y_{i}^{m}\) and \(x_{i}\) that are observed in the data. In our application, the number of parameters may be reduced by introducing additional restrictions. In particular, if we decompose vector x into location and time dummies, \(x_{1}\), and additional controls, \(x_{2}\), then we have without any loss of generality:

$$\begin{aligned} \Pi _{w,x}=\Pi _{x_{2}|w,x_{1}}\Pi _{w|x_{1}}\Pi _{x_{1}} \end{aligned}$$

In order to reduce the number of parameters, we assume that the distribution of \(x_{2}\) depends only on the location and time dummies and not on the mother’s working status:

$$\begin{aligned} \Pi _{w,x}=\Pi _{x_{2}|x_{1}}\Pi _{w|x_{1}}\Pi _{x_{1}}. \end{aligned}$$

Finally, we also assume that \(\Pi _{v}=\Pi _{v}^{m}\) for \(v\in \left\{ 0,1\right\} \). Although the model is identified, a very large proportion of missing observations probably affects the concavity of the likelihood function, thereby impairing the identification of the parameters of the participation process. To improve sample identification, external information that provides direct values for \(\Pi _{w|x_{1}}\) can be plugged into the likelihood function resulting in a reduction of the set of parameters (Imbens and Lancaster 1994). Furthermore, we use this external information and estimates for \(\Pi _{x_{2}|x_{1}}\) and \(\Pi _{x_{1}}\) to construct \(\Pi _{1}^{m}=\Pi _{1}\) (i.e. the unconditional probability of participation) which, in conjunction with Eq. (3.18), simplifies the likelihood function and allows the identification of the constant \(\beta _{0}\) in Eq. (3.2).

4 Main results

In our initial specification, vector \(x_{i}\) includes location and time dummies. We present estimates of \(\alpha \) and AME under three major strategies in Table 4. First, in the column labelled “Ignorability”, we report estimates using the sample obtained after having dropped missing observations. Second, we use the sample obtained after imputing non-activity to missing values on both mother’s and daughter’s participation status. The mother is not identified for 5,839 women, or 31.5 per cent of our sample. Mothers may remain unidentified: (i) when the mother’s record precedes the first currently available registries, (ii) when there are coding errors, (iii) when the daughter is an illegitimate child, or (iv) when she or her mother change residency. Since sample selection cannot be ruled out due to these reasons, we consider here two approaches under Imputation. In the column labelled “Imputation A”, we show results using the sample with identified mothers and impute all missing values as non-participation. In column “Imputation B”, we additionally impute non-participation for the missing values when the mother’s record is not identified within the sample. Third, we present results using the Likelihood Approach developed in the previous section for the Baseline scenario described in Sect. 2.4.

Table 4 Benchmark results: Estimates, AMEs, and Ignorability Tests

In all but one of the estimations (Imputation B), we find a positive and statistically significant estimate of the parameter associated with the mother’s participation status (parameter \(\alpha \) in Eq. (3.1)) and of the corresponding average marginal effect (AME), that is, the average change in the probability of participation when the mother’s participation status changes from non-participation to participation. By comparing the AME from column “Ignorability” in Table 4 with the raw marginal effect obtained using the ratios computed from the information in Table 3, we observe a drop from 62.4 to 12.9 pp. (12.9 pp corresponds to an AME estimate in Table 4 of 0.129.) This drop occurs because no factors are controlled for in the raw estimate. Similarly, for Imputation A and B estimates, the AME drops from 5.2 and 2.9 pp in the raw data to 2.0 and \(-1.9\) pp, respectively. Following the patterns found in the raw data (Table 3 and Appendix D), the AME under Ignorability is larger than under Imputation.

The size of the AME using the Likelihood Approach is substantially larger than under Ignorability or Imputation. According to the results using the Likelihood Approach, a woman whose mother participates in the labour market has a probability of participation which is 39.7 pp larger than a woman whose mother does not participate.Footnote 11 Naturally, sample averages of the dependent variable differ across Ignorability and Imputation. They also differ between Ignorability and the Likelihood Approach because the sample employed to compute \({\overline{y}}\) expands from 570 under Ignorability to additionally include observations with missing participation status of the mother to a total of 2, 584 under the Likelihood Approach. For the latter, we also report the ML estimate for the unconditional probability of participation (parameter \(\Pi _{1}\) in our model), which is approximately 9 pp smaller than \({\overline{y}}\). As anticipated, priests seem to selectively report participants.

In order to formally test Ignorability in the model (3.13), we test whether \(\Pr \left\{ I_{i}=1\left| I_{i}^{m},y_{i}\right. \right\} \) and \(\Pr \left\{ I_{i}^{m}=1\left| y_{i}^{m}\right. \right\} \) do not vary with y, \(I^{m}\), and \(y^{m}\). In the bottom half of Table 4, we report these conditional probabilities and the Wald test (which we label “Ignorability test”) for the equality of all conditional probabilities in the Likelihood Approach.Footnote 12 The differences in conditional probabilities are sizable and congruent with the hypothesis that priests selectively report participants. For example, the probability that the mother’s participation status is observed almost doubles when she participates. Similarly, the probability that the daughter’s participation status is observed increases from 8.6 per cent to 20.3 per cent when she participates if the mother’s participation status is not observed. Notably, this statistical association between participation and its observability reverses and amplifies when the participation of the mother is observed. In this case, the probability that the participation of the daughter is observed when she is participating is 10.6 per cent, whereas it reaches 51.5 per cent when she is not participating. When the priest reports the mother, signalling her as socially important, the fact that her daughter does not participate is statistically associated with a very high probability that the priest reports her as “dona”; that is, she does not participate. These large differences in conditional probabilities lead to a strong rejection of the Ignorability test.

In addition, the ML estimates discard the plausibility of the implicit assumption under Imputation, that is, that missing participation implies non-participation in the labour market. For example, given that

$$\begin{aligned} \text{ Pr }\left( y^{m}=1|I^{m}=0\right) =\text{ Pr }\left( I^{m}=0|y^{m}=1\right) \times \frac{\text{ Pr }\left( y^{m}=1\right) }{\text{ Pr }\left( I^{m}=0\right) }, \end{aligned}$$

the ML estimate for the probability that mothers participate if their information is missing is \(\left( 1-0.182\right) \times \frac{0.288}{0.8787}\times 100\approx 26.8\) per cent, a value which is well over zero.

4.1 Alternative scenarios

Hitherto, we have reported results under the Baseline scenario. In this scenario, the local predictions of female participation rates in 1890 from the estimation of Eq. (B.1) in Appendix B are used for all quarter-centuries before the year 1890. This is perhaps restrictive. In this section, we propose two alternative strategies to assess the validity of the results under the Baseline scenario.

The first strategy considers two scenarios with constant but extreme local participation rates before 1890. In what we refer to as the “Low Participation scenario”, we take the smallest local participation rate for which we have information in all censuses from 1890 to 1950. Similarly, in the “High Participation scenario”, the largest local participation rate is taken. Specifically, we take the smallest and largest rates in the borough of Horta for Horta and the smallest and largest values in the nearby borough of Vila Nova de Famalicão for Ronfe and Ruivães.

In the second strategy, a large number of paths for each location are simulated backwards under the weak assumption that 25-year changes before 1890 were no larger, in absolute value, than the average changes from 1890 to 1950. To simulate the change in the local participation rate from one quarter-century to the previous quarter-century, we consider two possibilities: increase or decrease. In the case of an increase, the change is drawn from the uniform distribution between zero and the local mean increase in the 1890–1950 period. In the case of a decrease, the value is drawn from the uniform distribution between the local mean decrease and zero. For Horta, there are a total of \(2^{7}=128\) combinations of possible increases and decreases in the seven quarter-centuries from 1700–1724 to 1850–1874. The number of possible combinations for Ronfe is 256 because it has an additional quarter-century, while for Ruivães it is 2 because there is only one quarter-century before 1890. The interaction of all the combinations of the three locations generates a total of \(128\times 256\times 2=65,536\) scenarios, each with a different participation path per location. Figure 1 shows several examples of simulated paths for Horta and Ronfe.

Fig. 1
figure 1

Local female participation rates. Note: Female participation rates in the quarter-century following the quarter-century birth. “Model fit” refers to predictions in midpoint years of each quarter-century from the estimation of Eq. (B.1) in Appendix B. “Illustratory scenarios” are seven examples of simulated paths obtained from the algorithm explained in Sect. 4.1 for the period before any external sources are available

We estimate the model using the Likelihood Approach for all scenarios under the two strategies. Table 5 shows the Ignorability and Baseline results from Table 4, the estimates under the High and Low Participation scenarios, and the minimum and maximum AME estimates out of the 65,536 simulated paths (columns “Minimum AME” and “Maximum AME”, respectively).

Table 5 Alternative Scenarios for External Information: Estimates and AMEs

In contrast to the Ignorability case, the Baseline results lie comfortably within the range defined by the Minimum and Maximum AMEs. Furthermore, the latter two values are surprisingly similar to those of the High and Low Participation scenarios, which strongly suggests that the High and Low Participation scenarios provide reasonable extremes to capture the potential variability of the AME given the lack of external information before 1890. Henceforth, for computational reasons, we only report Baseline and High and Low Participation results.

4.2 The stability of the role of mothers

Results from the Likelihood Approach in Table 4 are obtained under the assumption that \(\alpha \) in Eq. (3.1) is invariant across periods. To assess the validity of this assumption, our working sample is divided into three periods: the first period ranges from 1675 to 1750; the second period starts in 1775 and ends in 1825; and the remaining four quarter-centuries constitute the last period (i.e. 1850–1925).Footnote 13

Sample identification problems arise in the first period because all reported mother–daughter pairs are non-participants. This makes estimation under Ignorability impossible and leads to non-convergence and instability under the Likelihood Approach. The first period is therefore disregarded, and in the remaining part of this section the focus is on the second and third periods. Table 6 presents the Baseline and High and Low Participation results for the second period and, for brevity, only the Baseline results for the third period.Footnote 14

Table 6 ML Estimates for Different Periods: Estimates, AMEs, and Ignorability Tests

All AME estimates are large and statistically significantly different from zero. For the sample without the first period, that is, for 1775–1925, the results increase, but the confidence intervals overlap with those shown in Table 4 using all periods. In the second period 1775–1825, which ends before 1890, the external information varies across scenarios for all observations. Unsurprisingly, it is for this period where the largest differences are found across scenarios in the AME estimates. For the third period, that is, for 1850–1925, we find a smaller effect: 39.7 vs 49.9 pp. This is consistent with a decreasing role of mothers over time as participation increases from a \(100\times {\widehat{\Pi }}_{1}\) estimate of 23.9 per cent to an estimate of 38.5 per cent.

The bottom half of Table 6 deserves several comments. Ignorability tests are again strongly rejected in both periods. Naturally, as the number of missing values decreases over time, most of the estimates of the conditional probabilities of observability increase. More interestingly, by distinguishing between the two periods, we find evidence of decreasing influence of social status on the reporting activity of the priest at the turn of the century: for example, for \(y=0\), the role of \(I^{m}\) on \(\text{ Pr }\left( I=1|y,I^{m}\right) \) decreases substantially from \(83.3-9.7=73.6\) pp in the 1775–1825 period to \(15.1-14.8=0.3\) pp in the 1850–1925 period.

4.3 Additional controls

4.3.1 Family size and father’s social status

Permanent and quasi-permanent factors induce intergenerational persistence in participation decisions. Inasmuch as the goal is to assess persistence arising from the mother–daughter link, other factors that induce persistence need to be controlled for. These include labour market conditions, sex-ratio differentials, and social multiplier effects captured by the location and time dummies used in our specification in Table 4. In this section, we add to Eq. (3.1) family characteristics that may induce persistence. Within family characteristics, we consider family size, which is proxied by an indicator of more than four siblings, siblings, and two dummy variables related to the father’s socio-economic status: (i) whether the priest records that the father is a rentier, an owner, a merchant, a high-ranking civil servant, or an officer, \( father \,SES\), and (ii) whether the priest reports the father to be an owner, \( father \, owner \). These two dummy variables are important because they capture at least two sources of persistence within the family: the transmission of wealth (which probably leads to lower participation rates) and that of physical capital (which should lead to higher rates according to our participation measure). The dummy variable \( father \,owner\) identifies the latter effect of property transmission.

One could argue that a measure of human capital should also be added. Throughout the sample period, however, any proxy for wealth is also a proxy for educational achievement: female illiteracy rates remained over 68 per cent in Portugal in the 1930s and formal education was only accessible to a small and privileged group (Candeias et al. 2004). Hence, the controls for wealth indirectly control for human capital.Footnote 15

Table 7 Results controlling for SES: Estimates, AMEs, and Ignorability Tests

Table 7 replicates Table 4 with the additional controls. As in Table 4, the results under Ignorability and Imputation differ widely from those obtained with the Likelihood Approach. The AME estimates with the Likelihood Approach are smaller because the additional controls capture part of the inertia within the family. Contrary to the results in Table 5, the AME estimates under the Baseline, and the High and Low Participation scenarios are very similar. Under the Baseline scenario, a woman whose mother participates in the labour market has a probability of participation which is 26.7 pp greater than a woman whose mother does not participate. This value is relatively large as it represents over 70 per cent of the sample average participation and implies an increase of 126.5 per cent in the probability of participation for those women whose mothers do not participate.Footnote 16 If the actual effect were zero and our AME estimate were biased due to selection on unobservables, this bias would double that from selection on observables (that is, the difference between 39.7 and 26.7). The magnitude of the estimated effect suggests that this is not the situation.Footnote 17

Regarding the estimates of the additional controls, their signs are not always consistent across the three approaches. Under the Likelihood Approach, they are always smaller in magnitude than the estimate for the mother participation coefficient \({\widehat{\alpha }}\). Conditional on the participation of the mother, a large number of siblings, a low SES status of the father, and an owner status of the father increase the probability that the woman participates in the labour market.

The estimated probabilities of observability are similar to the ones presented in the Likelihood Approach in Table 4. Moreover, we also strongly reject that these probabilities do not vary with y, \(I^{m}\), and \(y^{m}\).

4.3.2 Migration

The transmission of wealth could also be reflected in migration flows. On the one hand, large families and low socio-economic status may increase the probability of migration for economic reasons. On the other hand, marriages among wealthy families may involve migration flows and be associated with non-participation. Hence, migration status could be an additional control for transmission of wealth. Unfortunately, place of birth (which identifies migration status) is not reported in all records. In Ronfe and Ruivães, the place of birth is missing for 77.7 per cent and 70.9 per cent of the observations, respectively. In contrast, in Horta only 6.1 per cent of observations have this information missing.

Here, we discuss results under the Likelihood Approach when (i) adding a dummy variable for migrant status for the sample of Horta and (ii) estimating the model for the subgroup of migrant women in Horta.

When we restrict the sample to immigrants in Horta, the number of observations drops to 3,571, and we lose the sample identification for the extended model with socio-economic variables. We can nevertheless get estimates by ML in this small sample with a simpler variable specification. Specifically, we replace the quarter-century dummies with century dummies and, given that the average family size in Horta is smaller, redefine the dummy variable siblings to the existence of more than two siblings. With this new specification, the AME for the complete Horta sample becomes 36.8 pp (s.e. 6.5 pp). Addition of the migrant status dummy variable only slightly changes the AME: 32.4 pp (s.e. 6.5 pp). The estimate of the coefficient for migrant status is negative and significant: \(-0.349\) (s.e. 0.090). This suggests that migrants into Horta were, on average, non-participants and, hence, migration was probably associated with marriage decisions. Finally, for the sample of migrants, the AME is 27.2 pp (s.e. 11.8 pp). To sum up, all these AME estimates overlap, which suggests that the results are robust to migration flows.

5 Sensitivity of results to prevalence of missingness

Our working sample is very large (\(n=18,523\)), and the number of mother–daughter pairs with observed participation (696, see Table 3) is not unusual for historical data. Under Assumptions 3.1 and 3.2, the ML estimator is consistent. However, the proportion of missing values for y (86.1 per cent) and for \(y^{m}\) (87.9 percent) might raise doubts regarding the robustness and reliability of the results.

Fig. 2
figure 2

Smoothed densities for AME estimators. Note: Smoothed densities for AME estimators using simulation results with 250 replications. “AME” refers to the estimated average marginal effect, i.e. the sample average change in the estimated probability of participation when the mother participation status changes from no-participation to participation. The smoothed densities are obtained using the Epanechnikov kernel function computed using Stata® kdensity command with default bandwidth parameter. Parameters of the DGP are the estimates in Baseline column in Table 7 (AME value in the DGP referred to as “True Value”). Participation and missing statuses for daughters and mothers are simulated as described in the text. “20% missing” (“50%”) means that on average 20% (50%) of mothers and daughters have their participation status unreported. “87%” means that missing statuses for mothers and daughters are simulated to target the actual proportion of missing values in the original dataset. In Panel “Ignoring missing values”, we show smoothed densities for probit estimators ignoring the simulated observations with missing participation status of mothers and daughters. We also show probit estimates obtained with the simulated data without missing values. In Panel “Likelihood Approach”, we show smoothed densities from ML estimates of the model in Sect. 3.2 using the external information under the Baseline scenario.

In order to evaluate the impact of the high proportion of missing values on our estimates, we perform a Monte Carlo experiment with \(R=250\) simulations based on our data and the extended model with additional controls from Sect. 4.3.1. (See Appendix E for the algorithm details.)

Figure 2 presents the smoothed densities of all estimators of the AME obtained from the Monte Carlo experiment, both under Ignorability and the Likelihood Approach. These show that ignoring the missingness process results in downward-biased estimates of the effect, even for a proportion of missing values as low as 20 per cent (upper panel in Fig. 2). In contrast, under the Likelihood Approach, the AME is effectively unbiased regardless of the proportion of missing values (lower panel in Fig. 2). The larger the proportion of missing values, the larger the standard errors. However, the AME is precisely estimated even for the largest missing incidence (with a standard deviation, 2.39 pp, more than ten times smaller than the true AME, 26.74 pp), probably due to the large sample size. In addition to the results shown in Fig. 2, the proportion of simulations in which the true value falls within the 95 per cent confidence interval in the Likelihood Approach with the largest missing incidence is 94 per cent.

To sum up, our Monte Carlo analysis highlights the fundamental role played by the information contained in I and \(I^{m}\), and by the modelling of the missing process on the robustness and reliability of the Likelihood Approach even when the incidence of missing values is as large as 87 per cent.

6 Conclusions

In this paper, we use historical parish registry data from the late seventeenth century until the beginning of the twentieth century from three Portuguese locations to estimate the relation between mothers’ labour market participation and that of their daughters. Our main data source is drawn from baptism, marriage, and death certificates found in the local churches. In addition, our data were matched with information from parochial censuses carried out by the priests during Lent to administer the sacrament of penance to the parishioners. Although these censuses contain invaluable information on occupation and/or social status, the coverage never exceeds 36 per cent and, in certain locations and periods, this falls to below 10 per cent. We argue that coverage of occupation/social status is associated with non-random individual factors such as social status and activity choices. Therefore, any restriction of the estimating sample to only those observations with complete coverage would result in a selected sample and estimation biases. We address the problem of non-random missing values by adapting the methodology proposed by Ramalho and Smith (2013). By allowing the estimation of models in contexts in which missing data are abundant and non-random, this methodology confers considerable potential to the examination of historical data.

Our results show a large and positive statistically significant relation between the mother’s working status and the daughter’s decision to participate in the labour market. After having controlled for location and time dummies and socio-economic characteristics, the probability that a woman participates in the labour market increases by 26.7 pp if her mother also works. One way to assess the potential importance of this estimate is to simulate the long-term evolution of the FLFP process. The average probability of participation in our sample is 29 per cent, although for women whose mothers did not work, this is only 19 per cent. Now imagine a 50 per cent increase of the latter to 29 per cent. This level would be the expected long-term participation rate in the absence of a mother-to-daughter association. In contrast, with the estimated association between mothers and daughters’ FLFP, the long-term participation rate would be expected to increase to 52 percent.

The existence of such an early transmission mechanism acting as a catalyst of change contributes towards the understanding of the long-term dynamics of female labour force participation.