1 Introduction

In a pioneering paper , Goodman et al. (1974) presented a general analytic system for studying the relationships between mortality and fertility and kin numbers. For stable populations with varying regimes of fertility and mortality, they provide formulas to calculate average numbers of kin, by category of kin, for females of various ages.

Of great substantive importance was their demonstration of the strong relationship between kin numbers and fertility levels for all categories of kin except ascendants in the direct line. The general relationship, obvious after the fact, was not widely recognized before their work [more attention had been devoted to the effect of mortality on kinship], nor had it been quantified even roughly. The relationship, combined with current low levels of fertility in many societies [for example, Italy with a total fertility rate of 1.3, or about 0.65 daughters born per woman] points to a continuing decline in numbers of kin for the average person in the future, and probably an associated decline in the importance of family and kinship in everyday life.Footnote 1

The potential importance of this finding can be illustrated by a mental experiment. Suppose China’s ‘one-child’ policy were perfectly realized, with no one having more than one birth. In a generation or two, collateral kinship would disappear: there would be no brothers or sisters, aunts or uncles, nieces or nephews, or cousins—only, parents, grandparents, child, and grandchild.

Despite its substantive importance, their approach has not seen much further development [for example, by the inclusion of data on proportions married, or the relaxation of the stable population assumption] or widely used for the exploration of substantive questions relating to kinship (with the major exceptions of Goldman 1978, 1984 and Coresh and Goldman 1988). One practical barrier has been the difficulty of estimating the integral equations in which the basic relations are stated, equations containing up to quadruple integrals.

In their original paper the authors comment: ‘Ordinarily, we cannot evaluate the l(x) and m(x) functions for arbitrary values of x, since the data are usually collected for 5-year age intervals’ (p. 24). To estimate the equations, they develop finite approximations of the multiple integrals, programmed in Fortran by Pullum . In its original form, this Fortran code ran to more than ten single-spaced pages. It has been used in the later work by Goldman, and more recently by Keyfitz (1986), in an analysis of Canadian kinship numbers . But such code, written by someone else, is often difficult to master or to modify correctly.

This note illustrates an alternative procedure for evaluating the kinship integrals, using computer software developed since their paper first appeared. The procedure allows one in effect to ‘evaluate the l(x) and m(x) functions for arbitrary values of x.’ It involves a minimum of programming, yields results that agree well with the Pullum approximations, and has the advantage, both scientific and pedagogical, of working directly with the theoretical equations rather than with long finite approximation algorithms. Theory and computation are more closely linked.

The procedure involves two steps: (1) analytic expressions are found to represent empirical data on age-specific fertility and survivorship; (2) these expressions are substituted into the theoretical integral equations for kin numbers [with appropriate arguments and limits of integration], which are then evaluated numerically.

In the present note, the first step has been accomplished using TableCurve, an automated curve-fitting package using standard algorithms for linear or non-linear fitting.Footnote 2 Any general-purpose curve-fitting routine could be used. TableCurve has the advantage, for this application, that the user does not have to supply a functional form ahead of time, although user-defined functions are an option. The program has a built-in library of over 3500 functions, and can successfully fit most sets of demographic data by age or duration.Footnote 3

The resulting analytic expressions and parameter estimates are used solely to represent particular schedules of age-specific mortality and fertility. They do not have, nor need they have for this application, any theoretical rationale or interpretation for their parameters. The only requirement is a close fit to the data at hand. Of course, if functional forms better grounded either in mathematics, empirical research, or substantive theory are available, their use in this application would be possible and desirable.

The second step uses the numerical integration capabilities of Mathcad , a numerical mathematics package.Footnote 4 Again, other mathematics packages could be used, so long as they can evaluate multiple integrals. Mathcad has an advantage that basic formulas are entered and appear [on the screen and in hardcopy] in standard mathematical notation, tying the calculations more closely to theoretical equations. Note, however, that the results still are based on underlying numerical approximation procedures not unlike those of Pullum’.Footnote 5

The procedure is illustrated for children and grandchildren for 1981 Canadian data, and the results compared with those in Keyfitz (1986). Since both techniques start with data for 5-year age intervals to approximate theoretical integrals, neither can be said to yield ‘correct’ estimates of kin numbers, so that Keyfitz’s results cannot serve as an absolute standard against which to judge the new procedure proposed. In any case, the agreement is close,Footnote 6 and the choice between the two computational techniques can be made on other grounds – ease of application, transparency, and flexibility.

Canadian 1981 age-specific fertility rates from Keyfitz (1986) were modified by adding zero values at ages 10 and 52.5, and fit by TableCurve.Footnote 7 Perfect fits were given by high-order polynomials, with eight to ten parameters. But for convenience in further use, more compact functions, with three or four parameters, were examined. The following function was chosenFootnote 8:

$$ {\displaystyle \begin{array}{l}f(x)={e}^{\left(a+\left[ bx\sqrt{x}\right]+c\sqrt{x}\right)}\\ {}\kern1.13em a=-35.1\kern0.5em b=-0.122\kern0.5em c=9.66\end{array}} $$

When the resulting function f(x) is integrated over the same reproductive span as given by the original data (ages 10–50), the total fertility rate agrees with that computed in the usual way to within 0.1%. As well, visual inspection and conventional measures of goodness of fit suggest that f(x) provides a reasonable fit to the fertility data at hand. To repeat, that is the only goal for the present application. No theoretical or substantive claims are made for the resulting functions; we use them as approximating functions , defined by TableCurve as ‘…nothing more than an equation which is used to represent X-Y data’ (Systat 2002, pp. 20–1).Footnote 9

To eliminate small non-zero values of f(x) outside the reproductive ages, the function is redefined by inserting conditions on x which evaluate the function as zero when x is less than 10 or greater than 52.5. The function is also re-defined to adjust for the sex ratio at birth [since the kinship equations relate to one-sex, stable population models], yielding m(x), a maternity function for female births.

A similar curve-fitting procedure was applied to L x values from the 1981 abridged life table for Canada [the data used by Keyfitz] to fit a survivor function.Footnote 10 In this case, four parameter functions were required to get an adequate fit. The chosen function:

$$ {\displaystyle \begin{array}{l}s(x)=a+\frac{b}{1+{e}^{\frac{c-x}{d}}}\\ {}\kern1em a=-0.741\kern0.5em b=5.66\kern0.5em c=84.4\kern0.5em d=-8.85\end{array}} $$

As with the fertility function, conditions on x were inserted to assure that the curve behaves properly at ages outside the range of observation.Footnote 11 And, the values were adjusted to take account of the 5-year intervals of the original L x data, yielding a survivorship function p(x) (See Appendix A.1).

2 Estimating Kin Numbers

Figure 7.1 defines the Goodman, Keyfitz and Pullum equations for daughters born, living daughters, granddaughters born and living granddaughters by age a of an average woman [ego]. The fertility and survivorship functions m(a) and p(a) are as defined above. Given these equations and function definitions, Mathcad evaluates the integrals (see Appendix A.2). The results are given in Fig. 7.2.

Fig. 7.1
figure 1

Estimating Kin Numbers

Fig. 7.2
figure 2

Comparison of estimates

Estimates by the proposed procedure are in close agreement with those of Keyfitz (1986), presented for comparison. Agreement is to within 1.1 per 100 kin for all categories and ages. The largest relative errors are for daughters and living daughters at age 20 of the reference woman – about 15%. These presumably relate to differences in procedures for dealing with fertility rates in the earliest ages of childbearing. But notice that the substantive story is not appreciably different, 6 or 7 daughters born per 100 women by age 20.

3 Discussion

The differences between the results of the proposed computational procedure and those produced by the Pullum algorithm are negligible, within the bounds of error of the original data. Moreover, the results are precise enough for any likely substantive use to which they might be put, given that they relate to a highly abstract model of kinship [a one-sex stable population model, with no input for marriage patterns].

The general approach used above clearly has applications to other areas of population mathematics. The approach is not entirely novel, but until recently it was impractical and beyond the capabilities of many researchers. Finite sums using grouped data became conventional. Writing as recently as 1985, for example, Keyfitz could note correctly with respect to an expression for the intrinsic growth rate r: ‘no direct use can be made of a continuous form like (5.1.4) – it must be converted to the discrete form for calculations’ (1985, p. 115), and more generally: ‘Although the stable age distribution is easier to think about in the continuous version, application requires a discrete form’ (1985, p. 81).

Due to recent developments in computer software, this is no longer the case. As illustrated above, it is now relatively easy to find continuous functions to represent many demographic data sets, and to do direct numerical evaluation of integrals and other analytic expressions. In some contexts, working with analytic expressions for processes such as fertility, survivorship and marriage may be a more effective way to derive numerical results than traditional finite sums. At the very least, one now has a choice.

Approximating functions also can be effective for interpolation and – with due caution – extrapolation.

The suggested procedure is a reminder of Hakkert ‘s (1992) argument that many standard demographic algorithms were derived for purposes of hand calculation, and may need to be revised to make greater use of modern developments in statistics and computer software.Footnote 12

As with any use of computerized ‘black box’ procedures, of course, one must balance the potential advantages in ease, speed and flexibility of computation against the possibility of unrecognized pitfalls leading to seriously incorrect results. In the case at hand, for example, it would be easy to select a survivorship function that rises after age 100 or so. The careless use of such a function in the kinship equations would lead to meaningless results for some kinship categories. Computer mathematics software is at best a partial substitute for mathematical skill, and no substitute at all for thoughtful analysis.

Finally, it should be emphasized once more that in this approach, the analytic expressions are used solely to represent specific sets of data. Fertility schedules for a high-fertility population might lead to different functions being selected. The discovery of general analytic expressions for such processes, especially expressions with theoretically meaningful parameters, is another, more difficult and more important task.