Estimating the Goodman, Keyﬁtz and Pullum Kinship Equations: An Alternative Procedure

As is often the case in demography, Goodman et al. (Theoretical Population Biology, 5:1–27, 1974) developed their theory of the interrelationships of fertility, mortality and kinship numbers by means of continuous mathematics [integrals], but resorted to finite approximations for calculating results. Recent developments in computer software now provide an alternative procedure that avoids extensive programming of finite approximation algorithms: (1) continuous functions are found to represent discrete data on fertility and mortality; (2) the resulting functions and parameter estimates are then inserted directly into the kinship equations, and the integrals evaluated numerically. This procedure has the potential for use in many other areas of population mathematics, where theory is given by integrals and other continuous expressions, but data are for discrete age groups.

1 A major qualification of this statement relates to the potential role of high levels of divorce and remarriage in supplying an individual with 'new' kin -step kin -in addition to those resulting from first marriage and birth. disappear: there would be no brothers or sisters, aunts or uncles, nieces or nephews, or cousins-only, parents, grandparents, child, and grandchild.
Despite its substantive importance, their approach has not seen much further development [for example, by the inclusion of data on proportions married, or the relaxation of the stable population assumption] or widely used for the exploration of substantive questions relating to kinship (with the major exceptions of Goldman 1978, 1984and Coresh and Goldman 1988. One practical barrier has been the difficulty of estimating the integral equations in which the basic relations are stated, equations containing up to quadruple integrals. In their original paper the authors comment: 'Ordinarily, we cannot evaluate the l(x) and m(x) functions for arbitrary values of x, since the data are usually collected for 5-year age intervals ' (p. 24). To estimate the equations, they develop finite approximations of the multiple integrals, programmed in Fortran by Pullum. In its original form, this Fortran code ran to more than ten single-spaced pages. It has been used in the later work by Goldman, and more recently by Keyfitz (1986), in an analysis of Canadian kinship numbers. But such code, written by someone else, is often difficult to master or to modify correctly.
This note illustrates an alternative procedure for evaluating the kinship integrals, using computer software developed since their paper first appeared. The procedure allows one in effect to 'evaluate the l(x) and m(x) functions for arbitrary values of x.' It involves a minimum of programming, yields results that agree well with the Pullum approximations, and has the advantage, both scientific and pedagogical, of working directly with the theoretical equations rather than with long finite approximation algorithms. Theory and computation are more closely linked.
The procedure involves two steps: (1) analytic expressions are found to represent empirical data on age-specific fertility and survivorship; (2) these expressions are substituted into the theoretical integral equations for kin numbers [with appropriate arguments and limits of integration], which are then evaluated numerically.
In the present note, the first step has been accomplished using TableCurve, an automated curve-fitting package using standard algorithms for linear or non-linear fitting. 2 Any general-purpose curve-fitting routine could be used. TableCurve has the advantage, for this application, that the user does not have to supply a functional form ahead of time, although user-defined functions are an option. The program has a built-in library of over 3500 functions, and can successfully fit most sets of demographic data by age or duration. 3 The resulting analytic expressions and parameter estimates are used solely to represent particular schedules of age-specific mortality and fertility. They do not have, nor need they have for this application, any theoretical rationale or interpretation for their parameters. The only requirement is a close fit to the data at hand. Of course, if functional forms better grounded either in mathematics, empirical research, or substantive theory are available, their use in this application would be possible and desirable.
The second step uses the numerical integration capabilities of Mathcad, a numerical mathematics package. 4 Again, other mathematics packages could be used, so long as they can evaluate multiple integrals. Mathcad has an advantage that basic formulas are entered and appear [on the screen and in hardcopy] in standard mathematical notation, tying the calculations more closely to theoretical equations. Note, however, that the results still are based on underlying numerical approximation procedures not unlike those of Pullum'. 5 The procedure is illustrated for children and grandchildren for 1981 Canadian data, and the results compared with those in Keyfitz (1986). Since both techniques start with data for 5-year age intervals to approximate theoretical integrals, neither can be said to yield 'correct' estimates of kin numbers, so that Keyfitz's results cannot serve as an absolute standard against which to judge the new procedure proposed. In any case, the agreement is close, 6 and the choice between the two computational techniques can be made on other grounds -ease of application, transparency, and flexibility.
Canadian 1981 age-specific fertility rates from Keyfitz (1986) were modified by adding zero values at ages 10 and 52.5, and fit by TableCurve. 7 Perfect fits were given by high-order polynomials, with eight to ten parameters. But for convenience in further use, more compact functions, with three or four parameters, were examined. The following function was chosen 8 : 4 PTC Inc., Needham, Mass. 5 It is conceivable that expressions for fertility and survivorship could be found that would lead to closed-form solutions of the kinship equations. But these still would not be exact solutions given the approximation involved in the underlying data. 6 As it should be, given that both are using essentially the same data and similar numerical approximation procedures. The small differences observed presumably relate to small differences in input [for example, treatment of extreme ages of fertility or survivorship, age indexing, etc.] and in numerical procedures. 7 For fitting, age-specific fertility rates were associated with the mid-points of their respective age intervals. This clearly involves error, especially in the intervals 10-14 and 45-49. With more information [e.g., data on births by single-years of age], average ages instead of midpoints could be used. Or one could simply assume that the rate for 10-14 should be associated with some age greater than 12.5. But such refinements are not necessary for present purposes. 8 For readability, only three digits are given for parameter values. For accurate graphing of these functions more digits may be needed, especially if the function is non-linear. See Note to When the resulting function f(x) is integrated over the same reproductive span as given by the original data (ages 10-50), the total fertility rate agrees with that computed in the usual way to within 0.1%. As well, visual inspection and conventional measures of goodness of fit suggest that f(x) provides a reasonable fit to the fertility data at hand. To repeat, that is the only goal for the present application. No theoretical or substantive claims are made for the resulting functions; we use them as approximating functions, defined by TableCurve as '. . .nothing more than an equation which is used to represent X-Y data' (Systat 2002, pp. 20-1). 9 To eliminate small non-zero values of f(x) outside the reproductive ages, the function is redefined by inserting conditions on x which evaluate the function as zero when x is less than 10 or greater than 52.5. The function is also re-defined to adjust for the sex ratio at birth [since the kinship equations relate to one-sex, stable population models], yielding m(x), a maternity function for female births.
A similar curve-fitting procedure was applied to L x values from the 1981 abridged life table for Canada [the data used by Keyfitz] to fit a survivor function. 10 In this case, four parameter functions were required to get an adequate fit. The chosen function: As with the fertility function, conditions on x were inserted to assure that the curve behaves properly at ages outside the range of observation. 11 And, the values were adjusted to take account of the 5-year intervals of the original L x data, yielding a survivorship function p(x) (See Appendix A.1). Estimates by the proposed procedure are in close agreement with those of Keyfitz (1986), presented for comparison. Agreement is to within 1.1 per 100 kin for all categories and ages. The largest relative errors are for daughters and living daughters at age 20 of the reference woman -about 15%. These presumably relate to differences in procedures for dealing with fertility rates in the earliest ages of childbearing. But notice that the substantive story is not appreciably different, 6 or 7 daughters born per 100 women by age 20.

Discussion
The differences between the results of the proposed computational procedure and those produced by the Pullum algorithm are negligible, within the bounds of error of the original data. Moreover, the results are precise enough for any likely substantive use to which they might be put, given that they relate to a highly abstract model of kinship [a one-sex stable population model, with no input for marriage patterns]. The general approach used above clearly has applications to other areas of population mathematics. The approach is not entirely novel, but until recently it was impractical and beyond the capabilities of many researchers. Finite sums using grouped data became conventional. Writing as recently as 1985, for example, Keyfitz could note correctly with respect to an expression for the intrinsic growth rate r: 'no direct use can be made of a continuous form like (5.1.4) -it must be converted to the discrete form for calculations ' (1985, p. 115), and more generally: 'Although the stable age distribution is easier to think about in the continuous version, application requires a discrete form ' (1985, p. 81).
Due to recent developments in computer software, this is no longer the case. As illustrated above, it is now relatively easy to find continuous functions to represent many demographic data sets, and to do direct numerical evaluation of integrals and other analytic expressions. In some contexts, working with analytic expressions for processes such as fertility, survivorship and marriage may be a more effective way to derive numerical results than traditional finite sums. At the very least, one now has a choice.
Approximating functions also can be effective for interpolation and -with due caution -extrapolation.
The suggested procedure is a reminder of Hakkert's (1992) argument that many standard demographic algorithms were derived for purposes of hand calculation, and may need to be revised to make greater use of modern developments in statistics and computer software. 12

Number of Kin per 100 Women Age Daughters Living Daughters Granddaughters Living Granddaughters
Results from equations in Figure 1:

50.9
Results from Keyfitz [1986] As with any use of computerized 'black box' procedures, of course, one must balance the potential advantages in ease, speed and flexibility of computation against the possibility of unrecognized pitfalls leading to seriously incorrect results. In the case at hand, for example, it would be easy to select a survivorship function that rises after age 100 or so. The careless use of such a function in the kinship equations would lead to meaningless results for some kinship categories. Computer mathematics software is at best a partial substitute for mathematical skill, and no substitute at all for thoughtful analysis.
Finally, it should be emphasized once more that in this approach, the analytic expressions are used solely to represent specific sets of data. Fertility schedules for a high-fertility population might lead to different functions being selected. The discovery of general analytic expressions for such processes, especially expressions with theoretically meaningful parameters, is another, more difficult and more important task.

Appendix A: Tablecurve Output for Fit of Survival Curve
Note. This is a facsimile of the TableCurve graphic output for the function fit to L x data, to represent survivorship. Parameter values and measures of goodness of fit are given to 15-digit accuracy. This is not justified by the accuracy of the basic data. But if one wishes to graph the function independently of TableCurve, many digits may be required to get an accurate graph, for example, with the correct range or specific values of y. Other output, not shown here, gives summary statistics and confidence intervals for parameters.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.