Modelling Variation in Fertility Rates Using Geographically Weighted Regression

Australia is one of the largest countries in the world, and also one of the most urbanised. At the national level it can be characterised as a country with low fertility and high income per capita. However, there are significant geographic inequalities between different areas of the country in terms of fertility rates as well as education levels, income and employment opportunities. In this paper, we use birth registration and Census data to explore the spatial variation of fertility in Australia and how it relates to compositional and contextual characteristics of places. Geographically weighted regression allows us to analyse the spatial relationships and identify the geographical variability in the fertility experience of Australian women. We find substantial variation with some areas having a total fertility rate well above or below the national average. Around metropolitan areas much of the variation can be explained by differences in the socio-economic composition of the local population. However in more rural or remote parts of the country, understanding the variation in fertility is more difficult.


Introduction
Australia is a large land with considerable variation in population density across the country. It is characterised by a densely populated eastern seaboard, with 50% of the population living in the three largest cities Sydney, Melbourne and Brisbane, and less populated inland regions. As observed in other countries, there is significant spatial heterogeneity in fertility rates when comparisons are made across major cities, suburban, regional and remote areas of Australia (Australian Bureau of Statistics 2010). This paper addresses the distribution of fertility across Australia, and considers the compositional and contextual factors associated with regional differences.
In 2015, Australia had a total fertility rate (TFR) of 1.81 but geographic variation in fertility across Australia shows stark differences between areas. Australia has eight States and Territories: New South Wales, Australian Capital Territory, Victoria, South Australia, West Australia, Northern Territory, Queensland and Tasmania. Comparing the States and Territories, TFR is as high as 2.11 in the Northern Territory, and the lowest is observed in Victoria (1.68). There are also notable differences across urban and rural areas, with a TFR of 1.74 in major cities, and 2.31 in remote areas-as classified by Remoteness Area (Australian Bureau of Statistics 2016). But it is not only TFR that varies across geography. There are also remarkable spatial disparities in socio-economic disadvantage comparing urban versus rural areas, coastal versus inland areas, as well as inner versus outer suburbs (Hugo 2002). The spatial pattern of social disadvantage across different regions of Australia, as well as within cities is an area of study which has attracted much attention (Badcock 1997;Stilwell and Jordan 2007;Saunders and Wong 2014). Stilwell and Jordan (2007) found that housing, employment, education and infrastructure were key factors driving inequalities in income and wealth in Australia and emphasise the importance of space and place for public policy development. They also argue for greater attention to spatial issues more broadly when conducting social and economic research. Compared to the literature on spatial inequalities in social deprivation there has been very little research on spatial inequalities in fertility, and indeed on how the two might be related.
The absence of research on the spatial variances in fertility is not unique to Australia. While geographical awareness was a common feature of early demographic research, by the mid-twentieth century it started to lose its popularity. This move away from geography was due in part to the general shift in focus from macro-level to micro-level explanations and theories and to the growing awareness of the problems of aggregation bias or the ecological fallacy (Voss 2007). In addition there was the belief as fertility levels across the world, and within countries, began to converge, that geography was no longer an interesting factor in explaining fertility (Wilson 1990). As a result, geographical considerations have been largely absent from contemporary studies of fertility (Boyle 2003).
However, interest in the geography of demographic processes including fertility has made a strong resurgence (Voss 2007) which has coincided with the greater availability of geo-referenced data and the development of statistical techniques and software especially designed for spatial analysis (de Castro 2007). In addition, there is a general movement in social research to link micro and macro levels and people and places (Matthews and Parker 2013). There is growing evidence from a range of recent studies that even in developed countries with low fertility, significant differences in childbearing behaviours still persist across regions within countries (see Hank 2001 for Western Germany; Khawaja et al. 2006 for New Zealand;Tromans et al. 2008 for England and Wales, Walford and Kurek 2016 for Poland and England and Wales; Vitali and Billari 2015 for Italy; de Beer and Deerenberg 2007 for the Netherlands).
From a policy perspective, differences in sub-national fertility rates have important implications for the planning of local level provision of social services, education and health care (Khawaja et al. 2006). Geographic differences in fertility are also an important concern when developing national population projections (Tromans et al. 2008;Wilson 2015). More broadly, an understanding of the determinants of geographical differences in fertility for low fertility countries such as Australia, is important as it provides insight into the extent to which we can expect regional differences to persist or to converge in the future, which can inform planning at the subnational level. In this paper we examine the geographic variation of fertility rates across Australia, using a combination of birth registration and Census data. We start by outlining the broad trends in geographical variation. We then use bivariate and multivariate analysis to examine the factors that could explain the spatial variation in fertility. We are particularly interested in measures of social disadvantage and how this relates to the spatial variation in fertility.

Background
The TFR for each State and Territory, as well as for Australia as a whole, for a 25 year period is shown in Fig. 1. Although the TFRs in different states have converged over time, it is evident that the Northern Territory (NT) and Tasmania (TAS) have consistently had a TFR above the national average, whereas the Australian Capital Territory (ACT) and Victoria (VIC) have for the most part had TFR lower than other states and territories. It is a well-established pattern, even in modern societies such as Australia that rural areas tend to have lower levels of fertility than urban areas (Boyle 2003). Part of the state differences in fertility could therefore be due to the varying levels of urbanisation. However, a similar relationship also holds within capital cities of these states and territories. For example, between 2013 and 2015, Hobart (the capital of Tasmania) had a TFR of 1.98 whereas Melbourne (the capital of Victoria) has a TFR of 1.70 (Australian Bureau of Statistics 2016). What lies behind these regional differences? There is almost a complete lack of research examining the factors behind sub-national fertility differences in Australia.
In a comprehensive review of research on regional fertility differences in Europe, Duchêne et al. (2004) distinguish between two sets of factors that may be relevant: differences in the structure or composition of the population (e.g. level of education, ethnicity) and differences in the environment or ''contextual'' factors of the place of Modelling Variation in Fertility Rates Using Geographically… 123 residence (e.g. specific cultural attitudes, availability of infrastructure, housing market situation). Which compositional elements are important will vary from country to country depending on the characteristics of their population. Cultural background is known to be an important factor affecting spatial variation in fertility. In Australia, there is considerable variation in the percentage of the population that is Indigenous in different regions. While the Indigenous population comprise roughly 3% of the population of Australia, they are not evenly distributed geographically across Australia-in fact Indigenous Australians are more likely to be living in remote parts of Australia (Biddle and Wilson 2013). With a TFR of 2.27 (Australian Bureau of Statistics 2016), the Indigenous population also has higher fertility than the non-Indigenous, and therefore areas with a higher percentage of the population identifying as Indigenous could be expected to have higher fertility. However, there is some evidence that the relationship is not necessarily straightforward: for example, in the south and east of the country fertility patterns of the indigenous population are very similar to those of the non-Indigenous population (Biddle and Johnstone 2015).
Ethnic background is also important. Australia has a large percentage of first and second generation immigrants who have settled unevenly across the country. If there are particularly high concentrations of immigrants from a particular country in a region the fertility in that place could be affected. As the source countries of migrants have changed over time (Wilson and Raymer 2017) so too has their impact on fertility. The composition of each migrant intake in terms of source country and age also has the potential to impact local area fertility. Of course the impact of concentrations of migrants may not only be compositional as there would also likely be strong contextual factors related to cultural or religious beliefs.
Socio-economic status is also an important compositional determinant that is related to fertility, at least at the micro level. In Australia as in many other countries, there is a negative relationship between educational attainment and fertility (Heard and Arunachalam 2015). Women with higher levels of education have lower numbers of children. However, there are indications that the relationship with fertility varies by place. Heard and Arunachalam (2015), in looking at cohort fertility rates by remoteness area and level of education, found that the relationship is not straightforward. The level of remoteness made the most difference to those with lower levels of education where there is a clear relationship between increasing remoteness and completed fertility. For other education levels the relationship was weaker.
In this paper we move beyond simple comparisons of urban versus rural fertility to examine the spatial variation across the country. We are interested to find out why some areas have higher fertility than others, and whether the reasons behind fertility levels vary across place.

Data
The data come from the 2011 Census of Australia and birth registration data. 1 The outcome variable of interest is the TFR between 2011 and 2013. Data for the 3 years is averaged to avoid fluctuations in TFR in some areas with smaller populations. The TFR is presented at the level of Statistical Area Level 2 (SA2), based on birth registration data. Statistical Area Level 2 (SA2) is a geographical unit defined by the Australian Standard Geographical Classification (ASGC). According to the 2011 ASGC there are 2214 Statistical Area Level 2s which cover the whole country without overlap. The purpose of the SA2 is to classify the country into roughly suburb sized areas (Capuano 2011). The average population in each SA2 is around 11,000 although in large cities the population of a single SA2 can be over 20,000. After excluding SA2s covering national parks or places with a population of under 2000, the final analytical sample is 2062 units across Australia.
The distribution of TFR across the 2062 SA2s under observation are shown in Fig. 2a. For illustrative purposes Sydney and Melbourne are shown separately in Fig. 2b, c, respectively. These figures show the variability in level of TFR across Australia, the higher fertility patterns in regional and remote Australia, and the lower fertility patterns in the inner suburbs of Australia's two largest cities.

Independent Variables
Following Duchêne et al. (2004) we examine how compositional as well as structural (or contextual) characteristics are associated with the geographical pattern of fertility. To measure compositional features of the population we use area level For the contextual variables we include population density, the level of unemployment, and a measure of public housing in the area. These measures allow us to examine the link between social disadvantage and fertility and are further described below.

Compositional
Level of education is measured as the proportion of residents, aged 20-29, of the SA2 that have completed Year 12. In Australia, high school is completed at Year 12 however schooling is only compulsory until age 16 (approx. Year 10). Students who complete Year 12 have a greater likelihood of continuing with further study, particularly university, as well as entering the workforce. We focused on the education of young people in their twenties, rather than all adults, as some areas that have an older age distribution may have a high percentage of residents with very low levels of education but this would have little relationship to current fertility. We expect SA2s where a higher percentage of the population had completed Year 12 to display lower fertility, but that there will be more variation across geography for those with lower education. Cultural background is measured by five indicators which represent, respectively, the proportion of residents who identify as Indigenous, the proportion who were of South and Eastern European ancestry, the proportion who were of Middle Eastern ancestry, the proportion who were of North-East Asian ancestry, and the proportion who were of South-Central Asian ancestry. These indicators were positively skewed so each one was log transformed to improve the fit in the linear and geographically weighted regressions. These ancestry groups were chosen as they represent some of the largest cultural backgrounds in Australia and have diverse fertility rates. For example in 2015, the TFR of those women born in South-East Asia was 1.40, whereas the TFR of those born in Southern and Central Asia was 2.03 (Australian Bureau of Statistics 2016).
Relationship status is measured as the percentage of women in each SA2, aged 20-49 years of age that are married or cohabiting.

Contextual
For contextual or structural variables, we include an indicator of publicly provided housing, of the employment market, and the level of population density. The indicator of the employment market is the unemployment rate in the SA2 and population density is measured as number of people per 1000 square meters. To reduce the impact that very small and large populations have on the measure of unemployment, the logarithm of this indicator was taken. Similarly, in the modelling, the square root of the population density is used. Kulu et al. (2007) found that settlement size had a large and persistent impact on individual level fertility when comparing people living in different settlements across Denmark, Finland, Norway and Sweden with the larger settlements having lower fertility. Although this was based on individual level data, we hypothesise for Australia that population density will also be negatively related to fertility.
The effect of unemployment on fertility is difficult to predict. Kravdal (2002) found that in Norway unemployment at the individual level had a negligible effect, whereas at the aggregate level it led to a small reduction in fertility. For teenage fertility, Shoff and Yang (2012) found that more socially disadvantaged areas, including those with higher unemployment levels, tended to have higher teenage fertility rates. In Australia, Evans (2003) also found similar geographic variation in teenage fertility rates but also noted that lack of access to abortion services were fundamental to this outcome.
We also include the percentage of households in an area that are reported as public housing as this is a key indicator of disadvantage and low income (Baum and Gleeson 2010).
The distribution of the dependent variable and the independent variables are show in Table 1.

Methods
We start with bivariate analyses to examine how each of the independent variables relates to TFR across the county and use linear regression to see how well the model fits for the country as a whole. Then we use geographically-weighted regression (GWR) to see if there is any spatial variability in the relationship between the independent variables and TFR. GWR allows us to analyse the spatial relationships and identify geographical variability in the fertility experience of Australian women. It is primarily an exploratory method for analysing spatially-varying relationships (Fotheringham et al. 2002). Whereas linear regression assumes that the relationship between the independent variables and TFR in each SA2 is the same across all of Australia, GWR produces local regression estimates for each location. The software GWR4 is used to implement the analysis.
Adaptive bi-square spatial kernels are used due to the geography of Australia, which is unevenly populated. Some SA2s are small in size and will be surrounded in a fixed distance by many other small SA2s while others, particularly in Central Australia will have very few neighbouring SA2s. So rather than weighting based on fixed distance, adaptive weighting does so based on the number of nearest neighbours. Kernels are made to adapt themselves to variations in the density of the data so that kernels have larger bandwidths where the data are sparse and smaller bandwidths where the data are plentiful (Fotheringham et al. 2002). The optimal bandwidth chosen was the one that minimised the Akaike Information Criterion (AICc). Sensitivity tests were also conducted by changing the number of nearest neighbours to examine the effect on the results. Importantly GWR is flexible enough to take into account that some regression coefficients may vary geographically, while others may be global or not vary across space (Fotheringham et al. 2002).
The mixed GWR model can be written as where for each SA2 i, y i is the TFR, and (u i , v i ) is the geographical location (longitude and latitude). The first group of variables are the 'a' group (x il (a)…x ika (a)) and these are the global ones which do not vary across space. In our model this includes population density, and four of our measures of ethnicity (excluding Indigenous). Their coefficients are denoted as (a i …a ka ). In contrast the 'b' group, (x il (b)…x ikb (b)) are the local variables which do vary across space. Scatterplots were used to examine potential collinearity among locally estimated regression coefficients (Wheeler and Tiefelsdorf 2005).

Bivariate Analysis
Figure 3a-f show the relationship between TFR and the main independent variables of interest. Overall the results show that there is a positive relationship between the TFR and the per cent in the region that are: married or cohabiting; unemployed; living in public housing; or indigenous. The relationship between the per cent of 20-29 year olds with Year 12 education and population density with TFR is negative.
With regards to the per cent of women aged 20-49 who are married or cohabiting, there is overall a positive correlation (r = 0.48, p \ 0.01). There are two notable outliers. The Statistical Area of Acton stands out as having extremely an extremely low percentage of the population of women who are married or cohabiting. This is a suburb that encompasses the Australian National University (in the ACT), and the residents are largely comprised of university students. While the percentage married or cohabiting in Acton is extremely low, so is that TFR (0.14). A contrasting outlier is the remote (primarily indigenous) community of Palm Island (in QLD) that has high fertility, but low levels of marriage/cohabitation.
In relation to the percentage of 20-29 years olds that have completed Year 12 a negative relationship is evident at the SA2 level, when correlated with TFR (r = -0.61, p \ 0.01). As before, Acton stands out as an outlier, due to its extreme low level of fertility. Crace and Bonner stand out as having relatively high level of fertility but also high levels of education. These SA2s are in the Australian Capital Territory, and are relatively new suburbs with many amenities considered good to start a family. East Pilbara, a remote area of Western Australia, is an example of a place with low levels of education, but also low fertility. Mining is the main The percentage of public housing, as a percentage of all households, in an SA2 is weakly but positively related to TFR (r = 0.12, p \ 0.001). Once again there are some outliers from the Australian Capital Territory, with the SA2s of Braddon and Reid as two examples of places with relatively high levels of public housing and low levels of fertility. In the case of Braddon and Reid, this can be explained by the fact that these inner city suburbs have a mixed composition of public housing tenants as well as young professionals who rent or own apartments in addition to very expensive private homes. Tiwi Islands also stand out as having high levels of public housing but relatively low fertility which could be due to poor registration of Aboriginal births (State of Queensland (Queensland Health) 2014; Gibberd et al. 2016).
As expected, population density (here we work with the square root of population density) is negatively related to TFR at the SA2 level (r = -0.57, p \ 0.001). Lakemba-Wiley Park in Sydney is an example of an SA2 with a high population density, but also high fertility. This can be explained by the fact that Lakemba-Wiley Park is one of the most deprived and socially disadvantaged areas of Sydney (Baum and Gleeson 2010), and one with a high proportion, 49%, of the population being Muslim (Australian Bureau of Statistics 2013). Unemployment is positively related to fertility at the SA2 level (r = 0.26, p \ 0.001). Roxby Downs (South Australia) and Nhulunbuy (Northern Territory) stand out as having very low unemployment and yet reasonably high fertility. Roxby Downs is a mining area which could explain the very low unemployment (1.8% of people aged 15 and over).
The per cent of the population identify as indigenous is positively related to TFR at the SA2 level (r = 0.55, p \ 0.001). Table 2 shows the results of the linear regression. All the predictors are statistically significant at p \ 0.001, with the exception of the percentage of north-east Asian ancestry and South Central Asian ancestry. Variance inflation factor showed no significant collinearity.

Linear Regression
The compositional factors predict TFR in the expected direction. As predicted, there is a negative relationship between the percentage of 20-29 year olds that have completed Year 12 in an SA2 and TFR. For every 1% increase in the percentage who had completed Year 12, TFR is predicted to be 0.003 lower.
Ethnic background is also important. Not surprisingly, the higher the percentage of the population that is indigenous the higher the TFR. A higher percentage of the population of Middle-Eastern ancestry is also associated with higher TFR, Southern and Eastern European ancestry is associated with lower fertility.
In terms of the composition of the population by marital status, the higher the percentage of women married or cohabiting, the higher the TFR.
Turning to the contextual variables, the percentage of the population unemployed is strongly and positively related to TFR. Public housing rates in the SA2 are positively related to fertility. In contrast, population density is negatively related to fertility as predicted.
So how well does this linear regression explain the variation in fertility? Overall, the variables included explain 63% of the variation in TFR across the 2062 SA2s. An examination of the residuals from the linear regression identifies that many were identified as outliers in the scatterplots in Fig. 3. The largest negative standardized residual from the linear regression is found in APY lands in outback South Australia. There, TFR is low (1.44), despite having characteristics associated with a high TFR. According to this model, predicted TFR for APY lands would be 2.97, demonstrating that the global model is unsuitable for predicting regional variation. In contrast, other places have large positive residuals, with observed TFR being much higher than predicted from the model. For example in the largely indigenous settlement of Palm Island, TFR in 2011, and yet the predicted fertility is lower at 2.61. Lakemba-Wiley Park in Inner South West Sydney as well as Lethbridge Park-Tregear in the Blacktown area also record considerably higher TFR than predicted by this model. Lethbridge Park-Tregear has high a high percentage of the population unemployed (17%) and low levels of education (47% of 20-29 year olds having completed Year 12). It has a recorded TFR of 2.98, whereas the predicted value is 2.49.
Overall the linear regression performed well and the independent variables included explained just over 60% of the variation in TFR among the different SA2s. However, linear regression is a global model in that it assumes that the relationship between contextual and compositional factors and fertility is the same across all areas of the country. It does not take into account that some factors may be important predictors of fertility in some areas but not others.

Geographically Weighted Regression
To examine the effect of location, we require a technique that can determine the spatial variability in the relationship between the variables and TFR. Geographically weighted regression produces a set of estimates for each location in the study and tells us how the effect of predictors of fertility differ in their strength, and possibly direction, across different locations. A geographical variability test was conducted which indicates that population density, as well as the cultural background measures (apart from percentage Indigenous) do not vary in their relationship with fertility across the locations. Therefore they were included in the model as fixed components.
The remaining variables were included as local components as their relationship to fertility changes across space. For the fixed components, we show the coefficient, standard error and t-statistic in Table 3. For the spatially-varying variables the mean, median, lower quartile and upper quartile are displayed instead. The adjusted R 2 coefficient has increased from 0.63 to 0.74, although they are not strictly comparable, as an increase is expected given difference in degrees of freedom. However the decline in AICc shows a noticeable improvement in the GWR model. Figure 4 shows the local R 2 based on the GWR. The pattern illustrates that there is a better fit in urban areas. In inner regional and outback Australia the variability in TFR explained by the GWR is very low.
Casewise diagnostics based on standardised residual and leverage values were examined to identify areas where predicted TFR was severely over or underestimated by the model. Six areas had large negative standardised residuals over -3, and these were similar to the ones identified in the linear regression model. These were primarily remote areas with a high indigenous population where observed TFR was substantially lower than what the model predicted based on the characteristics of the area and the composition of the population.
Education is significantly related to TFR in over half of the SA2s, and in 93% of areas the relationship between the percentage of 20-29 years olds who had completed year 12 and fertility, was negative.
The SA2s with the strongest negative relationship between education and fertility are predominantly in Sydney, Melbourne and Brisbane. In these inner-city areas low Fig. 4 Local R 2 fertility is strongly associated with a high proportion of the population with high school qualifications.
Unemployment had a positive relationship with fertility. It was significant in 80% of SA2s, including SA2s such as Logan Central (in QLD) which had high fertility (3.03) and high rates of unemployment (22%) as well as SA2s with low unemployment and low fertility. The percentage of women aged 20-49 that were married was significantly related to fertility in 97% of cases.
Public housing rate was only significantly related to fertility in 22% of SA2s, and then it was primarily a positive, and relatively strong relationship. In those SA2s the higher the public housing rate, the higher the fertility rate.
The percentage of the population with an indigenous background was a significant predictor of TFR in around 30% of SA2s, and in 9 out of 10 cases the relationship was positive. As seen in Fig. 5, in Cairns, Darwin, as well as most of Northern Territory and other parts of outback Australia positive relationship. There were a handful of SA2s such as Bankstown or Auburn where the coefficient was negative. As an example these two places had less than 0.5% indigenous, yet relatively high TFR of 2.51 and 2.7 respectively.

Discussion
This paper has demonstrated the importance of place in understanding fertility patterns in Australia. We show that close to three-quarters of the variation in the TFR across the country can be explained by the social, economic, demographic and contextual characteristics in the geographically weighted regression model. The model works best in urban areas where the impact of education, employment, housing and indigenity are closely linked with fertility. The model is less successful in rural and remote areas where the contextual drivers are likely to also be quite different from the city. The failure of the model to correctly predict fertility in some remote indigenous communities highlights the different cultural and structural context of indigenous fertility. It is also possible that measurement issues, including lack of birth registration, associated with sparsely populated areas are impacting the model.
Population density and ethnic background did impact fertility, but the relationship did not vary across areas. Fertility is lower in more dense areas and fertility varies in areas with different concentrations of ethnic backgrounds.
The paper is limited in the range of data that are considered. We do not consider child care availability or the ability of family to provide support. Nor do we measure other local structural constraints to education and employment. Another limiting factor is the local level structure of the population in terms of age, which could be a particularly significant factor in the impact of migrant groups on local level fertility.
When interpreting the results it is important to not extend any conclusions between the variables seen in this analysis and TFR to the individual level as it would result in what has been termed an 'ecological fallacy' (Hank 2001). To take an example given by Courgeau and Baccaini (1998) for migration, and transposing it to fertility, the ecological fallacy would be to conclude, after observing that fertility is high in areas with high unemployment levels, that unemployed people have a particularly high fertility. We also cannot generalise these results to smaller or larger aggregations of geography. However, we have found that areas with a high proportion of individuals with these characteristics are more likely to have high fertility rates when compared with areas with low prevalence. Notwithstanding these caveats, the paper does demonstrate that the composition of the population is not the only driver of fertility and that contextual and other unmeasured play a role. Additionally, the impact of context differs across the country.