1 Background

Lung cancer is one of the top five contributors to disease burden globally [1] and in Australia [2]. This burden is mainly due to early death [1, 2] as 5-year survival is relatively poor whether compared against all cancers combined (19% versus 70% in 2012–2016) or the wider population [3].

Most health loss from lung cancer is amenable to change by prevention and early detection [4, 5] as incident cases are often attributable to risk exposure from tobacco use [6, 7], chemicals in the workplace or wider environment [7], inadequate diet and physical inactivity [4, 8]. As with health risks generally [9, 10], exposure to lung cancer related risks are unequally, and often inequitably, distributed. Risks cluster among groups within the population [4] and lead to disparities in lung cancer incidence and outcomes [11]. Put another way, people with higher likelihood of lung cancer are also more likely to share particular demographic characteristics, and health and socio-economic disadvantages. Demographic correlates include sex [7, 12,13,14], ancestry, ethnicity [13, 14] and Aboriginal status [3], living in remote locations [4] and living alone [15]. Pre-existing physical health conditions are common because of the strong associations with tobacco smoking and ageing [16,17,18,19]. Socio-economic disadvantage is strongly associated with tobacco use [20, 21] and, in turn, heavily contributes to lung cancer incidence [10, 22,23,24]. In Australia, relative disadvantage at area levels is estimated using 5-yearly census records [25] and matching with population cancer registry records allows for monitoring ecological cancer trends. Person level descriptions of some demographic, most health characteristics and the discrete factors contributing to socio-economic disadvantage in Australia are not routinely available and rely on infrequent survey samples [14, 26] or dedicated epidemiological studies. Other jurisdictions face similar limitations, for example in the US where lung cancer screening has been recommended for over a decade, information on eligibility and uptake of screening rely heavily on National Health Survey samples [27, 28].

A comprehensive, systematic approach toward person-centred information in cancer control is warranted. Information describing the characteristics of people diagnosed with lung cancer diagnosis can help meet that need [29]. The knowledge gained will assist with targeting interventions [30] to address health inequities across areas of prevention, screening and early detection, treatment and palliation while making best use of the limited resources available [31].

This study examines associations between demographic, health conditions and socio-economic characteristics of adults in New South Wales and the subsequent diagnosis of lung cancer. The purpose in doing so is to indicate people groups at elevated risk of lung cancer who may require additional attention in service planning and delivery [32].

2 Methods

2.1 Study design

A statewide, population-based study of adults aged 18 years or more in New South Wales (NSW) at Australia’s 2016 Census (August 2016). Participants were then observed over the contiguous period from September 2016 to December 2018 for a first registration of lung cancer.

2.2 Data sources

We used person level, unit records from three data sources: the Australian census, Australia’s universal Pharmaceutical Benefits Scheme (PBS) and the NSW Cancer Registry (NSWCR). Census informed on demographic characteristics of individual and household composition as well as the principal components of socio-economic disadvantage. PBS records covered prescription medications in the 12-months before census among people without lung cancer, or the 12-months before lung cancer diagnosis informed by the NSWCR.

Data were deterministically linked by the Australian Bureau of Statistics (ABS) within their Person-Level Integrated Data Asset (PLIDA) [33, 34] using a unique Person Linkage Spine (PLS) for people recorded in the Australian Medicare Consumer Directory, Centrelink or Taxation datasets.

Our study included linked PLIDA records with a PLS. Those without a PLS, or those with a first, invasive cancer diagnosis other than lung cancer occurring from September 2016 to December 2018 were excluded.

2.3 Variables

The primary outcome is a first diagnosis of lung cancer (ICD10 C33-34) from September 2016 to December 2018.

Demographic characteristics included: age at census arranged in categories (18–39, 40–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, 85 years or more) for tabled results and as a continuous measure in multivariable models; sex; geographic remoteness using the Accessibility and Remoteness Index of Australia (ARIA) categorised as major city, inner regional, outer regional/remote; country of birth and Aboriginal self-identification are adopted as proxy measures of ethnicity and ancestry [35], and grouped as Aboriginal, other Australian; China; Germany; Greece; Italy; Lebanon; New Zealand; the Philippines; the United Kingdom; Vietnam; “other mainly English speaking” countries and “mainly non-English speaking” countries [36]; and, single occupant households.

PBS records included the Anatomical Therapeutic Classification (ATC) of prescribed medications. We mapped medications to a set of specific conditions in the Rx Risk comorbidity index using a method validated in Australian settings [37].

Socio-economic disadvantage characteristics included area Index of Relative Socio-Economic Disadvantage (IRSD) [25] quintiles based on Statistical Area 2 level geography. Each of the principal components within IRSD were dichotomized following ABS methods [37, 38]. Those variables included poor English language proficiency, low household income, core function limiting disability at age less than 70, employment status and occupation (as drivers and labourers), no education or noncompletion of Year 12 high school, households with children, resident parent numbers and rental through a housing authority. The latter provided a signal on the nature of housing in lieu of overcrowding which, along with information on household internet connection and motor car availability, was not available in PLIDA.

The potential for selection bias within the whole of enumerated population was examined by testing for PLS presence (or not) for each census variable. A difference in PLS presence for a given characteristics indicates a systematic difference in the population as a whole, and the cases and non-cases compared. For example, if younger adults in the population census are less likely to have an administrative health, social security or taxation record, they are less likely to have a PLS, and less likely to be included in analyses.

2.4 Statistical analysis

Analyses were stratified by sex because patterns of lung cancer incidence vary markedly for males and females and reflect historical differences in tobacco and other risk exposures [39]. We first compared characteristics of participants diagnosed/not diagnosed with lung cancer using cross-tabulations with odds ratios and 95% confidence intervals from multiple logistic regressions (Step 1). We repeated this for each demographic, health and socio-economic characteristic. The odds ratios reported are age-adjusted (aOR) as age was strongly associated with most characteristics. Step 2 focussed on multivariable analyses. Starting with all potential covariates, we purposefully removed the least-contributing variables (with Wald statistic p-values of > 0.2 as a guide), then refitted the model with remaining covariates until deriving a main effects model where each retained covariate substantially contributed [40].

We further summarised the multivariable model by counting the number of relevant demographic, health condition and socio-economic characteristics present for each person—awarding one “point” for each characteristic. That is, for males we totalled points for: relevant ancestries, living alone, having four or more medicated conditions, and each of the relevant socio-economic variables. Similarly, for females we totalled: relevant ancestries, living alone, having four or more medicated conditions, less than Year 12 education and public housing rental. We repeated cross-tabulations and multivariable analyses using the number of correlated characteristics in lieu of discrete variables, then estimated the marginal mean change in odds of lung cancer in the presence of 0, 1, or 2 or more characteristics. Behavioural characteristics such as tobacco use are not included in the available datasets and not open to direct inference.

In each step, we assessed potential collinearity among covariates using variance inflation factors. Data preparation and analyses using Stata 17 were conducted within the ABS DataLab.

3 Results

3.1 Participants

Of 6,120,982 adults enumerated in the 2016 Census, 89.9% were assigned a PLS and became available to analysis (Fig. 1). Linkage was less likely among the youngest and oldest adults (11.9% aged 18 to 23 years and 13.1% aged 85 years or more were not linked), those in remote areas (15.5%), and those born in unspecified, non-English speaking countries (29.1%). After excluding a further 76,484 adults with a first diagnosis of cancer other than lung cancer in the period September 2016 to the end of 2018, a group of 6160 with lung cancer and 5,422,133 community members without a first cancer diagnosis in the period were available to analysis.

Fig. 1
figure 1

Participant selection

Comparatively more males than females were diagnosed with lung cancer (3276 or 0.13% versus 2884 or 0.10% respectively) (Table 1). Participants diagnosed with lung cancer were substantially older than those not diagnosed. For example, 63% of males and 64% of females diagnosed with lung cancer were aged between 50 and 74 years against 35% of males and females community members without cancer diagnosis.

Table 1 Age distribution of NSW lung cancer cases (Sept 2016–Dec 2018) and NSW community controls (2016 census)

3.2 Males

Distributions of demographic, health and socio-economic disadvantage characteristics among males are described in Table 2.

Table 2 Demographic, health and socio-economic characteristics among lung cancer cases and community non-cancer cases—males

Age-adjusted odds of lung cancer among males increased in the presence of several demographic characteristics. For example, as area remoteness increased, so did the odds of lung cancer diagnosis. Compared with non-Aboriginal Australian born males, odds of lung cancer were twice as high among Aboriginal males (aOR = 2.19 95% CI 1.75,2.75) with elevated odds also observed among males born in Greece, Italy, Lebanon, New Zealand, the United Kingdom and other, non-specified non-English speaking countries, and, those living alone, aOR = 1.41 (95% CI 1.29,1.53). Among the discrete, disadvantage related variables, increased odds of lung cancer were evident among males with poor English proficiency, living in low-income households, with disability, being unemployed or having labourer occupations, less education, or renting through a housing authority.

In a combined, multivariable relationship of male characteristics with lung cancer diagnosis demographic characteristics of increasing age, living outside of major cities and specific backgrounds continued to show a substantial correlation. Compared to other Australian born males, Aboriginal males and those born in China, Greece, New Zealand, the United Kingdom and other predominately non-English speaking countries had increased odds of lung cancer, as did those who lived alone (aOR = 1.31 95% CI 1.20, 1.43). Similarly, as the number of medicated health conditions increased so too did the odds of lung cancer. Discrete socio-economic characteristics rather than the area index showed substantive relationships with lung cancer among males. Specifically, living in a low-income household, being younger while living with disability, unemployment or labouring occupations, having less than Year 12 education and renting through a public housing authority were each related with lung cancer diagnosis.

3.3 Females

Table 3 summarises distributions of demographic, health and socio-economic disadvantage characteristics with lung cancer among females.

Table 3 Demographic, health and socio-economic characteristics among lung cancer cases and community non-cancer cases—females

Adjusting for age among females showed the odds of lung cancer increased across several demographic characteristics for example in outer regional areas (aOR = 1.20 95% CI (1.05,1.37)). Aboriginal females had more than two-fold higher odds of lung cancer (aOR = 2.43 95% CI 1.96,3.01) with heightened odds also observed among those born in New Zealand and the United Kingdom, those living alone, aOR = 1.36 (95% CI 1.25,1.47) and those with higher numbers of health conditions medicated. The odds of lung cancer increased along with area disadvantage with aOR = 1.48 (95% CI 1.32,1.66) in most versus least disadvantage. Among discrete disadvantage variables, increased odds of lung cancer were apparent among females in low-income households, living with disability, with less than Year 12 education and, those renting through a housing authority.

The multivariable relationships among characteristics and lung cancer diagnosis for females also featured increased age, being Aboriginal and birth in New Zealand, the United Kingdom or other mainly English-speaking countries, along with those living alone (aOR = 1.22 95% CI 1.12, 1.33). An increased number of health conditions again accompanied higher odds of lung cancer as did the discrete, socio-economic characteristics of less than Year 12 education and renting through a public housing authority.

Conversely, several characteristics lowered the odds of lung cancer diagnosis among females and these include having been born in Greece, Italy and Lebanon, or reporting a lower proficiency with English (aOR = 0.65 95% CI 0.55,0.81).

Figure 2 summarises the estimated change in probability of lung cancer diagnosis as age increases and as the number of characteristics increased among males and females (detailed in the Supplementary File). On average, being younger with none of the identified characteristics minimised the likelihood of subsequent lung cancer diagnosis. Age increased the likelihood of diagnosis and the presence of any identified characteristics further increased that likelihood, and in a stepwise manner. For example, a 60 year old female with 1 of the identified characteristics had almost three times greater probability of lung cancer over the next two years than another with none of the characteristics (3 in 10,000 versus 9 in 10,000). The probability doubled again where two or more characteristics were present (20 in 10,000).

Fig. 2
figure 2

Marginal means of characteristic numbers correlated with lung cancer, by sex

4 Discussion

Our population level analysis used a newly available digital platform to expand our knowledge of associations between lung cancer and a wide range of demographic, health and socio-economic characteristics. PLIDA’s extensive population coverage (90%) and person-centred construction allowed our analysis to move beyond ecological studies of socio-economic disadvantage onto individuals’ exposure to specific, disadvantage factors correlated with lung cancer. Having unpacked area level disadvantage into constituent parts at an individual person-level, we found some socio-economic characteristics were associated with lung cancer incidence in both males and females while other characteristics were associated for males rather than females and vice versa.

The likelihood of lung cancer increased among adults of older age, who lived alone, had more medicated health conditions and experienced economic disadvantage. More specifically, disadvantage in the form of not completing high school and renting through a housing authority correlated with lung cancer diagnosis. Aboriginal people along with those born in China, New Zealand and the UK had higher odds of lung cancer than other, non-Aboriginal Australians. So too did males born in Greece and Italy, yet the converse was observed among females of those countries and females reporting poor English proficiency more generally. This is consistent with cultural differences in tobacco use in those contexts [14]. Among males, several other characteristics also related to lung cancer and included: living outside major cities; unemployment or labouring roles; and, being younger while living with long-term disability.

4.1 Limitations

Our analysis was limited in four areas. First, having dichotomised predictor variables in line with a validated, disadvantage index producing method [25, 38], we acknowledge this purposefully focussed our examinations on the extreme end of distributions of income, education, employment and housing. As a result our analysis may have less statistical power and under-estimates [41] the correlation of disadvantage with lung cancer diagnosis. Second, smoking status is important in the aetiology of lung cancer but information on this and other behavioural risks are unavailable in census records [42] which limits statistical modelling in this analysis presented. Given the strong association of tobacco exposure with socioeconomic disadvantage, we feel that it is reasonable to infer that this remains a likely key risk factor for lung cancer among cases. Third, several existing, disadvantage measures of internet access, private motor vehicle access and household over-crowding were unavailable to our first use of the data platform [25]. However, we partially offset this limitation in housing characteristics at least, by including low-rental housing through a public housing authority. Finally, the administrative data sourced for describing health condition burden were limited to the fact of prescription filled medications, rather than severity, of conditions experienced [43].

4.2 Comparison and interpretation

Our results align with earlier studies in confirming correlations of lung cancer diagnosis across Aboriginal status and/or country of birth by sex [13, 14], among males in remote areas [44] in line with known patterns of tobacco use [14] and reduced contact with general practitioners in rural areas [45]. Results also confirm that one in nine adults lived alone [46, 47] which had a strong association with lung cancer diagnosis [15]. Living alone is a poor proxy for social support but nevertheless our findings reinforce the importance of social connectedness in maintaining positive health practices such as attending primary care for health checks, particularly where that care is less readily accessed in rural areas [44].

Physical comorbidities assessed using hospital records often pre-exist cancer diagnosis, including lung cancer [16,17,18,19]. Our results are consistent with those findings while using medicated conditions as a wider, population alternative to hospital records. That is, lung cancer was likely preceded by conditions that were not necessarily serious enough to involve hospitalisation.

Area socio-economic disadvantage is strongly related to lung cancer incidence in NSW and Australia more widely [7, 11, 22, 36]. Our study disaggregated area disadvantage into constituent parts. As expected highly prevalent, person-level characteristics of low income, low educational attainment, unemployment and particular occupations related strongly with lung cancer diagnosis. Two less prevalent characteristics with similar strength of relationship to lung cancer diagnosis became apparent in being younger with disability and renting through a public housing authority. We know tobacco use is up to two times more prevalent among adults living with disabilities in Australia [48] so the potential for lung cancer disparities in the presence or absence of disabilities exists but is little understood [49, 50]. One response to our findings is the further examination of particular limitations in mobility, sensory, learning, or cognition [50] in relationship with cancer diagnosis. Housing is more widely acknowledged as an environmental health risk [10] and exposure to sub-optimal housing tenure contributes to accumulating adversity. Suffice to say, our results may help sensitise, then orient preventive efforts toward previously under-recognised groups of people with potential to benefit from health related information and proportionately greater attention to their health, social and economic needs [10, 51].

4.3 Generalisability

Precision medicine asserts each tumour is different and should be treated with specific regard to its distinctive characteristics [52]. Precision prevention might similarly point to individual differences among people at risk of lung cancer and the need to move past a one size fits all approach [53] by “treating” people according to their distinctive characteristics [54]. Developing those interventions will benefit from knowing about the characteristics described in this paper and how they vary (or not) by sex. In the meantime, results provide a base for regional profiles used by local health authorities engaging with their communities. Predicted lung cancer distribution can be validated against observations within each local health district. If the model performs acceptably, the distribution of specific characteristics associated with lung cancer can be mapped to smaller areas within each local district. This could initially inform community collaborations on which characteristics are most prevalent. Secondly, it informs thinking about how cancer control initiatives might have regard for the nature and distribution of those characteristics [55] in prioritising higher-risk individuals in clinically effective, cost-effective and equitable programs [11, 56,57,58].

For example, current recommendations to Australian Government advocate the inclusion of adults aged 50 to 70 in screening “to support the early detection of lung cancer in asymptomatic high-risk individuals” [59]. High risk individuals are initially described in terms of current or past smoking history—information that is not routinely available or recorded outside of some clinical environments. Our results follow from information that is more routinely available and focus on characteristics relevant to lung cancer diagnosis, characteristics which are also shared among current smokers [14]. Geoffrey Rose identified several strengths in prioritising approaches toward smaller numbers of higher-risk people [60]. Here, prioritisation means applying more effort to greater capacity to benefit where the ability to readily participate may be lower. Such an approach may: reduce interfering with those at lower risk; offer more cost-effective resource use; and, use selectivity to improve case finding. A key advantage though, is focussing on issues appropriate to an individual. Knowing the characteristics of a person at heightened risk is an important step. Engaging with the person and learning about their context and perceived need is a next step. The person may become a candidate for lung cancer screening and/or perhaps a candidate for a tobacco control intervention. A lot depends on the flexibility of approach to deliver a fixed program, versus responding to a person’s need and their readiness and ability to participate [61]. None of this precludes other eligible candidates in the population from participating in screening or other assistance seeking activities. The example simply describes the opportunity to apply a vertical equity principle, by making a relevant program available to a wide audience or population while more actively seeking to offer assistance to a smaller group of individuals where assistance is most needed [51].

4.4 Further data enhancement and research

Our analysis improves Australian knowledge of social determinants on health inequity in lung cancer [62]. Further research using PLIDA will build on this study. For example, examination of potential interactions between living alone and/or the presence of disability prior to lung cancer diagnosis is warranted. Better understanding the nature and severity of disability will add to understanding disability’s aetiological role through increased tobacco use [63, 64] for example, and the potential for disability to mask or facilitate cancer diagnosis as seen with morbid health conditions generally [43]. Additional research may address our paper’s limitations by examining gradients within disadvantage related characteristics and lung cancer. For example, rather than dichotomising income, census records include 15 household income levels [65] making it possible to investigate incremental change across income level and determine income thresholds correlating with risk and/or protection against lung cancer. Further enhancement of PLIDA data could extend the look-back period used. This would support a deeper understanding of the longitudinal consequences of earlier life exposure to disadvantage characteristics and how those characteristics relate to health generally [10] and lung cancer specifically.

In the first instance, an easy data enhancement on the current platform is to add cause of death, then hospital and other health administration data. These data will further person-centred analyses along the cancer care pathway into treatment exposure and eventually the quantity and quality of life after lung cancer diagnosis. Taking account of the characteristics reported in this paper may enhance our understanding of how models of care and treatment pathways are influenced by each person’s circumstances and location.

More broadly, our correlational findings are not necessarily causal. For example, we noted an association between males living in more remote areas and increased risk of lung cancer. Living remotely can contribute to inequitable experiences of illness and cancer diagnosis. Conversely, individuals may also experience illness, reduced income as a result, and subsequently need to move out of major cities to more affordable housing because of their health.

5 Conclusion

Our analysis provided a population wide description of person-centred characteristics relevant to lung cancer diagnosis. Similarities and differences based on sex confirmed the varying relationships of high prevalence characteristics with disease attributes. The results also reveal less discussed but relevant characteristics of living alone, disability and housing. Improved knowledge of these characteristics in the community can inform the development of lung cancer prevention and detection programs, then guide the targeted delivery of those programs. The data platform and method applied are relevant not only to other cancer sites but other chronic diseases.