1 Background

1.1 Urbanization and Population Ageing

Today, 54% of the world’s population lives in urban areas (United Nations Department of Economic and Social Affairs Population Division 2014). This proportion is expected to increase to 66% by 2050. By then, the urban population will have nearly doubled in size since 2009, from 3.4 billion to over 6.4 billion. At the same time, the number of people aged 60 and older is expected to more than double worldwide, from 841 million in 2013 to more than two billion in 2050 (United Nations Department of Economic and Social Affairs Population Division 2013). Not surprisingly, as the world becomes increasingly older and more urbanized, the population of older adults living in urban areas is similarly growing dramatically in many parts of the world. The population of older adults over 65 years of age living in cities of the Organisation for Economic Co-operation and Development (OECD) countries grew by 23.8%, in the brief period from 2001 to 2011 (OECD 2015).

There are several commonalities between the global trends of urbanization and population ageing in terms of the policy challenges and opportunities they present (U.S. National Institute on Aging and World Health Organization 2011; WHO 2015b; WHO and UN-HABITAT 2010, 2016). They create pressures on health systems, their workforce and budgets, which countries are often unprepared to deal with. They can complicate health equity given the diversity in capacity and circumstance among urban residents and among people in older age. There are gaps in research and data on the complex dynamics that both constitute and influence urbanization, ageing and health. On the other hand, urbanization and population ageing are markers of economic and social development. Both extended life and urban centres are valuable resources to society which can be leveraged to enhance overall population health and to achieve sustainable development.

Multiple entry points for optimizing health in urban communities and in ageing populations, respectively, have been identified (WHO 2015b; WHO and UN-HABITAT 2016). A common feature is the importance of broader environmental changes engaging multiple sectors beyond the health sector, such as urban design, housing and transportation. However, the implications of the convergence of urbanization and population aging for the kinds of environmental interventions that would be needed or appropriate are not as well understood. Thus, greater understanding is needed about the relationship between the urban environment, and environment more generally, and the health and wellbeing of an ageing population.

The key environmental features which make a city healthy through their contribution to disease prevention and health promotion are increasingly well understood: urban infrastructure and services that provide clean water, sanitation and waste management, adequate housing, accessible public transport, air pollution control, violence prevention, and others (Rydin et al. 2012; WHO and UN-HABITAT 2016). This has created heightened awareness about cities as an important arena for creating lasting positive change, as reflected in Goal 11 of the new Sustainable Development Goals. However, cities are complex systems and urban health outcomes are dependent on many interactions. This makes the analysis of causal associations between the urban environment and population health very challenging. Moreover, data at the urban scale are still sorely lacking, including disaggregated data that allow assessments of health equity (Corburn and Cohen 2012; Rydin et al. 2012; WHO and UN-HABITAT 2016).

Meanwhile, research on the relationship between the built environment and older person’s health has provided empirical support mainly for the effect on mobility or physical activity, especially walking (Rosso et al. 2011). A recent systematic review found evidence of associations with walking for transport (as opposed to leisure) in older persons aged 65 and above for residential density, walkability, street connectivity, access to destinations, land use mix, pedestrian-friendly features and cleanliness (Cerin et al. 2017). A body of research has also addressed the social environmental determinants and their association with a wider range of health outcomes among older adults. A systematic review of this broader literature found limited support that neighbourhood environment, including both social and physical dimensions, is a primary influence on older adults’ health and functioning (Yen et al. 2009). Nonetheless, there have been several studies supporting the notion that the social environment also has an important role in determining older people’s health and wellbeing. Recent studies have found, for example, the positive effects of social capital, measured by participation in groups, sense of belonging and relationship with neighbours (Norstrand et al. 2012) and of social network growth (Cornwell and Laumann 2015) on older adults’ functional, self-rated, and psychological health.

Since the development of the concept of Age-friendly Cities by the World Health Organization (WHO), discussed further in the subsequent section, research has also emerged on this topic (Fitzgerald and Caro 2014; Greenfield et al. 2015; Scharlach and Lehning 2013). Several models of an “age-friendly community” have been developed, not limited to urban settings (Menec and Nowicki 2014). A review of these models found that the emerging ideal of age-friendly communities is characterized by enabling social and physical environments that are mutually reinforcing, a participatory collaborative governance model, and importantly, inclusiveness (Lui et al. 2009). This notion is supported by research findings that professionally facilitated community development with and by older adults in the neighbourhood can result in more hospitable and supportive community environments for older people by increasing civic engagement and social capital (Austin et al. 2005; Buffel et al. 2012a). Several recent case studies of age-friendly communities (Buffel et al. 2014; Glicksman et al. 2013; Menec et al. 2014; Neal et al. 2014), as well as an evaluation of European Healthy Cities (Green et al. 2015), have also elucidated policy and governance factors that are conducive to age-friendly communities.

The research challenges in this area, however, are similar to those for urban health—an insufficient understanding of the actual holistic effects of physical and social environment interventions, and of the various dimensions of inequity and exclusion that affect older adults (Buffel et al. 2012b; Scharlach and Lehning 2013). Thus, more local knowledge and evidence is needed on how the physical and social environment can be improved in a coherent manner to affect the health and wellbeing of older adults and other people in the community, and to prevent older adults from being systematically excluded from society. The numerous difficulties in evaluating community-based initiatives notwithstanding, more rigorous research, routine evaluation and evidence of effectiveness are necessary to advance scientific knowledge, improve practice and persuade policy makers to support these initiatives when appropriate (Greenfield et al. 2015; Lui et al. 2009). To this end, a number of tools have been developed in recent years for practitioners and researchers to assess such initiatives (e.g. Handler 2014; Neal and Wernher 2014; Orpana et al. 2016; Public Health Agency of Canada 2015). The practical utility of these tools, however, is not yet well understood.

1.2 Creating an Age-Friendly City

The World Health Organization (WHO) defines an “Age-friendly City” (AFC) as an inclusive and accessible community environment that optimizes opportunities for health, participation and security, in order that quality of life and dignity are ensured as people age (WHO 2007). More specifically, in an AFC, policies, services, settings and structures support and enable people to age well by recognizing the wide range of capacities and resources among older people; anticipating and responding flexibly to ageing-related needs and preferences; respecting older people’s decisions and lifestyle choices; protecting those who are most vulnerable; and promoting older people’s inclusion in, and contribution to, all areas of community life (WHO 2007).

In 2007, WHO published the “Global Age-Friendly Cities: A Guide”, which was developed through qualitative research conducted in 33 cities across the world to identify the core features of an AFC through the perspective of older people, their care givers and municipal service providers (WHO 2007). The purpose of the guide is to help cities become more age-friendly. It describes the advantages and challenges that older people experience living in cities, and provides a checklist of age-friendly features. These features span eight domains of city life which cut across all government sectors: outdoor spaces and buildings, housing, transportation, community and health services, civic participation, respect and social inclusion, social participation, and communication and information. In 2010, the WHO Global Network of Age-Friendly Cities and Communities was established to facilitate the exchange of information, best practices and experiences on making cities, as well as other types of communities, more age-friendly. While AFC was developed primarily in response to the pressures and demands created by the converging trends of population ageing and urbanization occurring in specific regions of the world, the initiative is being increasingly adopted around the globe in both urban and rural settings (Fitzgerald and Caro 2014). As of April 2017, 400 communities in 37 countries had joined the network.

Recognizing the need to assist cities in measuring age-friendliness and to promote further research in this area, the WHO initiated a project in 2012 to develop a tool to provide guidance on the use of indicators to assess age-friendliness. Over the next few years a set of core indicators were developed through an iterative consultation process. Initially, WHO conducted a literature review and environmental scan to identify a comprehensive list of potential indicators. The list of indicators was then reduced and refined based on several criteria, including their relevance, feasibility of measurement and actionability, through an iterative process involving two rounds of international expert consultations,Footnote 1 and a survey of a purposive sample of local government authorities and community representatives from around the world. The indicators were further refined based on a pilot study of the indicators, which is the focus of this paper. The final list of core and supplementary indicators was selected considering the results of the consultation process, the pilot study and a scientific peer review. The final indicators were published in, “Measuring the Age-friendliness of Cities: A Guide to Using Core Indicators” (WHO 2015a).

The main objective of the pilot study was to explore how these indicators would be applied in practice. The specific aims were to examine the fidelity of indicator measurement, the data sources used, practical problems encountered, and the perceived benefits to the community. The results have important implications for sorely needed evaluation research on Age-friendly City initiatives (Greenfield et al. 2015) and the potential enabling role of evaluation and indicator guides.

2 Methods

2.1 Pilot Sites

A Request for Proposal (RFP) to pilot the draft tool was distributed widely during October and November 2014. The RFP was posted on the WHO Centre for Health Development’s website, the Age-friendly World website (http://agefriendlyworld.org/en/), and the mailing list for the Global Network of Age-friendly Cities and Communities. It was also announced at relevant international conferences. As a result, a total of 29 proposals were received from across all six WHO Regions, which are Africa, the Americas, the Eastern Mediterranean, Europe, South East Asia and Western Pacific.

A total of 13 sites were chosen based on the overall quality of their proposal and to achieve a mix of geographical location, cultural and linguistic background, population size, population ageing rate, urban/rural setting, level of progress with age-friendly initiatives, and previous engagement in the consultation process to develop the indicator guide (with a preference for those that had not been engaged in the past). The selected sites were: La Plata, Argentina; Banyule, Australia; Hong Kong, China; Jing’an District of Shanghai, China; Dijon, France; New Delhi Municipal Council area, India; neighbourhoods of Eyvanak and Shahrak-e-Ghods in Region 7 District 2 of Tehran, Iran; Udine, Italy; informal settlements of Korogocho and Viwandani in Nairobi, Kenya; Tuymazy, Russia; Bilbao, Spain; Bowdoinham, USA; and Washington, DC, USA. These sites were each provided with US$5000 to conduct the pilot study in a period of 3 months. In addition, two sites—New Haven, USA and Fishguard-and-Goodwick of Wales, UK—which had submitted quality proposals volunteered to pilot test the guide using their own resources, bringing the total to 15 pilot sites. A map displaying the location of the pilot sites is shown in Fig. 1. Some of the pilot sites covered entire municipalities, while others were composed of select districts or neighbourhoods within a municipality. Specifically, the pilot sites included two neighbourhoods within one region of Tehran, one county area of New Delhi, one district within the greater city of Shanghai, and two slum neighbourhoods in Nairobi.

Fig. 1
figure 1

Map of pilot site locations

The RFP process and decision criteria resulted in a very diverse set of pilot sites. Major cities such as Hong Kong, Tehran, New Delhi and Shanghai, all assessed large urban populations. The towns of Bowdoinham and Fishguard-and-Goodwick, with populations of about 3000 and 5000 respectively, were in rural settings. Of the 15 pilot sites, all but five were members of the Global Network of Age-friendly Cities and Communities. English was the first language in only four of the pilot sites.

2.2 Pilot study Protocol

The pilot sites were provided with a protocol for the study and an earlier version of the indicator guide in four of the six UN official languages—English, French, Spanish and Chinese. The official languages of the pilot sites were more varied, including Chinese (Cantonese, Mandarin), English, French, Hindu, Italian, Persian, Russian, Spanish, Swahili and Welsh. Pilot sites that required translation of the guide into another language were given an additional US$1000 toward the translation cost.

The main task for the pilot sites was to follow the guide and report relevant indicator data for their community. Two operational definitions are suggested in the guide for each indicator—one which mainly relies on administrative data sources and another on surveys of older residents in the community. This is to provide different options for measurement and thereby increase the likelihood that a community would be able to measure the indicator. For the purpose of the pilot study, the communities were instructed to measure the indicator using both definitions and to adapt the indicator definitions to their local context as necessary and appropriate. The list of core indicators and their definitions described in the guide are shown in Table 1. The pilot study also included reporting on a few additional supplementary indicators [all of which can be found in Sect. 4 of the indicator guide (WHO 2015a)], but this paper focuses on the measurement of the core indicators.

Table 1 List of core indicators and suggested definitions in the guide used for the pilot study

The use of existing data was prioritized, given time and financial resource constraints imposed by the study protocol. During the project period of December 2014 to March 2015, the pilot sites were required to provide regular progress reports and submit a final report at the end. The final report was required to follow a structured outline which included: (1) background and local context; (2) process used for the pilot study (e.g. stakeholder engagement, data collection); (3) indicator measurement (e.g. definition used, data source, year of data, population or sample); (4) discussion about indicator measurement and results (e.g. local relevance, challenges of data collection); (5) feedback on the content and usability of the indicator guide; and (6) reflections on the pilot study experience and its impact.

2.3 Analysis

To assess the fidelity of the indicator measurements, the authors extracted and collated information on the definitions and data sources used for measuring the indicators from the pilot site reports. Fidelity was assessed by coding each measurement into three categories based on the following predetermined criteria: (1) Exact: If the measurement adhered exactly to the operational definition provided in the guide. This included cases where the operational definition required partial adaptation of the definition. For example, one of the definitions for neighbourhood walkability is, “Proportion of streets in the neighbourhood that have pedestrian paths which meet locally accepted standards” (see Table 1). By definition of this indicator, the measurement requires local adaptation, that is, a specification of locally accepted standards for accessible pedestrian walkways. Thus, in this case, if a city measured the “proportion of streets in the neighbourhood that have pedestrian paths with no steps to the road”, it was coded as an “exact” measurement. (2) Modified: If the measurement was based on a modified definition, but maintained face validity. In other words, this would apply to modified definitions that still adhered to the core concept of the original definition. One example of a modified definition would be a city which measured the “proportion of streets in the neighbourhood that have sidewalks” without further specification. (3) Replaced: If the measurement was replaced with a proxy indicator with face validity. One example of this would be the replacement of the original definition with a composite measure of walkability (i.e. a walkability score), calculated as a proxy for the neighbourhood walkability indicator. (4) Not measured: If the indicator was not measured at all, for example, due to absence of relevant data. If an indicator was purportedly measured, but significantly deviated from the operational definition described in the guide (i.e. lacked face validity), it was also coded as “not measured”.

The pilot sites had the choice of calculating the equity indicators for one or more of the substantive indicators (e.g. neighbourhood walkability, social participation) and for comparisons of different subgroups (e.g. by gender). Fidelity was assessed by coding a measurement (or measurements) as “exact” if it (or at least one of them) was calculated using the exact equation in the guide (see Table 1); “modified” if none of the measures were calculated using the exact equation in the guide, but used some other calculation method consistent with the concept of equity as described in the guide (i.e. a comparison of two subgroups, or a comparison of the population average to a benchmark); and “not measured” if an equity assessment was not done at all, or done using a method that was inconsistent with the concept of equity as described in the guide.

The fidelity rate for each indicator definition was calculated as the proportion of communities that used the “exact” indicator measurements. The coding was performed by two of the authors with disagreements resolved by the third author.

The content of the pilot site report was also qualitatively coded to distil the key challenges and perceived benefits of conducting the indicator assessment. Given the exploratory rather than confirmatory nature of this study, the narrative text in the pilot site reports was analysed manually using a grounded approach to enable themes and patterns to emerge from the documents themselves, instead of using a priori coding. In vivo coding was used in the first cycle of coding to give the participants a voice in the research by reflecting the terms and phrases they used in the codes. The three authors independently coded the text for the first cycle, then collectively worked through a reflective process of comparing the memos they created during the coding process and reduced the themes to a manageable number. After completion of the pilot study, the researchers convened the representatives of the pilot sites for a single group discussion about the pilot study process. At least one representative from 13 of the 15 pilot sites attended the meeting. The discussion was unstructured, non-directive, and moderated by one of the authors. The main objective of the group discussion for the purpose of this study was to validate the analysis results of the pilot study process, and not to gather new data for further analysis. The two pilot sites that could not attend the meeting were given an opportunity to review and comment on the analysis results in advance by e-mail or by an unstructured, individual phone interview. They both offered comments only by e-mail. The outcomes of the group discussion and the feedback received by e-mail supported the research findings. Additional details were given that elaborated on certain aspects of the findings, but no information was offered that contradicted or deviated from the analysis results. Thus, the following section presents the results of the analysis as originally performed by the authors.

3 Results

3.1 Data Sources

The pilot sites shared similar approaches to collecting the various data. Lead agencies were typically county or municipal health departments, or other organizations with authority or influence within the community. Each of the pilot sites partnered with multiple external organizations in order to measure the range of indicators presented in the guide. This was seen as necessary by most sites due to the breadth of data required. These partners assisted in the selection and adaptation of indicators, collection of data, and analysis of the results. Partner organizations included state government agencies, municipal government departments, non-governmental organizations (NGOs), local community organizations, businesses, or public institutions.

Table 2 shows the number of pilot sites, out of the total of 15, that reported data for each of the two operational definitions of each indicator and their respective data sources. The data sources for equity indicators are not shown in this table because they were the same as those for the substantive indicators. That is, the data source for the substantive indicators also included the stratification variables (e.g. gender, race, income) in order for the two to be analyzed in combination to produce the equity indicators.

Table 2 Data sources used for measuring the core indicators

A majority of the pilot sites reported data using at least one of the two operational definitions for each of the indicators. Across all indicators, more pilot sites reported data using definition A, which relies on administrative data sources, than definition B, which relies on surveys of older residents. The only exceptions were the indicators of neighbourhood walkability and accessibility of public transportation stops for which an equal number of pilot sites reported data for each of the definitions. Less than half of the pilot sites reported data for definition B for six of the indicators.

As expected by design, the most common data source for definition A of the indicators was routine government data, while special purpose surveys were the most common source for definition B. There were just two indicators that did not adhere to this pattern. Regardless of which definition was used, a special purpose survey was the most common data source for measuring affordability of housing, and government administrative data was the most common source for measuring engagement in socio-cultural activity.

Government administrative data were primarily housed within various departments and agencies at the local government level. These also included official regional and national data, such as census data, that were used as proxies in the absence of relevant local data. Special purpose surveys were mainly conducted locally with the older population, often in direct association with an age-friendly initiative. In other cases, data from surveys conducted on a specific issue (e.g. employment) were disaggregated by age to extract the data related to the older population. In addition to these two predominant data sources, field surveys and direct observation methods were used to assess physical accessibility when no routine government data or relevant survey data were available, for example, in the Nairobi slums. Other more rarely used data sources included local service agencies, both private and public, and the Global Burden of Disease study. The Global Burden of Disease data were used by Tehran to obtain a measure of healthy life expectancy at the country level as a proxy for the municipal-level healthy life expectancy.

Comparing the data from different sources was as an important method of data validation. For example, based on data routinely collected by the municipal government (definition A), the accessibility of public spaces and buildings according to objective characteristics may be very good, whereas a survey of the older resident population (definition B) may reveal that perceived accessibility is poor. This is a typical example of when objective and perceived measures of environmental factors do not agree (Menec et al. 2016). They are known to capture different conceptual aspects of the environment, where the objective characteristics are relevant to policy interventions and perceptions are important for understanding an individual’s relationship with their environment (Rosso et al. 2011). The reactions to such data discrepancies in the pilot study included critical appraisal of the respective data sources, and efforts to understand and address the problem in order to improve both the objective and perceived measures of the indicator in the community.

Several of the pilot sites also engaged community stakeholders, including the older residents, to discuss the relevance of the core indicators in the context of their community, prioritize the indicators and discuss the results of the indicator assessment in a large group dialogue (e.g. World Cafe format). In effect, this offered another form of data validation, as well as a mechanism to ensure the inclusion of older adults and other beneficiaries of an age-friendly community.

3.2 Fidelity of Indicator Measurement

Table 3 shows the coding results of the data reported for each indicator for definitions A and B, respectively. Two statistics are presented for the fidelity rate (i.e. the percent of reported indicator measurements that used the exact operational definition provided in the guide): (1) fidelity rate among all sites and (2) fidelity rate among sites reporting a given indicator. Thus, for example, of the nine communities that reported accessibility of public transportation stops using definition A, all nine used the exact definition. The resulting fidelity rate for this indicator is reported as 60% among all pilot sites and 100% among the sites which reported this indicator.

Table 3 Fidelity of core indicator measurement

On average, fidelity was highest for the physical environment indicators (32.7% among all sites, 51.4% among reporting sites), followed by the social environment indicators (20.4% among all sites, 37.3% among reporting sites), and the impact indicator (i.e. quality of life) (6.7% among all sites, 10% among reporting sites). The average fidelity rate among all sites was higher for definitions A (28.6%) than for definitions B (19.0%) of the substantive indicators. The difference in fidelity comparing the use of definition A (40.8%) and definition B (39.9%) for the substantive indicators was minimal when examining fidelity among reporting sites.

In general, the fidelity rate for the substantive indicators was low with an average of 23.8% across all sites. Only two indicator measures had a fidelity rate of greater than 50% among all sites. Accessibility of public transportation stops (definition A) had the highest fidelity rate of 60% among all sites, with nine communities using the exact definition given in the guide and the remaining six not measuring the indicator. Accessibility of public transportation vehicles (definition A) had the second highest fidelity rate of 53.3% among all sites and the fewest number of communities that did not measure the indicator at all. The fidelity rates calculated only among reporting sites was substantially higher, as would be expected, but still with an average of 45.2%. Accessibility of public transportation stops (definition A) had the highest fidelity rate of 100% among reporting sites, and both availability of information (definition A) and engagement in volunteer activity (definition A) had the second highest fidelity rate of about 60% among reporting sites.

The lowest fidelity rate of 0% among all sites was observed for three indicators (all using definition A): engagement in socio-cultural activity; availability of health and social services; and quality of life. While all three had the same fidelity rate of 0%, nine of the communities still measured engagement in socio-cultural activity using a modified definition or an alternative indicator, whereas for the other two, only one or two communities were able to measure them at all.

In some cases the low fidelity was due to the lack of relevant data, and thus, the communities’ inability to measure the indicator at all. For example, Definition A for engagement in socio-cultural activity required information on visitors to local cultural facilities and events, broken down by age group, which most communities did not have. Such kinds of visitors’ information were not routinely collected or accessible, only collected for a few specific events, or even when collected, they could not be disaggregated by age group. Definition A for availability of health and social services required that the denominator only include the older people with personal care or assistance needs so that gaps in service availability (i.e. proportion of people needing services but not receiving them) can be determined. In most cases, however, the communities only had data on the number or proportion of people receiving services in the total population of older people. This provides information on service utilization, but not about the extent to which the services are reaching those who need them. Definition A for quality of life required data on healthy life expectancy for the community. Only two communities were able to provide this data using a modified definition—Jing’an District in Shanghai calculated healthy life expectancy at age 60 instead of at birth, and Udine had this data for the province but not for the city. The others only had data for life expectancy (and not healthy life expectancy) at birth, and usually for a much broader geographic area like the state or country level. Given that the core indicator concept being measured was quality of life and not merely longevity, life expectancy was not considered an acceptable proxy.

In most other cases, low fidelity resulted because the communities measured the indicator using a slightly modified definition or an alternative measure. These alterations were motivated by the absence of data and the desire to find an appropriate proxy, or the result of a deliberate effort to adapt the definition or replace the indicator so that the measure would be more meaningful and appropriate for their community.

Despite the communities’ strong interest in assessing equity, practical limitations with the guidance document, data and technical capacity prevented many of the sites from calculating the equity indicators. Several communities found that the guide inadequately explained how to calculate the equity indicators; the data that were available could not be disaggregated into subgroups of interest; and they did not have the technical skills to perform the required calculations. Just over half of the communities were able to compute the inequality between two reference groups based on a ratio of the respective indicator values. The most frequently compared subgroups were those defined by gender, income, geographic area and age. Only four communities were able to compute the other three measures of equity using the equations in the guide. However, when they were calculated, it was with high fidelity (80.0–100.0% fidelity among reporting sites).

3.3 Problems Encountered

The problems experienced by the pilot sites during the implementation process related to three main issues: adaptation of the indicator definitions, data collection and indicator calculation. Several of the operational definitions required some degree of interpretation and adaptation by the community. They were designed assuming that such flexibility would be necessary to make the indicators measurable in widely diverse contexts. While most of the communities fully appreciated this flexibility, some struggled to interpret and adapt the definitions to their own context and expressed a desire for more rigidity in the definitions.

The greatest challenge for the communities was data collection. Relevant data for the indicators were often not easily identifiable or not within immediate access, and therefore considerable time and effort had to be spent on research and obtaining access. All communities experienced varying degrees of difficulty with data collection, regardless of whether they were members of the Global Network of Age-friendly Cities and Communities or not. Some of the data sought by the sites were proprietary to the data producer, and securing the collaboration of some producers proved to be elusive in the time allotted. While it is not possible to discern whether additional time may have improved their success in securing data, it was frequently the perception of the pilot sites that time limited their success rate. Even when data could be secured from external data producers, the sites experienced some difficulty in producing the desired results because the data were not available for the time period, population or geographic location of interest.

Many of the communities also struggled with calculating the indicators, especially the equity indicators. The constraining factors varied. Some sites perceived that they did not have enough time, some lacked technical capacity or struggled with the instructions provided in the guide, while others lacked appropriate data (for example, absence of disaggregated data for specific subgroups in order to calculate the equity indicators). However, perhaps most importantly, the sites uniformly stressed the importance of the equity analysis for both the implementers as well as their constituents.

3.4 Secondary Benefits Generated from the Process

The communities demonstrated that there were beneficial outcomes of the pilot study beyond the main goal of generating local data for the core indicators. These related to strengthening local partnerships, improving local data quality and enhancing the inclusion and agency of older persons in the community.

The process of conducting the indicator assessment created significant opportunities for the pilot sites to engage other governmental agencies and non-governmental entities as partners. In this way, the successful implementation of a collaborative process was an important outcome in and of itself. Most of the sites had to seek proprietary data from other governmental bodies or organizations. At the lowest level of engagement, data producers were apprised of the importance of these data for the health and wellbeing of older adults. At the highest level of engagement, external data producers were enlisted as project collaborators and co-publishers of the assessment results. At all levels, the external engagement engendered by the indicator study was reported to have broadened ownership of the initiative, as well as created working partnerships for future collaboration. These outcomes were equally valued by communities that already have an ongoing age-friendly initiative and those that have yet to launch one.

The process of collecting, analysing and validating the data often shed light on the value of having these data for the betterment of the community. It also demonstrated the need to improve the local availability, accessibility and quality of data. This led to efforts to raise awareness about the problems with existing data, or the lack thereof, and to mobilize resources to collect more and better data. While the limitations of data produced by other agencies cannot be retroactively resolved, the sites that built partnerships with other agencies may be able to exert some influence on data collection moving forward to improve their usefulness.

For all of the sites, a necessary part of the piloting process involved the engagement of local communities of older adults. Engagement took many forms. In some cases, town hall meetings or focus groups were organized to gather inputs from the community in advance. In addition to being a mechanism for obtaining their inputs, these gatherings would also signal that their inputs were essential to the process. The results were later taken back to the community to review the results together. In other cases, representatives of the older adult community were asked to join the leadership of the project. Where older adults were engaged in the pilot studies, it gave them agency over their health and wellbeing indirectly through their influence on how the indicators would be measured, interpreted and acted upon by the community to shape the environment in which they live.

4 Discussion

This study evaluated the process by which communities in widely different contexts implemented the measurement of core indicators of community age-friendliness as described in a new guidance document produced by the WHO. The results showed that the tool has practical utility for communities and local governments, and can play an important role in improving the quality and quantity of evidence on age-friendly initiatives and their impact on the communities they serve.

Specifically, the results showed that the flexibility and adaptability of the indicator guide enabled the majority of pilot sites to report data for most indicators, if not all. When given the choice, the communities tended to use government administrative data than special purpose surveys as their data source. This may be because government data were more readily available, there were time and resource constraints on their ability to conduct new surveys, and there was a perception that official data reported by relevant government agencies were more objective than survey data. However, in some contexts, especially in lower resource settings, official government data were scarce, non-existent or of poor quality. Furthermore, the communities emphasized that the perceptions and opinions of community residents were as important as the official reports from government agencies, whether they were obtained through representative surveys or group dialogue. This is further supported by research showing that municipal officials consistently overestimate community age-friendliness relative to residents’ perceptions (Menec et al. 2016).

The engagement of older residents in the process was viewed as a valuable way to both validate the data and to build ownership and agency among older people. The struggles to obtain relevant data by reaching out to various government and community agencies also strengthened collaboration and sense of ownership within the communities. Thus, the tool can play a role in promoting participatory collaborative governance and inclusiveness of older adults, which are consistently emphasized in the literature on creating caring and supportive age-friendly environments for older people (Buffel et al. 2012a; Fitzgerald and Caro 2014; Green et al. 2015; Greenfield et al. 2015; Lui et al. 2009; Scharlach and Lehning 2013). The potential for such an outcome and its significance for practice have been posited in previous research (Corburn and Cohen 2012). Broad ownership of an age-friendly initiative and the indicators can contribute to effective advocacy and to the sustainability of the work in changing political environments (Farrer et al. 2015).

In general, however, the fidelity of indicator measurements tended to be low, indicating that the communities often use indicator definitions that differ from those in the guide. This finding was expected, given that the definitions were intentionally designed to allow some flexibility for the communities to measure the indicators in ways that would be most relevant to their context. Nonetheless, some cases of very low fidelity, such as the indicators for availability of health and social services and quality of life, where the communities were unable to find suitable modifications or substitutions are problematic. These results indicate the particular need for better metrics and data on availability and coverage of home- and community-based health and social services for those who need them (DeJonge et al. 2009; Tappenden et al. 2012), and on social indicators of community quality of life (Diener and Suh 1997; Harrell et al. 2014; Swain and Hollar 2003).

The fact that fidelity was lower for the social environment indicators signals at least two things. First, relevant data for these indicators are not readily available. The availability of data on the social environment was more variable across the communities, and such data were often not routinely collected by administrative agencies, perhaps because they are less describable in objective terms. Secondly, these indicators are relatively more context-dependent than the physical environment indicators. While accessibility criteria for physical infrastructure such as sidewalks, public spaces, buildings and transportation facilities are becoming increasingly standardized nationally or internationally, and adopted by governments, the desirable features of the social environment are less generalizable. For instance, sociocultural context strongly influences what would be considered a positive social attitude toward older people, the forms of engagement in local decision-making that are available to older people, or the kinds of health and social services that are needed or provided to older people. Further research to develop the concept and metrics for relevant social environmental factors will help promote more standardized, comparable measurements, as well as enable communities to measure these indicators in ways that are responsive to their local context. The rapidly advancing research on the measurement of happiness and subjective wellbeing, and the increasing use of such indicators by governments and civil society to inform policy-making decisions (Dodge et al. 2012; Helliwell et al. 2017; Kahneman and Krueger 2006; OECD 2013) provide fertile ground for improving social metrics for age-friendly initiatives.

The lack of measurement of the equity indicators is concerning, especially given the importance of equity as a social value and its relevance to social justice (Braveman et al. 2011), but also because of the strong interest expressed by the communities to assess equity. While almost half of the communities were able to compare indicator values between two subgroups, this was not always possible for the indicator or subgroups of most interest to the community due to data limitations. Also, pairwise comparisons of subgroups provide only a partial picture of the inequities affecting the population. Ideally, equity indicators that use data from the total population or across all population subgroups should also be measured. The advancement of research and practice to improve equity depends largely on the ability to accurately measure it. Some of the practical challenges that hampered the communities’ ability to calculate the equity indicators can be overcome by ensuring that both routine administrative data and survey data can be disaggregated into meaningful social strata; by developing the capacity of local government agencies to conduct equity analysis; and by developing tools and resources that can ease the calculation of equity indicators.

Nearly all of the pilot sites adapted some proportion of the indicator definitions to ensure alignment with local needs, priorities and the availability of data. In most cases, the definitions were interpreted as guidance, rather than prescriptive instructions, which is consistent with the WHO’s intentions in developing this tool. Standardization and comparability of indicators is often a hallmark of global indicators, such as those established by international organizations like the WHO. Standardization of the indicators, including reference values, can help establish a common set of goals and targets. It can promote development and healthy competition by enabling comparisons, benchmarking and target-setting among cities in widely different contexts. The experiences of the pilot sites, however, demonstrated that the collection and monitoring of these indicators are most meaningful when they allow a locality to set its own benchmarks and compare results over time for self-improvement. Internally comparable data, including the ability to compare across smaller geographic areas within municipalities, such as neighbourhoods, were regarded as more valuable than comparisons to international benchmarks. This also makes it possible to identify subpopulations or areas that are doing better than the rest of the city, which is critical for assessing equity and establishing attainable goals for the city.

The pilot sites’ experiences in adapting the indicators support two important principles for developing globally useful indicators. First, in developing core indicators for use in diverse local contexts, it is important to keep them flexible, as more static, universal sets of indicators may be less useful (Corburn and Cohen 2012; Rothenberg et al. 2015). The flexibility to adapt to, and to be held accountable to, local needs and priorities should be more important than competing with cities that may not even be appropriate for comparison. Secondly, designing the core indicators to be locally adaptable can potentially facilitate more meaningful equity analysis (Corburn and Cohen 2012; Rothenberg et al. 2015). Health inequities are by definition the products of the societies and geographies in which people live, and are therefore highly context-driven. It is therefore an essential quality of these core indicators that they adequately allow for adaptation. Importantly, however, adaptation should be driven by a pursuit of relevance and alignment with local priorities and not by a pursuit of convenience.

One important limitation of this study was the short duration of the pilot study. This constrained the ability of communities to access and analyse data in time. The modest funding for this study was expected to be an additional constraint for data collection, but this was not reported to be the case, perhaps due to the emphasis on utilizing existing data. The time constraint and reliance on existing data may partly explain the low fidelity of indicator measurements observed in this study. Nonetheless, for most of the indicators (22 out of 28 substantive indicators), over half of the pilot sites reported data by adapting the indicator definition or by using proxy indicators. While low fidelity in measurement signals a problem for comparability of the indicators, it can also mean that the indicators are being adapted to be more relevant and appropriate for the local context. As the body of research in this field continues to mature, it can be expected that the core indicators will also evolve at a future time. New evidence may demand the addition, replacement or refinement of indicators.

It has been observed in this study and in the literature that city-level data on health and its determinants are often scanty in general, let alone for a subpopulation of the city which may not be a priority for city decision makers. Where city level data are available, it is too often not possible to disaggregate the data by social stratifiers long known to be important determinants of health equity. Even when these data are available, the data are often proprietary and scattered. Increasing the quantity and quality of data on health and its determinants, particularly for older adults, should be one of the priorities in making cities and communities more age-friendly, alongside actions to address the real and immediate needs of older people. It is a necessary precursor for monitoring and evaluation to improve programme performance. It is also critical to the research needed to further refine the core indicators and to promote policy and programmes that are caring and supportive of older adults. This study demonstrated the promising ability of communities in vastly different contexts to gather locally relevant data with some guidance, and in the process of doing so, to also strengthen the engagement of their stakeholders, including older adults. In the current economic and political zeitgeist, policy decisions can be skewed toward those with greater agency and representation (Farrer et al. 2015). Developing local capacity to produce and analyse data for a core set of indicators on health determinants and equity in collaboration with older residents will be an important way of bringing agency to the older adult community, and ultimately improving their health and quality of life.