Keywords

Chapter 1 made the case for non-profits building their data capability as part of enabling their work for social good. This chapter jumps straight into the reality of how organisations start to work with different types of datasets and learn about working with data. We present three case studies of our own research working with different non-profit (and other) organisations and different internal re-used datasets, as well as open public datasets. Each case study features collaborative data action and—we argue—results in steps towards data capability. We jump straight to the projects here because this is really what happened in our work. We took our skillsets from our different research backgrounds—approximately data science, communications and community development—and looked at how we could partner with organisations to address their real challenges. As well as having a problem to solve, each partner organisation we worked with also had a curiosity to find out about whether data science could help. In our first case study, we worked with government departments and agencies to understand the public conversation on family violence and the impact of policy. For the second, we partnered with three non-profits looking to solve social problems with data. Our final case study is a collaboration with several community organisations and a bank in a regional city. The case studies illustrate the evolution of our work with data over 2017–2021, and how we came to arrive at collaborative data action as a methodology as it was trialled and refined over a series of studies. There are hints about what building data capability involves in each case study, but we only started to build in processes of evaluation as our studies progressed. Hence, the case studies have slightly different formats. And only over this evolution of cases and other data projects have we arrived at our understanding of data capability. This is explored in Chap. 3.

We suggest the case studies show how data projects that involve social mission-driven organisations benefit from combining multiple skills and perspectives. This is because applying data science in domains of social action is complex. It benefits from knowledge of relevant evidence, acknowledging that ideology and values are always present, and above all it benefits from practitioner expertise through their experience working in contexts that highlight what is significant and how to address it. Our case studies are light-on regarding the techniques of ‘big data’ science because this is not a book on how to do data analytics technically. That is covered in other texts (e.g., Aragon et al., 2022). In this chapter, we focus more on what we did from an operational, indeed co-operational, standpoint. We expand on what that means—the implications and how to build data capability—more in Chaps. 3 and 4. Case study projects 2 and 3 took place during 2020–2021 during the COVID-19 pandemic when extended lockdowns meant a lack of face-to-face engagement. The case studies are as follows:

The project featured in Case Study 1 involved re-using data for insights into the public conversation about family violence following implementation of new state family violence policy. Working mainly with a government department concerned with family violence policy, but also in consultations with non-profit stakeholders, the case study addresses how to gain information about social outcomes by re-using qualitative datasets generated via social media and public consultation. It thus exemplifies some of the kinds of datasets, analyses and visualisations that non-profits could use when looking for novel data to inform outcomes evaluation.

The project in Case Study 2 involved working with three non-profits of different sizes. They partnered to learn if and how they could use internal already-generated data to create added value, particularly around showing their organisations’ direct and wider social impacts and, on the other hand, to improve organisational effectiveness.

Case Study 3 illustrates how seven organisations, including non-profits and a bank, worked together to find out if and how they could use their internal data, plus open data, to find out more about their community. They brought data together to generate geospatially visualised data layers describing community resilience, including layers about social connection, financial wellbeing, homelessness and housing, and demand for social services. The case highlights some of the potential and challenges in sharing data amongst organisations.

Table 2.1 summarises the case studies including an overview of the topic and nature of the collaboration, datasets used, analyses and visualisations and key learnings.

Table 2.1 Summary data projects case comparison

At the end of this chapter, we compare some aspects across the cases, mainly considering what was learned as this informs the themes about building capability and collaboration that are extended in Chap. 3.

Case Study 1: Outcomes of Family Violence Policy—A Public Sector Collaboration

Project Goal

Explore the value of novel datasets to inform the State Government of Victoria, Australia, about changes to the public conversation after it introduced new policies to address family violence.

Project Description

The Victorian Government produced new family violence prevention policies in 2017 in response to a Royal Commission investigation (2015–2016). Alongside recommendations for public and community sector reform, the government produced a framework of outcome indicators. These tended to reflect aspirations for change and were considered difficult to measure, particularly those related to improved awareness, understanding and attitudes about family violence in the community. Some of the outcomes were complicated to assess; for example, while the policy sought a “reduction in all family violence behaviours” (State Government of Victoria, n.d., p. 6), family violence incident reporting rose, possibly because people were more comfortable with coming forward and were supported to do so with better services. Simply measuring changes in crime statistics, therefore, gave potentially misleading results.

We worked with government and government agency partners to target outcomes relating to changes in public discussion. We assessed changes by analysing: (a) the public consultation submissions that informed the new policy (to establish a baseline of core family violence issues) collected in 2015 and (b) public discussion through social media data (Twitter) and news media reporting to understand how the public conversation changed in response to public policy during 2014–2018.

Collaborating Partners

The project was instigated by the Victorian Department of Premier and Cabinet (DPC). The DPC leads the whole of Victoria state government policy and performance, coordinating activities to help the government achieve its strategic objectives.

Other partners that collaborated on this project were:

  • Women Victoria, a state government department promoting gender equality and women’s leadership.

  • Respect Victoria, an agency funded by but independent of state government, dedicated to the primary prevention of all forms of family violence and violence against women.

  • Family Violence Branch at the Department of Premier and Cabinet, Victorian Government.

  • Family Safety Victoria, the Victorian Government agency leading the implementation of family violence reforms.

  • Business Insights Unit at the Department of Premier and Cabinet, Victorian Government.

  • Social Data Analytics Lab at Swinburne University of Technology.

How the Project Began

The project started with discussions with the DPC in mid-2018 about the feasibility of re-using external data sources to inform outcomes. This was an exploratory project and, as a first step, our DPC partners spent several months identifying a suitable topic and group of stakeholders. Criteria for selection were as follows: that it should be a non-controversial topic area; there should be pre-existing good relationships between relevant agencies and departments; and stakeholders were open to novel data analytics. The DPC had its own Business Insights Unit that analysed data, so these staff were involved with the aim of complementing, not replicating, the work they were already doing. Initial workshops were held involving our multi-disciplinary university researcher team and partner staff, and this led to identifying data sources and likely useful types of analysis.

Summary of Datasets Used

Data sources (see Table 2.2) were selected to provide insights into public discussions about family violence over the five-year study period, allowing comparisons year by year.

Table 2.2 Data sources for public discussion of family violence

Methods

Discussion Workshops

A steering group with representatives of project partners met six times during the project. Early workshops established questions to pursue in the data analysis and identified a timeline of policy events from 2014. As data was analysed—and explored through subsequent workshops—the group gave feedback on findings and input to aid further analysis. Through these workshops, a collaborative analysis strategy was developed.

Data Analysis

Data analysis techniques were chosen to fit datasets and project goals. To discover semantic patterns within the large bodies of text data from the three datasets, natural language processing (NLP) was used to augment qualitative content and thematic analysis. This involved word frequency and clustering analysis, using Pearson Coefficient Correlation analysis (Pearson’s r), and the topic modelling method Latent Dirichlet Allocation (LDA). The approach to analysis is informed by established theory in policy analysis, frame analysis and socio-linguistics that addresses the formation of public social issues and understands the role of language and communication in ‘framing’ or shaping and contesting the parameters of those issues.

A timeline analysis of the Twitter dataset identified peaks in discussion across the five-year timeframe and matched these with known policy or public events. Named entity recognition was also used to identify key individuals and organisations and their prominence at different times.

Submissions to the Royal Commission Public Inquiry (2015)

The sample of public submissions was analysed using word frequency and thematic clustering, as well as qualitative content analysis to establish a baseline of the key policy dimensions framing family violence. The submissions were taken as a proxy for the attitudes and topics discussed by an informed public—that is, the diverse individuals, community sector and services, government and research voices, who have experiences of family violence or work with victim survivors or perpetrators.

Twitter Corpus (January 2014–December 2018)

To identify topics in the Twitter dataset over the target timeframe, a sampling strategy was used, generating a maximum of 500 tweets per week. To inform the timeline analysis, this sample was supplemented by extracting the Twitter counts endpoint which returns the total tweet count at each timepoint. This allows quantification of tweets beyond the 500 per week sample.

LDA topic modelling was applied to Twitter posts for each year. Since LDA is an unsupervised learning model, there is no ground-truth on the number of topics, and therefore it is the researcher’s responsibility to validate the appropriate number of topic clusters. For our study, the number of topics identified for each year is established by model parameter checks. The topic modelling process established a range of topic options, and these were reviewed by the researchers on the team to identify the most coherent and distinct topics, with the number of topics varying each year.

News Media Corpus (January 2014–December 2018)

The meta-data captured via the API for each article included the source name (media outlet), time and date of the article. We cleaned the media dataset by scraping the body of the articles from provided links. Stories with invalid URL links and duplicate stories published in more than one outlet were removed, retaining the first published article. LDA topic modelling was applied to the news media corpus, and a hand-annotated topic descriptor was associated with each cluster.

With all the datasets, reliability of machine analysis was checked by manual qualitative coding of samples of data items (tweets, stories and public submissions) and inter-coder reliability checks involving four people independently coding samples. The team checked emergent topics against the outcomes framework we were seeking to inform, existing research evidence and the Royal Commission reports.

Findings

We reported a range of findings that helped identify the longer-term changes in the way family violence was discussed and were able to estimate the main effects of the Royal Commission and subsequent policy initiatives. These changes, observable through the different public discourse datasets (news, Twitter, public inquiry submissions), were mapped against the government’s official outcome indicators. A number of diagrams and chart types were chosen to present the most salient findings. These choices matter, and working with large corpus natural language or text datasets meant that innovative techniques had to be used to convey findings concisely and dynamically.

A tree diagram was used to visualise five core thematic dimensions of family violence identified through analysis of the Royal Commission public submissions and policy reports, which were victims, perpetrators, causes and contexts, systems, and solutions. These dimensions served as a baseline and were used to compare changes to the public conversation thereafter.

Two standard graphs were used to quantify public discussion of family violence, and show change over time, against the five Royal Commission dimensions. This revealed alignment and divergence between public discourse and policy frameworks.

Two ribbon graphs (see Fig. 2.1) were used to represent and quantify the change in news media and Twitter topics, between 2014 and 2018, and the continuity and discontinuity of those topics. We drew out insights from this analysis. For example, in Twitter data, victim survivors and perpetrators are discussed more directly and pointedly, and victim survivors voice their own experiences, to a far greater extent than in news media and policy reports and inquiry submissions. At a high level, we showed that the public conversation changed in relation to the 2015 hearings of the Royal Commission and policy framing. Unlike Twitter, which consistently followed the hearings and amplified the issues it raised, news media reporting was much slower to change or respond to the Royal Commission. The news coverage only took off with the rise of the #MeToo movement in late 2018.

Fig. 2.1
A time series graph plots the volume of tweets from various categories from 2014 to 2018. Categories of the tweets are men's actions, culture, systems, law reform, advocacy, experiences, and policing.

Topic modelling analysis of Twitter topics related to family violence 2014–2018. Note: Ribbon graph adapted from data in “Community responses to family violence: Charting policy outcomes using novel data sources, text mining and topic modelling”. by A. McCosker, J. Farmer, and A. Soltani Panah, 2020, Swinburne University of Technology, p. 24, https://apo.org.au/sites/default/files/resource-files/2020-03/apo-nid278041.pdf. (Copyright 2020 by Swinburne University of Technology. Adapted with permission)

A Twitter timeline graph identified key public events against peaks and troughs in Twitter activity (Fig. 2.2). This helped to discover when there was attention to key policy events and other influential public actions and controversies.

Fig. 2.2
A time series graph plots the volume of tweets over 12 months for the years 2015 and 2016. Peak points mark the events that address family violence.

Timeline and peaks of Twitter activity addressing family violence by year (2015 and 2016 represented). Note: Twitter timeline analysis graph adapted from data in “Community responses to family violence: Charting policy outcomes using novel data sources, text mining and topic modelling”. by A. McCosker, J. Farmer, and A. Soltani Panah, 2020, Swinburne University of Technology, p. 29, https://apo.org.au/sites/default/files/resource-files/2020-03/apo-nid278041.pdf. (Copyright 2020 by Swinburne University of Technology. Adapted with permission)

Bubble charts were also used, drawing on named entity analysis, which quantifies mentions of people or organisations in the data. This showed the relationship between Twitter and news media items by key topic area and influential people and organisations. These changed over time. Through the named entity analysis, we identified key players in the public debates surrounding family violence over the target period. This included politicians, advocates and activists, as well as news organisations.

Outcomes and Lessons Learned

The data analysis gave fresh insights relating to how family violence was discussed and changes over time post-policy change. It showed the DPC that there were datasets that could inform their outcomes about public attitude and public discussion changes. Where they had previously relied on community surveys that tend to feature limited demographics in response, by re-using other datasets they could access a wider range of attitudes and language. Analysis raised new issues that they had not thought about previously, such as what topics were featured in policy compared with public concerns. For example, there was limited and abstract discussion of perpetrators, but as time passed, there was more nuanced discussion on Twitter about men as perpetrators and social and structural factors influencing family violence. That the news media continued sensationalising tropes about violence showed that government still needed to do more to influence news media reporting. They found out that the public uses different and diverse words (compared to policy) to depict and discuss forms of family violence, particularly using the term ‘abuse’. An evolving timeline of public responses highlighted that policy events influenced volume and duration of peaks in Twitter discussion more than some very serious crime events. Analyses also highlighted how particular people and organisations influence the conversation in different directions. Together, the analyses gave a much more nuanced perspective about how the public responds to policy that could inform useful changes to policy over time.

The project featured collaborative research around evaluating outcomes in relation to a significant social policy issue with government departments and arms-length agencies. As such, it showed that through collaborating to bring multiple knowledges and skills to the table, existing data could be re-used to find evidence, rather than collecting new data. We introduced new types of data and analytical methods and showed how partners’ current social media analysis could be refined and extended.

The work led to our research team developing ongoing relationships with the departments and agencies. Specifically, it also led to a presentation at a key government knowledge transfer event and to newly funded research about accessing, integrating and analysing the government’s longitudinal datasets on family violence.

Re-using data and using novel data analytics techniques is challenging, and in large, traditional, bureaucratic organisations requires determined champions to drive experimentation and change. While we were fortunate to work with a series of senior advocates within government, the project was hampered by multiple senior staff changes throughout the study period, affecting continuity, support and understanding of the work.

The collaborative processes we used may appear time-intensive, but they offer substantial methodological benefits from bringing in different expertise, perspectives and questions and achieve direct impact in influencing knowledge and awareness about data amongst those that participate. Potentially, these representatives are inspired to return to their departments and agencies and be more confident about advocating for using data and growing skills in data use.

For further information about the project see McCosker et al. (2020).

Case Study 2: Re-using Operational Data with Three Non-Profits

Project Goal

Explore the relevance and feasibility of data analytics for non-profits through deploying a collaborative data action methodology.

Project Description

Australian non-profits are aware of the rise of the data analytics movement, but many lack the capability and resources that would allow them to fully utilise their data via analytics. The three non-profit partners in this project provide services for different target groups and have different existing requirements to use data—including to report to external funders and government regulators. Each has gathered a set of datasets over a number of years in relation to their work.

We facilitated a series of iterative workshops with staff to identify their organisational ‘pain points’ (i.e., problems and questions), understand their datasets and determine if and how data analytics could be used to provide new insights that could guide future strategies. We also developed a series of educational webinars about working with data, including information on relevant laws, local policies, technological tools and open data portals. Non-profits’ staff were interviewed at the beginning of the project to assess aspects of their existing organisational data capability and their hopes and expectations. Interviews were repeated at the end of the project to discover benefits and reflect on learning and challenges.

The project ran from 2020 to early 2021. While originally we envisaged multiple face-to-face meetings and training sessions, ultimately all sessions were conducted online. Both non-profits’ staff and researchers spent several months in lockdown due to the COVID-19 pandemic and dealt with multiple operational challenges while they participated in the project.

Collaborating Partners

The project was funded by the Lord Mayors Charitable Foundation (LMCF) (a philanthropic foundation based in Melbourne), the non-profit organisations that participated, and a small grant from our university. The non-profit partners were:

  1. 1.

    Yooralla, an organisation providing services for people with disabilities in their homes and the community.

  2. 2.

    Good Cycles, a social enterprise that provides supported employment for young people who might otherwise have difficulty accessing jobs and training due to social and economic disadvantage. Good Cycles engages young people in work experience including in operating retail bicycle shops, mobile car share cleaning, bike share, and parcel deliveries and logistics—all using cargo bikes instead of cars or trucks. In addition to providing training and employment, the organisation promotes urban sustainability.

  3. 3.

    Entertainment Assist, a charity that raises awareness about mental health and wellbeing in workplaces and for employees in the Australian entertainment industry. Entertainment Assist offers a mental health training programme (Intermission) for staff and employers.

How the Project Began

Leaders at the LMCF partnered with our team because they were interested to explore the potential of new capabilities in understanding and using data from partnering with a university data lab to find, examine, analyse and visualise data.

Once initial partial funding from LMCF was secured, the next step was to identify and attract three or four non-profits that would also co-fund their participation. Establishing agreement from the non-profits to participate sometimes took several conversations over two to three months, involving researchers, non-profit managers and staff. The researchers shared examples from past data projects, as well as gave examples from initiatives like The GovLab (https://datacollaboratives.org) and NESTA UK’s data analytics projects and reports. While there was strong initial interest from potential partners, negotiating to the point of securing participation and funding was a significant challenge. As the COVID-19 pandemic hit, one partner (a large community health service provider) was forced to withdraw to focus on core business.

Summary of Datasets Used

We focused on re-using non-profit partners’ internal datasets but drew on open public datasets to support and complement these datasets, helping to produce new insights (Table 2.3).

Table 2.3 Datasets used in the three non-profits’ analyses

Methods

Educational Webinar Series

A webinar series was designed aiming to familiarise non-profit partners’ staff with foundational concepts about data analytics in the context of their sector. Five webinars were pre-recorded by the research team and distributed via email weblink, with supporting resources and recommended readings. Webinars ran concurrently with the co-design workshops from August to November 2020. Topics covered included introducing data projects, data ethics and governance, data collaborative methodologies, sharing a technology toolkit and next steps in organisational data analytics. A final interactive webinar was conducted via Zoom in February 2021, bringing non-profit staff participants and the university team together to share project findings and insights.

Discussion Workshops

Staff from each non-profit participated in three data analytics workshops specifically exploring their questions and data. The workshops covered the following:

  • Workshop 1: Goals of the project, key ‘pain-points’ and questions, and identifying internal datasets;

  • Workshop 2: Review and discussion of initial data analyses and visualisations;

  • Workshop 3: ‘Deeper dives’ into organisational data visualisations, use of other open public datasets to enrich analyses and discussion of how to communicate and apply data analyses.

The non-profits were responsible for identifying relevant internal datasets and ensuring these were de-identified according to the Australian Privacy Act 1988. These datasets were shared with Swinburne researchers via SharePoint (a secure enterprise file-sharing platform).

Following workshops 1 and 2, the research team’s data scientists worked with non-profits’ staff to generate visualisations based on partners’ internal datasets. Following workshop 3, some open public data sources were analysed and visualised to compare or add value to internal data analyses. These processes involving non-profit staff in processes of cleaning, obtaining, analysing and visualising data provided opportunities for non-profit staff to identify potential value from data analytics as well as to understand the work, technologies and governance issues involved. Collaborative working between university and non-profits’ staff inspired discussions about future investments in data science capability-building for their organisations.

The workshop approach drew on aspects of the data walk method pioneered by the Washington DC based Urban Institute (Murray et al., 2015). This method focuses on visualising data and sharing and discussing visualisations as a method of collaboration, participation and iteratively honing analyses to address participants’ questions.

Data Analysis

Entertainment Assist

Data scientists from the research team worked with Entertainment Assist to generate several different visualisations using the Intermission course evaluation survey data. Descriptive statistics and sentiment analysis were applied. In workshop discussions, differences between managers and staff cohorts undertaking the training were identified, and this drove a next round of data analysis further exploring the responses from these groups. Workshop 3 raised the idea of comparing programme participants by job, as those taking the course range from young performing artists to older technical staff. Word clouds, sentiment analysis and other types of statistical analyses compared data from the Intermission dataset with data from the Australian Bureau of Statistics’ Australian National Survey of Mental Health and Wellbeing. The comparison generated new insights about the potential impacts of the Intermission programme for particular at-risk cohorts as highlighted by national data.

Good Cycles

Data about training by employee from the Transitional Employment Program dataset was initially used to generate an analysis of tracking workers’ progress in building employment skills over time. Thereafter, worker journey data was used to generate a geospatial visualisation of data showing 2514 trainees’ bicycle journeys during the course of service delivery over three months. Bicycle journeys were visualised as trails on a map of Melbourne’s suburbs.

Building on these initial analyses, geospatial data about trainee journeys from Good Cycles facilities to customer sites was compared with environmental modelling data from the City of Melbourne Transport Strategy 2030 (City of Melbourne, 2020) to help calculate the environmental benefits, in terms of reduced traffic congestion, reduced carbon emissions and improved citizen health outcomes, of employees travelling by bicycle as opposed to car or truck.

Yooralla

Yooralla was interested to improve staff experiences of work, and analysis began by examining internal operational human resources and training datasets. Geospatial and temporal visualisations were initially generated, showing aggregated data about staff demographics, rostering history and training by Yooralla service location. Thereafter, an objective became to discover variables linked to staff retention, and one target suggested to explore was to compare staff demographics with distances travelled to reach workplaces. A key question pursued was—might distance travelled to their workplace influence staff retention?

For discussion at workshop 3, datasets analysed included Australian Bureau of Statistics (ABS) data about median levels of general population employee income across Melbourne, compared with geospatial postcode data for Yooralla employees and geospatial postcode data about employees’ primary workplace (ABS, 2020a). Datasets were compared for any insights relating to associations between median income for suburbs and staff home and work locations.

Findings

Insights from Data Analyses

Each non-profit participated in generating analyses and visualisations that they considered helpful in understanding and explaining the challenges they brought to the project. As examples, staff of Entertainment Assist were able to better understand the significance of their training course for particular target groups and to consider how training might be tailored for different groups. For example, young, mostly female dance students and stagehands who are mostly middle-aged men would both be key target groups but would need differently configured training content.

Data analysis and visualisations generated allowed Good Cycles to demonstrate their contribution to the environmental sustainability of Greater Melbourne because the impact of employees’ travel by bicycle could be calculated in terms of impact on congestion, emissions and public health. Figure 2.3 provides an indication of how Good Cycles’ employees journey data can be shown. This particular depiction selects out only three cycling employees’ journeys across Melbourne from the Good Cycles’ depot but serves to show the type of geospatial visualisation that Good Cycles found useful.

Fig. 2.3
A geospatial map of Greater Melbourne. It marks the path of three Good Cycles employees. All the paths are convoluted and overlap.

Geospatial visualisation of three Good Cycles’ employee journeys

Insights for Yooralla included understanding the impact of the locations of their service hubs (often in higher income suburbs) in relation to where their staff could afford to live (a majority resided in mid-lower income suburbs). Disparities meant staff had long journeys to work and this potentially related to staff retention. Through a visualisation of internal and ABS employment and income datasets, Yooralla saw that the average daily commute for their employees was nearly 60 km return journey. This is considerably further than the average Australian commuting distance (ABS, 2020b). This led the Yooralla team to consider whether new work practices and staff work locations could be significant when trying to improve staff retention. Insights generated from the work ultimately led Yooralla to develop new policies for employee rostering.

From the Before and After Interviews

The non-profits’ managers shared their initial goals for participating in interviews held at the start of the project. The main themes are summarised below, with illustrative quotes.

Improve organisational data know-how: “The best-case outcome is that … we improve our definitions, we improve our measurement, and we improve our data collection … and we have a culture, we have a discipline around capturing data” (Entertainment Assist).

Inform organisation strategy: “I think we’ve got very rich data. We’ve got a lot of data. And obviously, it’s getting through all of that information and providing it that will inform change, that will inform improvements, that will make changes for the better”(Yooralla).

Generate new insights: “I think there is an opportunity…to look at what other areas we could be exploring with this data. I think there is an opportunity to actually look at all the information that we have—and look at it in different ways, and look at it in more meaningful ways” (Good Cycles).

Show outcomes and impacts to funders: “Obviously there are a number of incredibly generous philanthropic organisations out there and seeking support for particular programs and projects is an important part of our work. [This project] … helps us to quantify some of the outcomes that we’re seeking to achieve” (Entertainment Assist).

At the end of the project, participants identified immediate benefits from using data visualisations in reports to board members and funding bodies. For example, Good Cycles used a visualisation as part of a competitive tendering process to show the advantages their use of bicycle transport had for the environment:

[The client] said, ‘What’s your footprint? What sort of area can we cover?’ So, I got [Swinburne data scientist] to send me the heat map … I packaged that up and we sent that back to the client, to demonstrate how far north of the CBD [Central Business District] we go, how far south-east and west. It was good, it was a valuable piece of data. (Good Cycles)

All participants reported that the iterative workshop discussions of visualised data helped them to understand challenges and impacts associated with using their data which built their skills for working with data. One organisation, for example, realised there was a need to streamline current use of open text in reporting processes to generate more consistent and useful information:

People would put in the same concept [into the database] in 40 different ways … [It was] a bit of a wake-up call for us, and it really clarified that there’s only five major classifications that we want to look at in terms of risk, and that it’s actually easier for us to show what the problems are to stakeholders if we just use five risk classifications. (Yooralla)

Outcomes and Lessons Learned

The project took a long time to start, partly due to challenges of the pandemic and lockdowns, but also because potential partner non-profits were uncertain about committing to participation. In preliminary interviews, staff ‘confessed’ their lack of formal training in data analytics or their lack of experience with specific tools or resources for managing and visualising data. Some expressed embarrassment about the ‘messiness’ of their organisation’s data. While most participants worked with data to some degree, all assessed their understanding of data practices as limited.

Concern was particularly acute where large volumes of data were already generated. Participants discussed workarounds to deal with poor systems or their lack of know-how. For example, one participant described downloading datasets from the organisation’s proprietary human resources software, which they then manually imported into Excel to generate monthly reports.

A key finding from the project was that through collaborating with the university team, non-profit staff and leaders developed a different philosophy of thinking about data. They started to view data, its collection, and stewardship as a resource management issue, with datasets as resources that were useful to them depending on their skills and knowledge around using them. This was a shift from thinking about data as a compliance issue, something they had to do to assuage funders and regulators. Non-profit participants started to think about protecting and owning the value in data with an eye to the insights they could glean from different types of analyses.

Despite multiple challenges caused by working during the pandemic and its lockdowns, project aims were met. Unforeseen impacts included participants reporting that working with data sparked new collaboration between internal staff teams that had previously been siloed. This prompted new thinking about ways the combined teams might work with other organisations to combine resources and build data collaborations.

For further information about the project, see Albury et al. (2021).

Case Study 3: City of Greater Bendigo Data Collaborative

Project Goal

Assess the feasibility and potential benefits of a community data collaborative.

Project Description

Place-based planning and collaboration to address community challenges is encouraged in Australian government policy (Government of Victoria, 2020). However, planning for rural places is challenged by lack of data at meaningful spatial levels (Payton Scally et al., 2020). Forming a data collaborative could help by enabling re-use and pooling of data from multiple sources, including non-profits’ internal data and open public data. In this project, seven organisations collaborated with university researchers to test the feasibility and potential of pooling and sharing data. The City of Greater Bendigo covers a population of 120,000 living in urban suburbs and rural localities. It is 153 kms (two hours’ drive) from central Melbourne, the capital of the state of Victoria, Australia. Working with managers of the partner organisations, the project identified, obtained, analysed and visualised open public datasets and organisations’ internal datasets, with mainly geospatial analysis and visualisation by suburbs and localities. During 2021, a series of workshops involving organisation staff and researchers were held to discuss topics of interest, identify datasets, consider useful ways to analyse data and then to discuss mainly geospatially analysed and visualised of datasets. Ultimately, this process informed development of a prototype community resilience indicator dashboard.

Collaborating Partners

Partner organisations included a national bank; City of Greater Bendigo council; Haven Home Safe, a non-profit homelessness services provider; Murray Primary Health Network, a government-funded primary health services commissioning organisation; Women’s Health Loddon Mallee, a women’s health service; and Bendigo Community Health Service and Heathcote Health Service, two community healthcare providers servicing different parts of the City of Greater Bendigo area. Our Swinburne University Social Data Analytics Lab team worked alongside the community partners.

How the Project Began

The project started because a community health service manager was interested in exploring whether a data collaborative could help to overcome lack of data to help assess services’ impacts on local health and wellbeing. The manager mobilised a group of other managers of local organisations to form a data collaborative working with our team of data science and social science researchers.

An initial workshop discussed practicalities of data collaboratives and presented examples of international community data initiatives, such as those led by the National Neighborhood Indicators Partnership and The GovLab. Following this, the organisations each contributed to a fund (to an approximate total of US$50,000) to form a data collaborative, and they nominated a lead organisation. Their self-organisation meant the partners committed to work with each other from the start.

As well as an overall contract between the university and the lead organisation, individual data-sharing agreements had to be established between the university and each organisation. We provided a standard template, but each organisation had to generate separately a data-sharing document agreed by their lawyers. This variously took one to five months to organise. As each agreement was signed, we started working with their staff to identify datasets and analyse their data.

While established methodologies about the process of data projects emphasise the need to start with a focused problem or question (GovLab, 2022), our partners found it difficult to identify a specific shared problem. All were interested in community wellbeing and resilience and potentially had datasets that could inform those topics. Consequently, we suggested developing layers of geospatially visualised data, each layer broadly relating to a community resilience topic. Given the partner organisations, the topic-focused data layers we suggested were social connection/isolation, caring, financial wellbeing, housing/homelessness and community health service use.

Summary of Datasets Used

We used open public datasets as well as re-using partners’ internal datasets, as Table 2.4 shows.

Table 2.4 Datasets for community resilience data collaborative

Methods

Discussion Workshops

Six workshops of organisation representatives were held at key stages. Early workshops established organisations’ missions, topics of interest and relevant datasets. Discussions with organisations were ongoing between workshops, particularly about establishing data-sharing agreements. Datasets were analysed by the researchers in liaison with organisation staff and explored collaboratively through subsequent workshops. These revealed insights, as identified by partner organisations, enabled discussion of caveats of the datasets and included and considered useful ways to present the data while maintaining unidentifiability and paying heed to emergent considerations for partners. For example, we discussed how to present bank data—ultimately this was presented as an index of financial wellbeing, along with other relevant financial wellbeing datasets. The workshop process helped to build relationships, mutual knowledge and trust between the partners, even though most workshops were held online.

Data Analysis

Geospatial visualisation by suburbs was adopted as an analytical approach because most of the datasets had location data, and a place-based approach resonated with partners. As well as considering what open public data was available, each collaborating partner also worked to identify internal datasets that could be re-used and shared. A set of criteria drove identification of datasets to include, as follows:

  • data about a topic that aligns with the idea of community resilience;

  • data that is analysable by suburb;

  • either data subjects that are unidentifiable or data that could be aggregated to achieve non-identifiability;

  • caveats around the datasets should be transparent (e.g., the denominator of the dataset, how data was collected and the nature of consent obtained must be known).

Flexibility was required because some datasets were not analysable by suburb, meaning we had to explore other ways to analyse and present some data.

Once each organisation worked through the process of generating a data-sharing agreement, partner organisation managers then shared their dataset(s) with researchers in a suitable format for analysis. Some organisations were able to navigate this stage more quickly than others, depending on data governance practices and availability of dedicated data staff. It was particularly challenging (and for some organisations, impossible) to obtain aggregated data about health services.

Some requested help to export their data. Organising data by suburb was not a standard metric for all organisations. Some collect data at postcode or local government area (LGA) level, which was insufficiently granular for the analyses sought. Suburbs have the disadvantage that they have highly varied population sizes, with some (especially rural localities) having small populations (sometimes <50). This makes it challenging to report results as unidentifiable and reduces the reliability of the Census-derived datasets, because the Australian Bureau of Statistics (ABS) introduces deliberate errors when numbers are low, to protect privacy.

Given the caveats above, datasets were aggregated by suburb where possible and then combined into a single table using the R programming language. The data was exported, joined to a shapefile of suburbs and displayed as a colour-coded geospatial visualisation (map) using PowerBI.

To facilitate comparisons between datasets, data was expressed as proportions of people or households. Different datasets had different samples—so, some were reported as a proportion of the entire population, while others were reported as proportions of other denominators, for example, of respondents to the council survey, by suburb.

Findings

Community Resilience Data Dashboard

With most datasets analysed by suburb, the geospatial map format shown in Fig. 2.4 was favoured by most workshop participants. One, two or four maps could be shown on the screen so simultaneous comparisons could be made between different topics or different indicators or datasets about the same broad topic. Ultimately, a data dashboard was generated with an opening interface showing the different topics—Social Connection, Financial Wellbeing and so on. Users could click through to datasets on these topics and view data geospatially visualised as maps with other graphical representations also available on-screen for deeper dives. As examples, social connection by suburb also shows a bar graph by age group. Also, suburbs could be clicked on via the map, for more granular information about age group and other demographics, by suburb.

Fig. 2.4
6 maps with certain areas shaded stacked diagonally. Each is labeled from bottom to top, social connection, caring, money, well-being, housing, and service use.

City of Greater Bendigo Community resilience dashboard layers by suburb

From the Before and After Interviews

Interviews with partner organisations were held at project start and end. Below, issues raised at each stage are summarised, with some example quotes. This serves to highlight the outcomes and process of change for participants.

At the start of the project, participants raised three main aspirations: access to data, connecting with data, and building capability. These are summarised below, sometimes with illustrative quotes.

Issues About Data

Themes discussed related to lack of access to useful data, including low granularity, insufficiently current data and decline in tailored help from government statistical agencies as their funding has contracted. Partners were frustrated by apparent complete inaccessibility of some datasets (e.g., health data) and hoped the project would help them to find ways to access this data or to find out why it was so hidden. In terms of their own data, partners sometimes noted feeling overwhelmed; for example, “We have just so much data that’s in our systems, but actually being able to pull it out and make sense of it and gain insight and intelligence from it is a continuous challenge” (homelessness service). All were intrigued by the potential to use data more and sought to probe the benefits and boundaries of data re-use.

Connecting with Data

Participants saw beyond the immediate challenges and thought working together with data could be a catalyst for bringing organisations together for community benefit: “For the health services and other providers as part of the co-op, it might just actually make a difference and be a way we can all collectively advocate for a more interconnected service system. We know at the moment there’s a lot of wasted time and effort and money for the service providers, but also the clients who just get shunted from one place to another” (homelessness service).

Building Capability

Generating data capability for individuals, organisations and the community was mentioned by most participants: “It’s actually growing some capacity in our region to use data together” (women’s health service); “So our organisation would have capacity in terms of well, how to design data sets for instance, so that they are analysable” (homelessness service).

By the end of the project, partner participants reported feeling more confident and empowered about using data. While they noted insights gained about their community from data analyses, their main reflections were about gains in data capability and collaborative relationships.

Insights About Community

Participants noted their preconceptions about more-or-less resilient suburbs were not all borne out when actual datasets were analysed. For one suburb not previously identified as having challenges, data analyses showed consistent deficits, when compared with other suburbs, on multiple resilience indicators. Another suburb perceived as wealthy was suggested—via data analysis—as vulnerable regarding social isolation. Participants noted this made them want to find out more about what was happening in these suburbs, that is, to get some ground-truthing for verification of the information suggested by the data analyses.

Capability Built

All participants discussed increases in aspects of data capability. One participant highlighted appreciation of governance matters for using and sharing data, while another had started working with her organisation’s data specialist and was working more with data herself. One participant, a data manager at a health organisation, noted the project had made him question his organisation’s reluctance to share data: “I’ve come to question some really tired governance structures. Maybe it’s done because we don’t understand what’s being asked, but really, it’s about avoiding the risk. I don’t have a solution, but it’s become quite obvious” (health service commissioning organisation).

Participants discussed strategies developed to deal with data sharing challenges. For example, making indices to show relative levels of indicators across different suburbs. The power of sophisticated visual displays was highlighted: “I found it really riveting the first time you guys showed those maps… it was just—I loved it” (community health service No. 2); and “Service managers are often quite visually driven, so it’s quite powerful in that sense, the power of the data seeing it displayed” (health service commissioning organisation).

Connecting with Data

The project helped to build relationships and understanding between organisations. One said: “I guess I’ve become more aware of the value of the process, perhaps even more so than the value of the outcome” (Council). Talking about and with data was suggested as useful for building knowledge about each other’s work through data. Bank participants said they had increased understanding of community challenges and they were able to introduce this knowledge into other discussions within the bank.

Outcomes and Lessons Learned

Overall, the project was well received, with participants more enthused at the end than at the start! Participants worked their way through data challenges as they arose, finding workable solutions. For example, using an index when working with potentially sensitive data to avoid any risk of identifiability. On this topic, participants were primarily concerned about reputational risk for their organisation if someone used analysed data out of context as, in all other respects, they were sure they were re-using data safely and ethically.

Contrary to advice to start with an identified question (The GovLab, 2022), partners in this project benefited from a period of exploring data with each other. At the start, each had their own interests and did not know the work of other organisations. Significantly, they also did not know what data might be forthcoming from their own organisations. The project was a journey of discovery in many ways and, at the end, participants were more knowledgeable and confident to agree next steps of work with data as individual organisations and collaboratively.

While the project started with organisations focused on getting new insights from data, from around half-way through the project, partners agreed a different significant outcome was forthcoming. This was building mutual knowledge through exploring data together that enabled them to see what each could contribute to collective change at community level. Further, they felt empowered to use data in their own work and could see where it might support work of the organisation because they could now understand their operations and services through a lens of data. Some commented they had started to work more confidently on data governance issues. For example, the homelessness service identified gaps in data due to incomplete collection. Managers said they would use new data visualisations to illustrate to staff the benefit of collecting complete datasets.

Data sharing remains problematical. One health organisation simply did not provide data because of perceived challenges of sharing. The data manager explained it was too difficult and time-consuming to navigate the necessary processes—potentially impossible, he thought. Most encouraging was that some managed to navigate data sharing, helping to generate novel analyses that gave new perspectives about the community.

To read more on the City of Greater Bendigo Data Collaborative see Farmer et al. (2022) and https://datacoop.com.au/bendigo/.

Summary

Above we have provided three case studies of data projects from our research and working with partners. While each is different, they all involve collaboration between people and/or organisations with different expertise and perspectives. Similarly, in common, the cases each re-used different datasets and targeted different insights.

Each of the cases provides evidence of learning and changes in relation to using data among staff of the participating organisations. We understand this as influencing aspects of the data capability of the organisations that participated. With Case Studies 2 and 3, we were able to evidence changes through before and after the project interview data collected. With Case Study 1, the government Business Insights Unit was able to extend its range of types of analyses to inform policy once it learned new techniques of using social media data and found new data sources. In Case Study 2, each organisation’s participants expressed surprise that their routine datasets could be repurposed to address real operational and impact measurement challenges. Case Study 3 yielded several examples of changes in awareness, with a participant of one organisation talking about using data much more in her own work and most of the participants remarked on their increasing and more confident interactions with their data staff and teams due to their practical and applied learning from the data collaborative project.

The datasets and analysis techniques varied. While Case Study 1 used innovative Natural Language Processing techniques and public ‘big data’, linking disparate existing datasets and geospatial analysis was more important for Case Studies 2 and 3. Common to each case was a collaborative process of data discovery, repurposing, linking and sense-making. That is, each case shows the significance of identifying and exploring existing datasets and considering how they can be re-used and linked with open and public data. Equally important is the process of data visualisation and, in each case, this enabled processes of collaborative sense-making with the data.

In terms of collaboration, Case Study 1 involved participants from different departments and agencies of government involved in generating, implementing and evaluating policy, but also staff of the Business Insights Unit who were already engaged in aspects of data analysis. In Case Study 2, the participants brought together around projects were from across departments within each of the non-profit organisations. These staff tended to note that they generally work in isolated departmental silos. The data project brought them together to discuss how their work interconnects, driven by working with data. In Case Study 3, the collaboration was among different organisations working in the same community. Interestingly, for each of these different types of collaborations, we noted the same set of emergent phenomena or benefits. Participants got to know and understand each other’s work partly through the purposeful action of the process, but also by discussing and probing data generated by the work of different participants at the table (or on the Zoom call). Further, new relationships were forged that could lead to more efficient and effective, and certainly better-informed, future working together. As a participant in Case Study 3 noted, she came to understand “the value of the process even more so than the outcome”.

Each case raised barriers and challenges that simultaneously helped to ground participants’ expectations about the potential of data analytics, but also sent them back to their organisations to question practices or to make change. For example, in Case Study 3, the homelessness organisation wanted to improve the completeness of its data, and the healthcare commission organisation participant wanted to explore governance practices that served to keep health data hidden. In Case Study 1, participants came to understand the value of aligning the outcomes measurement framework with likely available data from the start, rather than trying to tack things together after policy implementation. All participants came to understand the challenges of sharing data between collaborating partner organisations.

Key Takeaways from This Chapter

In this chapter, we jumped straight into some case studies of non-profits and data analytics. This was done to ensure that readers know what kind of work we are talking about and to illustrate the range of possibilities for types of datasets to work with, visualisations and participants. Key points to take away from this chapter are listed below.

Key Takeaways

  • Small, experimental projects that address real-life challenges provide a ‘toe in the water’ for staff of non-profits and others to test the value that data analytics could have for them.

  • Collaborating on projects led to building relationships across departments and organisations that resulted in better informed data products and to wider understanding among novel networks of people.

  • Work on the projects led to increases in knowledge, awareness and comfort in working with data among participants. We suggest this led to some building of data capability and also to understanding what their organisations need if they are to work more effectively with their data.

Undertaking the case study projects in this chapter with diverse organisational partners led to our conceptualisation of data capability and appreciating the benefits of collaborative working that are explored in Chap. 3.