Achieving Data Synergy: The Socio-Technical Process of Handling Data

Good quality research depends on good quality data. In multi-disciplinary projects with quantitative and qualitative data, it can be difficult to collect data and share it between partners with diverse backgrounds in a timely and useful way, limiting the ability of different disciplines to collaborate. This chapter will explore two examples of the impact of data collection and sharing on analysis in a recent Horizon 2020 project, RealValue. The

main insight is that it is not only projects but also the processes within them such as data collection, sharing and analysis that are socio-technical. We shall examine two examples within the project-validating the models and triangulating the qualitative data-to examine data synergy across four dimensions: time (synchronising activities), people (managing and coordinating actors), technology (in this case focusing mainly on connectivity) and quality. Recommendations include developing a data protocol for the energy demand community built on these four dimensions.
Keywords Data collection and sharing methods • Socio-technical • Multidisciplinary • Energy demand • Demand response • Smart grid

IntroductIon
A large number of field trials have attempted to understand energy use in buildings (e.g. Economidou et al. 2011;Jones et al. 2013;TSB 2014;Guerra-Santin et al. 2013;Gupta and Kapsali 2015). Nevertheless, the number of studies with complete monitoring equally capturing building data, technologies and people is limited, a fact recognised by the Buildings Performance Institute Europe (BPIE) as limiting the impact of this research on European policy (Economidou et al. 2011). Notwithstanding their size, samples and research scope, many studies experience similar pitfalls in their data collection processes. Despite recognition of the need to combine multiple methods to understand the multidimensional socio-technical issues (Topouzi et al. 2016) and ongoing recognition of the ontological and language challenges of multidisciplinary work (Mallaband et al. 2017;Robison and Foulds 2017;Sovacool et al. 2015), there is less focus on the challenge of data collection and the implementation of these methodologies. This chapter reflects on the socio-technical nature of data collection and sharing in multi-partner multidisciplinary 1 projects: not just the fact that different types of data need to be collected and analysed but the expectations different disciplines have of data 2 and the different skills they bring to the analysis. Recognising this and planning accordingly increases the chances of high-quality, useful data being used in collaborative ways in complex consortia. We suggest four dimensions to achieving data synergy 3 in such contexts: synchronising data processes in time, coordinating the people involved both logistically and in terms of their skills and expectations, recognising the multiplicity of issues affecting both social and technical data collection and paying attention to data quality.
Although the chapter will use examples from RealValue 4 (see Fig. 5.1), the issues discussed are common to most multidisciplinary projects with multiple actors. The chapter will use two illustrative examples. The first examines attempts to validate bottom-up models of energy demand using RealValue was a 3 year demonstration project (2015-2018) exploring the potential of Demand Response (DR) through the installation of Smart Electric Thermal Storage (SETS) space-and water-heating systems in several hundred properties (domestic and non-domestic) across trial sites in Ireland, Germany and Latvia. Whereas previously, storage heating typically only charged up overnight, the aim was to demonstrate how smart electric storage and water heating might support the functioning of the grid through Demand Response (DR) if it was able to switch on or off at any time (provided customers' needs were being met) in order to match demand with available supply.
The project involved a multidisciplinary group of energy modellers, social scientists, manufacturers, engineers, software designers, network operators and the electricity supply industry and was divided into two strands: on-the-ground implementation, which collected data in properties, and a modelling component based on archetypal data and validated by trial data. Both strands started in parallel straight away. These fitted together as outlined in the diagram below, which shows the interrelationship between the two strands (to be achieved through data sharing); the importance of timing (given the need to synchronise the strands to produce deliverables within the project time-frame); and the difference between the original plan of the project and what actually happened (which is discussed in more detail later). This is a fairly standard project framework but has inherent difficulties built into the data collection process, which is what this paper addresses.  trial data collected during the project. The second explores efforts to triangulate the qualitative data collected on customers, using monitoring data from the heating and hot water appliances fitted in their homes. The chapter will start by introducing the background context of the project, move on to discussing the four dimensions of data synergy and finish with some recommendations for achieving data synergy.

Background context
In order to later appreciate the data requirements of each project strand, it is necessary to describe them briefly.

Modelling
The plan for the modelling work was to integrate a building energy model (BEM) into power system models in order to assess the potential system value of deploying smart electric thermal storage (SETS) and then to validate them using trial data.
A BEM is a physics-based simulation of building energy use. Inputs into the model include physical characteristics such as building geometry, construction materials, lighting, HVAC 5 and so on (Negendahl 2015;Clarke and Hensen 2015). The model also needs information about building use, occupancy and indoor temperature. A BEM program combines these inputs with information about local weather to calculate thermal loads and energy requirements, the electricity grid's response to those loads and resulting energy use. Such models are used by building professionals and researchers to evaluate the energy performance of buildings for applications like building design, retrofit decision-making, LEED certification 6 and urban planning. Bottom-up models of demand are based on uncertain assumptions (McKenna et al. 2017). To help deal with some of these, the models were initially calibrated based on 'archetypal' data from national databases to allow time to run the simulations required by the project (Andrade-Cabrera et al. 2016). Originally, there was a plan to use trial data at a later stage, to validate the models and recalibrate them if necessary.

Customer Impact Assessment
In parallel with the modelling work, customers were recruited for the live trial and had a combination of technologies installed in their homes, an experience captured in the Customer Impact Assessment (Darby et al. 2018). The technologies installed included heaters and/or hot water cylinders, an internet connection if not already present, a gateway to link the appliances to the cloud (where demand response (DR 7 ) would be facilitated), interval meters and, in a sample of homes, additional sensors (occupancy and temperature) and smart plugs 8 . Each home was therefore a source of multiple data points, for assessing the potential for DR and other research purposes.
The social scientists also collected data, including surveys before and after the installation of the technology and at the end of the project, inhome interviews, observations and photographs in a subset of properties and interviews with other project actors (installers, project delivery coordinators, manufacturers, etc.) on their interactions with customers. The objectives were to understand the impact of the installed technologies and, eventually, DR, on customers, and to assess necessary conditions for a good customer experience and DR participation. Five conditions emerged: comfort, control, cost, care and connectivity.
Both the technical and social data were meant to facilitate multidisciplinary collaboration. Interesting data from the implementation phase included indoor and outdoor temperature, occupancy, building fabric, energy consumption (ideally, with heating consumption disaggregated) and customer data held by other partners, like billing, call centre data and DR performance data. The quantitative data from the technologies installed in homes was to be used to triangulate the qualitative data.

the Processes of collectIng, sharIng and analysIng data are socIo-technIcal
Based on the social and technical contexts just described, researchers took the view that this was a socio-technical project . Following Powells et al. (2014) who argue that electricity 'load' is not an isolated physical phenomenon but also represents activities and social practices, we recognised that the technology and its users were inextricably interlinked and that, therefore, multiple disciplinary methods were necessary. Table 5.1 summarises the data collected.
It also became clear that the processes of collecting, sharing and analysing data were socio-technical, no matter whether the data being collected was qualitative or quantitative and irrespective of the use to which it was Only possible in a subset of properties finally put. 11 For example, the social scientists collected and shared customer satisfaction data with industry partners, building and occupancy data with the modelling team and the interview, observation and photographic data with several partners who were interested in a more in-depth insight into their customers, often to improve the technology on offer. In return, they hoped to receive more quantitative data such as heating periods and temperature settings from the SETS, call centre complaints/ inquiries, cost data from the energy providers and consumption data from interval meters.
Having discussed the use of the data, we now turn our attention to the data itself. There is no space to deal with every data source in turn but, in the discussion that follows, we explore more fully the idea that dealing with data is socio-technical by focusing on four aspects of the data collection and sharing process necessary to achieve data synergy.

data synergy
We contend that good data depended on four interlinking dimensions: • Time (synchronising the collection and sharing of data between different parts of the project) • People (coordinating the different actors involved in the collection and sharing of data) • Technology (establishing the connectivity between the different technologies so that data could be transmitted) • Quality (ensuring data is good enough for the research purpose) The discussion will examine challenges in relation to these dimensions in order to make recommendations for the development of a data protocol for appropriate data synergy for use in other multidisciplinary energy demand projects.

Time: Synchronisation
Figure 5.2 shows the timing of data collection in the project, 12 including the winter periods (critical data collection opportunities in a heating project), the two strands of the project and the variety of data collection methods. It is noteworthy that most data was collected towards the end of the project, with a gap in the middle caused by recruitment difficulties.
Several issues emerged: 1. Multiple data collection methods required complex coordination with the main implementation phases of the project such as recruitment, installation and the three heating seasons, as well as maintaining a coherent approach across the three countries. 2. As different stages started and finished, the need to facilitate communication among actors across different stages of the process became more complicated, and, without a single data person to oversee this process, the inevitable result was that partners focused more on managing their own data and results than on collaboration. 3. Collecting the same data at different points in the project necessitated the altering of the data collection tools to reflect the changing priorities of partners, resulting in changed metrics in some cases, and this compromised the quality of the data and made comparisons across countries difficult. 4. Timing data collection to happen during the winter season was critical, and the ambitious timeframe meant there were only three heating seasons in which to test the technology and monitor behaviour. The first phase of installations had been done by the first heating season, but the connectivity problems discussed below meant data was absent or of poor quality. Further, recruitment was then delayed until just before the final heating season, so close to the end of the project that it was difficult to process data collected when the technologies were at their most reliable.

Fig. 5.2
The timing of data collection across the two project strands

People: Coordination
People are a crucial part of collecting data, even when the methods are apparently technical. It is worth noting the different roles of people in the project, each of whom impacted the data: customers, data collection agents, installers, industrial project partners and researchers. The nature of this project meant direct access to customers was restricted, and so data were generally not collected by researchers. This was problematic because those collecting it did not have the skills, training or appreciation of the final use of the data to collect it correctly, as they had other priorities. Previous research (Janda and Parag 2013;Wade et al. 2016) has highlighted the influence of different actors in socio-technical processes, and this project was a case in point. The otherwise excellent project management team had an industry background, and their priority was implementation rather than research. Thus, ensuring timely deliverables sometimes hampered the collection and sharing of research data. Table 5.2 serves to highlight the number of different actors involved in the project and consequent complexity of sharing different types of data.
Apart from the logistical challenge of coordinating the data across actors, working with multiple partners had other challenges, more widely discussed in the literature, such as a lack of shared ontology, vocabulary and culture (Hargreaves and Burgess 2009;Longhurst and Chilvers 2012;Robison and Foulds 2017;Sovacool et al. 2015). Data sets also had a different meaning for different partners, who brought different skills to the analysis and interpreted, and then used, the data differently. This had implications for the quality of data they needed and the way in which the data was interpreted, both of which are discussed later under data quality.

Technology: Connectivity
Given IOT [Internet of Things] is in the news… clean technology, all these buzzwords are always being used. But yet, when it comes to the practicalities of doing a project with [hundreds of] houses, it was incredibly difficult.
Project delivery coordinator, RealValue project Good connectivity between the different technologies was essential, both for successful DR and to access most of the quantitative data. It is not necessary to dwell on the details of these connections (Fig. 5.3), but, in essence, it was necessary for the connected appliances to communicate through a gateway to a cloud-based aggregation platform that optimised the charging of those appliances according to the customer's comfort settings, cost algorithms and grid constraints. This was unexpectedly demanding. Unanticipated complications included the need to install internet connections, customers turning off one or other technology, power failures causing the appliances to revert to 'stand-alone' mode (i.e. not connected and so no longer transmitting data or available for DR), the need to develop interfaces for different technologies to communicate, organisational firewalls preventing communication, changing communication protocols necessitating ongoing modifications and a software update that disrupted the appliances. The variety of factors that can influence technical data is noteworthy. Spataru and Gauthier (2014) focused explicitly on the performance of various indoor environmental sensors for monitoring people and indoor temperatures. In addition, there were significant impacts on the researchers (for a specific example, see Box 5.1). However, we are more interested in the impact.

Box 5.1 Attempts to collect temperature and occupancy data using technical and social methods
Temperature and occupancy data were important both to validate the models and triangulate the qualitative customer data, and there were multiple possible data sources (Table 5.1). The heaters had temperature sensors and timing settings, which offered a proxy for temperature and occupancy, respectively. However, the temperature sensors were on the heaters themselves and so could not measure the actual temperature of the room, and heating was often set to come on when people were not at home, making both proxies unreliable. Besides, data from most heaters was unavailable until much later in the project, as described. This meant additional temperature and occupancy sensors installed in a subset of homes were important both to help calibrate the models with this appliance data and to triangulate the customer impact assessment data, but there were two significant problems. The first was that most did not transmit data. The second was that the location of the sensors was not accurately noted by those who installed them, making interpretation of the data impossible.
Although the social scientists included occupancy and temperature questions in the surveys, these were filled in by agents with different objectives, and the data was incomplete and ultimately unusable. Follow-up home visits were carried out and did include questions and observations on temperature and occupancy that were shared with modellers, but it was not possible to visit the homes with additional sensors, again because of the need to coordinate with other project partners, and so remedying the connectivity issues or observing the location of the sensors was impossible. Despite multiple possible sources, therefore, the final data on temperature and occupancy was patchy. This prevented researchers collaborating as fully as they might have done otherwise.

Data Quality: Granularity, Reliability and Project Design
During the final heating season, recruitment was completed and attention turned to fixing the connectivity issues, with some success: data did become available. As partners started to work with it, however, the next major issue arose-the quality of the data, a product of the previous three sections (Stevenson and Leaman 2010). All sorts of factors had affected the data but there are three main points to discuss here. First, expectations of the granularity (or resolution 13 ), and the duration of the data, varied depending on the partner and their purpose. So, whilst industrial partners needed single 24-hour periods of uninterrupted data to run equipment diagnostics, social scientists wanted data for participants for whom they had other data (such as surveys or interviews), and modellers needed several days of data to help them see patterns but did not mind some gaps, as long as they had an idea of occupancy (Fig. 5.4).
Second is the reliability, or consistency, of the data. As noted in Fig. 5.4, different methods of collecting apparently the same data yielded different results, making methodological transparency and accuracy vital for replicable research. Figure 5.5 demonstrates this from viewpoint of the data. It shows two sets of temperature data: one from a SETS temperature sensor, the other from an additional temperature sensor (whose location was unknown). Based on the midday temperature spikes on the solid line, we could speculate that the additional sensor was warmed by the sun. Interestingly, the interpretation of what happened on the days without spikes differed between modellers and social scientists: the former assuming cloudy weather and the latter closed curtains, possibly indicating illness or shift work, for example. Without additional data on weather, the aspect of the room and occupancy, it is not possible to tell which of these is correct, but the different analyses indicate each discipline's bias.
Still on temperature, the 2-4 °C difference between the two sensors is striking. 14 As the SETS sensor is on the metallic SETS surface near the warm air vent, it might well be warmer than the room. This might help explain the high temperature settings seen during the home visits: 24 °C at the appliance might translate to 18-20 °C in the room.
Both graphs also show gaps in the data, indicated by straight horizontal lines. Strangely, these do not always coincide, suggesting either that they were caused by different factors or that there were various combinations of factors affecting data quality. Again, without a home visit to verify, the cause cannot be known.
Third is the socio-technical project design. What has become clear upon examination of the data is that many of the problems related to the project design phase of the project. Rather than a socio-technical proj- ect design, this was in fact an industry-led technical demonstration project with some social inputs, partly leading to the incommensurability of the data discussed above. A socio-technical project design should encompass three phases: model, design and methods, and analysis, all of which should be socio-technical. This should start with a conceptual, theoretical phase that considers how the actions and states of people interact with the technical and physical properties of their environments. It might end with an analysis of socio-technical constructs such as a 'person-space-time mean internal temperature', a measure meant to get closer to the user experience of temperature in the home . The methods linking these have yet to be developed, but mobile phones and in-home temperature apps might offer some traction (Grunewald 2015).

achIevIng data synergy
Epistemological debates run as an undercurrent through all of these issues. Fundamentally, the more positivist-grounded technical/monitoring sciences would define quality in very different ways to most critical social scientists, who would instead embrace subjectivity, implying that issues of 'validation' and 'calibration', in the traditional sense, are backgrounded or at least mean something different. Nevertheless, in the context of a replicability crisis in various disciplines, this chapter suggests that data processes in the energy demand research community could use improvement.
We have contributed to the conversation about ways in which this might happen and will finish with recommendations in each of the four dimensions discussed: • Time: Synchronising research rests on critical dependencies, different from project management, and requires backup plans to ensure quality data, otherwise sometimes constrained by the project plan. Also, the duration of heating projects needs to be better aligned with their objectives. 15 • People: The impact of different actors cannot be underestimated.
Planning and responsive management are essential parts of realworld project delivery, and we would recommend four coordination roles-a project manager, a project delivery coordinator (for practical project implementation), a data analyst (from the start of the project, to organise, hold and facilitate access to a shared set of data) and a research coordinator (with a socio-technical background, to synchronise the research).
• Technology: Demonstration projects inevitably use novel technologies and the difficulty of managing the interfaces between them should be taken into account. • Quality: The use of consistent metrics would allow better comparisons across different countries with different languages, contexts, technologies and participant groups. Data protocols need to be developed to establish conventions for collecting and sharing data, both quantitative (e.g. what to capture, how often and where) and qualitative (e.g. what scales to use for age, income and cost). This is not trivial and requires work from researchers and funders. However, the reward would be more robust, reliable data; better, more policy-relevant outcomes; and more replicable research.