1 Introduction

Legacy replacement projects constitute a large proportion of digital government projects [14, 31, 60, 78]. The need to deal with legacy systems—integrate or replace them, increases effort complexity and duration, and the associated projects are overly costly and suffer from high failure rate [57]. Upon closer examination, legacy replacement projects experience a very specific requirements-related dynamic. Government agencies often use the existing legacy systems features as requirements for their replacement applications. This is because legacy features are perceived to be a stable requirements set validated through business use, hence they are adopted to minimize the risks associated with business change or for project management convenience or due to legislative or policy constraints. While some of these reasons might be legitimate, others lead to unnecessary replication of legacy features and business processes and to the failure to take advantage of digital innovation. In fact, it is a common vicious circle in which new replacement systems are customized to accommodate existing business processes in government agencies often shaped by the technological limitations of the very same legacy systems being phased out. In our research, we have defined this phenomenon as the “legacy problem” [6]: due to its circular nature, compounded by government agencies’ bureaucratised decision-making processes, it can be seen as exhibiting characteristics of “wickedness” [17, 71].

From a requirements perspective, practitioners do not treat legacy replacement projects differently from any other IT project, and no legacy-centric requirements engineering approaches are utilized. We take the view that due to their criticality to project success, a focus on the requirements engineering activities and their social dynamics is of utmost importance. Specifically, the requirements discussions during legacy replacement projects are driven by two opposing practitioner attitudes: one promoting conservatism and risk aversion, and the other, innovation and transformation [5]. In our research, we investigate how these attitudes can be incorporated into gameplay and, more precisely, within a gamification of the inquiry-driven model proposed by Colin Potts et al. [68], a well-known inquiry model in requirements engineering, to contribute to tackling the legacy problem in the context of government bureaucracies. To this end, we have designed, developed and evaluated a game prototype, named RE-PROVO. Our research reveals the potential benefits of the approach and constitutes a timely and original contribution to the requirements engineering domain: the use of gamification in government legacy replacement projects is novel, and the context and the usage of a game are an uncommon combination in both requirements practice and academic research.

This paper is based on the first author’s doctoral dissertation [4]. The definition of the legacy problem was previously published in  [6], with its bearing on requirements engineering discussed in  [5]. The original, unpublished, content of this paper concerns the design of RE-PROVO and its evaluation in real-world practice.

The paper is organized as follows. In Section two we review relevant background literature and introduce the concept of the “legacy problem.” In Sect. 3 we provide an overview of our methodological approach: the design and development of the game RE-PROVO and its evaluation with practitioners from two government agencies. Section 4 presents the results, with findings discussed and analyzed in Sect. 5. The conclusion in Sect. 6 presents a summary of the main takeaways of our research and some implications for future research.

2 Related work

Legacy technologies, such as mainframe systems or software applications developed using older platforms, have long been recognized as an obstacle to information technology innovations in public agencies and to establishing more flexible, transparent and responsive government services [24, 33, 37]. Legacy systems are also said to be barriers to strategic innovation [44], because they are difficult to modify, almost incapable of accommodating changing business processes, unable to provide new functionality and features easily, and difficult to integrate with [27].

Such characteristics are mostly regarded as technical in nature, so there has been substantial research dedicated to technologies that help extend the life of legacy systems and make integration with them easier, such as “wrappers,” web services, screen-scraping technology, etc. [16, 69]. Legacy technologies, however, pose more than technical challenges, in that due to their extensive usage (usually spanning decades) and scale, they become ingrained in work processes and organizational culture, to the extent that they come to define the modus operandi of public agencies. Kelly et al. [44], citing Kim [45], define legacy systems as an “accumulation of years of business rules, policies, expertise and knowhow.” The capabilities and limitations of legacy technologies are essentially a source of design of workflows and procedures used in many organizations—Lloyd et al. [52] provide multiple illustrations of how legacy applications “lock-in” inefficient processes.

The environment of bureaucratic and legislative rigidity in which public agencies operate [10] and the legacy technologies used in such agencies mutually reinforce each other in ways that make it hard to “disentangle” operational (or business) dimensions from technological (or software) functions and structures. In previous work, we have defined the “legacy problem” as the uncritical replication of legacy systems in the requirements for applications that supersede them [6]. Such replication is intended to minimize the changes to business processes which were shaped by the technological constraints of those same legacy systems. Government organizations are typically unable or reluctant to move away from anachronistic work practices defined by and embedded in legacy IT systems because the rationale for them has not been made explicit. For instance, Lauder & Kent [50] acknowledge “implicit business processes” as a legacy systems pattern, while Edwards and Millea [25] cite embedded business knowledge as one for four typical legacy issues that plague organizations. Furthermore, the business processes and practices embedded in legacy technologies are often uncritically accepted and “legitimized,” and they become an important source of requirements for future software applications.

The most explicit framing of the dynamics described so far is offered by Homburg [40] in his analysis of the national trajectories of digital government development. Homburg articulates the legacy problem in stating that “specifically mainframe technologies tended to be applied in such a way that they replicated the formal structures that already existed in classical bureaucracies.” He cites Nohria and Berkley [62]: “computer systems and software adopted the ‘architecture of bureaucracy.’ Not surprisingly the language of information systems became the language of bureaucracy.” This statement is not dissimilar to Conway’s Law, which, in an historical context of bespoke greenfield software development, states that software tends to replicate the structure of the organization which created it [19]. In adopting this perspective, transitioning from legacy systems is a critical step not just for technological modernization, but also in the sense of organizational and, even more, of civic and political transformations, as this step absolutely impacts the bureaucratic architecture of government agencies. If organizations in the government sector are still rigidly hierarchical, with formalized decision-making processes, rather than flat, flexible, collaborative and cross-functional entities [39], it is foreseeable that they will gravitate toward preservation of the systems that fit their culture and structural composition.

In their investigation of the innovation dynamics in public agencies, van Duivenboden et al. [79] argue that there are numerous environmental factors which stifle innovation and change in public administration, and cause government operational managers and staff to generally refrain from straying from established processes and workflows. These include lack of freedom to experiment, general aversion to risk, a punitive reaction to making mistakes, and no meaningful rewards provided when challenges are overcome. So, even if public employees see the benefits of departure from a legacy system, they may not choose the route of change, or might approach it conservatively, if a positive outcome is not guaranteed and a potential failure could be exposed by the media or by critics as yet another example of government incompetence and waste. The common denominator observed in most justifications for extending the operational models embedded in legacy systems is that change is just too risky. In fact, some organizations will make a substantial effort to prolong the life of a legacy system in various ways, with more radical changes entailing new systems implementation or development deemed too intimidating [20]. The risks associated with potential project or software failures and budget overruns during legacy software replacement are assumed to outweigh the benefits of the new systems and/or business models being introduced. Risk is usually defined as the possibility of loss expressed probabilistically [76], but often the risk discourse occurs in an ad-hoc manner [26] and no systematic or objective analysis to assess the potential for losses is actually undertaken. In such instances, the potential risks discussed by IT or business managers could be anecdotal, understated, overstated or mis-stated: what is communicated as risk might be a general feeling of discomfort, or fear of change instead. Ryan [74] confirms the prevalence of the affective heuristic, explaining that by default “humans possess a negativity bias” in which the potential for a loss is considered worse than the prospect of winning. In government organizations, the negativity bias is embedded in the institution’s policies and rules and is hence exacerbated by bureaucracy. This translates to situations in which potentially valuable information systems initiatives are stifled because their novelty or magnitude conjures up images of unknown and negative outcomes. Instead, a preservation of the status quo, or the legacy, is preferred.

We identified a dearth of academic publications dedicated explicitly to the significance of requirements practices in government agencies and their unique challenges in the context of digital government and also of research in the requirements engineering field which deals specifically with tools and methods to overcome the legacy problem. To bridge this gap we conducted a survey and a series of qualitative interviews [6] to explore the extent and dynamics of the legacy problem in government agencies, the insights from which informed our approach to developing a gamified tool to be applied in the context of legacy system replacement in government agencies. Key findings from those studies indicate that practitioners tend to use the descriptions of features of legacy systems as requirements for the new technologies that are supposed to replace them, motivated primarily by the wish to minimize the risk associated with changes to business processes. Also, two main practitioner personas appear to emerge during legacy replacement projects: those that try to preserve the operational status quo, and those in favor of introducing business process innovations. Therefore, whether a legacy replacement project will adopt the legacy system model as a set of requirements for the new system or promote new features and functionality depends on organizational dynamics along those personas, underpinned by the hierarchical position of key project actors, as well as on the quality of engagement in the requirements definition and analysis process. Moreover, adherence to traditional project management and formal requirements analysis practices (which are rarely applied: [43]) does not guarantee consideration of alternative approaches to the business model imposed by legacy systems, and project teams have little or no incentive to be creative during the legacy system replacement requirements analysis process.

Some of those findings are also echoed by Milne and Maiden’s analysis [58], which indicated that requirements engineering activities are perpetually impacted by organizational politics and power relationships. More importantly, key requirements and high-level goals are originally “constructed through a political decision process” [58], so their questioning might be construed by organizational practitioners as a subversive act per se. Therefore, the requirements engineering discipline must incorporate recognition, analysis, and sensitivity to organizational politics and conflict in order to support the elicitation, analysis and management of better requirements. However, research approaches like ethnography or social network analysis, to name just a few, are too time-consuming and even considered “intrusive” [58], so that alternative approaches from other domains must be sought to aid with the conflict and power dimensions of the legacy problem.

In summary, our previous research has uncovered that legacy replacement projects are inherently dialectical, with two general lines of disagreement prevalent throughout the requirements phase: the innovation stance and the risk-averse legacy preservation stance. Therefore, alongside negotiation, conflict resolution, and sensitivity to organizational politics, which are common to most requirements engineering endeavors, there is an additional need to enable creativity and imagination [12], while removing any inherent bias toward those two core dialectic positions.

Looking at negotiation, a number of requirements negotiation techniques have been established for the purpose of assisting discussions and mollifying conflict surrounding the selection and validation of functional requirements. Techniques such as WinWin [7] or the Requirements Negotiation Spiral Model [3] focus on identifying conflicting requirements, developing requirement acceptance criteria and requirements alternatives, and moderating disputes and deliberations. A well-known conceptual model developed to support the deliberation and reasoning process during design and requirements negotiation is the Issue-Based Information System (IBIS) authored by Kunz and Rittel [49], upon which the software platform gIBIS was developed [18]. IBIS is specifically equipped to tackle wicked issues—problems that are highly complex, intractable, often incomplete and difficult to define. It makes these issues explicit and enables participants to put forth arguments for and against certain positions. According to ([18] cited in [65]), it discourages “unconstructive rhetorical moves, such as arguments for and against certain positions,” while fostering constructive engagement focused on central issues and supporting evidence. The benefits of the software platform are that factors related to peer-pressure and “power moves” during face-to-face meetings are removed from the discussion, thus allowing participants to focus on the essentials. Other IBIS-related tools are Compendium [75] and Dialogue Mapping [17], which introduce the ability to visually represent diverging viewpoints, new ideas and decisions reached, thus mapping the interactive process of group discussion over a topic that needs action-based closure. Dialogue Mapping includes markers for questions, pros, cons, and ideas, while Compendium also introduces the concepts of notes and decisions. Therefore, such tools could be applied to the discussion of business requirements and open up the possibility for eliciting divergent ideas and attitudes toward “legacy-leaning” features. However, the proper use of these platforms and their notation elements is dependent on a skilled moderator or note-taker [17] and contingent upon sufficient engagement of all stakeholders when the meetings they document are in person or not anonymous. In such cases, specific political aspects would also be a challenge.

Therefore, despite their numerous benefits, negotiation techniques and tools alone do not inherently promote participation and engagement in requirements analysis activities, encourage creativity and imagination, or ensure that negotiations are not inherently biased toward either of the two core dialectic positions of the legacy problem. As a consequence, in our research we wish to go beyond negotiation to fill in this gap, by considering gamification and serious games as a way to both solve business problems and encourage participation and innovative thinking. Our approach is novel and a clear departure from prevalent business analysis and application development methodologies in government agencies, which tend to be standardized, well-established and highly structured, e.g., waterfall approaches to the systems development life-cycle [41, 66], capability assessments, workflow process analysis, standard systems specifications, and so forth—experimental techniques and innovative approaches remain rare.

The development and use of games and game-like simulation for learning, collaboration, knowledge-sharing and training is a relatively recent trend. This movement has been referred to as “serious games” or “serious gaming” [15]. In government, while occasionally attempted [13, 21], game utilization is still rare, despite evidence of the benefits of games and simulations for addressing a wide range of problems in various domains.

By establishing an environment that is “quasi-realistic” [47] games allow for actual business situations to be simulated. The advantages of using simulations have been highlighted by researchers, who have argued that participants in a simulation may be more proactive and experimental because the simulated context provides a “safe” space to try novel approaches [55, 65]. Safety in this context has dual significance: both as safety to err, but also in the sense of freedom from organizational or inter-personal pressures. Specifically, Ocker [65] highlights the benefits of anonymity in electronic brainstorming, resulting in a non-judgmental environment, conducive to risk-taking. A related study [22], examining the effects of communications technologies on requirements negotiations, revealed that despite the extolled advantages of the rich medium of in-person interaction, many meeting participants focused on the tasks at hand better when using technologies such as video-conferencing, and other participants were not physically in the room. This is due to participants’ perceiving their partners as “less emotional” when they are at interacting from a remote location. In the study of serious games, anonymity has been linked to increasing the inclusivity of a game and enhancing its educational potential [51]. The relationship is explained in part by the relief from performance pressure brought about by withholding one’s identity. The focus is transferred from the individual performing game actions to the game actions themselves. Also, the respondents to a survey we conducted, reported in [6], indicated that the position of key project actors in the organization’s hierarchy impacted the quality of participation in requirements activities, with project participants being less likely to be critical of the direction of a legacy system replacement project during meetings with executives and senior management present. Therefore, we have deemed anonymity a potentially critical element common in the design of both serious and entertainment-focused games, which is worth considering in the legacy problem solution space.

Another core characteristic of games is the element of competition, or a dialectical dynamic, where the instinct to win or out-do an opponent is an accepted and benign form of behavior [55]. In contrast, in other contexts, disagreement, aggression and similar conduct may be discouraged and considered unprofessional.

The affective components of a game provide additional value to the exploration of the organizational dynamics we are interested in. Systems implementation activities do involve emotional aspects [61], and ascribing risk to certain requirements specifications for application development is certainly rooted in affect [76]. Maiden et al. [54] also note the importance of letting participants “let off steam” and have “shouting sessions” prior to engaging in creative brainstorming, as this removes inhibitions and accumulated frustrations, enables teamwork and an unencumbered perspective on the business problems discussed.

In summary, we argue that participative safety, competitive drive, emotional impact, and stimulation of creative solution development are key advantages of game-like methods pertinent to the legacy problem in digital government and to requirements engineering activities for legacy replacement projects in the public sector. The emphasis on competition and argument in a game setting could pair well with the nature of the legacy problem, as one involving a juxtaposition of conservatism and business transformation. Hence, a dialectically designed game should enable opposing positions to be made evident/explicit as part of the goal of the game. Also, by incorporating game actions, rules and outcomes that express, or result directly from affect, feelings in requirements activities and feelings during exercises in innovation would be addressed explicitly.

There are only a few examples of application of games to the requirements engineering domain, which are aimed at learning, creativity and problem-solving. Games, such as Prune the Tree—for the creation of a product roadmap through requirements development, and Buy a Feature—for the prioritization of requirements in product releases, described in detail by Ghanbari et al. [32], have demonstrated some success in fostering innovation and collaboration in distributed teams, and in improving the quantity and quantity of elicited software requirements. The Refine platform has focused on the benefits of crowdsourcing for requirements gathering [77] and the GREM model explores gamifying requirements elicitation in Agile processes [53]. In a similar vein, our research aims to evaluate the potential of games to augment requirements activities, but it addresses an issue which has hitherto been explored only sparingly: the development of requirements engineering tools specifically for legacy systems replacement efforts, and the application of game-driven incentives for stimulation of engagement and creativity in these efforts. We apply gamification to the requirements analysis phase and attempt to promote discussion along the themes of innovation and status quo preservation, thus establishing a technique specific to legacy system replacement projects in an attempt to address this research gap.

3 Methodological approach

Our research objective is to evaluate the utility of a game enabling the structured discussion of requirements along the themes of risk aversion (legacy preservation) and innovation, to foster participation and creativity in business (functional) requirements analysis during legacy system replacement projects. Through this evaluation we seek to answer the following questions:

  1. 1.

    To which extent will a discussion organized specifically along the lines of explicit challenges to requirements lead practitioners to examine them more critically and to subject the legacy business model to explicit questioning?

  2. 2.

    How can game mechanics impact participants’ creativity and engagement in such discussions?

Although relatively new, games have been employed as a research mechanism in psychological studies where emotions, cognitive processes, behavioral triggers and stimuli, and individual or group conduct are being observed [42]. While such studies often take the form of an experiment with an underlying causal model which must be tested—for example the study of requirements gamification in an Agile project followed a controlled experimentation method and determined that gamification does improve engagement and creativity in requirements elicitation [53], our approach is that of an exploratory examination of how a game environment impacts practitioner participation in the deliberation of functional requirements. Therefore, we have adopted what is referred to by Oates [64] as “design and creation research” or the offering of a working system that instantiates models, constructs, or methods, as a contribution to knowledge. This type of research corresponds to what Nunamaker et al. [63] classify as formulative and developmental research, or the creation of an artefact used to test underlying concepts or models—in this case the gamification of a requirements argumentation and deliberation model for legacy systems replacement. This research assumes subsequent epistemological cycles of design and evaluation: in the following, we summarize both design and evaluation research phases.

3.1 Game design

The non-domain specific Triadic Game Design framework [38] supported our game design process, and its principles were used as high-level design goals. It distinguishes between three main areas: ontological, semiotic and ludic. The ontological aspects of a game encompass the underlying model of the real-world domain the game is based on. The semiotic design incorporates the elements and approaches that make the game meaningful and generate lessons and useful information that can be transferred to the “real world.” The ludic aspects refer to the techniques by which a game is made interactive, challenging, fun and immersive. Well-designed games achieve a balance between these elements: without a strong ontological base, a game would have limited connection to the real world; without the semiotic emphasis, the game would be mostly fun, but not enable knowledge transfer; and without the ludic elements, the game would be merely a training or simulation tool [56].

With the Triadic Game Design principles in mind, a basic mapping between game elements, requirements engineering concepts and organizational goals was developed. It was essential to introduce game elements purposefully and to associate them with learning or pragmatic outcomes. This mapping helped in the evaluation of the utility of individual ludic concepts first at the design phase of the game, and next at the stage of assessing a functional game prototype. The mapping is presented in Table 2 at the end of this section, after the description of the game’s rules, roles and flow.

3.1.1 Game conceptual model

In our attempt to address the legacy problem, we integrated the themes of legacy preservation and innovation within the inquiry-driven process defined by Colin Potts et al. [68], which emphasizes the act of challenging and iterative discussion of existing requirement formulations. When requirements are derived from a legacy system, it is important to specifically analyze their linkage and similarities to the legacy system features, and seek justifications for their mimicking in the new system. Therefore, an inquiry process wherein a requirement is subjected to a deliberate challenge of its source (the legacy system) and rationale (e.g., the minimization of risk and change) appears to be a suitable approach to tackle the legacy problem. The Potts et al., [68] ’s Inquiry Cycle Model (represented in Fig. 1) offers concepts to support such an analysis and argumentation process. It defines an inquiry-driven cycle, where the concept of challenge involves scrutinizing a requirement: one must answer questions regarding the need for the requirement in its current form and the reasoning behind it must be made explicit. This forms the basis of discussion, after which a decision can be reached as to whether and how the requirement should be modified. The decision to gamify the inquiry cycle was driven by the need to encourage non-conflict based competition between the legacy and innovation perspectives and to ensure that participants respond actively to the challenges by producing alternative requirements formulations. And since the findings from our survey [6] highlighted a disenchantment with traditional methods and forms of discussion prevalent in the workplace, we decided to develop and put to the test a concept and tool that is in itself novel and innovative.

Fig. 1
figure 1

(re-drawn based on an image from https://www.ics.uci.edu/)

The Potts et al.’s inquiry cycle model

Game concepts such as rules and roles were defined to encourage competition, challenges and innovative thinking, and mitigate the risk that the discussion may not follow the prescribed themes or reach meaningful outcomes. We followed Maiden et al.’s lead in introducing roleplay to induce creativity during requirements discussions. Maiden et al. utilize a set of roles defined originally by Von Oech [80], to enable practitioners to channel their creative energies while performing requirements engineering activities. Our game purpose is analogous in encouraging participants to think beyond the status quo, by exploring alternative perspectives, but our game roles correspond to actor stereotypes commonly manifested in government bureaucracies.

Our game was meant to focus the participants’ discussion around the legacy and innovation viewpoints specifically, and in order to decide on the ultimate mechanism to achieve such focus, we put the game through several design reviews, and playtest iterations with various practitioner groups. Eventually, after establishing the game model, we implemented it on a popular project management and issue tracking software platform—JIRA developed by Atlassian. The game was named RE-PROVO, from the verb “provo” which means test or attempt in the international language Esperanto, and “RE” for both a nod to “Requirements Engineering” and an indication of repetitive action.

3.1.2 Game elements: roles, rules, and flow

The basic elements and principles of RE-PROVO gameplay are as follows. The business requirements, related to a government legacy system to be replaced are entered one by one as separate discussion threads in an online repository. Each player is assigned to be either a “Heritage Keeper” or an “Innovator.”All players need to review the requirements. Those in the role of Heritage Keeper must issue a challenge to the requirements they think depart too much from the operational status quo and/or are too risky for implementation. Those in the role of Innovator must issue challenges to the requirements which too faithfully reproduce legacy workflows and features, and thus do not take advantage of digital innovation to streamline operations.

An example of the screen for an individual requirement is provided in Fig. 2.

Fig. 2
figure 2

Requirement screen in RE-PROVO

When a challenge is issued, the player must state the reasons the requirement is being critiqued, either by a free-form comment or by selecting one out of a pre-defined checklist of issues (see Fig. 3). The checklist was designed based on secondary research [2, 27, 28, 30] and the outcome of a survey [6] which asked practitioners to identify key issues that can potentially occur during legacy system replacement projects. Two lists were then produced, based on the heritage versus innovation viewpoints (see Table 1).

Fig. 3
figure 3

Innovation challenge screen

Table 1 Innovation and heritage challenge justifications

Re-using the categories from the survey was intended to assist players with the formulation of the challenges and as guidance on what type of issues one can look for in a requirement. An example of how they are used in the game interface is provided in Fig. 3.

After a requirement challenge, any player can respond to the challenge by proposing a modification to the requirement, i.e., by “morphing” it in way that addresses the issues put forth in the challenge. Morphings can be challenged too, thus potentially producing a chain of modified requirements from the initial requirement. At the end of an agreed upon timeframe (e.g., two weeks), the players vote on all the proposed requirement morphings, and those with the most votes become the winning versions of the requirements.

A summary view illustrating the requirement morphing cycle in RE-PROVO is presented in Fig. 4.

Fig. 4
figure 4

RE-PROVO requirement morphing cycle

In attempting to mimic the Potts et al.’s Inquiry Cycle, the game ultimate goal is to establish at least one discussion iteration for each requirement, i.e., to ensure that a morphing cycle has commenced with a challenge and is “closed” with a proposed morphing, or answer. An example from one of our practitioners evaluation session is provided in Fig. 5, where the Requirement “Crime Stats: Online Access” (with ID LEIS-6) has been challenged once with a Heritage challenge (LEIS-13) and once with an Innovation challenge (LEIS-12). Two separate morphings based on the Heritage Challenge were then produced—LEIS-19 and LEIS-17.

Fig. 5
figure 5

Requirement challenge and morphing activity

Players accumulate points and badges for their activity. This practice is a standard mechanism in gamification to reward players for both overall engagement and for specific behaviors [23, 29, 81]. Points are awarded when a player issues a challenge, creates a requirement morphing, posts a comment, creates a new requirement, ranks or votes on any object. Badges are awarded either for consistent actions (e.g., for creating mostly morphings, or for numerous comments), or when specific point levels are reached. The points and badges are listed in a section of RE-PROVO visually represented using a pirate theme, default theme in the gamification plugin utilized for the JIRA platform: the players are pirate characters on a mission (see Fig. 6).

Fig. 6
figure 6

Pirate character screen

Table 2 maps the main RE-PROVO game elements to requirements engineering concepts and organizational outcomes based on the Triadic Game Design framework. The ontological concepts are borrowed from the official curriculum of the International Requirement Engineering Board (IREB) as documented in Requirements Engineering Fundamentals [67]. The intent of the mapping is to demonstrate how a game component (in the “Ludic Element” column) approximates or simulates a requirements engineering concept (under “Ontological Elements”), and to highlight an activity or skill which could potentially be employed or affected as a result of engaging in the roleplay, game moves or actions, i.e., this is the area where a “meaningful effect beyond the game experience can be intentionally achieved” [48].

Table 2 Triadic game design element mapping

3.2 Game evaluation

RE-PROVO was made available online to teams of practitioners from two separate local government organizations with either ongoing or past legacy replacement projects. The two evaluations were designed to assess whether RE-PROVO could successfully enable a structured discussion of requirements along the themes of risk aversion (legacy preservation) and innovation, and foster creativity in business (functional) requirements analysis and development during legacy system replacement projects. Since the context of our research is government organizations, we selected practitioners from two public sector agencies who had been, or were involved, at the time of the evaluation, in projects related to the replacement of legacy technologies. The aim was to carry out two evaluations in different organizations so that the results could be compared—any consistent findings across both groups would be of greater significance considering they emerged from two appreciably different agency environments. In addition, it was important to achieve fidelity to the actual work environment of the participants, and this was accomplished by enabling them to play the game remotely and asynchronously from their own working environment, rather than in a controlled setting with an observer present.

The first group of practitioners was from a public library institution, hence employees of a public sector organization. With libraries frequently operating large-scale legacy systems which have reached their end-of-life, replacement projects are often underway. In this first evaluation of RE-PROVO, requirements for a new Integrated Library System replacing legacy cataloguing and patron management software were the subject of discussion. The second evaluation was conducted with employees from a different government agency and a substantially different domain—public safety and law enforcement. The requirements included in the game were from applications related to crime analytics, evidence management, incident records, and frequent offender lists. This second evaluation incorporated lessons learned from the first evaluation session. Details of both evaluations are provided in the next section.

As the game evaluations were being planned and prepared, we needed to address the question of whether requirements from real projects carried out at the participating organizations, or requirements associated with hypothetical IT legacy systems should be used in the game. This question echoes the concept of “task fidelity” in experimental research, which posits that an evaluation or an experimental setting must be as realistic as possible for its findings to be of utility [9]. In a hypothetical scenario, an assumption could be made that participants might be more at ease when issuing challenges and critiquing the requirements, because this would not imply questioning actual system setup or management decisions at their organization. However, a potential drawback is that the players may not feel they have a sufficient understanding of the hypothetical system. On the other hand, in a real project scenario, the participating organization may not be willing/able to share project information, or the researcher administering the game may not be able to properly re-formulate, group or edit the requirements so they can be used in the game, due to lack of familiarity with the domain, the system or the organizational context. In the end, we tested both approaches. In particular, for the first evaluation with public library practitioners, the chosen requirements were for a hypothetical Integrated Library System, while for the second evaluation with the law enforcement agency, the requirements were from actual agency projects. In both cases, participants were not equally familiar or involved in the chosen projects.

For the purpose of the evaluations, the JIRA software was licensed for ten users and installed on a self-hosted server by the researcher. A custom domain—www.egov-requirements.org, was used to access the RE-PROVO game. The players logged onto the system under fictitious usernames pre-defined by the game administrator to anonymize their identity to the other players and were provided an initial password which they could later change.

We collected game metrics, obtained automatically during gameplay (e.g., number of logins, number of challenges issued, number of morphings created, number of comments, number of votes). These metrics aimed to gauge participants’ level of activity and therefore assess the game’s ability to engage. As a second evaluation step, we obtained qualitative feedback through semi-structured follow-up interviews with participants, which addressed questions related to the utility of the two themes—heritage preservation and innovation, and of the various game elements to stimulate participation and creativity.

To supplement the practitioner evaluations, we also evaluated RE-PROVO using the Serious Games Design Assessment (SGDA) framework developed by Mitgutsch and Alvarado [59], which regards serious games as purpose-based games where entertainment is not the end goal, and where educational or business objectives need to be ostensibly incorporated in all game elements. SGDA includes the evaluation of various aspects of a game—content/information, framing, mechanics (rewards, rules etc.), fiction (narrative/roles) and aesthetic/graphics, and is commonly used to assess if a game is properly designed and could produce knowledge, behaviors and attitudes that are transferable outside its ludic context into the workplace/real world. We used the SGDA framework to examine how the miscellaneous game design features performed holistically, that is whether they were effective when working as a cohesive set.

4 Procedures and results

4.1 Evaluation session with Broward county library

The first evaluation was carried out with employees from the Broward County Library (BCL), a public institution funded by Broward County in Florida, the United States. The recruited participants included nine individuals at different seniority levels in the organization, ranging from interns to heads of departments.

The requirements for the game were not based on an existing project at BCL because at the time of the evaluation there was no active system replacement effort that could be co-opted as the basis for a RE-PROVO session. However, the organization was planning to replace its Information Library System (ILS) in the near future, and with that in mind we considered a collection of surveys submitted by libraries across the world regarding their consolidated information systems, where specific issues and experiences related to legacy replacement were shared, together with the transition processes from one type of software to a newer ILS [11].Additionally, academic case studies on library software implementation [36] were also considered. From those sources, requirements were formulated for the purpose of the game, which described authentic challenges specific to the domain of library management and library information systems, hence were likely to be familiar to the participants in the game evaluation. Requirements were listed alongside a short problem statement and organizational/business context description, aimed at supporting the understanding of the requirements and minimizing any ambiguous interpretations by the players. Roles were assigned to the players randomly.

Communication with the majority of the participants was primarily by electronic means. No group meetings or orientation sessions were organized due to the time constraints the practitioners had as working professionals.

Instead, participants were emailed information about the research, gameplay instructions, their role assignment, anonymized user name and password.

The game session was set to take place over a two-week period, but due to low activity levels, the gameplay period was subsequently extended by a week. During this timeframe, the participants would log into the game whenever they decided to. The first author, acting as game administrator, was available by email or phone if assistance was needed.

4.1.1 Participants feedback

Out of the nine players, two never logged in or participated in the game itself. The participants logged in a total of 32 times, with most participants logging in three to four times, and two being significantly more active. Five challenges were issued, but no morphings were generated, and no requirements or challenges were ranked using the star ranking feature. A call to vote on requirement versions was not issued, because there were no morphings available to be voted on.

The low level of participation in the game was initially attributed to low interest in the research project, or to the participants’ lack of spare time to conduct the game evaluation. However, post-game face-to-face interviews were performed with five of the practitioners and a different assessment emerged. The interview questions are listed in Table 3.

Table 3 Participant questions in BCL evaluation session

Here is a summary of the responses.


Engagement and level of participation

The interface was overloaded and confusing to most participants. They felt it was busy and they did not know where to start—as one participant noted “I could see where to read things, but not where to react to [them].” Another stated: “Components everywhere [that] didn’t relate to each other.” The unfamiliar layout left them confused and unable to take actions within the game: “It was busier than I thought it would be, [there were] a lot of places to look.” This was the primary reason for their lack of activity in the game. Additionally, they felt the instructions they were provided with were too lengthy and too extensive to peruse: one interviewee in particular commented “I tend to be a direction reader—but they [the directions] were long though.” Some felt they should have been "hand-held" more, and an in-person session would have substantially improved their understanding of how the game is organized and should be navigated—in the words of a participant: “it would have been better if you met with us.”

Not many players visited the tab with the points and badges. It was noted that in order for the points and rewards to have a tangible influence, they must be immediately visible. One player suggested: “People didn’t see it—it wasn’t obvious. I would change the opening screen to show the point total for me versus someone else.”

To the question asking why there were not any requirements morphings generated during the game, they overwhelmingly stated that it wasn’t clear how to do so, or in one case that the formulations of the requirements sounded too “authoritative” to be questioned or modified.


Innovation and heritage preservation themes

The overall concept of a structured discussion with heritage preservation and innovation roles and their respective challenge actions was well-received. The players could envision how with better visual layout, RE-PROVO, would be very useful for their organization, since they acknowledged that individuals typically do gravitate toward either an innovative or a risk-averse persona. According to one participant, “it makes a lot of sense from a theoretical perspective, because people tend to be divided along those lines.”

A number of participants also felt they were not at ease with the concepts surrounding Integrated Library Systems. Even though the features listed were fairly generic, if some of the participants had not actively used such technology specifically, they were hesitant on issuing challenges and suggesting requirements modification for it. The players who were interns in BCL were particularly reluctant to make suggestions given their lack of experience with library operational processes, or in the words of a participant: “[I felt] nervous—because we did not know a lot about ILS—the description was good but I felt uncomfortable.”


Induction process and user interface issues

A key insight from this game session was the importance of having a proper participants’ induction, with some participants expressing a preference for face-to-face communication. Induction was seen as essential to ensure proper understanding of the purpose of the evaluation and of how to play the game. Even though email was their preferred mode of communication initially, as it was seen as a time-saver, it turned out to be insufficient as a single mode of communication.

Furthermore, from the post-game interviews, it became evident that the morphing and challenge dialogue menu did not encourage players to type in their own critiques or new requirement formulations. As a result, a modification was made to the RE-PROVO interface to prompt users specifically to define challenges in their own words, rather than just use problem categories from the pre-defined checklist. Similarly for morphings, the text of the initial requirement was not repeated in the morphing dialogue to encourage more creative reformulations.

By and large, the interviews with participants from BCL helped shape the communication and game instructions materials which were developed for the subsequent evaluation, where clarifications on the challenge and morphing concepts were included along with a statement that all constructive comments were safe to make in the ensuing discussion during the game: in-depth technical or business knowledge was not needed in order to pose challenges or suggest a reframing of a requirement. Players were also assured they could state their own assumptions about business processes. In other words, participants needed to be encouraged to be creative and be assured that there are no right or wrong answers.

4.2 South Florida Police Department

The second evaluation was carried out with non-sworn (civilian) employees from a Police Department (PD) in South Florida, the United States. The recruited participants included six individuals working in different units of PD: crime analysis, information technology services and the field technology team. This second evaluation had the same goals as the first session, but introduced minor modifications to the participant induction process, as a result of the feedback from the session with the BCL team.

The requirements for the game were derived from ongoing projects at PD which involved the replacement of either a legacy application or a legacy operational process with new technology. The majority of the participants had first-hand knowledge of these projects, but even those who were not directly involved in them had a basic understanding of the issues with the legacy software and the underlying business processes which were referenced. As in the BCL evaluation, the requirements were listed alongside with a short problem description, which was intended to limit ambiguity of interpretation by the players.

Participants were emailed information about the research and instructions on how to play the game, but communication for the purpose of coordinating the game session was done in person by the first author, who also provided participants with a hard-copy “cheat-sheet” to guide them through common game actions and rules, and with a personalized hard-copy handout of the player’s role and their anonymized user name and password. The game session was initially set to take two weeks, but due to a slower than expected start in the first week the gameplay period was extended by a third week.

4.2.1 Participants feedback

The participants in the PD evaluation were generally more engaged in the game compared to the BCL participants. All players logged in several times and participated in the game by performing different actions. They accessed the game a total of 43 times, and nine challenges and three morphings were issued. The challenges and morphings, however, did not necessarily conform to the intended format: some of the critiques were generic, rather than specifically formulated to point out the adherence of a requirement to the legacy model, or a risky departure from it. Only one challenge and one morphing were ranked. A call to vote was issued, even though there was only a small number of morphings created and available to be voted on. During the session, one player remarked that they could tell the identities of the other players by hovering over a specific section of their profiles, and viewing the email addresses displayed as an alt-tag. After this was revealed, measures were taken to properly anonymize the players and the email addresses were changed to generic addresses which did not disclose the users’ identities.

Five of the six participants discussed their RE-PROVO experience with the first author. The Police Department participants were asked the questions summarized in Table 4.

Table 4 Participant questions in PD evaluation session

All participants in the evaluation stated that playing the game was a positive experience, and they thought RE-PROVO was a useful tool to gather feedback and generate discussion—a “project marketplace” of sorts, as one person suggested. Another remarked: “This tool could assist in starting a discussion that would allow different parties to point out issues/concerns related to their specific divisions or process flow that the other part may not have been aware of/realized.”


Anonymity

The online/anonymous aspect of the game was definitely ranked highly, both in terms of convenience and also for its potential to generate honest arguments: “Anonymous was a good touch to the game. I find doing it that way keeps you guessing how things would play out” or as another participant commented: “anonymity tends to create a less filtered environment, which would be more beneficial in instances where the objective is to create an honest dialogue of current processes/programs involving various employment levels and/or divisions.” Participants also appreciated the element of competition in the tool: “[it] brings out the competitive side in you.”


Engagement and level of participation

The user interface of the system was deemed confusing by most as it was for the BCL evaluation, and participants expressed difficulty navigating it. A player suggested the need for a “more intuitive user interface, […] remove the hmmm how do I navigate around here. You should want to expend brainpower in the requirements and the game, not on how to access information or use the system.” For instance, the unified listing of all requirements was deemed hard to locate, and an overview of all actions conducted by other players was not readily visible after log-in. However, another player felt that “[the GUI] was pretty straightforward and navigation was user-friendly.”

Several participants noted that it there would be higher levels of participation if more requirements were available, because they did not feel at ease commenting and taking action on the requirements from some projects they were not deeply familiar with. The following related comments were made: “some of the topics may have [required] more than a tech understanding of the process, and perhaps the reasoning behind the current process was unknown[…] it may have been more [difficult to] morph the item,” and also: “[players would have been more active] with different scenarios. These were more geared toward law enforcement that other users may not be as familiar with.” These remarks indicated that even though participants were told their challenges and morphings can be somewhat hypothetical, and do not have to be entirely realistic as far as technology or business processes are concerned, they still made efforts to be factual and treat the game as a real requirements discussion.

The gamification elements such as points and badges were important to most but not all players, but even then, they were of secondary interest. As the point feedback was not immediate and the pirate character theme was not directly embedded in the individual requirements screens, the players did not visit the Points and Badges tab very often and did not fully appreciate the game elements. No participant kept up their activity just to accumulate points or earn a badge (although one player asked about the conditions to “level up”), which alludes to the importance of intrinsic motivation—in this case to generate a meaningful critique, or propose a good solution to a problem.


Innovation and heritage preservation themes

In terms of the heritage preservation and innovation themes, all participants expressed the view that having the challenge actions available for their respective roles does help structure and focus the requirements discussion and requirements analysis effort around the topic of whether legacy features should be replicated. Some players felt they naturally gravitate toward an opposite role than the one they were assigned, but also felt it presented a good opportunity to explore a different perspective. One player remarked that generally IT staff gravitate toward an innovator persona: “IT [people] are mostly innovators because there is always new technology we want to try. It is the business side that often wants to preserve things the way they are.” This points to the need to determine which role assignment method is most suitable for generating more dynamic gameplay in RE-PROVO—a random one which enables players to act differently from what their natural predisposition dictates, or one that matches their inclinations and allows them to make more authentic comments and critiques.

A succinct summary of the results from both evaluations is provided in Table 5.

Table 5 Summary of results from BCL and PD evaluations

5 Discussion

In this section, we discuss the extent to which the research has addressed the two key research questions of Sect. 3, together with strengths and limitations of the approach.

With reference to the Triadic Game Design framework, the discussion in Sect. 5.1, based on the participants’ feedback, focuses on the semiotic aspects of the game. This is supplemented in the following two subsections by a discussion of the conceptual design of the game through the prism of two different perspectives. The first, in Sect. 5.2, is the Serious Games Assessment (SGDA) framework, specifically designed to probe the linkages between a serious game’s purpose and its design elements, hence focusing on the ludic aspects of the game. The second, in Sect. 5.3, considers a series of success factors established for requirements engineering practices in complex government IT projects, such as those concerned with legacy system replacement, based on action design research conducted by Klier et al. [46]: this addresses the ontological dimension of the game.

5.1 Assessing RE-PROVO based on the participants evaluation sessions: semiotic aspects

Taking part in the game evaluation was largely an interesting and rewarding experience for all the participants due to the novelty of the tool. While participants were introduced to RE-PROVO as a game, most of them treated it, in effect, as a general discussion or message board, and appreciated being able to discuss and argue work-related topics online.

In evaluating the potential for increased engagement, the presence of gamification features such as points, badges and a leaderboard was of particular interest, but according to the interview responses, the participants did not consider those features to be of primary importance during the game. This is consistent with observations of serious games (and also entertainment games) where players tend to be more intrinsically motivated, rather than driven solely by the prospect of external rewards [72]. However, the interviews revealed that these game features created a predisposition toward competitiveness and introduced an element of entertainment. Also, a more prominent presence of activity recognition in the user interface of the game (e.g., through pop-ups or notifications about points gained) would have stimulated engagement further, according to some interviewees. Therefore, we received corroboration of the value of the concept of these gamification techniques, but not of the specific implementation we offered through RE-PROVO, which involved presenting them in a separate section of the user interface, making them not sufficiently prominent.

In regards to the game’s objective to encourage creativity, the outcomes are largely inconclusive since the game produced little alternative formulations of requirements (i.e., morphings) for participants to assess. However, the feedback from the interviews suggested that the game is conducive to thinking which extends beyond the existing requirement formulations, as it prompts players to not only comment or validate a requirement, but also to produce alternatives, and because it specifically engages the players in an innovation perspective and rewards them for the development of new formulations. Furthermore, the feature of anonymity was said to encourage exploration of ideas which might otherwise be considered too unusual and risky.

While increased engagement and creativity are a likely result of performing requirement analysis in a game setting such as RE-PROVO, there are both conceptual and implementation details in the game which may impact them adversely. For instance, requirement formulation matters significantly to the level of engagement: it is important to specify the requirements in a way that makes them both open for discussion and gives sufficient context for their analysis. In the case of the requirements used with the County Library participants, the project background information provided for each requirement was insufficient to trigger ideas for proper challenges and morphings. On the other hand, in the session with the Police Department the project background details were indeed sufficient for a higher level discussion, but not one that drilled down concretely into the legacy replication aspect of the projects.

An additional factor that possibly affected the level of engagement and led to a paucity of challenges and morphings in both sessions is that structured argumentation is typically more difficult and restrictive, even if deemed suitable in the context of the legacy problem. While there are studies that suggest that imposing constraints in the discussion format and providing limited options for action may lead to increased creativity [64], others indicate that discussions may be impeded if they are overly structured within a tool [17]. Participants in the RE-PROVO evaluations did not indicate that they felt restricted by the themes of innovation and heritage preservation: on the contrary they felt they were useful, in particular as the related roles in the game were assigned to them at random. The roles in RE-PROVO were clear and relatable to the players, because they matched existing organizational stereotypes. The challenge concept was generally understood as well; however, the critiques posed to the requirements were not always constructed within the particular heritage or innovation delineation. This was mostly due to the requirements themselves—participants did not feel confident they had sufficient background knowledge to discuss them, even after they were encouraged to make arguments that were somewhat hypothetical for purposes of the gameplay, so participants tended to be non-committal: they would critique, but in more general terms, and would not suggest a requirement reformulation with confidence.

Another important aspect to consider is how RE-PROVO was introduced within the two organizations which took part of the study. The lessons learned from the two different induction methods are relevant not just from a research methodology perspective, but potentially for the introduction of any new practice, tool or technique for use or evaluation by practitioners. The County Library sessions represented a more hands-off, autonomous approach in which participants received written guides and supporting documentation as well as electronic communications pertaining to the game, but no direct, in-person support. This proved to be insufficient, mainly because of a missed opportunity to stress the importance of approaching the game without any fear of “breaking the tool,” with the goal of testing its limitations and freely experimenting with the game’s features. Participants should have also been reassured that because the requirements presented in the game were hypothetical, their challenges and morphings could be similarly fictitious and that this would not compromise the game’s flow, and consequently the study. In the second evaluation with the Police Department, direct communication before gameplay ensured not only a proper understanding of the context and purpose of the research, and the nature of the game itself, but also created a sense of ease in participation in the study. Anonymity as a game feature, despite the positive comments it drew, in and of itself was not enough to promote uninhibited, active participation.

5.2 Assessing RE-PROVO using the SGDA evaluation framework: ludic aspects

According to the Serious Games Design Assessment (SGDA) [59] in order to be effective a game must demonstrate cohesiveness between its elements and their alignment with its overall educational and functional purpose. The framework employs six criteria: content and information, framing, game mechanics, fiction and narrative, aesthetics and graphics, coherence and cohesiveness.

In regards to the content criterion, the data included in RE-PROVO were requirements from legacy replacement projects. The relevance to the purpose of the game was therefore high and the content well-suited. It must be noted that the practitioner evaluations of RE-PROVO highlighted the importance of how the requirements are written and presented. Some pertinent guidelines emerging from the study include: that the requirements be defined as neutrally as possible in relation to the themes of legacy and innovation; that some context as to the problem space be provided so practitioners do not feel disadvantaged due to lack of background knowledge; that this background description should not incorporate potential alternatives to the requirements (that is what the players should generate); that the terminology used in the requirements should not be too technical or utilize business jargon excessively, so that all players can understand them, etc.

Framing, the next criterion, refers to ensuring the match between the participants’ play literacy, i.e., their experience level with the game technology and with gaming concepts. Framing in the case of RE-PROVO was essentially left to the supplemental “How-To” materials and the instructional documentation, with no framing mechanisms embedded in the game itself in the form of prompts, help pop-ups, or automatic step-by-step walkthrough. For the purpose of the evaluation of the game concepts related to the Potts et al.’s Inquiry Cycle and the game roles, this type of framing was not a substantial problem, but in a production-ready game it would be considered a deficiency.

In terms of game mechanics (i.e., issuing challenges, morphing, voting and assigning points to these actions) the game is straightforward, but not particularly exciting. In future iterations, these game actions should ideally be supplemented with better visuals or more expressive metaphors. As far as fiction and narrative are concerned, the only concepts representative of this element were the innovator and heritage keeper roles, and these were not incorporated as part of a story. The pirate theme of the points section of the game was not narratively tied to the roles either. This lack of attention to the fictional story component in RE-PROVO was due to an attempt to make the game domain agnostic (a single narrative relatable to all contexts would have been difficult to develop) and the technology constraints (it was not feasible to embed the narrative functionally or graphically in JIRA). It is possible that the presence of a narrative would have made RE-PROVO more engaging, but this would have to be confirmed through more gameplay sessions.

Aesthetics and graphics, and the GUI layout, were the biggest weakness of RE-PROVO as they reduced the usability of the software. As there was no overarching narrative theme, there were no corresponding graphics to be incorporated throughout the screens, and more importantly the platform used, JIRA, being an issue tracking and project management system, only offered minimal options for aesthetic improvement.

The final SGDA criterion is the cohesiveness and coherence of the game in relation to the game’s overall purpose. If we regard RE-PROVO as a serious game, the conclusion is undoubtedly that the inclusion of narrative components would have strengthened the linkage between all its elements. However, the lack of bridging narrative alone does not imply that RE-PROVO cannot be an effective tool for practitioners. In the discussion that follows, after the application of digital government project management assessment criteria, we demonstrate how RE-PROVO can accomplish important goals from a requirements engineering perspective.

5.3 Assessing RE-PROVO as a requirements tool: ontological aspects

In their analysis of information system requirements processes in the public sector, Klier et al. [46] establish four success factors for requirements engineering processes applied to complex government projects: communication, decision-making transparency, multi-stakeholder collaboration and the interleaving of the requirements process with the organization’s IT governance model. RE-PROVO enables structured communication between multiple stakeholders through its challenge and morphing, voting and commenting features. The decision-making transparency requirement is fulfilled by the visibility of players’ votes and the visualization of the discussion threads. Although the players are anonymous, the discussion around each requirement, which includes objections raised and justifications provided, can be easily perused. The final success factor—interleaving with the IT governance model of the agency—could be satisfied if the game is co-designed by practitioners from the organization employing it. Practitioners could customise the game’s rules, roles, rewards and incentives. That way IT governance process elements unique to the organization could be incorporated in the game. In fact, this final factor also relates to the question of the extent RE-PROVO is a simulation game or whether any suggestions made in the course of the game will be actually considered for implementation (as an evaluation participant from the Police Department specifically inquired). The answer to this question will depend on the organization employing the game and its willingness to experiment with game-based tools by incorporating them into its decision-making process.

5.4 Assessment summary

The main purpose of our research was to conduct an exploration focused on structuring discussions along core themes common to legacy projects, and to identify game mechanisms which boost creativity and engagement in these discussions. According to the analysis of the feedback from participants in the RE-PROVO evaluations aligning practitioner requirements deliberations with the concepts of heritage preservation and innovation is a direct and effective way of targeting the problem of “reflexive” and un-examined legacy replication. It allows requirements to be analyzed as either carriers of organizational legacy models, or as drivers of business process change and for their potential risks or benefits to be explicitly weighed. Our research participant statements provide a clear indication that the two opposing themes are meaningful and also true to organizational personas emerging regularly in practice (p. 15, 18). Additionally, the way roles are assigned in RE-PROVO (i.e., randomly instead of by choice or natural alignment) may facilitate the process of requirements negotiation in a legacy system replacement context as it can potentially immerse practitioners in perspectives different than their own. Traditional project practices are less suitable for encouraging such perspective-taking.

In terms of creativity and engagement, our game evaluations highlighted the primary importance of anonymity and competition. Both creativity and engagement are boosted when (1) organizational hierarchies do not impede the discussion (p. 17); (2) ideas are presented in the context of a competition (p. 17) and (3) immediate feedback about participant actions is available (p. 14); this feedback includes up-to-date points, badges or other rewards, or information about the reactions of others to player actions. It is these particular game mechanics which demonstrated the greatest promise to bolster the engagement and creativity necessary for confronting the legacy problem.

5.5 Limitations

Undeniably, the evaluation of any software prototype has limited generalisability. Although our goal was primarily to evaluate if a game can, in principle, be useful in the requirements analysis process in government agencies performing legacy system replacement, there was no way of exploring the flow of the requirement morphing cycle and the anonymous challenge-based interaction between participants effectively other than through a high-fidelity online prototype. Such prototyping, however, has been known to have disadvantages for the identification and analysis of conceptual approaches [73]. This is because content/concept cannot be easily divorced from appearance/design. The very technical elements that made such an evaluation possible also got in the way by diverting attention from the conceptual structure of the game: the graphical user interface elements often confused the participants and became of primary interest to the players.

Although concept evaluation through a prototype is definitely challenging, the assessment of the concept can be separated from technical design issues with appropriate post-evaluation feedback gathering and analysis. In our research, this was achieved by asking the players to comment on conceptual elements such as roles or challenges separately from the graphical representation of the game. Whenever applicable in the face-to-face interviews, after commenting on their game experience, players were asked follow-up questions to distinguish between the model for the game and its implementation, and some gave suggestions on how the user interface can be improved, which demonstrated that they were able to distinguish between the RE-PROVO concept and its implementation.

A further weakness of the research was the inability to evaluate the game using requirements from a project all practitioners were directly involved with. This resulted in the inability to determine conclusively the utility of the game using criteria other than the participants’ feedback, which might have been skewed by factors such as novelty effect and Hawthorne effect, which are highlighted in literature as common issues during similar evaluations [1, 34]. Therefore, any conclusions on the potential usage of RE-PROVO or similar games and tools during the requirements phase of legacy replacement projects should be treated as provisional and subject to further confirmation.

6 Conclusion and directions for future research

Legacy systems are an ongoing problem for government agencies. Their functionality is often replicated in the applications that are meant to replace them, as a way of mitigating the risks associated with business process change. As the existing bureaucratic structures and processes in government agencies favor risk aversion, methods and tools to promote innovative perspectives and to stimulate discussion during legacy replacement efforts must be applied.

The use of gamification in government legacy replacement projects is novel and has the potential to promote innovation and encourage practitioner creativity during requirements analysis. Specifically, creativity and engagement in requirements discussions are likely outcomes if gamification is properly introduced in the context of public sector IT systems replacement projects. As our research indicates, this requires that the games utilized are easy to play both from a conceptual and technical perspective, that they feature pertinent requirements and offer immediate interface-driven feedback to the players, that a proper incentivization model is used, which takes organizational culture and values into account, and also ensures participants’ freedom of expression.

Our design and evaluation of RE-PROVO have demonstrated that some of these benefits can be obtained by applying game elements to discussions of requirements along the themes of legacy preservation and innovation, and structured around the Potts et al.’s Inquiry Cycle: this readily lends itself to being augmented with game elements as its key steps can be transformed into a sequence of play actions, and rewards can be associated with their execution. The notion of the “challenge” in particular proved very germane to a game setting and was well understood by all evaluation participants.

Practitioner feedback obtained during our research suggests that the game competitive model itself could boost the discussion of requirements where uncritical legacy replication may be evident. Furthermore, it became clear that the effectiveness of the game concepts and mechanisms used in RE-PROVO, namely badges, points, roles and challenges, is affected by a consistent game experience, something which was problematic in RE-PROVO, particularly during the first evaluation. That said, several general conclusions could be drawn on their potential utility for requirements negotiation and analysis. Specifically, utility can be drawn from their ability to stimulate participation via virtual rewards; to engage themes pertinent to the work environment, project or requirements engineering problem at hand through the establishment of different roles and thematically-rooted actions; to enable “automated”/non-moderated processes by virtue of the game flow itself and the definition of action sequences; to engage affective components through the awarding of badges, points, stars and other feedback; as well as to temper power relationships which otherwise affect project- and requirements-related outcomes (by virtue of its anonymity). However, for a better assessment, RE-PROVO may need to be fine-tuned as a more immersive narrative-driven game through further interface and game mechanics adjustments, as per the Serious Games Design Assessment framework’s cohesiveness criterion.

In the course of our exploratory research of the legacy problem, a number of additional questions emerged which merit further academic research and practitioner inquiry. The design of a requirements game is an area of research ripe with possibilities for additional exploration. New features or adjustments to the RE-PROVO design emerged as options while the evaluations were progressing, but their technical or organizational implementation was not feasible at the time. One such example is the use of actual requirements from projects that all players are, or have been involved in. As previously noted the business content of the game—i.e., the requirements featured for discussion, was singled out as having significant influence on player activity and interest. Future evaluations of RE-PROVO (or similar requirements tools) will need to investigate specifically which scenario contributes to improved player engagement and creativity—one where the game is based on a real, ongoing project, or one where the requirements are hypothetical.

Even more important than the gameplay itself, however, is whether the players’ experience will have an impact on the outcomes of legacy replacement projects. A significant number of games, or gamified applications, primarily affect areas that are ancillary to core operations, i.e., they enable educational activities and training, brainstorming, or employee networking [70]. In the case of RE-PROVO, the game evaluation was undertaken for research purposes, and even though it contained real scenarios and requirements from actual ongoing projects, it was primarily an exercise in deliberation, and its outcomes have no guarantees of impacting agency decision makers. RE-PROVO has been, in effect, a rehearsal for future discussions, just as many other games or gamified applications are primarily educational and simulation tools. This echoes the notion of “procedural rhetoric” introduced by Ian Bogost [8], which posits that the main impact of games is to imply and teach a certain procedural model of the world. It would be a relevant line of inquiry to determine if requirements gamification can involve more than procedural rehearsals of requirements activities, but could be directly integrated into the management of legacy system replacement projects: for instance, versions of systems requirements with the most votes (or an agency-specific scoring algorithm) in RE-PROVO would automatically become a part of the new system’s specification document.

A valuable take-away from the RE-PROVO evaluations and the practitioner interviews was also that the act of game design may be as engaging and effective in addressing requirements problems during legacy replacement projects as gameplay itself. The suggestion to involve practitioners in serious game design would be a worthwhile thread of future research. The increased availability of flexible serious game platforms in recent years would make such an approach plausible. As RE-PROVO is designed to provide support for practitioners to voice more freely opinions and suggestions about the features of new technologies in their organizations, it would logically follow to enable them to shape the game itself. The involvement of players in the definition of game rules and parameters would constitute an act of empowerment in the spirit of the Scandinavian tradition [35], which engages end-users to co-create the software tools they would ultimately use. Furthermore, as organizational culture substantially impacts legacy system replacement project outcomes, it is sensible to design tools that fully take into consideration the agency context.