Setting the Scope for a New Agile Assessment Model: Results of an Empirical Study

. Agile software development methods have been increasingly adopted by many organizations at diﬀerent organizational levels. Whether named agile adoption, agile transition, agile transformation, digital transformation or new ways of working, the success of embracing this change process mostly remains uncertain. This is primarily because there are many ways of evaluating success. Based on the existing agile assessment models, we developed a model of principles with associated practice clusters that serves as a core for a new agile assessment model that is capable of assessing agile organizations at diﬀerent scale. Towards our ultimate goal to establish a lightweight, context-sensitive agile maturity model, we validated our initial ﬁndings in an expert interview study to identify improvement points, and ensure the at hand model’s applicability, coherence and relevance. The results of the interview study show that the structure as well as the content of our assessment model ﬁts with the experts’ expectations and experience.


Introduction
Agile software development methodology has been a well investigated topic over the past two decades. Its potential towards enabling more lean and customer oriented value creation processes makes it valuable for almost any organization. On the other hand, it is known that the success and potential impact of agile software development methodology is dependent on how it is put in practice. While it is clear that merely applying certain practices comes short in reaping the value of agile methodology, enforcing it to inappropriate organizational contexts or considering it as a silver bullet generates more harm than they help. In order to enable contextually appropriate adoptions, one important criterion is identifying the current state of an organization. Contextual appropriateness itself is a function of culture, complexity of problems at hand, form of the value to be delivered and potentially many other aspects. Therefore, in order to evaluate an organization's current state with respect to the application of agile software development methodology, manifold aspects are required to be rigorously assessed. Yet, the questions including but not limited to how to structure such an assessment model so that it provides enough flexibility to be applicable to different organizational contexts, which aspects to consider in the evaluation without compromising from practical applicability, how to ensure objectivity of the results, how to identify improvement areas and guide organizations towards those improvement areas remain unanswered. Though there exist multitude of models and various methodologies towards achieving this goal, the existing models mostly lack both meticulous scientific and industrial validation, practical applicability and contextual appropriateness.
Our aim with this research endeavour is to tackle this unsolved, complex problem, without compromising from the aforementioned essential attributes. Tuncel et al. [1] state that there could be two approaches for model development. Based on our experience in evaluating the process maturity of development organizations, we could develop an assessment model for the agile software development methodology context. Or, we could assess existing agile maturity assessment models, identify their valuable components and concepts, learn from their mistakes and develop a model based on the existing scientific body of knowledge. We have been pursuing the latter approach. In this paper, we mainly explain the process of identifying five principles derived from the Agile Manifesto [2] as one of those valuable components of the existing assessment models. These are the condensed pillars of agile software development methodology, which are ideally capable of reflecting the reality of an organization with respect to their agility. Following up on this, we discuss the practice clusters we establish within each of these principle pillars. These principles and clusters in the end, form the structural boundaries of the proposed assessment model. In this process itself, we aim to act agile and iterate over the model elements multiple times. In this paper, we share the results of this first iteration, which is provided by means of expert interviews. Other parts of the assessment model, e.g., the questions for each cluster as well as the assessment and aggregation model are out of scope for this paper. In order to investigate the importance, relevance and completeness of the consolidated principles and clusters within this first iteration, we formulated the following research questions: -RQ1: Do the pillars of principles sufficiently cover the relevant aspects of agility in practice? -RQ2: Do the clusters of a principle sufficiently operationalize the principle? -RQ3: Does the importance of clusters differ considering the organizational levels?
This paper is organized as follows. In Sect. 2 we provide the necessary background information in the theory behind assessment model construction, and summarize front-line studies that provide an assessment model. In Sect. 3, first, we elaborate on the research methodology. Second, establish the landscape for the structure of the proposed agile assessment model. Third, we discuss the validation procedure of the presented elements of the proposed agile assessment model. In Sect. 4, we share the results of the interview and objectively refer to the outcomes of the answers. In Sect. 5, we reflect on the results, share our key findings and discuss the potential threats to validity and our approach towards overcoming these limitations. Finally, in Sect. 6, we summarize our research and provide an outlook to the potential next steps to be taken.

Related Work
As discussed in Sect. 1, developing an agile assessment model that does not lack essential attributes is a challenging endeavour and there have been attempts towards this direction. It is important to highlight that, whether an assessment model or a maturity model, these models consist of multiple components: an overall structure such as which subject areas are to be asked for in an assessment, elements to look for within those subject areas, questions to find out about the state of these elements, a method for calculating the leveling structure and an aggregation mechanism. Since the scope of our research is establishing and validating the initial component mentioned above, this section discusses first, the existing studies that are relevant for establishing a structure for an assessment model, then, the pioneer studies that offer such a model structure themselves.

Model Development Approaches
De Bruin et al. [3] propose a structured generic framework for developing a maturity model. The framework consists of six distinct phases of model development, namely: Scope, Design, Populate, Test, Deploy and Maintain. Then, they discuss each of these phases in detail by means of exemplifying them over two well-established models. It is reflected by the authors that, the scope setting procedure of the model is followed by the actual design, both of which occur prior to populating the model. Becker et al. [4] establish an eight step procedure for developing maturity models. This procedure provides not only the distinct phases but also the activity flow and the logic to be followed throughout the development process. The authors provide eight requirements for maturity model development, which are derived from the design science research guidelines provided by Hevner et al. [5]. The study highlights the importance of starting the model development procedure by comparing the model to be developed against the existing models, as well as following an iterative procedure for the entire development process.
Wangenheim, von et al. [6] highlight the relevance of the process of creating software process capability/maturity models (SPCMMs) for the field of software engineering. This relevance is motivated by means of discussing how harmful can the misuse of SPCMMs be for the organizations. Authors provide five distinct phases as Knowledge identification, Knowledge specification, Knowledge refinement, Knowledge usage and Knowledge evolution. These phases encompass total of sixteen steps including but not limited to defining the scope, developing a draft, validating the draft. It is explicitly mentioned that sound theoretical basis and proper evaluations of the models with respect to validity, reliability and generalizability are lacking in most of the models. This study concludes by noting the need for methodological support to enable model validations.
Maier et al. [7] mention the importance of maturity grids in terms of their capacity to enabling organizational capability assessments. Upon reviewing twenty four different grid structures, authors provide guiding reference points for maturity grid development and define four phases, and thirteen decision points corresponding to these phases. These four phases are Planning, Development, Evaluation and Maintenance. Moreover, they provide the applicable decision options for each of these decision points.

Assessment Models
Sidky et al. [8] propose one of the essential approaches for guiding organizations' adoption process of agile practices. To achieve that, authors define an agile adoption framework consisting of two components: The agile measurement index and the four stage process that together provide assistance for adopting agile practices. The provided measurement index is formed by agile levels, principles, practices and indicators. Five principles of the measurement index are the condensed formulations of twelve agile principles of the Agile Manifesto. Practices are the elements falling in the intersecting cells of the Level-Principle matrix. Five levels of on the other hand provide the different stages of adoption. The four stage process utilizes this measurement index. The model is explicitly highlighting the importance of tailorability of the five levels, by describing the challenges behind reaching a consensus on the assignment of practices to the levels. It is concluded that the framework received overall positive feedback, yet, has significant room for improvement.
Qumer and Henderson-Sellers [9] define an agile software solution framework that is built upon an agile conceptual aspect model, that is accompanied by an agile toolkit and a four dimensional analytical tool. Authors define a method core comprising five aspects, namely: Agility, People, Process, Product, Tools, and an Abstraction aspect to reflect an agile software development methodology. While the agile toolkit consists of seven main components, the provided analytical tool focuses on the following four dimensions: Method scope, Agility characterization, Agile value characterization and Software process characterization. To complement these two components of the framework for process adoption, authors establish the Agile Adoption and Improvement Model (AAIM). AAIM is built on the following three agile blocks: Prompt, Crux and Apex and six agile levels. This study emphasizes the relevance of having a model that is applicable to different situation specific scenarios in the domain of software engineering.
Fontana et al. [10] suggest a framework for maturing in agile software development that has its roots in the complex adaptive systems theory. While explicitly mentioning ambidexterity as a fundamental attribute towards maturity, the provided framework focuses on the outcomes rather than prescribing practices. The core role of people in software development organizations is explicitly acknowledged within the study. The contrast between exploitation and exploration is represented as an important element for balancing the specific outcomes and adopting new practices. Further, by means of a cross-case analysis, the authors name six pursued outcomes: Practices, Team, Deliveries, Requirements, Product and Customer. In conclusion, this study draws attention to the importance of allowing context-specific practices in the maturing process without compromising from the agile values.

An Assessment Model Proposal
As the focal point of this research, this section elaborates on the methodology behind the development procedure of the proposed agile assessment model, the overall structure of the model along with the definitions of core model elements. Finally, without going into the detailed results, it discusses the validation phase of the model.

Methodology
In order to construct a scientifically founded agile assessment model that is capable of serving the needs of the industry, we have formulated a fine combination of research methodologies that enables us to perform the necessary research activities in an effective manner. Towards this goal, we have initiated our research with a Systematic Literature Review based on backwards and forward snowballing described by Wohlin [11], the systematic literature review study allowed us identifying reusable components of the previously conducted researches, and the already established assessment models. We have published the results of this research in [1]. Following up on these research results, we conduct Design Science Research in accordance with Hevner et al. [5]. As described by the authors, developing the design artifact is the core activity of the design science research. In the context of our research, building the overall structure of the proposed agile assessment model maps to this core activity. This core activity is to be followed by an Action Design Research per Sein et al. [12]. Proceeding with an action design research implies conducting an intervention, however, it is out of the scope of this research and is not discussed to great extent.

Model Structure
The proposed agile assessment model consists of two fundamental elements: principles and clusters. Principles are, as the name suggests, a set of abstract notions that are essential to the agile software development methodology. Clusters on the other hand, are the semantic classifications of the practices of agile software development methodology within the principles. The proposed model has five principles and eighteen clusters within those principles. Its structure is established by the following procedure: Initially, principles that are capable of capturing and reflecting the agile reality of an organization are extracted from the twelve notable agile assessment models. These twelve models are the prominent models among the 40+ models we have examined in our previous research [1]. While establishing these principles, we considered the critical views on the principles of the Agile Manifesto in an alignment with Meyer [13] and the degree of acceptance of these principles in the scientific literature. As a result, for the initial structure of the proposed model, a condensed set of five principles that are essentially derived from the Agile Manifesto are constructed as-is, based on the well received proposal of Sidky et al. [8]. Following that, the practices of agile software development methodology that are either implicitly or explicitly mentioned within these twelve models analyzed in our literature study are systematically extracted. Because some of these twelve models (e.g., Turetken et al. [14]) build up on one other (e.g., Sidky et al. [8]), or some (e.g., Patel and Ramachandran [15]) reuse certain practices such as "User stories are written." from another (e.g., Nawrocki et al. [16]), the identified duplicates in collected practices are discarded. Additionally, since the scope of the proposed model is beyond any agile software development framework, in the cases where models developed specifically towards a certain framework (e.g., Turetken et al. [14]), the framework specific practices are as well left out. In the end, to form the preliminary structure of the proposed agile assessment model in this paper, the remaining practices were classified under five principles, and with respect to their conceptual proximity to each other, were clustered underneath those five practices. As discussed in Sect. 2, establishing this overarching model structure encompass the initial steps of the model development. The resulting structure can be depicted in Fig. 1.   Fig. 1. Initial structure of the proposed model

Validation
As it is mentioned in Sect. 1, in order to obtain critical feedback on the proposed model structure, we conducted an interview study. The interview study involved six domain experts in agile software methodologies. The profiles of the interviewed experts are diversified from senior software developers, architects to senior agile coaches, senior process consultants to senior technical team leads, working in different development organizations, least of them having 7+ years of experience in the domain. The validation interviews focused on receiving feedback with respect to the completeness of the five agile principles, and the consolidated set of eighteen clusters positioned under these five principles. During the interviews, for each cluster, experts were systematically asked to evaluate 1) how well does the cluster fit to the principle it is positioned within, 2) whether the cluster is found to be fit into different principles as well, 3) whether the importance of a cluster vary with respect to different organizational scales, and 4) whether a clusters requires either a split or a merge with another cluster to establish a proper level of granularity. Following on top of these cluster specific questions, experts were additionally asked to comment on the completeness of these five principles with respect to their capacity to reflect the agile reality of an organization, as well as of these eighteen clusters with respect to their capacity to completely reflect the principle they are positioned under. In order to establish the context of a principle and a cluster, descriptions of each cluster (e.g., Value Delivery: The organization uses proper methods, techniques and tools for planning the delivery of value by means of realized user stories, epics or features.) are provided along with certain exemplary aspects (e.g., Release Planning, Collaborative Planning, Backlog Management, User Stories) to be associated with that cluster. While the results of this validation procedure and the key observations are reflected in Sects. 4 and 5 respectively, the complete list of cluster descriptions and exemplary aspects can be found at https://bit.ly/3eKj4im.

Results
The proposed model structure consists of five principles and eighteen clusters. In this section, first, we present the achieved results, grouped by each of the five principles. For each principle, the expert responses regarding the clusters within that principle are reflected. Particularly, the responses regarding the importance of a cluster with respect to the organizational scale are visualized by Figs. 2, 3, 4, 5 and 6. Second, we share the results regarding the overall structure of the proposed model under Sect. 4.6. Due to space limitations, results with relatively low information are provided at https://bit.ly/3eKj4im.
In the following figures, S refers to small organizational units such as agile teams, M refers to the medium level organizational units, which can be interpreted as project or product organizations consisting of multiple teams, and L refers to large organizational units. Depending on an organizational context, it is possible to perceive this level as the top level management of development organizations, where orchestration of multiple medium level organizational units are required. In agile frameworks addressing scale (e.g., SAFe, LeSS, Nexus) scaling starts from single teams, reach multiple teams or the entire organization. In order to abstract from specific agile frameworks and in order to avoid confusion regarding the terms, we use these more abstract terms small, medium and large. The coding mechanism of results in the following figures are as follows: Very Important: 5, Important: 4, Neither/Nor: 3, Unimportant: 2, Very Unimportant: 1 and I don't know: -, while E1 to E6 refer to the interviewed experts.

Embrace Change to Deliver Customer Value Principle
The results indicate that "Lean Mindset" cluster is an overarching cluster. This implies that it should be reflected under multiple principles. In fact, all of the interviewees explicitly stated that "Lean Mindset" should be reflected under at least two other principles. Additionally, interviewees highlight that the difference between the mindset and actual practice may not always be clear, in a way that one can have a lean or lean-agile mindset, yet fail to practice this mindset in real life scenarios. When it comes to "Change Orientation" one important point is, except one of the interviewees, there is an agreement that "Change Orientation" as well should be reflected under multiple principles. On the other hand, "Iterative and Incremental Value Delivery" is expected to be found also under "Planning and Delivering Software Frequently". This is highlighted by two of the interviewees by the response "Does not fit well", one of which also noted that the importance of this cluster is less when larger organizational scales are considered. Although there is no agreement with respect to its fitness to the principle it is positioned under, "Flexibility in Value Delivery" is found to be fit also under "Technical Excellence" by half of the interviewees.

Plan and Deliver Software Frequently Principle
Based on the interview study outcomes, there are two particular observations regarding this principle. First, except one of the interviewees, both the cluster "Value Delivery Planning" and "Value Delivery Actualization" are found to be "Fit Very Well" in this principle. Second, all of the interviewees found "Value Delivery Actualization" to be the most important for "Small" scale organizational units. While there is no general agreement about under which other principles to reflect these two clusters, an important result can be that "Value Delivery Actualization" is mostly perceived as in relation with "Technical Excellence", while "Value Delivery Planning" is associated with "Embrace Change to Delivery Customer Value".

Human Centricity Principle
One particular shared comment was that it is not easy to make a clear distinction between the clusters "Unit Empowerment" and "Unit Autonomy". Consequently, the idea to merge two clusters was prominently mentioned. Further, human related aspects were referred to as so essential yet overlooked aspects by multiple experts. As it can be observed from Fig. 4, all clusters are found to be at least "Important", especially on the scale of small organizational units.

Technical Excellence Principle
Regarding the four clusters under this principle, primary remark is that "Continuous Improvement" cluster is perceived as "Very Important" by almost all of the experts, irrespective of the organizational scale. Another important observation is that this cluster is expected to additionally be positioned under "Embrace Change to Deliver Customer Value", by all but one of the experts.

Customer Collaboration Principle
While both of the clusters under this principle are found to be "Fit Very Well" except one case for "Customer Involvement" where it is evaluated as "Fits Well"; half of the interviewee responses indicate an association with "Embrace Change to Deliver Customer Value" for both of these clusters. When it comes to the importance of a cluster with respect to the organizational scale, a pattern can be observed in Fig. 6. Specifically when the interviewees posed an answer, all of them agree that "Customer Decision Making" cluster is "Very Important" for middle level and larger organizational units, whereas its importance decrease as the organizational scale gets smaller.  Figure 7 shows that the different backgrounds of the interviewees provide certain patterns in the results. Particularly, the importance of a cluster with respect to different organizational scales can be captured by the color transitions among the cells. The experts with consulting or coaching roles for example, seem to put more importance on the small organizational units, than the experts with software implementation focus. Based on the collected responses, the initial model structure is to be updated with "Change Orientation" and "Continuous Improvement" becoming overarching concepts. Further, "Unit Empowerment" and "Unit Autonomy" clusters are to be merged.

Discussion
Based on the results of our interview study, this section discusses the important findings under two groups: general findings and cluster specific findings. Then, it provides answers to the research questions. Finally, in Fig. 8, we share the updated structure of the proposed model after an iteration over the discussed findings. General findings touch to the important remarks with respect to the overall structure of the proposed model, whereas cluster specific findings reflect some of the important patterns observed regarding the clusters. Figure 7 is formed by merging the aforementioned tables in their order of presentation. This consolidated view is provided to allow the reader to observe certain vertical patterns that can be associated with expert profiles. The complete table reflecting the fitness of each cluster with respect to the principles they are positioned under, as well as whether they also fit under multiple principles can be found at the URL provided in the Sect. 4. Fig. 7. Importance of clusters with respect to the organizational scales

General Findings
Both Principle and Cluster Completeness are Highlighted. Although some additional remarks and suggestions were provided by the interviewees, all of the experts appreciated the completeness of the model elements.
Descriptive Texts and Exemplary Aspects Help Defining the Boundaries. Experts provided positive feedback on being given clear cluster descriptions, example practices, artifacts and aspects associated with clusters so that they can easily establish the context of a cluster.
There Is No "One, All Agreed Positioning" of the Clusters. The concepts and practices of the agile software development methodology are perceived very differently based on the background and experience of the individuals.
In Fig. 7, we observe that most of the average cluster values are greater than or equal to 4.00. This is expected as the underlying elements of the model are extracted from the scientific literature. Where the average values fall under 4.00, it can be observed that it is caused by the distance of the larger organizational units to the implementation level concerns of software development. Even though there is no agreement on a single model structure, the experts provided great insights towards improving the proposed model to capture the reality of an organization. These findings show that the initially proposed model structure was too simplistic for reflecting the reality.

Cluster Specific Findings
Technical Excellence Clusters Act as a Prerequisite for Frequent Delivery. Technical excellence is mostly interpreted as the first step towards making frequent delivery possible, as frequent delivery implies a certain level of automation, and involves making architectural decisions.
Technical Excellence Clusters are Perceived as Relatively Less Important for Higher Level Organizational Units. As technical excellence clusters are reflecting more the implementation level concerns, their importance is perceived to decrease as the scale of the organization increase. The lower importance score of the technical clusters on higher levels were therefore not surprising for us.
Customer Collaboration Clusters Contribute to Planning. Especially for large organizational units, customer collaboration is commented to be very important, and is perceived as an enabler of the delivery planning activities.
Human Centricity Clusters are Well Perceived. In almost all of the interviews, human centricity clusters received positive feedback. It is often commented that, people play a central role almost in any process, and if the aim of a model is to capture the reality with respect to agile, people should never be overlooked.

RQ1: Do the Pillars of Principles Sufficiently Cover the Relevant Aspects of Agility in Practice?
The principles are evaluated to be complete in terms of reflecting the world of agile. Only one of the experts stated that communicating the purpose of agile transformation and the role of management should be reflected in this structure more explicitly.

RQ2: Do the Clusters of a Principle Sufficiently Operationalize the Principle?
The clusters are evaluated to be sufficient and complete in terms of spanning the principles they are positioned under. Only one of the experts mentioned that the budgeting aspects may be necessary to position under a principle appropriately.
RQ3: Does the Importance of Clusters Differ Considering the Organizational Levels? From Fig. 7, we observe that the importance of clusters differ with respect to the organizational scale. This is an important finding as it can help conducting contextually appropriate assessments, where the scale of the organization is considered as a component of the organizational context.

Threats to Validity
In this section, we discuss the validity threats and our attempts to ensure a high quality of research by keeping these threats minimal. We are aware of the four validity threats namely Construct Validity, External Validity, Internal Validity and Reliability as defined, and tailored to the software engineering domain by Yin [17] and Runeson et al. [18] respectively. However, as our methodology is not a case study research, not all four of these validity threats are covered in this section. Rather, we concentrate on the following two aspects as they are found to be more relevant for our methodology: Construct Validity reflects how properly the examined concept represents the ideas of the researchers. Therefore, should there be any misunderstandings between the researchers and the interviewed parties with respect to the definitions or concepts that are being discussed, they should be addressed. In our interview study, in order to proactively avoid potential misunderstandings, we provided descriptive texts for each cluster, as well as exemplary practices falling under that particular cluster. This approach allowed us to establish the boundaries and the context of the inspected cluster. As our interview partners positively commented on the cluster descriptions and exemplary aspects, we can assume that we were able to properly tackle this threat to validity.
External Validity refers to the generalizability of the derived results from a research activity. In our research, validation of the consolidated agile principles and agile practices is performed by means of expert interviews. This procedure makes the validation susceptible to converge towards and be specific to the potentially strong personal opinions of experts. In order to overcome this threat, first, we derived the principles and practices from scientific literature. This scientifically grounded approach allowed us to have a safety net, in terms of the further validation of the elements within the model. Moreover, as a second attempt to ensure the external validity of clusters and the structure of the model, we have selected experts from different business organizations. Each of these experts has more than 7 years of experience in the domain of agile software development methodologies, and provide their expertise in the spectrum of domains from consulting to software architecture. To conclude, although it is generally accepted that statistical generalizability should not be expected in empirical studies, we have put strong emphasis on ensuring the external validity of our findings.

Conclusion and Future Work
This section summarizes our research and provides an outlook to the further research that is necessary in order to further improve the proposed agile assessment model following up on the validation procedure. Given the points discussed under Sect. 4 and Sect. 5, there is an overall positive feedback to the proposed elements of the assessment model, and this provides a promising outlook for the future of our research. Our research motivates the need towards building an agile assessment model based on an in depth, comparative analysis of the existing models. While learning from the relatively weaker aspects of the existing models, we established our model structure based on principles of the Agile Manifesto. The clusters of practices are also reflecting approximately twenty years of scientific analysis of agile software development activities starting from Nawrocki et al. [16], to Laanti [19].
Our endeavour towards iterative and incremental development of this model requires us to frequently consult experts and receive continuous feedback on the evolving elements of the proposed model. As discussed in Sect. 2 to a great extent, establishing the boundaries and the overall structure of the assessment model is an important step of assessment model development which needs to be followed by populating the assessment matrix with practices. In our context, this maps to identifying which practices should go under which clusters, which practices should be discontinued or marked as irrelevant for contemporary development activities of the organizations. In the following steps, it will be necessary to further refine the practice clusters and establish boundaries across the principles for the clusters that are relevant for multiple principles. Once the overall structure such as which subject areas are to be asked for in an assessment, and which elements are to be looked for within those subject areas sufficiently identified, a method for calculating the leveling structure, as well as the appropriate questions to find out about the state of these elements to be looked for need to be clarified. As the next step, we will develop questions for all clusters and validate them with detailed interviews with a focus on completeness and suitability for agile assessments. Upon completing these remaining phases, we will develop an aggregation mechanism so that meaningful outcomes can be retrieved when an assessment is conducted.
These activities are planned as part of an action design research described by Sein et al. [12]. By means of an action design research, we will be able to not only employ the proposed assessment model in real business environment, but also receive feedback regarding the methodology behind the assessment.