Why are many businesses instilling a DevOps culture into their organization?

Díaz, Jessica; López-Fernández, Daniel; Pérez, Jorge; González-Prieto, Ángel

doi:10.1007/s10664-020-09919-3

Why are many businesses instilling a DevOps culture into their organization?

Published: 02 March 2021

Volume 26, article number 25, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

2804 Accesses
21 Citations
14 Altmetric
Explore all metrics

Abstract

Context

DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Although this movement is relatively recent, there exist an intensive research around DevOps. However, the real reasons why companies move to DevOps and the results they expect to obtain have been paid little attention in real contexts.

Objective

This paper aims to help practitioners and researchers to better understand the context and the problems that many companies face day to day in their organizations when they try to accelerate software delivery and the main drivers that move these companies to adopting DevOps.

Method

We conducted an exploratory study by leveraging in depth, semi-structured interviews to relevant stakeholders of 30 multinational software-intensive companies, together with industrial workshops and observations at organizations’ facilities that supported triangulation. Additionally, we conducted an inter-coder agreement analysis, which is not usually addressed in qualitative studies in software engineering, to increase reliability and reduce authors bias of the drawn findings.

Results

The research explores the problems and expected outcomes that moved companies to adopt DevOps and reveals a set of patterns and anti-patterns about the reasons why companies are instilling a DevOps culture.

Conclusions

This study aims to strengthen evidence and support practitioners in making better informed about which problems trigger a DevOps transition and most common expected results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Change Management: From Theory to Practice

Article 09 September 2022

The theory contribution of case study research designs

Article Open access 16 February 2017

The contribution of organizational culture, structure, and leadership factors in the digital transformation of SMEs: a mixed-methods approach

Article 12 September 2022

Notes

https://www.devopsagileskills.org/, last accessed 2020/01/01.
http://bit.ly/2ky00LQ, last accessed 2020/01/01.
https://www.devops-spain.com/ last accessed 2020/01/01.
http://bit.ly/2ky0eCG, last accessed 2020/01/01.
https://devopsdays.org/events/2020-madrid/welcome/ last accessed 2020/01/01.
Spanish Law 5/2015 indicates that a micro enterprise is one that has less than ten workers and an annual turnover of less than two million euros or a total asset of less than two million euros; a small company is one that has a maximum of 49 workers and a turnover or total assets of less than ten million euros; medium-sized companies are those with less than 250 workers and a turnover of less than fifty million euros or an asset of less than 43 million euros; and large companies are those that exceed these parameters.
https://github.com/jdiazfernandez/DevOpsInPractice/blob/master/codebook.md, last accessed 01/01/2020
http://bit.ly/2ky00LQ, last accessed 2020/01/01.

References

Allspaw J., Hammond P. (2009) 10+ deploys per day: Dev and ops cooperation at flickr. In: Velocity: web performance and operations conference
ATLAS.ti (2019) Scientific Software Development GmbH. Inter-coder agreement analysis: Atlas.ti 8. https://atlasti.com/. Last accessed: 2020-01-01
Beck K, Beedle M, Van Bennekum A, Cockburn A, Cunningham W, Fowler M, Grenning J, Highsmith J, Hunt A, Jeffries R. (2001) Manifesto for agile software development
Bosch J (2012) Building products as innovation experiment systems. In: Cusumano MA, Iyer B, Venkatraman N (eds) Software business, Springer, pp 27–39
Cohen J (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1):37–46
Article Google Scholar
Corbin J, Strauss A (2008) Basics of qualitative research. (3rd ed): techniques and procedures for developing grounded theory. SAGE Publications, Inc
Creswell JW, Creswell JD (2017) Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications
Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: 2011 International symposium on empirical software engineering and measurement, IEEE, pp 275–284
de França BBN, Jeronimo H, Travassos GH (2016) Characterizing devops by hearing multiple voices. In: Proceedings of the 30th brazilian symposium on software engineering, SBES ’16, pp 53–62, New York. Association for Computing Machinery
Debois P (2008) Agile infrastructure and Operations: How infra-gile are you?. In: Agile 2008 Conference, pp 202–207
Díaz J, Almaraz R, Pérez J, Garbajosa J (2018) DevOps in practice. In: Proceedings of the 19th international conference on agile software development companion - XP 18 ACM Press
Díaz J, Perez JE, Yague A, Villegas A, de Antona A (2019) Devops in practice – a preliminary analysis of two multinational companies. In: Franch X, Männistö T, Martínez-Fernández S (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 323–330
Dingsøyr T, Nerur S, Balijepally V, Moe NB (2012) A decade of agile methodologies: Towards explaining agile software development. J Syst Softw 85(6):1213–1221
Article Google Scholar
Dyck A, Penners R, Lichter H (2015) Towards definitions for release engineering and DevOps. In: 2015 IEEE/ACM 3rd international workshop on release engineering. IEEE
Easterbrook S, Singer J, Storey M-A, Damian D (2008) Selecting Empirical Methods for Software Engineering Research. Springer, London, pp 285–311
Google Scholar
Ebert C, Gallardo G, Hernantes J, Serrano N (2016) Devops. IEEE Softw 33(3):94–100
Article Google Scholar
Erich F (2019) Devops is simply interaction between development and operations. In: Bruel J-M, Mazzara M, Meyer B (eds) Software engineering aspects of continuous development and new paradigms of software production and deployment. Springer International Publishing, Cham, pp 89–99
Erich F (2014). In: Amrit C, Daneva M, Jedlitschka A, Kuvaja P, Kuhrmann M, Männistö T, Münch J, Raatikainen M (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 277–280
Erich F, Amrit C, Daneva M (2017) A qualitative study of devops usage in practice. J Softw Evol Proc 29(6):e1885. e1885 smr 1885
Article Google Scholar
Feitelson D, Frachtenberg E, Beck K (2013) Development and deployment at facebook. Internet Comput IEEE 17:8–17
Article Google Scholar
Fitzgerald B, Stol K-J (2017) Continuous software engineering: a roadmap and agenda. J Syst Softw 123:176–189
Article Google Scholar
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Floersch J, Longhofer JL, Kranke D, Townsend L (2010) Integrating thematic, grounded theory and narrative analysis: a case study of adolescent psychotropic treatment. Qual Soc Work 9(3):407–425
Article Google Scholar
Fowler M (2013) Continuous delivery. https://martinfowler.com/bliki/ContinuousDelivery.html. Last Accessed: 2020-01-01
Friese S (2019) Qualitative data analysis with ATLAS. ti SAGE Publications Limited
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston
MATH Google Scholar
Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89
Article Google Scholar
Iden J, Tessem B, Päivärinta T (2011) Problems in the interplay of development and it operations in system development projects: a delphi study of norwegian it experts. Inf Softw Technol 53(4):394–406
Article Google Scholar
Jabbari R, bin Ali N, Petersen K, Tanveer B (2016) What is devops? a systematic mapping study on definitions and practices. In: Proceedings of the scientific workshop proceedings of XP2016, XP ’16 Workshops, New York, NY, USA, Association for Computing Machinery
Jabbari R, bin Ali N, Petersen K, Tanveer B (2018) Towards a benefits dependency network for devops based on a systematic literature review. J Softw Evol Proc 30(11):e1957. e1957 smr 1957
Article Google Scholar
Kim G, Debois P, Willis J, Humble J (2016) The devops handbook: how to create world-class agility, reliability, and security in technology organizations. IT Revolution Press
Krippendorff K (2004) Reliability in content analysis: Some common misconceptions and recommendations. Hum Commun Res 30(3):411–433
Google Scholar
Krippendorff K (2011) Computing krippendorff’s alpha-reliability. http://repository.upenn.edu/asc_papers/43. Last Accessed: 2020-01-01
Krippendorff K (2018) Content analysis: An introduction to its methodology, 4rd edition. Sage Publications
Krippendorff K, Mathet Y, Bouvry S, Widlöcher A. (2016) On the reliability of unitizing textual continua: Further developments. Quality & Quantity: International Journal of Methodology 50(6):2347–2364
Article Google Scholar
Kuusinen K, Balakumar V, Jepsen SC, Larsen SH, Lemqvist TA, Muric A, Nielsen AØ, Vestergaard O (2018) A large agile organization on its journey towards devops. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA), IEEE, pp 60–63
Leite L, Kon F, Pinto G, P. Meirelles. (2020) Platform teams: An organizational structure for continuous delivery. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, ICSEW’20, pp 505–511, New York, NY, USA. Association for Computing Machinery
Leite L, Rocha C, Kon F, Milojicic D, Meirelles P (2019) A survey of DevOps concepts and challenges. ACM Comput Surv 52(6):35. https://doi.org/10.1145/3359981
Article Google Scholar
Leppänen M, Mäkinen S, Pagels M, Eloranta V, Itkonen J, Mäntylä MV, Männistö T (2015) The highways and country roads to continuous deployment. IEEE Softw 32(2):64–72
Article Google Scholar
Lethbridge TC, Sim SE, Singer J (2005) Studying software engineers: Data collection techniques for software field studies. Empir Softw Eng 10:311–341
Article Google Scholar
Luz WP, Pinto G, Bonifácio R (2019) Adopting devops in the real world: A theory, a model, and a case study. J Syst Softw 110384:157
Google Scholar
Lwakatare LE, Kuvaja M, Pasiand O (2016) An exploratory study of devops - extending the dimensions of devops with practices. In: Proceedings the eleventh international conference on software engineering advances. pp 91–99
Lwakatare LE, Kuvaja M, Pasiand O (2016) Relationship of DevOps to Agile, Lean and Continuous Deployment. Springer International Publishing, Cham, pp 399–415
Google Scholar
Mathet Y, Widlöcher A, Métivier J-P (2015) The unified and holistic method gamma (γ) for inter-annotator agreement measure and alignment. Comput Linguist 41(3):437–479
Article MathSciNet Google Scholar
Miles MB, Huberman AM, Huberman MA, Huberman M (1994) Qualitative data analysis: An expanded sourcebook. Sage
Mäkinen S, Leppänen M, Kilamo T, Mattila A-L, Laukkanen E, Pagels M, Männistö T (2016) Improving the delivery cycle A multiple-case study of the toolchains in finnish software intensive enterprises. Inf Softw Technol 80:175–194
Article Google Scholar
Parnin C, Helms E, Atlee C, Boughton H, Ghattas M, Glover A, Holman J, Micco J, Murphy B, Savor T, Stumm M, Whitaker S, Williams L (2017) The top 10 adages in continuous deployment. IEEE Softw 34 (3):86–95
Article Google Scholar
Poppendieck M, Poppendieck T (2006) Implementing lean software development: from concept to cash (the addison-wesley signature series). Addison-Wesley professional
Rafi S, Yu W, Akbar MA (2020) Towards a hypothetical framework to secure devops adoption: grounded theory approach. In: Proceedings of the Evaluation and Assessment in Software Engineering, EASE ’20, page 457–462, New York, NY, USA. Association for Computing Machinery
Research and Assessment D (2018) Accelerate: state of devops 2018: strategies for a new economy. https://devops-research.com. Last Accessed: 2020-01-01
Riungu-Kalliosaari L, Mäkinen S, Lwakatare LE, Tiihonen J, Männistö T, Felderer M (2016) Devops adoption benefits and challenges in practice: A case study. In: Abrahamsson P, Jedlitschka A, Nguyen Duc A, Amasaki S, Mikkonen T (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 590–597
Rodríguez P, Mäntylä M, Oivo M, Lwakatare LE, Seppänen P, Kuvaja P (2019) Advances in using agile and lean processes for software development. In: Adv Comput. Elsevier, Amsterdam, pp 135–224
Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164
Article Google Scholar
Salameh A, Bass JM (2019) Spotify tailoring for promoting effectiveness in cross-functional autonomous squads. In: Hoda R (ed) Agile processes in software engineering and extreme programming – workshops. Springer International Publishing, Cham, pp 20–28
Sánchez-Gordón M, Colomo-Palacios R (2018) Characterizing devops culture: A systematic literature review. In: Stamelos I, O’Connor RV, Rout T, Dorling A (eds) Software process improvement and capability determination. Springer International Publishing, Cham, pp 3–15
Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19(3):321–325. Accessed 2 Jan 2021. http://www.jstor.org/stable/2746450
Article Google Scholar
Senapathi M, Buchan J, Osman H (2018) Devops capabilities, practices, and challenges: Insights from a case study. In: Proceedings of the 22nd international conference on evaluation and assessment in software engineering 2018, EASE’18, pp 57–67, New York, NY, USA. Association for Computing Machinery
Smeds J, Nybom K, Porres I (2015) Devops: A definition and perceived adoption impediments. In: Lassenius C, Dingsøyr T, Paasivaara M (eds) Agile processes in software engineering and extreme programming. Springer International Publishing, Cham, pp 166–177
Spearman C (1987) The proof and measurement of association between two things. Am J Psychol 100(3/4):441–471
Article Google Scholar
Stamelos I (2010) Software project management anti-patterns. J Syst Softw 83(1):52–59
Article Google Scholar
Thomas J, Harden A (2008) Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology 8 (1):45. https://doi.org/10.1186/1471-2288-8-45
Article Google Scholar
Webteam P (2018) State of devops report 2018. https://puppet.com/resources/whitepaper/state-of-devops-report. Last Accessed: 2020-01-01
Wohlin C, Runeson P, Höst M., Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated
Book Google Scholar
Yin RK (2018) Case study research and applications - design and methods, 6ed SAGE Publication

Download references

Acknowledgements

This research project is being performed thanks to Vass, Clarive, Autentia, Ebury, Carrefour, Vilt, IBM, AtSistemas, Entelgy, Analyticalways, Mango eBusiness, Adidas, Seur, Zooplus, as well as other participating companies.

Author information

Authors and Affiliations

Universidad Politécnica de Madrid, 28031, Madrid, Spain
Jessica Díaz, Daniel López-Fernández, Jorge Pérez & Ángel González-Prieto

Authors

Jessica Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel López-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Ángel González-Prieto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Díaz.

Additional information

Communicated by: Emerson Murphy-Hill

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Theoretical framework for α coefficients

In this Appendix, we introduce a novel interpretation that unifies the different variants of the α coefficient into a common framework. These coefficients that may be found in the literature are presented as unrelated and a kind of ad hoc formulation for each problem is provided.

The key point of this section is that we will show that these versions can be translated to simpler and universal version of Krippendoff’s α. For this purpose, we will formulate the α coefficients in terms of some ‘meta-codes’, that we will call ‘labels’. In each situation, we will provide a well-defined algorithm that translates from semantic domains and codes (the units of judgment considered in thematic analysis) into labels. In this way, after this translation, all the coefficients reduce to the same mathematical computation of the universal α coefficient for labels.

In order to lighten this section, the mathematical formulation of this universal α coefficient has been moved to Appendix B. Nevertheless, for convenience, let us recall some notations introduced there that will be used along this section. We fix a finite set Λ of labels and we are dealing with a collection J₁,…,J_n of judges that will evaluate a set of items I₁,…,I_m. Each of the judges, J_α, evaluates the item I_β with a subset $\omega _{\alpha , \beta } \subseteq {\varLambda }$. The result of the evaluation process is gathered in a set ${\varOmega } = \left \{\omega _{\alpha , \beta }\right \}$ for 1 ≤ α ≤ n and 1 ≤ β ≤ m.

From Ω, we can compute Krippendoff’s coefficient, α = α(Ω), which is a real number with 0 ≤ α ≤ 1. Such a quantity is interpreted as a measure of the degree of agreement that is achieved out of the chance. The bigger the α is, the better agreement is observed. A common rule-of-thumb in the literature (Krippendorff 2018) is that α ≥ 0.667 is the minimal threshold required for drawing conclusions from the data. For α ≥ 0.80, we can consider that there exists statistical evidence of reliability in the evaluations.

However, this ideal setting, as described in Appendix B, might be too restrictive for the purposes of content analysis (particularly, as applied by the Atlas.ti Software (ATLAS.ti 2019)). The most general setting of content analysis is as follows. We have a collection of s > 1 semantic domains, S₁,…,S_s. A semantic domain defines a space of distinct concepts that share common meanings (for a concrete example, check our semantic domains in Section 4.1.1). Subsequently, each semantic domain embraces mutually exclusive concepts indicated by a code. Hence, for 1 ≤ i ≤ s, the domain S_i for 1 ≤ i ≤ s, decomposes into r_i ≥ 1 codes, that we will denote by ${C_{1}^{i}}, \ldots , C_{r_{i}}^{i}$. As pointed out in the literature, for design consistency these semantic domains must be logically or conceptually independent. This principle translates into the fact that there exist no shared codes between different semantic domains.

Now, the data under analysis (e.g. scientific literature, newspapers, videos, interviews) is divided into items, which in this context are known as quotations, that represent meaningful parts of the data by their own. The decomposition may be decided by each of the judges (so different judges may have different quotations) or it may be pre-established (for instance, by the codebook creator or the designer of the ICA study). In the later case, all the judges share the same quotations so they cannot modify the limits of the quotations and they should evaluate each quotation as a block. To enlighten the notation, we will suppose that we are dealing with this case of pre-established quotations. This is the setting of our study (see Section 3.4). Indeed, from a mathematical point of view, the former case can be reduced to this version by refining the data division of each judge to get a common decomposition into the same pieces.

Therefore, we will suppose that the data is previously decomposed into m ≥ 1 items or quotations, I₁,…,I_m. Observe that the union of all the quotations must be the whole matter so, in particular, irrelevant matter is also included as quotations. Now, each of the judges J_α, 1 ≤ α ≤ n, evaluates the quotation I_i, 1 ≤ i ≤ m, assigning to I_i any number of semantic domains and, for each chosen semantic domain, one and only one code. No semantic domain may be assigned in the case that the judge considers that I_β is irrelevant matter, and several domains can be applied to I_β by the same judge.

Hence, as byproduct of the evaluation process, we obtain a collection of sets ${\varSigma } = \left \{\sigma _{\alpha , \beta }\right \}$, for 1 ≤ α ≤ n and 1 ≤ β ≤ m. Here, $\sigma _{\alpha , \beta } = \left \{{C}_{j_{1}}^{i_{1}}, \ldots , {C}_{j_{p}}^{i_{p}}\right \}$ is the collection of codes that the judge J_α assigned to the quotation I_β. The exclusion principle of the codes within the semantic domain means that the collection of chosen semantic domains i₁,…,i_p contains no repetitions.

Remark 2

To be precise, as proposed in Krippendorff et al. (2016), when dealing with a continuum of matter each of the quotations must be weighted by its length in the observed and expected coincidences matrices (see Appendix B). This length is defined as the amount of atomic units the quotation has (say characters in a text or seconds in a video). In this way, (dis)agreements in long quotations are more significant than (dis)agreements in short quotations. This can be easily incorporated to our setting just by refining the data decomposition to the level of units. In this way, we create new quotations having the length of an atomic unit. Each new atomic quotation is judged with the same evaluations as the old bigger quotation. In the coefficients introduced below, this idea has the mathematical effect that, in the sums of Equation (1, Appendix B, each old quotation appears as many times as atomic units it contains, which is the length of such quotation. Therefore, in this manner, the version explained here computes the same coefficient as in Krippendorff et al. (2016).

In order to quantify the degree of agreement achieved by the judges in the evaluations Σ, several variants of Krippendorff’s α are proposed in the literature (Krippendorff et al. 2016; Krippendorff 2018). For the purposes of this study, we will apply the variants described below.

1.1 A.1 The coefficient α _binary

The first variation of the Krippendorff’s α coefficient is the so-called α_binary coefficient. This is a coefficient that must be computed on a specific semantic domain. Hence, let us fix a semantic domain S_i for some fixed i with 1 ≤ i ≤ s. The set of considered items to be judged is exactly the set of (prescribed) quotations I₁,…,I_m. However, the set of labels will have only two labels, that semantically represent ‘voted S_i’ and ‘did not vote S_i’. Hence, we take

$$ {\varLambda} = \left\{1, 0\right\}. $$

For the assignment of labels to items, the rule is as follows. For 1 ≤ α ≤ n and 1 ≤ β ≤ m, we set $\omega _{\alpha ,\beta } = \left \{1\right \}$ if the judge J_α assigned some code of S_i to the quotation I_β (i.e. if ${C_{j}^{i}} \in \sigma _{\alpha , \beta }$ for some 1 ≤ j ≤ r_i) and $\omega _{\alpha ,\beta } = \left \{0\right \}$ otherwise. Observe that, in particular, $\omega _{\alpha ,\beta } = \left \{0\right \}$ if J_α considered that I_β was irrelevant matter. From this set of evaluations, ${\varOmega }_{binary}^{S_{i}} = \left \{\omega _{\alpha , \beta }\right \}$, α_binary is given as

$$ \alpha_{binary}^{S_{i}} = \alpha({\varOmega}_{binary}^{S_{i}}). $$

In this way, the coefficient $\alpha _{binary}^{S_{i}}$ can be a measure of the degree of agreement that the judges achieved when choosing to apply the semantic domain S_i or not. A high value of $\alpha _{binary}^{S_{i}}$ is interpreted as an evidence that the domain S_i is clearly stated, its boundaries are well-defined and, thus, the decision of applying it or not is near to be deterministic. However, observe that it does not measure the degree of agreement in the application of the different codes within the domain S_i. Hence, it may occur that the boundaries of the domain S_i are clearly defined but the inner codes are not well chosen. This is not a task of the $\alpha _{binary}^{S_{i}}$ coefficient, but of the ${cu\textrm {-}\alpha }^{S_{i}}$ coefficient explained below.

Remark 3

Empirically, we discovered that the semantic that the software Atlas.ti (ATLAS.ti 2019) applies for computing α_binary (and for the coefficient cu-α introduced in Section A.2) is the one explained in this section. However, to our understanding, this behavior is not consistent with the description provided in the corresponding user’s guide.

1.2 A.2 The coefficient c u-α

Another variation of the Krippendorff’s α coefficient is the so-called cu-α coefficient. As the previous variation, this is a coefficient that is computed for each semantic domain, say S_i for some 1 ≤ i ≤ s. Suppose that this semantic domain contains codes ${C}_{1}^{i}, \ldots , {C}_{r}^{i}$. As always, the set of considered items is the set of quotations. However, the collection of labels is now a set

$$ {\varLambda} = \left\{\mathcal{C}_{1}, \ldots, \mathcal{C}_{r}\right\}. $$

Semantically, they are labels that represent the codes of the chosen domain S_i.

For the assignment of labels to items, the rule is as follows. For 1 ≤ α ≤ n and 1 ≤ β ≤ m, we set $\omega _{\alpha ,\beta } = \mathcal {C}_{k}$ if the judge J_α assigned the code ${C_{k}^{i}}$ of S_i to the item (quotation) I_β. Recall that, from the exclusion principle for codes within a semantic domain, the judge J_α applied at most one code from S_i to I_β. If the judge J_α did not apply any code of S_i to I_β, we set ω_{α, β} = ∅. From this set of judgements ${\varOmega }_{cu}^{S_{i}} = \left \{\omega _{\alpha , \beta }\right \}$, cu-α is given as

$$ {cu\textrm{-}\alpha}^{S_{i}} = \alpha({\varOmega}_{cu}^{S_{i}}). $$

Remark 4

As explained in Remark 5 of Appendix B, for the computation of the observed and expected coincidence matrices, only items that received at least to evaluations with codes of S_i from two different judges count. In particular, if a quotation is not evaluated by any judge (irrelevant matter), received evaluations for other domains but not for S_i (matter that does not corresponds to the chosen domain) or only one judge assigned to it a code from S_i (singled-voted), the quotation plays no role in cu-α. This limitation might seem a bit cumbersome, but it could be explained by arguing that the presence/absence of S_i is measured by ${\alpha _{binary}}^{S_{i}}$ so it will be redundant to take it into account for ${cu\textrm {-}\alpha }^{S_{i}}$ too.

1.2.1 A.3 The coefficient C u-α

The last variation of Krippendorff’s α coefficient that we consider in this study is the so-called Cu-α coefficient. In contrast with the previous coefficients, this is a global measure of the goodness of the partition into semantic domains. Suppose that our codebook determines semantic domains S₁,…,S_s. As always, the set of considered items is the set of quotations, but the collection of labels is the set

$$ {\varLambda} = \left\{\mathcal{S}_{1}, \ldots, \mathcal{S}_{s}\right\}. $$

Semantically, they are labels representing the semantic codes of our codebook.

We assign labels to items as follows. Let 1 ≤ α ≤ n and 1 ≤ β ≤ m. Then, if $\sigma _{\alpha , \beta } = \left \{{C}_{j_{1}}^{i_{1}}, \ldots , {C}_{j_{p}}^{i_{p}}\right \}$, we set $\omega _{\alpha , \beta } = \left \{\mathcal {S}_{i_{1}}, \ldots , \mathcal {S}_{i_{p}}\right \}$. In other words, we label I_β with the labels corresponding the semantic domains chosen by judge J_α for this item, independently of the particular code. Observe that this is the first case in which the final evaluation Ω might be multivaluated. From this set of judgements, ${\varOmega }_{Cu} = \left \{\omega _{\alpha , \beta }\right \}$, Cu-α is given as

$$ {Cu\textrm{-}\alpha} = \alpha({\varOmega}_{Cu}). $$

In this way, Cu-α measures the degree of reliability in the decision of applying the different semantic domains, independently of the particular chosen code. Therefore, it is a global measure that quantifies the logical independence of the semantic domains and the ability of the judges of looking at the big picture of the matter, only from the point of view of semantic domains.

Appendix B: Universal Krippendorff’s α coefficient

In this appendix, we rephrase Krippendorff’s α for a wide class of judgements. This gives rise to a universal Krippendorff’s α coefficient formulated for assignments of ‘meta-codes’ called ‘labels’. This formulation is very useful for the unified formulation of several variants of α, as introduced in Appendix A. For an historical description of this coefficient, check (Krippendorff 2018).

In the context of Inter-Coder Agreement analysis, we are dealing with n > 1 different judges (also known as coders), denoted by J₁,…,J_n, as well as with a collection of m ≥ 1 items to be judged (also known as quotations), denoted I₁,…,I_m. We fix a set of N ≥ 1 admissible ‘meta-codes’, called labels, say ${\varLambda } = \left \{ l_{1},\ldots , l_{N}\right \}$. The task of each of the judges J_α is to assign, to each item I_β, a collection (maybe empty) of labels from Λ. Hence, as byproduct of the evaluation process, we get a set ${\varOmega } = \left \{\omega _{\alpha , \beta }\right \}$, for 1 ≤ α ≤ n and 1 ≤ β ≤ m, where $\omega _{\alpha , \beta } \subseteq {\varLambda }$ is the set of labels that the judge J_α assigned to the item I_β. Recall that ω_{α, β} is not a multiset, so every label appears in ω_{α, β} at most once.

From the collection of responses Ω, we can count the number of observed pairs of responses. For that, fix 1 ≤ i, j ≤ N and set

$$ o_{i,j} = \left|\left\{(\omega_{\alpha, \beta}, \omega_{\alpha^{\prime}, \beta}) \in {\varOmega}^{2} \left| \alpha^{\prime}\neq \alpha, \begin{array}{c} (l_{i} \in \omega_{\alpha, \beta} \textrm{ and } l_{j} \in \omega_{\alpha^{\prime}, \beta} ) \\ \text{or} \\ (l_{j} \in \omega_{\alpha, \beta} \text{ and } l_{i} \in\omega_{\alpha^{\prime}, \beta})\end{array} \right.\right\}\right|. $$

In other words, o_{i, j} counts the number of (ordered) pairs of responses of the form $(\omega _{\alpha , \beta }, \omega _{\alpha ^{\prime }, \beta }) \in {\varOmega } \times {\varOmega }$ that two different judges J_α and $J_{\alpha ^{\prime }}$ gave to the same item I_β and such that J_α included l_i in his response and $J_{\alpha ^{\prime }}$ included l_j in his response, or viceversa.

Remark 5

Suppose that there exists an item I_β that was judged by a single judge, say J_α. The other judges, $J_{\alpha ^{\prime }}$ for $\alpha ^{\prime } \neq \alpha $, did not vote it (or, in other words, they voted it as empty), so $\omega _{\alpha ^{\prime }, \beta } = \emptyset $. Then, this item I_β makes no contribution to the calculation of o_{i, j} since there is no other judgement to which ω_{α, β} can be paired. Hence, from the point of view of Krippendoff’s α, I_β is not considered. This causes some strange behaviours in the coefficients below that may seem counterintuitive.

From these counts, we construct the matrix of observed coincidences as $M_{o} = \left (o_{i,j}\right )_{i,j=1}^{N}$. By its very construction, M_o is a symmetric matrix. From this matrix, we set $t_{k} = {\sum }_{j=1}^{N} o_{k, j}$, which is (twice) the total number of times that the label l_k ∈Λ was assigned by any judged. Observe that $t = {\sum }_{k=1}^{N} t_{k}$ is the total number of judgments. In the case that each judge evaluates each item with a single non-empty label, we have t = nm.

On the other hand, we can construct the matrix of expected coincidences, $M_{e} = \left (e_{i,j}\right )_{i,j = 1}^{N}$, where

$$ e_{i,j} = \left\{\begin{array}{cc} \frac{t_{i}}{t}\frac{t_{j}}{t-1}t = \frac{t_{i}t_{j}}{t-1} & \textrm{if }i\neq j \\ & \\ \frac{t_{i}}{t}\frac{t_{i}-1}{t-1}t = \frac{t_{i}(t_{i}-1)}{t-1} & \textrm{if }i = j \end{array}\right. $$

The value of e_{i, j} might be though as the average number of times that we expect to find a pair (l_i,l_j), when the frequency of the label l_i is estimated from the sample as t_i/t. Again, M_e is a symmetric matrix.

Finally, let us fix a pseudo-metric $\delta : {\varLambda } \times {\varLambda } \to [0, \infty ) \subseteq \mathbf {R}$, i.e. a symmetric function satisfying the triangle inequality and with δ(l_i,l_i) = 0 for any l_i ∈Λ (recall that this is only a pseudo-metric since different labels at distance zero are allowed). This metric is given by the semantic of the analyzed problem and, thus, it is part of the data used for quantifying the agreement. The value δ(l_i,l_j) should be seen as a measure of how similar the labels l_i and l_j are. A common choice is so-called discrete metric, given by δ(l_i,l_j) = 0 if i = j and δ(l_i,l_j) = 1 otherwise. The discrete metric means that all the labels are equally separated. This is the underlying assumption that we will apply in our study. However, subtler metrics may be used for extracting more semantic information from the data (see Krippendorff et al. (2016)).

From these computations, we define the observed disagreement, D_o, and the expected disagreement, D_e, as

$$ D_{o} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} o_{i,j} \delta(l_{i}, l_{j}), D_{e} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} e_{i,j}\delta(l_{i}, l_{j}). $$

(1)

These quantities measure the degree of disagreement that is observed from Ω and the degree of disagreement that might be expected by judging randomly, respectively.

Remark 6

In the case of taking δ as the discrete metric, we have another interpretation of the disagreement. Observe that, in this case, since δ(l_i,l_i) = 0 we can write the disagreements as

$$ D_{o} = \underset{i\neq j}{\sum} o_{i,j} = t - \sum\limits_{i=1}^{N} o_{i,i}, D_{e} = \underset{i\neq j}{\sum} e_{i,j} = t - \sum\limits_{i=1}^{N} e_{i,i}. $$

The quantity $A_{o} = {\sum }_{i=1}^{N} o_{i,i}$ (resp. HCode $A_{e} = {\sum }_{i=1}^{N} e_{i,i}$) can be understood as the observed (resp. expected) agreement between the judges. In the same vein, $t = {\sum }_{i,j=1}^{N} o_{i,j} = {\sum }_{i,j=1}^{N} e_{i,j}$ may be seen as the maximum achievable agreement. Hence, in this context, the disagreement D_o (resp. D_e) is indeed the difference between the maximum possible agreement and the observed (resp. expected) agreement.

From these data, Krippendroff’s α coefficient is defined as

$$ \alpha = \alpha({\varOmega}) = 1 - \frac{D_{o}}{D_{e}}. $$

From this formula, observe we are following limitting values:

α = 1 is equivalent to D_o = 0 or, in other words, it means that there exists perfect agreement in the judgements among the judges.
α = 0 is equivalent to D_o = D_e, which means that the agreement observed between the judgements is entirely due to chance.

In this way, Krippendorff’s α can be interpreted as a measure of the degree of agreement that is achieved out of the chance. The bigger the α is, the better agreement is observed.

Remark 7

Observe that α < 0 may only be achieved if D_o > D_e, which means that there is even more disagreement than the one that could be expected by chance. This implies that the judges are, consistently, issuing different judgements for the same items. Thus, it evidences that there exists an agreement between the judges to not agree, that is, to fake the evaluations. On the other hand, as long as the metric δ is non-negative, D_o ≥ 0 and, thus, α ≤ 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Díaz, J., López-Fernández, D., Pérez, J. et al. Why are many businesses instilling a DevOps culture into their organization?. Empir Software Eng 26, 25 (2021). https://doi.org/10.1007/s10664-020-09919-3

Download citation

Accepted: 14 October 2020
Published: 02 March 2021
DOI: https://doi.org/10.1007/s10664-020-09919-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Why are many businesses instilling a DevOps culture into their organization?