Skip to main content
Log in

Why are many businesses instilling a DevOps culture into their organization?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Although this movement is relatively recent, there exist an intensive research around DevOps. However, the real reasons why companies move to DevOps and the results they expect to obtain have been paid little attention in real contexts.

Objective

This paper aims to help practitioners and researchers to better understand the context and the problems that many companies face day to day in their organizations when they try to accelerate software delivery and the main drivers that move these companies to adopting DevOps.

Method

We conducted an exploratory study by leveraging in depth, semi-structured interviews to relevant stakeholders of 30 multinational software-intensive companies, together with industrial workshops and observations at organizations’ facilities that supported triangulation. Additionally, we conducted an inter-coder agreement analysis, which is not usually addressed in qualitative studies in software engineering, to increase reliability and reduce authors bias of the drawn findings.

Results

The research explores the problems and expected outcomes that moved companies to adopt DevOps and reveals a set of patterns and anti-patterns about the reasons why companies are instilling a DevOps culture.

Conclusions

This study aims to strengthen evidence and support practitioners in making better informed about which problems trigger a DevOps transition and most common expected results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://www.devopsagileskills.org/, last accessed 2020/01/01.

  2. http://bit.ly/2ky00LQ, last accessed 2020/01/01.

  3. https://www.devops-spain.com/ last accessed 2020/01/01.

  4. http://bit.ly/2ky0eCG, last accessed 2020/01/01.

  5. https://devopsdays.org/events/2020-madrid/welcome/ last accessed 2020/01/01.

  6. Spanish Law 5/2015 indicates that a micro enterprise is one that has less than ten workers and an annual turnover of less than two million euros or a total asset of less than two million euros; a small company is one that has a maximum of 49 workers and a turnover or total assets of less than ten million euros; medium-sized companies are those with less than 250 workers and a turnover of less than fifty million euros or an asset of less than 43 million euros; and large companies are those that exceed these parameters.

  7. https://github.com/jdiazfernandez/DevOpsInPractice/blob/master/codebook.md, last accessed 01/01/2020

  8. http://bit.ly/2ky00LQ, last accessed 2020/01/01.

References

  • Allspaw J., Hammond P. (2009) 10+ deploys per day: Dev and ops cooperation at flickr. In: Velocity: web performance and operations conference

  • ATLAS.ti (2019) Scientific Software Development GmbH. Inter-coder agreement analysis: Atlas.ti 8. https://atlasti.com/. Last accessed: 2020-01-01

  • Beck K, Beedle M, Van Bennekum A, Cockburn A, Cunningham W, Fowler M, Grenning J, Highsmith J, Hunt A, Jeffries R. (2001) Manifesto for agile software development

  • Bosch J (2012) Building products as innovation experiment systems. In: Cusumano MA, Iyer B, Venkatraman N (eds) Software business, Springer, pp 27–39

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1):37–46

    Article  Google Scholar 

  • Corbin J, Strauss A (2008) Basics of qualitative research. (3rd ed): techniques and procedures for developing grounded theory. SAGE Publications, Inc

  • Creswell JW, Creswell JD (2017) Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications

  • Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: 2011 International symposium on empirical software engineering and measurement, IEEE, pp 275–284

  • de França BBN, Jeronimo H, Travassos GH (2016) Characterizing devops by hearing multiple voices. In: Proceedings of the 30th brazilian symposium on software engineering, SBES ’16, pp 53–62, New York. Association for Computing Machinery

  • Debois P (2008) Agile infrastructure and Operations: How infra-gile are you?. In: Agile 2008 Conference, pp 202–207

  • Díaz J, Almaraz R, Pérez J, Garbajosa J (2018) DevOps in practice. In: Proceedings of the 19th international conference on agile software development companion - XP 18 ACM Press

  • Díaz J, Perez JE, Yague A, Villegas A, de Antona A (2019) Devops in practice – a preliminary analysis of two multinational companies. In: Franch X, Männistö T, Martínez-Fernández S (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 323–330

  • Dingsøyr T, Nerur S, Balijepally V, Moe NB (2012) A decade of agile methodologies: Towards explaining agile software development. J Syst Softw 85(6):1213–1221

    Article  Google Scholar 

  • Dyck A, Penners R, Lichter H (2015) Towards definitions for release engineering and DevOps. In: 2015 IEEE/ACM 3rd international workshop on release engineering. IEEE

  • Easterbrook S, Singer J, Storey M-A, Damian D (2008) Selecting Empirical Methods for Software Engineering Research. Springer, London, pp 285–311

    Google Scholar 

  • Ebert C, Gallardo G, Hernantes J, Serrano N (2016) Devops. IEEE Softw 33(3):94–100

    Article  Google Scholar 

  • Erich F (2019) Devops is simply interaction between development and operations. In: Bruel J-M, Mazzara M, Meyer B (eds) Software engineering aspects of continuous development and new paradigms of software production and deployment. Springer International Publishing, Cham, pp 89–99

  • Erich F (2014). In: Amrit C, Daneva M, Jedlitschka A, Kuvaja P, Kuhrmann M, Männistö T, Münch J, Raatikainen M (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 277–280

  • Erich F, Amrit C, Daneva M (2017) A qualitative study of devops usage in practice. J Softw Evol Proc 29(6):e1885. e1885 smr 1885

    Article  Google Scholar 

  • Feitelson D, Frachtenberg E, Beck K (2013) Development and deployment at facebook. Internet Comput IEEE 17:8–17

    Article  Google Scholar 

  • Fitzgerald B, Stol K-J (2017) Continuous software engineering: a roadmap and agenda. J Syst Softw 123:176–189

    Article  Google Scholar 

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378

    Article  Google Scholar 

  • Floersch J, Longhofer JL, Kranke D, Townsend L (2010) Integrating thematic, grounded theory and narrative analysis: a case study of adolescent psychotropic treatment. Qual Soc Work 9(3):407–425

    Article  Google Scholar 

  • Fowler M (2013) Continuous delivery. https://martinfowler.com/bliki/ContinuousDelivery.html. Last Accessed: 2020-01-01

  • Friese S (2019) Qualitative data analysis with ATLAS. ti SAGE Publications Limited

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston

    MATH  Google Scholar 

  • Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89

    Article  Google Scholar 

  • Iden J, Tessem B, Päivärinta T (2011) Problems in the interplay of development and it operations in system development projects: a delphi study of norwegian it experts. Inf Softw Technol 53(4):394–406

    Article  Google Scholar 

  • Jabbari R, bin Ali N, Petersen K, Tanveer B (2016) What is devops? a systematic mapping study on definitions and practices. In: Proceedings of the scientific workshop proceedings of XP2016, XP ’16 Workshops, New York, NY, USA, Association for Computing Machinery

  • Jabbari R, bin Ali N, Petersen K, Tanveer B (2018) Towards a benefits dependency network for devops based on a systematic literature review. J Softw Evol Proc 30(11):e1957. e1957 smr 1957

    Article  Google Scholar 

  • Kim G, Debois P, Willis J, Humble J (2016) The devops handbook: how to create world-class agility, reliability, and security in technology organizations. IT Revolution Press

  • Krippendorff K (2004) Reliability in content analysis: Some common misconceptions and recommendations. Hum Commun Res 30(3):411–433

    Google Scholar 

  • Krippendorff K (2011) Computing krippendorff’s alpha-reliability. http://repository.upenn.edu/asc_papers/43. Last Accessed: 2020-01-01

  • Krippendorff K (2018) Content analysis: An introduction to its methodology, 4rd edition. Sage Publications

  • Krippendorff K, Mathet Y, Bouvry S, Widlöcher A. (2016) On the reliability of unitizing textual continua: Further developments. Quality & Quantity: International Journal of Methodology 50(6):2347–2364

    Article  Google Scholar 

  • Kuusinen K, Balakumar V, Jepsen SC, Larsen SH, Lemqvist TA, Muric A, Nielsen AØ, Vestergaard O (2018) A large agile organization on its journey towards devops. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA), IEEE, pp 60–63

  • Leite L, Kon F, Pinto G, P. Meirelles. (2020) Platform teams: An organizational structure for continuous delivery. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, ICSEW’20, pp 505–511, New York, NY, USA. Association for Computing Machinery

  • Leite L, Rocha C, Kon F, Milojicic D, Meirelles P (2019) A survey of DevOps concepts and challenges. ACM Comput Surv 52(6):35. https://doi.org/10.1145/3359981

    Article  Google Scholar 

  • Leppänen M, Mäkinen S, Pagels M, Eloranta V, Itkonen J, Mäntylä MV, Männistö T (2015) The highways and country roads to continuous deployment. IEEE Softw 32(2):64–72

    Article  Google Scholar 

  • Lethbridge TC, Sim SE, Singer J (2005) Studying software engineers: Data collection techniques for software field studies. Empir Softw Eng 10:311–341

    Article  Google Scholar 

  • Luz WP, Pinto G, Bonifácio R (2019) Adopting devops in the real world: A theory, a model, and a case study. J Syst Softw 110384:157

    Google Scholar 

  • Lwakatare LE, Kuvaja M, Pasiand O (2016) An exploratory study of devops - extending the dimensions of devops with practices. In: Proceedings the eleventh international conference on software engineering advances. pp 91–99

  • Lwakatare LE, Kuvaja M, Pasiand O (2016) Relationship of DevOps to Agile, Lean and Continuous Deployment. Springer International Publishing, Cham, pp 399–415

    Google Scholar 

  • Mathet Y, Widlöcher A, Métivier J-P (2015) The unified and holistic method gamma (γ) for inter-annotator agreement measure and alignment. Comput Linguist 41(3):437–479

    Article  MathSciNet  Google Scholar 

  • Miles MB, Huberman AM, Huberman MA, Huberman M (1994) Qualitative data analysis: An expanded sourcebook. Sage

  • Mäkinen S, Leppänen M, Kilamo T, Mattila A-L, Laukkanen E, Pagels M, Männistö T (2016) Improving the delivery cycle A multiple-case study of the toolchains in finnish software intensive enterprises. Inf Softw Technol 80:175–194

    Article  Google Scholar 

  • Parnin C, Helms E, Atlee C, Boughton H, Ghattas M, Glover A, Holman J, Micco J, Murphy B, Savor T, Stumm M, Whitaker S, Williams L (2017) The top 10 adages in continuous deployment. IEEE Softw 34 (3):86–95

    Article  Google Scholar 

  • Poppendieck M, Poppendieck T (2006) Implementing lean software development: from concept to cash (the addison-wesley signature series). Addison-Wesley professional

  • Rafi S, Yu W, Akbar MA (2020) Towards a hypothetical framework to secure devops adoption: grounded theory approach. In: Proceedings of the Evaluation and Assessment in Software Engineering, EASE ’20, page 457–462, New York, NY, USA. Association for Computing Machinery

  • Research and Assessment D (2018) Accelerate: state of devops 2018: strategies for a new economy. https://devops-research.com. Last Accessed: 2020-01-01

  • Riungu-Kalliosaari L, Mäkinen S, Lwakatare LE, Tiihonen J, Männistö T, Felderer M (2016) Devops adoption benefits and challenges in practice: A case study. In: Abrahamsson P, Jedlitschka A, Nguyen Duc A, Amasaki S, Mikkonen T (eds) Product-focused software process improvement. Springer International Publishing, Cham, pp 590–597

  • Rodríguez P, Mäntylä M, Oivo M, Lwakatare LE, Seppänen P, Kuvaja P (2019) Advances in using agile and lean processes for software development. In: Adv Comput. Elsevier, Amsterdam, pp 135–224

  • Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164

    Article  Google Scholar 

  • Salameh A, Bass JM (2019) Spotify tailoring for promoting effectiveness in cross-functional autonomous squads. In: Hoda R (ed) Agile processes in software engineering and extreme programming – workshops. Springer International Publishing, Cham, pp 20–28

  • Sánchez-Gordón M, Colomo-Palacios R (2018) Characterizing devops culture: A systematic literature review. In: Stamelos I, O’Connor RV, Rout T, Dorling A (eds) Software process improvement and capability determination. Springer International Publishing, Cham, pp 3–15

  • Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19(3):321–325. Accessed 2 Jan 2021. http://www.jstor.org/stable/2746450

    Article  Google Scholar 

  • Senapathi M, Buchan J, Osman H (2018) Devops capabilities, practices, and challenges: Insights from a case study. In: Proceedings of the 22nd international conference on evaluation and assessment in software engineering 2018, EASE’18, pp 57–67, New York, NY, USA. Association for Computing Machinery

  • Smeds J, Nybom K, Porres I (2015) Devops: A definition and perceived adoption impediments. In: Lassenius C, Dingsøyr T, Paasivaara M (eds) Agile processes in software engineering and extreme programming. Springer International Publishing, Cham, pp 166–177

  • Spearman C (1987) The proof and measurement of association between two things. Am J Psychol 100(3/4):441–471

    Article  Google Scholar 

  • Stamelos I (2010) Software project management anti-patterns. J Syst Softw 83(1):52–59

    Article  Google Scholar 

  • Thomas J, Harden A (2008) Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology 8 (1):45. https://doi.org/10.1186/1471-2288-8-45

    Article  Google Scholar 

  • Webteam P (2018) State of devops report 2018. https://puppet.com/resources/whitepaper/state-of-devops-report. Last Accessed: 2020-01-01

  • Wohlin C, Runeson P, Höst M., Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated

    Book  Google Scholar 

  • Yin RK (2018) Case study research and applications - design and methods, 6ed SAGE Publication

Download references

Acknowledgements

This research project is being performed thanks to Vass, Clarive, Autentia, Ebury, Carrefour, Vilt, IBM, AtSistemas, Entelgy, Analyticalways, Mango eBusiness, Adidas, Seur, Zooplus, as well as other participating companies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica Díaz.

Additional information

Communicated by: Emerson Murphy-Hill

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Theoretical framework for α coefficients

In this Appendix, we introduce a novel interpretation that unifies the different variants of the α coefficient into a common framework. These coefficients that may be found in the literature are presented as unrelated and a kind of ad hoc formulation for each problem is provided.

The key point of this section is that we will show that these versions can be translated to simpler and universal version of Krippendoff’s α. For this purpose, we will formulate the α coefficients in terms of some ‘meta-codes’, that we will call ‘labels’. In each situation, we will provide a well-defined algorithm that translates from semantic domains and codes (the units of judgment considered in thematic analysis) into labels. In this way, after this translation, all the coefficients reduce to the same mathematical computation of the universal α coefficient for labels.

In order to lighten this section, the mathematical formulation of this universal α coefficient has been moved to Appendix B. Nevertheless, for convenience, let us recall some notations introduced there that will be used along this section. We fix a finite set Λ of labels and we are dealing with a collection J1,…,Jn of judges that will evaluate a set of items I1,…,Im. Each of the judges, Jα, evaluates the item Iβ with a subset \(\omega _{\alpha , \beta } \subseteq {\varLambda }\). The result of the evaluation process is gathered in a set \({\varOmega } = \left \{\omega _{\alpha , \beta }\right \}\) for 1 ≤ αn and 1 ≤ βm.

From Ω, we can compute Krippendoff’s coefficient, α = α(Ω), which is a real number with 0 ≤ α ≤ 1. Such a quantity is interpreted as a measure of the degree of agreement that is achieved out of the chance. The bigger the α is, the better agreement is observed. A common rule-of-thumb in the literature (Krippendorff 2018) is that α ≥ 0.667 is the minimal threshold required for drawing conclusions from the data. For α ≥ 0.80, we can consider that there exists statistical evidence of reliability in the evaluations.

However, this ideal setting, as described in Appendix B, might be too restrictive for the purposes of content analysis (particularly, as applied by the Atlas.ti Software (ATLAS.ti 2019)). The most general setting of content analysis is as follows. We have a collection of s > 1 semantic domains, S1,…,Ss. A semantic domain defines a space of distinct concepts that share common meanings (for a concrete example, check our semantic domains in Section 4.1.1). Subsequently, each semantic domain embraces mutually exclusive concepts indicated by a code. Hence, for 1 ≤ is, the domain Si for 1 ≤ is, decomposes into ri ≥ 1 codes, that we will denote by \({C_{1}^{i}}, \ldots , C_{r_{i}}^{i}\). As pointed out in the literature, for design consistency these semantic domains must be logically or conceptually independent. This principle translates into the fact that there exist no shared codes between different semantic domains.

Now, the data under analysis (e.g. scientific literature, newspapers, videos, interviews) is divided into items, which in this context are known as quotations, that represent meaningful parts of the data by their own. The decomposition may be decided by each of the judges (so different judges may have different quotations) or it may be pre-established (for instance, by the codebook creator or the designer of the ICA study). In the later case, all the judges share the same quotations so they cannot modify the limits of the quotations and they should evaluate each quotation as a block. To enlighten the notation, we will suppose that we are dealing with this case of pre-established quotations. This is the setting of our study (see Section 3.4). Indeed, from a mathematical point of view, the former case can be reduced to this version by refining the data division of each judge to get a common decomposition into the same pieces.

Therefore, we will suppose that the data is previously decomposed into m ≥ 1 items or quotations, I1,…,Im. Observe that the union of all the quotations must be the whole matter so, in particular, irrelevant matter is also included as quotations. Now, each of the judges Jα, 1 ≤ αn, evaluates the quotation Ii, 1 ≤ im, assigning to Ii any number of semantic domains and, for each chosen semantic domain, one and only one code. No semantic domain may be assigned in the case that the judge considers that Iβ is irrelevant matter, and several domains can be applied to Iβ by the same judge.

Hence, as byproduct of the evaluation process, we obtain a collection of sets \({\varSigma } = \left \{\sigma _{\alpha , \beta }\right \}\), for 1 ≤ αn and 1 ≤ βm. Here, \(\sigma _{\alpha , \beta } = \left \{{C}_{j_{1}}^{i_{1}}, \ldots , {C}_{j_{p}}^{i_{p}}\right \}\) is the collection of codes that the judge Jα assigned to the quotation Iβ. The exclusion principle of the codes within the semantic domain means that the collection of chosen semantic domains i1,…,ip contains no repetitions.

Remark 2

To be precise, as proposed in Krippendorff et al. (2016), when dealing with a continuum of matter each of the quotations must be weighted by its length in the observed and expected coincidences matrices (see Appendix B). This length is defined as the amount of atomic units the quotation has (say characters in a text or seconds in a video). In this way, (dis)agreements in long quotations are more significant than (dis)agreements in short quotations. This can be easily incorporated to our setting just by refining the data decomposition to the level of units. In this way, we create new quotations having the length of an atomic unit. Each new atomic quotation is judged with the same evaluations as the old bigger quotation. In the coefficients introduced below, this idea has the mathematical effect that, in the sums of Equation (1, Appendix B, each old quotation appears as many times as atomic units it contains, which is the length of such quotation. Therefore, in this manner, the version explained here computes the same coefficient as in Krippendorff et al. (2016).

In order to quantify the degree of agreement achieved by the judges in the evaluations Σ, several variants of Krippendorff’s α are proposed in the literature (Krippendorff et al. 2016; Krippendorff 2018). For the purposes of this study, we will apply the variants described below.

1.1 A.1 The coefficient α binary

The first variation of the Krippendorff’s α coefficient is the so-called αbinary coefficient. This is a coefficient that must be computed on a specific semantic domain. Hence, let us fix a semantic domain Si for some fixed i with 1 ≤ is. The set of considered items to be judged is exactly the set of (prescribed) quotations I1,…,Im. However, the set of labels will have only two labels, that semantically represent ‘voted Si’ and ‘did not vote Si’. Hence, we take

$$ {\varLambda} = \left\{1, 0\right\}. $$

For the assignment of labels to items, the rule is as follows. For 1 ≤ αn and 1 ≤ βm, we set \(\omega _{\alpha ,\beta } = \left \{1\right \}\) if the judge Jα assigned some code of Si to the quotation Iβ (i.e. if \({C_{j}^{i}} \in \sigma _{\alpha , \beta }\) for some 1 ≤ jri) and \(\omega _{\alpha ,\beta } = \left \{0\right \}\) otherwise. Observe that, in particular, \(\omega _{\alpha ,\beta } = \left \{0\right \}\) if Jα considered that Iβ was irrelevant matter. From this set of evaluations, \({\varOmega }_{binary}^{S_{i}} = \left \{\omega _{\alpha , \beta }\right \}\), αbinary is given as

$$ \alpha_{binary}^{S_{i}} = \alpha({\varOmega}_{binary}^{S_{i}}). $$

In this way, the coefficient \(\alpha _{binary}^{S_{i}}\) can be a measure of the degree of agreement that the judges achieved when choosing to apply the semantic domain Si or not. A high value of \(\alpha _{binary}^{S_{i}}\) is interpreted as an evidence that the domain Si is clearly stated, its boundaries are well-defined and, thus, the decision of applying it or not is near to be deterministic. However, observe that it does not measure the degree of agreement in the application of the different codes within the domain Si. Hence, it may occur that the boundaries of the domain Si are clearly defined but the inner codes are not well chosen. This is not a task of the \(\alpha _{binary}^{S_{i}}\) coefficient, but of the \({cu\textrm {-}\alpha }^{S_{i}}\) coefficient explained below.

Remark 3

Empirically, we discovered that the semantic that the software Atlas.ti (ATLAS.ti 2019) applies for computing αbinary (and for the coefficient cu-α introduced in Section A.2) is the one explained in this section. However, to our understanding, this behavior is not consistent with the description provided in the corresponding user’s guide.

1.2 A.2 The coefficient c u-α

Another variation of the Krippendorff’s α coefficient is the so-called cu-α coefficient. As the previous variation, this is a coefficient that is computed for each semantic domain, say Si for some 1 ≤ is. Suppose that this semantic domain contains codes \({C}_{1}^{i}, \ldots , {C}_{r}^{i}\). As always, the set of considered items is the set of quotations. However, the collection of labels is now a set

$$ {\varLambda} = \left\{\mathcal{C}_{1}, \ldots, \mathcal{C}_{r}\right\}. $$

Semantically, they are labels that represent the codes of the chosen domain Si.

For the assignment of labels to items, the rule is as follows. For 1 ≤ αn and 1 ≤ βm, we set \(\omega _{\alpha ,\beta } = \mathcal {C}_{k}\) if the judge Jα assigned the code \({C_{k}^{i}}\) of Si to the item (quotation) Iβ. Recall that, from the exclusion principle for codes within a semantic domain, the judge Jα applied at most one code from Si to Iβ. If the judge Jα did not apply any code of Si to Iβ, we set ωα, β = . From this set of judgements \({\varOmega }_{cu}^{S_{i}} = \left \{\omega _{\alpha , \beta }\right \}\), cu-α is given as

$$ {cu\textrm{-}\alpha}^{S_{i}} = \alpha({\varOmega}_{cu}^{S_{i}}). $$

Remark 4

As explained in Remark 5 of Appendix B, for the computation of the observed and expected coincidence matrices, only items that received at least to evaluations with codes of Si from two different judges count. In particular, if a quotation is not evaluated by any judge (irrelevant matter), received evaluations for other domains but not for Si (matter that does not corresponds to the chosen domain) or only one judge assigned to it a code from Si (singled-voted), the quotation plays no role in cu-α. This limitation might seem a bit cumbersome, but it could be explained by arguing that the presence/absence of Si is measured by \({\alpha _{binary}}^{S_{i}}\) so it will be redundant to take it into account for \({cu\textrm {-}\alpha }^{S_{i}}\) too.

1.2.1 A.3 The coefficient C u-α

The last variation of Krippendorff’s α coefficient that we consider in this study is the so-called Cu-α coefficient. In contrast with the previous coefficients, this is a global measure of the goodness of the partition into semantic domains. Suppose that our codebook determines semantic domains S1,…,Ss. As always, the set of considered items is the set of quotations, but the collection of labels is the set

$$ {\varLambda} = \left\{\mathcal{S}_{1}, \ldots, \mathcal{S}_{s}\right\}. $$

Semantically, they are labels representing the semantic codes of our codebook.

We assign labels to items as follows. Let 1 ≤ αn and 1 ≤ βm. Then, if \(\sigma _{\alpha , \beta } = \left \{{C}_{j_{1}}^{i_{1}}, \ldots , {C}_{j_{p}}^{i_{p}}\right \}\), we set \(\omega _{\alpha , \beta } = \left \{\mathcal {S}_{i_{1}}, \ldots , \mathcal {S}_{i_{p}}\right \}\). In other words, we label Iβ with the labels corresponding the semantic domains chosen by judge Jα for this item, independently of the particular code. Observe that this is the first case in which the final evaluation Ω might be multivaluated. From this set of judgements, \({\varOmega }_{Cu} = \left \{\omega _{\alpha , \beta }\right \}\), Cu-α is given as

$$ {Cu\textrm{-}\alpha} = \alpha({\varOmega}_{Cu}). $$

In this way, Cu-α measures the degree of reliability in the decision of applying the different semantic domains, independently of the particular chosen code. Therefore, it is a global measure that quantifies the logical independence of the semantic domains and the ability of the judges of looking at the big picture of the matter, only from the point of view of semantic domains.

Appendix B: Universal Krippendorff’s α coefficient

In this appendix, we rephrase Krippendorff’s α for a wide class of judgements. This gives rise to a universal Krippendorff’s α coefficient formulated for assignments of ‘meta-codes’ called ‘labels’. This formulation is very useful for the unified formulation of several variants of α, as introduced in Appendix A. For an historical description of this coefficient, check (Krippendorff 2018).

In the context of Inter-Coder Agreement analysis, we are dealing with n > 1 different judges (also known as coders), denoted by J1,…,Jn, as well as with a collection of m ≥ 1 items to be judged (also known as quotations), denoted I1,…,Im. We fix a set of N ≥ 1 admissible ‘meta-codes’, called labels, say \({\varLambda } = \left \{ l_{1},\ldots , l_{N}\right \}\). The task of each of the judges Jα is to assign, to each item Iβ, a collection (maybe empty) of labels from Λ. Hence, as byproduct of the evaluation process, we get a set \({\varOmega } = \left \{\omega _{\alpha , \beta }\right \}\), for 1 ≤ αn and 1 ≤ βm, where \(\omega _{\alpha , \beta } \subseteq {\varLambda }\) is the set of labels that the judge Jα assigned to the item Iβ. Recall that ωα, β is not a multiset, so every label appears in ωα, β at most once.

From the collection of responses Ω, we can count the number of observed pairs of responses. For that, fix 1 ≤ i, jN and set

$$ o_{i,j} = \left|\left\{(\omega_{\alpha, \beta}, \omega_{\alpha^{\prime}, \beta}) \in {\varOmega}^{2} \left| \alpha^{\prime}\neq \alpha, \begin{array}{c} (l_{i} \in \omega_{\alpha, \beta} \textrm{ and } l_{j} \in \omega_{\alpha^{\prime}, \beta} ) \\ \text{or} \\ (l_{j} \in \omega_{\alpha, \beta} \text{ and } l_{i} \in\omega_{\alpha^{\prime}, \beta})\end{array} \right.\right\}\right|. $$

In other words, oi, j counts the number of (ordered) pairs of responses of the form \((\omega _{\alpha , \beta }, \omega _{\alpha ^{\prime }, \beta }) \in {\varOmega } \times {\varOmega }\) that two different judges Jα and \(J_{\alpha ^{\prime }}\) gave to the same item Iβ and such that Jα included li in his response and \(J_{\alpha ^{\prime }}\) included lj in his response, or viceversa.

Remark 5

Suppose that there exists an item Iβ that was judged by a single judge, say Jα. The other judges, \(J_{\alpha ^{\prime }}\) for \(\alpha ^{\prime } \neq \alpha \), did not vote it (or, in other words, they voted it as empty), so \(\omega _{\alpha ^{\prime }, \beta } = \emptyset \). Then, this item Iβ makes no contribution to the calculation of oi, j since there is no other judgement to which ωα, β can be paired. Hence, from the point of view of Krippendoff’s α, Iβ is not considered. This causes some strange behaviours in the coefficients below that may seem counterintuitive.

From these counts, we construct the matrix of observed coincidences as \(M_{o} = \left (o_{i,j}\right )_{i,j=1}^{N}\). By its very construction, Mo is a symmetric matrix. From this matrix, we set \(t_{k} = {\sum }_{j=1}^{N} o_{k, j}\), which is (twice) the total number of times that the label lkΛ was assigned by any judged. Observe that \(t = {\sum }_{k=1}^{N} t_{k}\) is the total number of judgments. In the case that each judge evaluates each item with a single non-empty label, we have t = nm.

On the other hand, we can construct the matrix of expected coincidences, \(M_{e} = \left (e_{i,j}\right )_{i,j = 1}^{N}\), where

$$ e_{i,j} = \left\{\begin{array}{cc} \frac{t_{i}}{t}\frac{t_{j}}{t-1}t = \frac{t_{i}t_{j}}{t-1} & \textrm{if }i\neq j \\ & \\ \frac{t_{i}}{t}\frac{t_{i}-1}{t-1}t = \frac{t_{i}(t_{i}-1)}{t-1} & \textrm{if }i = j \end{array}\right. $$

The value of ei, j might be though as the average number of times that we expect to find a pair (li,lj), when the frequency of the label li is estimated from the sample as ti/t. Again, Me is a symmetric matrix.

Finally, let us fix a pseudo-metric \(\delta : {\varLambda } \times {\varLambda } \to [0, \infty ) \subseteq \mathbf {R}\), i.e. a symmetric function satisfying the triangle inequality and with δ(li,li) = 0 for any liΛ (recall that this is only a pseudo-metric since different labels at distance zero are allowed). This metric is given by the semantic of the analyzed problem and, thus, it is part of the data used for quantifying the agreement. The value δ(li,lj) should be seen as a measure of how similar the labels li and lj are. A common choice is so-called discrete metric, given by δ(li,lj) = 0 if i = j and δ(li,lj) = 1 otherwise. The discrete metric means that all the labels are equally separated. This is the underlying assumption that we will apply in our study. However, subtler metrics may be used for extracting more semantic information from the data (see Krippendorff et al. (2016)).

From these computations, we define the observed disagreement, Do, and the expected disagreement, De, as

$$ D_{o} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} o_{i,j} \delta(l_{i}, l_{j}), D_{e} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} e_{i,j}\delta(l_{i}, l_{j}). $$
(1)

These quantities measure the degree of disagreement that is observed from Ω and the degree of disagreement that might be expected by judging randomly, respectively.

Remark 6

In the case of taking δ as the discrete metric, we have another interpretation of the disagreement. Observe that, in this case, since δ(li,li) = 0 we can write the disagreements as

$$ D_{o} = \underset{i\neq j}{\sum} o_{i,j} = t - \sum\limits_{i=1}^{N} o_{i,i}, D_{e} = \underset{i\neq j}{\sum} e_{i,j} = t - \sum\limits_{i=1}^{N} e_{i,i}. $$

The quantity \(A_{o} = {\sum }_{i=1}^{N} o_{i,i}\) (resp. HCode \(A_{e} = {\sum }_{i=1}^{N} e_{i,i}\)) can be understood as the observed (resp. expected) agreement between the judges. In the same vein, \(t = {\sum }_{i,j=1}^{N} o_{i,j} = {\sum }_{i,j=1}^{N} e_{i,j}\) may be seen as the maximum achievable agreement. Hence, in this context, the disagreement Do (resp. De) is indeed the difference between the maximum possible agreement and the observed (resp. expected) agreement.

From these data, Krippendroff’s α coefficient is defined as

$$ \alpha = \alpha({\varOmega}) = 1 - \frac{D_{o}}{D_{e}}. $$

From this formula, observe we are following limitting values:

  • α = 1 is equivalent to Do = 0 or, in other words, it means that there exists perfect agreement in the judgements among the judges.

  • α = 0 is equivalent to Do = De, which means that the agreement observed between the judgements is entirely due to chance.

In this way, Krippendorff’s α can be interpreted as a measure of the degree of agreement that is achieved out of the chance. The bigger the α is, the better agreement is observed.

Remark 7

Observe that α < 0 may only be achieved if Do > De, which means that there is even more disagreement than the one that could be expected by chance. This implies that the judges are, consistently, issuing different judgements for the same items. Thus, it evidences that there exists an agreement between the judges to not agree, that is, to fake the evaluations. On the other hand, as long as the metric δ is non-negative, Do ≥ 0 and, thus, α ≤ 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Díaz, J., López-Fernández, D., Pérez, J. et al. Why are many businesses instilling a DevOps culture into their organization?. Empir Software Eng 26, 25 (2021). https://doi.org/10.1007/s10664-020-09919-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09919-3

Keywords

Navigation