Skip to main content
Log in

Cross-status communication and project outcomes in OSS development

A language style matching perspective

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

The success of an open source software (OSS) project requires effective communication among its members. Given that OSS projects often have established social status systems, such communication may happen between individuals of different statuses, particularly, elite developers with project management privileges and ordinary project contributors. They communicate with each other on many essential activities, e.g., bug fixing, code review, etc., thus having profound influences on project outcomes.

Objectives

We seek to develop an understanding of cross-status communication from a perspective of language style matching among developers of different status, and its relationships with an OSS project’s outcomes in terms of productivity and quality.

Method

We approach the above research objectives with the language style matching (LSM) tool, which measures the similarities of cross-status communication in multiple language style features. We first dynamically identify elite developers having project administration privileges for each sampled project. Then, we capture the cross-status communication between elite and non-elite developers; and calculate the LSM features of these two groups of individuals. The LSM variables, together with project outcomes, were used to fit regression models to analyze potential relationships between cross-status communication’s language matching and project outcomes.

Results

Using over 275,000 collected conversations, our analyses yield rich insights into cross-status communication in open source development. First, our results reveal that the elite and non-elite developers exhibit quite similar linguistic patterns in using certain categories of words. Second, we explore the relationships between linguistic similarity in cross-status communication and project outcomes. The regression results are generally negative, indicating there might be very limited significant relationships between cross-status communication’s language matching and project outcomes, with a few exceptions.

Limitations

The study has several limitations. First, it considers projects hosted on GitHub only. Second, to ensure data availability, our sample is drawn from top projects, thus not representing all projects. Third, we only consider a limited number of linguistic features, and indicators for project outcomes.

Registered Report

This study is developed from the registered report available at: https://arxiv.org/abs/2104.05538. This registered report was accepted at the MSR 2021 Registered Reports Track.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The datasets that support the findings of this study are available in the GitHub repository: https://github.com/HAAAAAN/cross-status-lsm.

Notes

  1. Originally, we sample 200 projects, however, 34 are removed for not having enough cross-status conversations. For validity considerations, we do not try to substitute the removed project through another round of random sampling, and use the data of the left 166 projects in data analyses (see Section 4.1).

  2. https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization

  3. A manual procedure is performed to distinguish software development projects from other projects, e.g., tutorial collections.

  4. Access via Google BigQuery: https://console.cloud.google.com/bigquery?project=githubarchive &page=project &pli=1.

  5. The current version of the GitHub API only allows us to access 300 events or events in up to the past 90 days, whichever is met first.

  6. This category includes the following languages:

  7. Note that the control variables “Programming Language” and “Domain” are represented by a set of dummy variables because they are categorical variables.

  8. The Cohen’s \(f^2\)s are calculated and interpreted according to Steiger (2004). Note that Cohen’s \(f^2\)s is relative to the control model (\(Model_0\)), rather than the null model. Thus, we take \(0.1^2 = 0.01\), \(0.25^2=0.06\) and \(0.4^2 = 0.16\) as the threshold for small, medium, and large, respectively.

References

  • Aberdour M (2007) Achieving quality in open-source software. IEEE Softw 24(1):58–64

    Article  Google Scholar 

  • Al Omran FNA, Treude C (2017) Choosing an nlp library for analyzing software documentation: a systematic literature review and a series of experiments. In: Proc. MSR’17, IEEE, pp 187–197

  • Alrashedy K, Dharmaretnam D, German DM, Srinivasan V, Gulliver TA (2020) Scc++: Predicting the programming language of questions and snippets of stack overflow. J Syst Softw 162(110505):1–11

  • Babcock MJ, Ta VP, Ickes W (2014) Latent semantic similarity and language style matching in initial dyadic interactions. J Lang Soc Psychol 33(1):78–88

    Article  Google Scholar 

  • Bacharach SB, Bamberger P, Mundell B (1993) Status inconsistency in organizations: From social hierarchy to stress. J Organ Behav 14(1):21–36

    Article  Google Scholar 

  • Barker RT (1973) Gower K (2010) Strategic application of storytelling in organizations: Toward effective communication in a diverse world. J Bus Commun 47(3):295–312

    Article  Google Scholar 

  • Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack overflow. Empir Softw Eng 19(3):619–654

    Article  Google Scholar 

  • Bayram AB, Ta VP (2019) Diplomatic chameleons: Language style matching and agreement in international diplomatic negotiations. Negot Conflict Manag Res 12(1):23–40

    Article  Google Scholar 

  • Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: Proc. ICPC’10, pp 124–133

  • Bhatt P, Ahmad AJ, Roomi MA (2016) Social innovation with open source software: User engagement and development challenges in india. Technovation 52:28–39

    Article  Google Scholar 

  • Bianchi AJ, Kang SM, Stewart D (2012) The organizational selection of status characteristics: Status evaluations in an open source community. Organ Sci 23(2):341–354

    Article  Google Scholar 

  • Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: Proc. FSE’08, pp 24–35

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories, IEEE, MSR’09, pp 1–10

  • boyd d, Crawford K, (2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15:662–679

  • Bunderson JS, Reagans RE (2011) Power, status, and learning in organizations. Organ Sci 22(5):1182–1194

    Article  Google Scholar 

  • Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382

    Article  Google Scholar 

  • Calefato F, Lanubile F, Vasilescu B (2019) A large-scale, in-depth analysis of developers’ personalities in the apache ecosystem. Inf Softw Technol 114:1–20

    Article  Google Scholar 

  • Cannava K, Bodie GD (2017) Language use and style matching in supportive conversations between strangers and friends. J Soc Pers Relatsh 34(4)

  • Chan D (2006) Interactive effects of situational judgment effectiveness and proactive personality on work perceptions and work outcomes. J Appl Psychol 91(2):475–481

    Article  Google Scholar 

  • Chartrand TL, Bargh JA (1999) The chameleon effect: The perceptional behavior link and social interaction. J Pers Soc Psychol 76(6):893–910

    Article  Google Scholar 

  • Chen CY, Hsu PY, Vu HN (2022) Collaborative process tailoring in evolutionary software development: a teamwork-quality perspective. Softw Qual J 1–31

  • Chung C, Pennebaker JW (2007) The psychological functions of function words. In: Fiedler K (ed) Social Communication. Psychology Press, pp 343–359

    Google Scholar 

  • Conway ME (1968) How do committees invent. Datamation 14(4):28–31

    Google Scholar 

  • Cooper N, Bernal-Cárdenas C, Chaparro O, Moran K, Poshyvanyk D (2021) It takes two to tango: Combining visual and textual information for detecting duplicate video-based bug reports. In: Proc. ICSE’21, IEEE, pp 957–969

  • Cosentino V, Izquierdo JLC, Cabot J (2016) Findings from github: methods, datasets and limitations. In: Proc. MSR’16, IEEE, pp 137–141

  • Cowls J, Schroeder R (2015) Causation, correlation, and big data in social science research. Policy Internet 7:447–472. https://doi.org/10.1002/poi3.100

    Article  Google Scholar 

  • Crowston K, Wei K, Li Q, Howison J (2006) Core and periphery in free/libre and open source software team communications. In: Proc. HICSS ’06, pp 118:1–10

  • Crowston K, Wei K, Howison J, Wiggins A (2008) Free/libre open-source software development: What we know and what we do not know. ACM Comput Surv (CSUR) 44(2):1–35

    Article  Google Scholar 

  • Danescu-Niculescu-Mizil C, Gamon M, Dumais S (2011) Mark my words! linguistic style accommodation in social media. In: Proc. WWW’11, pp 745–754

  • Ducheneaut N (2005) Socialization in an open source software community: A socio-technical analysis. Comput Supported Coop Work 14(4):323–368

    Article  Google Scholar 

  • Einav L, Levin J (2014) Economics in the age of big data. Science 346:1243089. https://doi.org/10.1126/science.1243089

    Article  Google Scholar 

  • El Mezouar M, Zhang F, Zou Y (2019) An empirical study on the teams structures in social coding using github projects. Empir Softw Eng 24(6):3790–3823

    Article  Google Scholar 

  • Fielding RT (1999) Shared leadership in the apache project. Commun ACM 42(4):42–43

    Article  Google Scholar 

  • Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Proc. ESEC/FSE’15, p 829–841

  • Germonprez M, Kendall JE, Kendall KE, Mathiassen L, Young B, Warner B (2017) A theory of responsive design: A field study of corporate engagement with open source communities. Inf Syst Res 28(1):64–83

    Article  Google Scholar 

  • Gonzales AL, Hancock JT, Pennebaker JW (2010) Language style matching as a predictor of social dynamics in small groups. Commun Res 37(1):3–19

    Article  Google Scholar 

  • Han Y (2020) Understanding developers’ linguistic behaviors in hierarchical open source communities. In: Proc. ECSCW’20, European Society for Socially Embedded Technologies (EUSSET), pp 1–5

  • He J, Xu L, Yan M, Xia X, Lei Y (2020) Duplicate bug report detection using dual-channel convolutional neural networks. In: Proc. ICPC’20, pp 117–127

  • Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: Windowed developer topic analysis. In: Proc. ICSM’09, IEEE, pp 339–348

  • Hou Y, Wang D (2017) Hacking with npos: collaborative analytics and broker roles in civic data hackathons. Proc ACM Hum-Comput Interact 1(CSCW):1–16

  • Idri A, Abran A, Khoshgoftaar TM (2002) Estimating software project effort by analogy based on linguistic values. In: Proc. Mtrics’02, IEEE, pp 21–30

  • Imtiaz N, Middleton J, Girouard P, Murphy-Hill E (2018) Sentiment and politeness analysis tools on developer discussions are unreliable, but so are people. In: Proc. SEmotion’18, IEEE, pp 55–61

  • Ireland ME, Henderson MD (2014) Language style matching, engagement, and impasse in negotiations. Negot Conflict Manag Res 7(1):1–16

    Article  Google Scholar 

  • Ireland ME, Slatcher RB, Eastwick PW, Scissors LE, Finkel EJ, Pennebaker JW (2011) Language style matching predicts relationship initiation and stability. Psychol Sci 22(1):39–44

    Article  Google Scholar 

  • Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: An empirical study on count and network metrics. In: Proc. ICSE’17, pp 164–174

  • Jongeling R, Datta S, Serebrenik A (2015) Choosing your weapons: On sentiment analysis tools for software engineering research. In: Koschke R, Krinke J, Robillard MP (eds) 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29 - October 1, 2015, IEEE Computer Society, pp 531–535. https://doi.org/10.1109/ICSM.2015.7332508

  • Kacewicz E, Pennebaker JW, Davis M, Jeon M, Graesser AC (2014) Pronoun use reflects standings in social hierarchies. J Lang Soc Psychol 33(2):125–143

    Article  Google Scholar 

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proc. MSR’14, ACM, New York, pp 92–101. https://doi.org/10.1145/2597073.2597074

  • Kaur R, Chahal KK, Saini M (2022) Analysis of factors influencing developers’ sentiments in commit logs: Insights from applying sentiment analysis. Inform Softw Eng J 16(1). https://doi.org/10.37190/e-inf220102

  • Kavaler D, Sirovica S, Hellendoorn V, Aranovich R, Filkov V (2017a) Perceived language complexity in github issue discussions and their effect on issue resolution. In: Proc. ASE’17), pp 72–83

  • Kavaler D, Sirovica S, Hellendoorn V, Aranovich R, Filkov V (2017b) Perceived language complexity in github issue discussions and their effect on issue resolution. In: IEEE/ACM International Conference on Automated Software Engineering

  • Kim S, Whitehead EJ (2006) How long did it take to fix bugs? In: Proc. MSR’06. p 173–174

  • Ko AJ, Myers BA, Chau DH (2006) A linguistic analysis of how people describe software problems. In: Proc. VL/HCC’06, IEEE, pp 127–134

  • Kovacs B, Kleinbaum AM (2020) Language-style similarity and social networks. Psychol Sci 31(2):202–213

    Article  Google Scholar 

  • Levendel Y (1990) Reliability analysis of large software systems: Defect data modeling. IEEE Trans Softw Eng 16(2):141–152

    Article  Google Scholar 

  • Levesque LL, Wilson JM, Wholey DR (2001) Cognitive divergence and shared mental models in software development project teams. J Organ Behav Int J Ind Occup Organ Psychol Behav 22(2):135–144

    Google Scholar 

  • Levina N, Arriaga M (2014) Distinction and status production on user-generated content platforms: Using bourdieu’s theory of cultural production to understand social dynamics in online fields. Inf Syst Res 25(3):468–488

    Article  Google Scholar 

  • Liao J, Yang G, Kavaler D, Filkov V, Devanbu P (2019) Status, identity, and language: A study of issue discussions in github. PLoS ONE 14(6):e0215059

    Article  Google Scholar 

  • Lin B, Robles G, Serebrenik A (2017) Developer turnover in global, industrial open source projects: Insights from applying survival analysis. In: Proc. ICGSE’17, pp 66–75

  • Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go? In: Proc. ICSE’18, pp 94–104

  • Lord SP, Sheng E, Imel ZE, Baer J, Atkins DC (2015) More than reflections: Empathy in motivational interviewing includes language style synchrony between therapist and client. Behav Ther 46(3):296–303

    Article  Google Scholar 

  • Mair P, Hofmann E, Gruber K, Hatzinger R, Zeileis A, Hornik K (2015) Motivation, values, and work design as drivers of participation in the r open source project for statistical computing. Proc Natl Acad Sci 112(48):14788–14792

    Article  Google Scholar 

  • Mangalaraj G, Nerur S, Mahapatra R, Price KH (2014) Distributed cognition in software design: An experimental investigation of the role of design patterns and collaboration. MIS Q 38(1):249–274

    Article  Google Scholar 

  • Markowitz DM (2018) Academy awards speeches reflect social status, cinematic roles, and winning expectations. J Lang Soc Psychol 37(3):376–387

    Article  Google Scholar 

  • Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proc. CSCW’13, pp 117–128

  • Mockus A, Herbsleb J (2002) Expertise browser: a quantitative approach to identifying expertise. In: Proceedings of the 24th International Conference on Software Engineering (ICSE 2002), pp 503–512

  • Mustansir A, Shahzad K, Malik MK (2022) Towards automatic business process redesign: an nlp based approach to extract redesign suggestions. Autom Softw Eng 29(1):1–24

    Article  Google Scholar 

  • Niederhoffer KG, Pennebaker JW (2002) Linguistic style matching in social interaction. J Lang Soc Psychol 21(4):337–360

    Article  Google Scholar 

  • Nisbett RE, Peng K, Choi I, Norenzayan A (2001) Culture and systems of thought: holistic versus analytic cognition. Psychol Rev 108(2):291–310

    Article  Google Scholar 

  • Pan K, Kim S, Whitehead EJ (2009) Toward an understanding of bug fix patterns. Empir Softw Eng 14(3):286–315

    Article  Google Scholar 

  • Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates 71(2001):2001

  • Pennebaker JW, Chung CK, Frazee J, Lavergne GM, Beaver DI (2014) When small words foretell academic success: The case of college admissions essays. PLoS ONE 9(12):e115844

    Article  Google Scholar 

  • Piazza A, Castellucci F (2014) Status in organization and management theory. J Manag 40(1):287–315

    Google Scholar 

  • Rains SA (2016) Language style matching as a predictor of perceived social support in computer-mediated interaction among individuals coping with illness. Commun Res 43(5):694–712

    Article  Google Scholar 

  • Richardson BH, Taylor PJ, Snook B, Conchie SM, Bennell C (2014) Language style matching and police interrogation outcomes. Law Hum Behav 38(4):357

    Article  Google Scholar 

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proc. ICSE’07, IEEE, pp 499–510

  • Savage M, Burrows R (2007) The coming crisis of empirical sociology. Sociology 41(5):885–899. https://doi.org/10.1177/0038038507080443

    Article  Google Scholar 

  • Sawyer S, Farber J, Spillers R (1997) Supporting the social processes of software development. Inf Technol People

  • Scacchi W (2004) Free and open source development practices in the game community. IEEE Softw 21(1):59–66

    Article  Google Scholar 

  • Sedgwick P (2014) Unit of observation versus unit of analysis. BMJ 348:g3840

    Article  Google Scholar 

  • Shi W, Zhang Y, Hoskisson RE (2019) Examination of ceo-cfo social interaction through language style matching: Outcomes for the cfo and the organization. Acad Manag J 62(2):383–414

    Article  Google Scholar 

  • Silva CC, Galster M, Gilson F (2021) Topic modeling in software engineering research. Empir Softw Eng 26(6):1–62

    Article  Google Scholar 

  • Steel DG (1996) Making unit-level inferences from aggregated data. Surv Methodol 22

  • Steiger JH (2004) Beyond the f test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol Methods 9(2):164–182

    Article  Google Scholar 

  • Steinmacher I, Conte T, Gerosa MA, Redmiles D (2015) Social barriers faced by newcomers placing their first contribution in open source software projects. In: Proc. CSCW’15, pp 1379–1392

  • Stewart D (2005) Social status in an open-source community. Am Sociol Rev 70(5):823–842

    Article  Google Scholar 

  • Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: Liwc and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54

    Article  Google Scholar 

  • Trainer EH, Kalyanasundaram A, Chaihirunkarn C, Herbsleb JD (2016) How to hackathon: Socio-technical tradeoffs in brief, intensive collocation. In: Proc. CSCW’16, pp 1118–1130

  • Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in github. In: Proc. ICSE’14, pp 356–366

  • Vale G, Schmid A, Santos AR, De Almeida ES, Apel S (2020) On the relation between github communication activity and merge conflicts. Empir Softw Eng 25(1):402–433

    Article  Google Scholar 

  • Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proc. ESEC/FSE’15, pp 805–816

  • Von Krogh G, Spaeth S, Lakhani KR (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32(7):1217–1241

    Article  Google Scholar 

  • Wang Y (2019) Emotions extracted from text vs. true emotions–an empirical evaluation in se context. In: Proc. ASE’19, IEEE, pp 230–242

  • Wang Y (2020) The price of being polite: politeness, social status, and their joint impacts on community q &a efficiency. J Comput Soc Sci 1–22

  • Wang Z, Feng Y, Wang Y, Jones JA, Redmiles D (2020) Unveiling elite developers’ activities in open source projects. ACM Trans Softw Eng Methodol (TOSEM) 29(3):1–35

    Google Scholar 

  • Wolf T, Schroter A, Damian D, Nguyen T (2009) Predicting build failures using social network analysis on developer communication. In: Proc. ICSE’09, IEEE, pp 1–11

  • Wu Y, Wang S, Bezemer C, Inoue K (2019) How do developers utilize source code from stack overflow? Empir Softw Eng 24(2):637–673. https://doi.org/10.1007/s10664-018-9634-5

  • Xuan Q, Gharehyazie M, Devanbu PT, Filkov V (2012) Measuring the effect of social communications on individual working rhythms: A case study of open source software. In: Proc. Socialinfo’12, IEEE, pp 78–85

  • Xuan Q, Devanbu P, Filkov V (2016) Converging work-talk patterns in online task-oriented communities. PLoS ONE 11(5):e0154324

    Article  Google Scholar 

  • Zhang Y, Wang H, Yin G, Wang T, Yu Y (2015) Exploring the use of@-mention to assist software development in github. In: Proceedings of the 7th Asia-pacific symposium on internetware, pp 83–92

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers in both the registered report phase and the paper submission phase. This work is partially supported by National Natural Science Foundation of China under the grant 62172049, and the Fundamental Research Funds for the Central Universities, Beijing University of Posts and Telecommunications

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Wang.

Ethics declarations

Conflicts of interest

All authors listed above declare that they have no conflicts of interest.

Additional information

Communicated by David Lo, Tegawendé F. Bissyandé.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Wang, Z., Feng, Y. et al. Cross-status communication and project outcomes in OSS development. Empir Software Eng 28, 78 (2023). https://doi.org/10.1007/s10664-023-10298-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10298-8

Keywords

Navigation