Introduction

Open source developers have become central actors in the political economy of artificial intelligence (AI). The rise of open source AI, specifically the emergent practice of releasing, fine-tuning, and openly developing pre-trained models that are freely available,Footnote 1 has extended open science practices crucial to AI advances [3, 4], including the development of open source software (OSS)Footnote 2 and the provision of open access to research [5] and datasets [6,7,8]. Open source AI has attracted attention as a potential challenger to the dominance of a few well-funded startups and Big Tech companies in AI research and development (R &D) [9, 10]. Grassroots initiatives like EleutherAI [11], BigScience [12], and BigCode [13] have shown the feasibility of open model development [14], while the Hugging Face (HF) Hub has emerged as a popular platform used by millions to host, download, and collaborate on a growing number of models, datasets, and spaces (i.e., web applications to demonstrate and try out models) [15].

While the benefits and risks of open source AI have been widely debated [16,17,18,19,20], the practices and processes involved in open model development have received relatively little attention. To date, only a handful of scholars have explored various aspects of open model development, including user contributions to grassroots initiatives [12, 14], commercial participation in model development [14, 21], model maintenance practices [22], and the processes and tools used by open data engineering communities [23].

We contribute to this nascent research agenda with a three-part quantitative analysis of development activity on the HF Hub. First, we investigate typical patterns of various types of activity on the HF Hub in 348,181 model, 65,761 dataset, and 156,642 space repositories (RQ1). Subsequently, we apply social network analysis (SNA) of code contributions to model repositories to investigate the social network structure of the developer community as well as collaboration practices amongst developers (RQ2). We replicate this analysis for models in the sub-fields of natural language processing (NLP), computer vision (CV), and multimodal (MM) for comparative analysis. Finally, we quantify model adoption through the lens of model usage in spaces on the HF Hub (RQ3), providing insights into the widespread use of a minority of models in the HF Hub developer community and the key actors driving their development.

Overall, our analysis reveals that various aspects of development activity on the HF Hub—e.g., interactions in model, dataset, and space repositories; collaboration in model repositories; and model adoption in spaces—exhibit right-skewed, Pareto distributions, which is a well-documented pattern in OSS development [24,25,26,27,28]. While the open model development life-cycle involves unique practices which differ from OSS development [22], such as model training and fine-tuning, the observed similarities in the overall patterns of activity suggests that future research on open source AI can benefit from drawing on the extensive, multidisciplinary literature on the social dynamics of OSS development. Based on our findings, we propose a number of recommendations for researchers, policymakers, and platform providers to facilitate research and evidence-based discussions on open source AI.

The paper has the following structure. First, the literature review provides an overview of prior work on open source AI, as well as prior work on OSS development in order to draw comparisons between OSS and open model development practices. Second, we presents the RQs and research design. Third, we introduce the main findings from the three-part analysis. Fourth, we discuss the contributions of the findings to research and practice, and make recommendations for research and practice. We conclude with a discussion of what further clarification of the practices in open model development can offer for (open source) AI researchers, developers, policymakers, and platform providers.

Related work

“We have no moat”: The emergence of open models

Open science practices, from the development of open source software (OSS) to the provision of open access to research (e.g., via arXiv [5]) and datasets (e.g., via Kaggle [6], ImageNet [8], or Common Crawl [7]), have been integral to advances in AI R &D and adoption [4, 29]. The culture and norms of openness in AI has evolved significantly in the last 15 years [30]. For example, in 2007, a coalition of 16 researchers lamented the lack of OSS that standardised the implementation of ML algorithms, highlighting this as a major obstacle to advances and reproducibility in AI research [31]. Yet today AI R &D is simply unimaginable without OSS [3, 32], drawing on a growing commons of over 300 OSS libraries [33], hundreds of thousands of open models [34], and over a million OSS repositories [35].

Following years of debate about the safety of openly releasing AI models [17, 18, 36, 37], recent years have seen the emergence and proliferation of “open” models, which individuals and organisations have shared on an open access basis on platforms such as the HF Hub [4]. Prior to this, AI models, in particular large language models (LLMs), were principally developed and maintained behind closed doors, albeit with open science practices, such as the sharing of publications on arXiv and code on platforms like GitHub. The start of this trend is attributed to EleutherAI, a grassroots research group, which formed on a Discord server with the intention to develop and release an open source variant to OpenAI’s GPT, resulting in The Pile in December 2020 [38], a library of datasets for training LLMs, and GPT-Neo in March 2021 [39]. Subsequently, open models gained more visibility with the release of other state-of-the-art AI models [10], including BLOOM by the BigScience workshop in July 2022 [40], Stable Diffusion by Stability AI in August 2022 [41], and LLaMA 2 by Meta in July 2023 [42], amongst others.

The proliferation of open models, especially foundation models, has ignited heated debate about their potential benefits and risks [16,17,18,19,20, 43]. On the one hand, open models are said to promise benefits for research, innovation, and competition by lowering entry barriers and widening access to state-of-the-art AI [44]. Drawing on Linus’ Law from OSS development that “given enough eyeballs, all bugs are shallow” [45] , proponents argue that open model development and auditing offers safety advantages [46]. In addition, open access to models lowers the barriers for adaptability and customisation for diverse language contexts [18, 47]. On the other hand, open models can pose risks of harm by both well-intended and malicious actors, including the creation of deepfakes [48,49,50], disinformation [51, 52], and malware [53, 54]. A study by 25 experts concluded that open models have five distinctive properties that present both benefits and risks: broader access, greater customisability, local adaptation and inference ability, the inability to rescind model access, and the inability to monitor or moderate model usage [18].

The development of open models has been described as a potential challenge to the dominance of Big Tech companies in AI R &D [9, 55]. This was underlined by a leaked Google memo that claimed there is “no moat around closed-source AI development” and “open source solutions will out-compete companies like Google or OpenAI” [56]. Venture capitalists have bullishly invested in open source AI startups [57, 58], and world leaders like President Macron of France have pledged public funds to support open source AI [59]. In addition, the Mozilla Foundation has launched mozilla.ai with $ 30 million in investment to build a trustworthy, independent, and open source AI ecosystem “outside of Big Tech and academia” [60]. While proponents champion open models as good news for innovation and competition, others temper this optimism by pointing to market concentrations at several layers of the AI stack, from chips to cloud compute infrastructure, which remain unchallenged by innovations stemming from open source AI communities [21, 61, 62].

A myriad of meanings are attached to “open models” and “open source AI”. Oftentimes these terms are understood as making pre-trained models, parameters (or “weights”), and documentation available on platforms like the HF Hub. In some cases, they refer to open collaboration on the development of models [14]. The description of open models as “open source” has been fiercely contested for failing to meet OSS standards as defined by the OSI

[2, 4, 63, 64]. For example, when Meta imposed limits on use of LLaMA 2, Stefano Maffulli from the OSI commented, “Unfortunately, the tech giant has created the misunderstanding that LLaMA 2 is ‘open source’—it is not. Meta is confusing ’open source’ with ‘resources available to some users under some conditions,’ [which are] two very different things” [63].

Companies have been criticised for “open-washing” by promoting their models as “open source” models, when they are typically “open weight” models at most, as a commercial strategy to present themselves as patrons of the digital commons, whilst disguising their intent to set open standards and benefit from crowdsourced innovation [21, 62, 65, 66]. A review ofthe openness of LLMs found that, “[W]hile there is a fast-growing list of projects billing themselves as ’open source’, many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare” [66].

It remains an open question whether one can or should classify AI models as either open or closed-source. Through a global, multi-stakeholder approach, the OSI is currently developing a definition of open source AI as AI systems that are made available under terms that grant the freedoms to use, study, modify, and share the system [1, 67]. Countering binary approaches, Irene Solaiman [17] makes the case that AI systems are not either fully open or fully closed; rather, the openness of AI systems can be plotted along a gradient with six degrees of openness. Each grade of openness involves trade-offs between concentrating power and mitigating risks [17]. As the field rapidly evolves, developing responsible practices, norms, and regulation around open source AI remains a critical challenge [43, 44].

A nascent research agenda on open source AI

While the benefits and risks of open models have been widely discussed, we still have a limited understanding of the collaborative practices involved. In this section, we review prior work on open model development and motivate our empirical analysis of development activity on the HF Hub to address this research gap.

The HF Hub has emerged as a popular platform used by individuals and organisations to share, download, and collaborate on models, datasets, and spaces [68, 69]. The HF Hub is a “model marketplace,” which is “a new form of user-generated content platform, where users can upload AI systems and AI-related datasets, which in turn can be downloaded, and depending on the business model, queried, tweaked, or built upon by other users” [70]. Much of the activity amongst the emerging developer community on this platform concerns individuals fine-tuning pre-trained models that were released by industry leaders for downstream use in research and applications [21]. In addition to the hosting and fine-tuning of open models, a few grassroots initiatives have embraced open collaboration methods to develop open models. For example, the development of BLOOM, a 176B parameter multilingual LLM, and its training dataset, ROOTS, was the largest “open source” AI collaboration to date, involving over 1,000 volunteers from over 70 countries and over 250 institutions [12]. Such initiatives have demonstrated alternative pathways for AI model development beyond the handful of companies that dominate the AI R &D [9, 14]. Prior work has also highlighted the leadership role of companies, such as Hugging Face, in organising “values-driven initiative[s]”, such as the BigScience workshop, and attracting contributors who have diverse motivations, from developing new skills and working on new problems to publishing research giving back to the ecosystem [12, 14].

Due to the growing popularity of the HF Hub, scholars have examined the suitability of the HF Hub for empirical research on open model development [69, 71].Footnote 3 Castaño et al. [22] provide most comprehensive empirical insights into maintenance practices in model repositories on the HF Hub.Footnote 4 They find that commit activity follows a right-skewed distribution, with a few models receiving extensive activity while the majority of repositories receive limited activity [22]. While the majority of models are developed by singular developers (1.18 mean, 1.0 median), some model repositories, such as bigscience/bloom or bigcode/santacoder, are co-developed and co-maintained by up to 20 developers [22]. They also find that developers tend to prioritise “perfective tasks” to enhance model performance and align with technological advances, unlike OSS maintenance that focuses on bug fixes and feature additions [22]. The authors contend this “reveals the need for methods and tools specifically designed for the unique demands of ML model maintenance. Such tools may include advanced version control systems optimized for data and model tracking, as well as automated monitoring tools capable of detecting model drift or degradation” [22]. Prior work has also examine carbon emission reporting in model repositories, finding stagnation in emissions reporting by developers and highlight and the need for improved reporting practices and carbon-efficient model development on the HF Hub [72].

Our research builds on this prior work. As one of the first studies to investigate open model development practices, in the next section we draw on prior work on OSS development in order to be able to compare our findings to prior research and to lay the groundwork for a more comprehensive understanding of open model development in the future.

Learning from prior work on OSS development

Prior work on OSS development provides an empirical foundation for investigating the social dynamics of open model development. In the early 2000s, a number of metaphors were used to describe the social structure of “the OSS community”. For example, the Linux developer community was described as a “bazaar” that vibrated with the activity of geeks, hackers, and hobbyists, who performed various tasks, from bug-spotting to writing code to “serving the hacker culture itself” [45]. However, prior work illustrates that OSS communities have diverse social structures [73, 74], from “caves” with singular developers [75] to “core-periphery” networks, akin to “layered onions” [76], with uneven activity distributions ranging from core contributors (e.g., project initiators) to users (e.g., bug-spotters) [77,78,79,80].

Numerous studies highlight that various types of activity in OSS development, such as discussions in mailing lists, bug-spotting in issue trackers, and commit activity, exhibit right-skewed, Pareto distributions [24, 26, 28]. Indeed, it is well-documented observation that OSS development is typically characterised by the Pareto principle, commonly known as the 80/20 rule or the law of the vital few, which states that approximately 80% of effects come from 20% of causes [81]. These findings are congruent with a wide range of Internet phenomena, which similarly exhibit right-skewed distributions, which follow power laws [82, 83]. However, there are exceptions to the rule; for example, a study of 2,496 projects on GitHub found that the Pareto principle does not always characterise development activity in OSS repositories, thus highlighting the need to be cautious about generalising the Pareto principle as an incontestable law of OSS development [84]. Furthermore, many activities, such as mentorship and hackathons, take place outside of the repository [32, 85,86,87] and are therefore invisible to quantitative scholars of OSS development practices.

The various social structures of OSS communities are shaped, amongst others, by the diverse incentives of individuals and companies that participate in OSS development [88,89,90]. Individual developers are typically motivated by factors such as personal values, altruism, enjoyment, reputation-building, and career benefits [91,92,93,94]. However, there are also major barriers to participation, including gender disparities [95, 96] and geographic inequalities [86, 87]. Activity tends to be concentrated in the Global North [97] and the English lingua franca is a barrier for many developers [87, 98]. Furthermore, the incentives of OSS developers vary by geography: while developers in the USA show a relatively strong interest in “geek culture”, developers in India and China tend to be motivated primarily by career benefits [99]. Thus, “researchers studying open source should be mindful of geographic variation in what motivates participation and what forms participation may take, particularly outside of the code repository” [86].

Meanwhile companies primarily participate in OSS development for strategic reasons, such as recruiting developers [100,101,102], reducing costs [101, 103, 104], influencing OSS projects [104, 105], promoting open standards [103, 106], and building a reputation as an OSS patron [32, 89, 107]. Commercial participation has mixed effects on the social structure of OSS communities. Typically, one company or a few companies emerge as dominant contributors in projects [28, 108]. The dominance of a company is negatively associated with the participation of volunteers, while it is positively associated with the productivity of contributors and the quality of issue reports [74, 109]. It is also common for companies, which may be market rivals, to collaborate in OSS ecosystems [108, 110,111,112,113], which has turned many OSS communities “from networks of individuals into networks of companies” [100].

Building on this prior work, this study aims to provide novel insights into the collaborative dynamics of the HF Hub. Specifically, we investigate typical patterns of development activity across model, dataset, and space repositories the HF Hub (RQ1), the social network structure of its developer community (RQ2), as well as adoption and key actors driving the development of the most widely-adopted models (RQ3). The research extends the literature by shedding light on the practices involved in open model development on this increasingly important platform. The findings contribute to a more comprehensive understanding of and lay the groundwork for future research on open model development.

Study design

Research aims and research questions

This study extends the nascent research agenda on open model development with a quantitative analysis of development activity on the HF Hub. We adopted a quantitative approach to explore large-scale patterns and trends in development activity on the HF Hub, which is a suitable approach when one seeks to generate baseline insights on a new phenomenon [114]. In particular, we examine different aspects of development activity on the HF Hub via the following RQs:

  • RQ1: What are typical patterns of development activity across the HF Hub?

  • RQ2: What is the social network structure of the HF Hub developer community?

  • RQ3: What is the distribution of model adoption on the HF Hub, and who are the key actors driving the development of the most widely-adopted models?

These RQs examine different aspects of development activity on the HF Hub. RQ1 focuses on identifying common patterns across various types of activity, such as likes, discussions, commits, and downloads, in the repositories of models, datasets, and spaces. Concretely, this analysis expands prior work that focuses on commit activity in model repositories [22]. RQ2 concerns the social network structure of the developer community on the HF Hub. In particular, we analyse a snapshot of collaboration interactions in model repositories amongst around 100,000 developers, building on prior descriptions of collaboration on open models [12, 14] and maintenance practices [22]. Lastly, RQ3 empirically tests a prior observation of uneven model adoption and the influence of Big Tech companies [21] by examining the distribution of model use in spaces and identifying the developers of the most used models. In addition, we examine model co-usage patterns to provide insights into the interdependencies and ecosystems surrounding popular models.

The HF Hub: a new platform and source of research data

The HF Hub was launched in 2021 by Hugging Face, a startup whose mission is to “democratize AI” [68]. The HF Hub is a Git-based social coding platform, widely used by researchers, developers, and hobbyists to share, discover, discuss, and collaborate on open models [115], datasets [116], and spaces [117]. Spaces are interactive web applications that facilitate the creation of demonstrations and make models hosted on the platform more accessible to end-users. The platform provides a number of tools for open model development, such as version control for collaboration and tracking [115], and evaluation and benchmarking of model performance [118]. The HF Hub API allows programmatic access to platform resources as well as metadata from repositories hosted on the platform [15]. In light of its features and data availability, prior work underlines the platform’s suitability for empirical studies on open model development [22, 69]. Building on this prior work, this paper aims to advance the research community’s understanding of the development practices in open model development as well as methodological considerations regarding the HF Hub.

When using data from the HF Hub, it is important to consider the ethical implications and adhere to the platform’s terms of service. In the study, we only collected publicly available data through the official HF Hub API, respecting the privacy settings of users and repositories. For example, we did not attempt to access or include data from private repositories in the analysis. Additionally, we anonymised the collected data by focusing on aggregate measures and avoiding the disclosure of personally identifiable information in the findings. Ethical clearance for this study was obtained from the CUREC institutional review board at the University of Oxford.

Data collection

We collected data via the HF Hub’s API in October 2023 [15], using Python scripts that are available on GitHub [119]. For RQ1, we collected and processed metadata for a number of activities from the public repositories of 348,181 models, 65,761 datasets, and 156,642 spaces, using the list_models(), list_datasets(), and list_spaces() API endpoints. These included: likes (n_likes), downloads (n_downloads),Footnote 5 discussions (n_discussions), commits (n_commits), unique developers who have contributed commits (n_commiters), unique developers who started discussions (n_disc_starters),Footnote 6 and the repository’s community size (n_community), calculated as the cardinality of the set union of n_disc_starters and n_commiters. As per prior work [113, 120, 121], we removed bots and merged multiple developer identities before enumerating n_disc_starters, n_commiters, and n_community. As a result, n_community is recorded as 0 if no user has made a commit or started a discussion in the repository, which ignores the creator of the repository. We acknowledge that alternatively such repositories could have the value 1.

For RQ2, we operationalised collaboration on models as instances where a pair of developers contributed commits to the same model repository, with direct edges recorded between developers that were weighted by the number of times a developer contributed a commit to the same repository as the other developer [122]. We operationalised commit activity as acts of collaboration because commits are easily measurable, represent “validated” contributions, and represent an accurate audit trail of collaboration [80, 113]. However, we acknowledge that the fact that two developers commit to the same repository does not necessarily imply direct interaction; for example, it would have been more accurate to focus on developers’ contributions to the same file in a repository, as we discuss in “Threats to validity” section. Formally, we modelled collaboration as a network \(N = (D,E,W)\), where D is the set of developers, \(E = \{(i,j,w_{ij}) \mid i,j \in D, w_{ij} \in {\mathbb {N}}\}\) is the set of directed edges denoting the relationships between developers, and \(W = \{w_{ij} \mid (i,j,w_{ij}) \in E\}\) represents the weights associated with each directed edge. For a developer pair i and j, we denote the directed relationship as \((i,j,w_{ij})\), where \(w_{ij}\) signifies the number of times developer i has committed to the same repository as developer j.

To collect data for the analysis of RQ2, we collected commit data from public model repositories via the HF Hub API. We started by retrieving a list of all available model IDs using the list_models() endpoint. Then, for each model repository, we used the list_repo_commits() endpoint to retrieve the commit data, including the authors associated with each commit. For each commit, we recorded an edge between the the developer who made the commit (source_node) and all other developers who had contributed to the repository (target_node). In cases where a repository had only one contributor, we created self-loop edges to capture the isolate contributor’s activity. We did not take temporal dynamics of commit activity into account, which we discuss as a threat to construct validity under “Threats to validity” section. We collected data for collaboration in NLP, CV, and MM model repositories by filtering repositories based on the tags, which developers add to their repositories to aid discoverability on the HF Hub. We used the list of tags per sub-field provided by the HF Hub, including computer-vision and image-classification for CV models; translation and summarization for NLP models; and image-to-text and image-to-video for MM models.

For RQ3, we collected data on model usage in spaces using the list_models() and model_info() API endpoints. We modelled model usage in spaces as a bipartite network, akin to the representation of software dependency networks [123]. The bipartite model usage network is denoted as \(D = (M, S, E)\), where M is the set of models, S is the set of spaces, and \(E = \{(m, s) \mid m \in M, s \in S\}\) is the set of undirected edges signifying that “space” s uses model m. The edges are unweighted, representing the model usage relationship between a “space” and a model. From the bipartite network D, we derived an undirected model co-usage network \(C = (M, E, W)\). In this network, M is the set of models, \(E = \{(m_i, m_j) \mid m_i, m_j \in M\}\) is the set of undirected edges connecting models based on their co-usage in a “space”, and \(W = \{w_{ij} \mid (m_i, m_j) \in E\}\) is the set of weights assigned to the edges, reflecting the frequency of co-usage of models \(m_i\) and \(m_j\) across spaces. This analysis complements the former analysis of model usage with insights into the interdependencies and ecosystems surrounding widely used models on the HF Hub.

Username merging

Following prior work, before the analysis, we undertook data preprocessing to merge multiple developer identities per unique developer, which can be caused by how Git records usernames based on users’ local repository credentials [28, 77, 121, 124, 125]. We assumed this might be an issue on the HF Hub, too. To ensure the accuracy of the dataset of 101,144 developers, we applied a three-pronged approach. First, we classified username string similarity (threshold=90%) between pairs of developers who contributed to the same repository, accepting 126 out of 180 (70.00%) pairs based on manual username searches on the HF Hub. Second, in light of the presence of potential real names (i.e. usernames with spaces like “Jessica Smith”), we examined string similarity (threshold=90%) between 1,979 potential real names and the remaining 99,041 usernames, accepting 358 out of 403 (87.75%) username pairs after manual searches on the HF Hub. Finally, we inspected the usernames of 700 developers with a network degree of 10 or higher, who represented 0.7% of developers but accounted for 44.78% of edges, via manual searches on the HF Hub. This resulted in the identification of 212 username pairs. In total, we merged 546 usernames after removing duplicates.

Data analysis

To investigate development activity on the HF Hub (RQ1), we conducted a descriptive analysis of various types of activity in 348,181 model repositories, 65,761 dataset repositories, and 156,642 space repositories. Pearson correlation coefficients were calculated to assess the pairwise relationships between the activity variables. In addition, we employed the Mann–Whitney U test to compare the levels of activity across repositories with different licenses (Permissive, Restrictive, and No license). The Mann–Whitney U test is a non-parametric test that examines whether two independent samples come from the same distribution, which does not require the data to be normally distributed or to meet the assumption of homogeneity of variance [126]. Given the large sample sizes, the U values are expected to be large, and the salient test statistic is the p-value which indicates the statistical significance of observed differences. Due to capacity constraints in labelling licenses, we limited this analysis to repositories with licenses used in at least 100 repositories (\(n=\)339,502, 98% of all repositories). Subsequently, we analysed a snapshot of the social network structure of collaboration on the HF Hub (RQ2), using techniques defined in Table 1 in Appendix 1. This analysis provides insights into collaboration patterns in model repositories at this point in time. Furthermore, we analysed collaboration patterns in the three AI sub-fields (NLP, CV, and MM) to enable comparisons. Lastly, we examined model adoption on the HF Hub (RQ3) by calculating the ranked degree of models in the bipartite model usage networks and ranked degree of models in the model co-usage networks to identify the most used models in spaces and their respective developers. These two complementary approaches quantified model popularity (i.e. which models are most frequently used in spaces) and model co-popularity (i.e. which models are most commonly used in conjunction with other models). We replicated this analysis for spaces with NLP, CV, and MM tags for comparative analysis of the three AI sub-fields.

Results

We first report results for activity in the 348,181 model, 65,761 dataset, and 156,642 space repositories in “Development activity on the HF Hub” section, relying on the metrics described in “Data collection” section. We then report results on the structure and dynamics of collaboration in “Social network structure and dynamics of collaboration” section, based on the analysis of collaboration interactions between around 100,000 developers in model repositories. Finally, we present the results of our analysis of model adoption in spaces in “Model adoption in spaces on HF Hub” section, where we examine the distribution of model usage in spaces on the HF Hub and identify the developers of the most used models.

Development activity on the HF Hub

In this section, we present the findings of development activity in the repositories of 348,181 models, 65,761 datasets, and 156,642 spaces on the HF Hub. We present three key findings: right-skewed distributions across different types of activity (“Right-skewed distributions in development activity” section), strong correlations between development activities (“Correlation between community size and engagement” section), and a significant lack of licenses in model and dataset repositories (“Impact of licenses on collaboration” section).

Right-skewed distributions in development activity

Activity per repository is extremely imbalanced, with right-skewed distributions of n_likes, n_discussions, n_commits, and n_downloads across model, dataset, and space repositories (see Fig. 1). For example, while the maximum number of likes amongst models is over 9000, the average model only receives 1.14 likes (see Tables 2, 3, 4). The majority of repositories get minimal engagement. For example, 91% of models and 88% of datasets have 0 likes; 84% of models, 91% of datasets, and 96% of spaces have 0 discussions; and 71% of models and 70% of datasets have 0 downloads. Meanwhile, most activity is concentrated in a small number of repositories. For example, \(<1\)% of models account for 80% of likes, 10% for 80% discussions, 30% for 80% commits, and \(<1\)% for 80% downloads. Upon increasing the threshold, 8% of models account for 99% likes, 15% for 99% discussions, and 1% for 99% downloads.

Fig. 1
figure 1

Distributions of development activity in HF Hub repositories

Most repositories have a community size of 1; for example, 87% of model repositories have 1 contributor and the 75th quartile value of n_committers is 1 across repository types (see Table 2). The respective maximum values of n_committers are 18, 100, and 282 across repository types, and the respective maximum values of n_community are 246, 110, and 4,685. The differences between n_committers and n_community are due to large n_disc_starters values, indicating a division of roles in repositories, where many developers participate in discussions but few are involved in model maintenance. The model repositories with the most n_committers are bigscience/bloom (\(n=18\)), bigcode/santacoder (\(n=16\)), and deepset/roberta-base-squad2 (\(n=15\)).

Correlation between community size and engagement

We correlate frequency counts over the different types of activity described in “Data collection” section (see Fig. 2). In model repositories, we find a strong positive correlations between n_community and n_likes (\(\rho = 0.75\), \(p < 0.001\)). In space repositories, we find strong correlations between various activities, especially n_likes) and n_discussions (\(\rho = 0.74\), \(p < 0.001\)), n_disc_starters (\(\rho = 0.76\), \(p < 0.001\)), and n_community (\(\rho = 0.76\), \(p < 0.001\)). However, in general, we observe weak correlations between most activities in model and dataset repositories. Furthermore, we do not find a strong correlation between commit activity (n_commits) and other types of activity, indicating that commit activity is not strongly linked to community engagement.

Fig. 2
figure 2

Correlations of activity in model, dataset, and space repositories

Impact of licenses on collaboration

A significant proportion of model and dataset repositories lack licenses, which can create uncertainty and potential legal issues for users and developers. Specifying a license is not the norm: the majority of model repositories (65%) and datasets (72%) do not have a license. Amongst the licensed models, the most commonly used licenses are Apache v2.0 (37%), MIT (17%), OpenRAIL (14%), and CreativeML OpenRAIL-M (10%). The most used licenses for datasets are MIT (28%), Apache v2.0 (15%), OpenRAIL (9%), and licenses from the family of Creative Commons v4.0 (7%).

The choice of license matters: there is a moderate to strong correlation between the use of a license and level of activity in model repositories (see Fig. 3). Furthermore, the Mann-Whitney U tests provide strong evidence of statistically significant differences between collaboration dynamics in model repositories with different types of licenses (all tests have \(p < 0.001\)). Specifically, model repositories with permissive licenses consistently have the highest levels of activity compared to model repositories with no license and those with restrictive licenses (see Table 5). However, repositories with restrictive licenses also exhibit significantly higher activity than those with no license. This pattern holds across all activity metrics measured, suggesting that while permissive licenses foster the highest engagement, restrictive licenses also promote more collaboration compared to model repositories that do not have a license.

Fig. 3
figure 3

Correlations of activity in model repositories with different licenses

Social network structure and dynamics of collaboration

In this section, we present findings from our analysis of a snapshot of the social network structure of collaboration in model repositories on the HF Hub. We begin with the structure and dynamics of collaboration in all model repositories (see “Collaboration in model repositories on the HF Hub” section), and then we compare collaboration patterns in Natural Language Processing (NLP), Computer Vision (CV), and Multimodal (MM) model repositories (see “Collaboration in model repositories in AI sub-fields” section).

Collaboration in model repositories on the HF Hub

The HF Hub collaboration network exhibits a right-skewed degree and PageRank centrality distributions, which indicates that influence in the HF developer community is concentrated amongst a small subset of developers. The majority of developers (89%) have not collaborated with others. Excluding these isolate developers, the remaining 10,524 developers have an average degree of 4.10 (SD: 32.63) and node degrees range from 1 to 3140. The right-skewed distributions of degree and PageRank centrality (see Fig. 4) suggest that a small group of influential developers plays a central role in driving collaboration on open models on the HF Hub. Specifically, the degree centrality distribution has a mean of 4 and a median of 2, with a maximum of 3140 and a standard deviation of 33, while the PageRank centrality distribution has a mean and median of 0.0001, a maximum of 0.04, and a standard deviation of 0.0005.

Fig. 4
figure 4

PageRank and degree distributions of developers on the HF Hub

The HF Hub developer community exhibits a core-periphery structure, with a tightly interconnected core of prolific developers. The k-core decomposition analysis reveals that as the k-core value increases, the number of distinct communities decreases, ultimately converging into a single densely interconnected core at k = 26 (see Table 6). The high modularity (0.81) at k = 1 suggests that the whole network consists of loosely connected groups of developers. As the k-core value increases, the modularity decreases to 0.00 at k = 26, indicating a transition from a compartmentalised community structure with distinct clusters or modules to an integrated core characterised by high cohesion and a lack of discernible sub-groups. Concurrently, the sub-network density increases, reaching unity at k = 26.

Collaboration is characterised by high reciprocity values, ranging from 0.81 to 1.00 across all k-core levels (see Table 6), indicating the prevalence of mutual relationships amongst developers. The low assortativity values, ranging from − 0.49 to 0.08, suggest that developers collaborate regardless of their centrality in the network, implying that other factors, such as shared interests, skills, or project roles, may be more significant in driving collaboration than their network centrality. Furthermore, the relatively low average rich club coefficients, ranging from 0.04 to 0.41, indicate that highly central developers do not primarily collaborate with each other and a lack of elitism amongst power developers.

Collaboration in model repositories in AI sub-fields

Collaborations on models in sub-fields of natural language processing (NLP), computer vision (CV), and multimodal (MM), despite the different sizes of the respective communities, are similarly characterised by core-periphery structures ith high modularity and low density (see Tables 7,8, 9). At k = 1, all networks are highly modular (CV: 0.80, NLP: 0.82, MM: 0.71) and have very low density (CV: 0.01, NLP: 0.00, MM: 0.00), implying that collaborations in the respective AI sub-fields are clustered into distinct communities of collaborators. As the k threshold increases, the networks undergo a similar transformation process, with modularity decreasing to 0.00 and the number of communities reducing to a single cohesive community at the maximal k values (CV: 10, NLP: 25, MM: 26). Concurrently, density increases, reaching 1.00 for CV and MM and 0.97 for NLP at their respective maximal k values.

Collaboration in sub-fields is also similarly characterised by reciprocity and connectivity in the core. At k = 1, reciprocity values range from 0.84 to 0.93 and increase to 1.00 at the maximal k for CV and MM, while NLP maintains a high reciprocity of 0.98 at its maximal k. The average degree increases with k for all networks, reaching the corresponding maximal k value at the highest threshold. This suggests that as we move towards the core of the collaboration networks, developers become more interconnected and collaborate with a larger number of peers. However, the low average clustering coefficients and low average rich club coefficients across all networks indicate that the more prolific developers in the respective sub-fields tend to collaborate with a diverse set of individuals rather than forming tightly-knit groups.

Model adoption in spaces on HF Hub

In this section, we present the results of the analysis of model usage in spaces on the HF Hub, shedding light on model adoption and key developers in this ecosystem. Specifically, we present two key findings: model adoption in spaces is characterised by a right-skewed distribution (“Right-skewed distribution of model adoption” section), and a small cohort of developers (in particular, Big Tech companies) build the most used models across all spaces as well as in the three AI sub-fields (“Dominance of a few models by a few developers” section).

Right-skewed distribution of model adoption

The bipartite model usage network displays a disparity in model adoption in spaces. The degree distribution of the bipartite network is right-skewed, as shown in Fig. 5. Only three models are used in 1000 or more spaces, including runwayml/stable-diffusion-v1-5 (\(n=1747\)), skytnt/anime-seg (\(n=1162\)), and gpt2 (\(n=1002\)). The mean degree (6.68) is significantly higher than the median (1.00), and the large standard deviation (34.75) confirms the high variability in model usage. The majority of models have a low degree of usage, with at least 50% being used in only one space, while a small number of highly popular models dominate the usage, with the maximum degree reaching 1747. This suggests that a few key models are widely adopted in AI applications, while many other models have limited use cases. The model co-usage network provides an additional perspective on the uneven interdependencies of models in spaces, complementing insights gained from examining model downloads or individual model usage in spaces. Specifically, the degree distribution of this network exhibits a multi-modal pattern, with five distinct clusters, each exhibiting a right-skewed shape (see Fig. 5). A small cluster at the far-right tail of the distribution represents a few highly interconnected models with significantly higher co-usage degrees compared to the other clusters.

Fig. 5
figure 5

Degree distribution of model adoption in spaces on the HF Hub

Dominance of a few models by a few developers

When we rank the models by their usage in spaces, we observe that major organisations, rather than individual developers or grassroots initiatives, have developed the most used models. Amongst the 100 most used models in spaces, the following organisations have developed the most models: Meta (\(n=8\)), Google (\(n=7\)), StabilityAI (\(n=5\)), OpenAI (\(n=4\)), Microsoft (\(n=4\)), and Fudan University (\(n=4\)). These five organisations account for 33% of the 100 most used models in spaces. We note that the individual user nitrosocke (\(n=5\)), an employee at StabilityAI, ranked highly amongst these organisations. With regards to the model co-usage network, the key developers of the 100 most co-used models in all spaces are: EleutherAI (\(n=15\)), Meta (\(n=12\)), h20ai (\(n=11\)), BigScience (\(n=9\)), and lmsys (\(n=9\)). These five organisations account for 56% of the 100 most co-used models in spaces.

The model usage networks in the sub-fields similarly exhibit right-skewed degree distributions, highlighting the dominance of a minority of models in each sub-field. The most used models in spaces with NLP tags (\(n=3995\)) are gpt2 (\(n=1001\)), bertbaseuncased (\(n=621\)), and gpt2medium (\(n=445\)). The organisations that developed the most models amongst the 100 most used models are Google (\(n=9\)), Meta (\(n=5\)), and Fudan University (\(n=5\)). For comparison, in the NLP model co-usage network, EleutherAI ranks first (\(n=16\)), followed by h20ai (\(n=12\)) and Meta (\(n=11\)). The most used models in spaces with CV tags (\(n=416\)) are saltacc/anime-ai-detect (\(n=500\)), openai/clip-vit-large-patch14 (\(n=454\)), and openai/clip-vit-base-patch32 (\(n=277\)). The most prolific developer of models in spaces with CV tags is the user lllyasviel (\(n=20\)), followed by Meta (\(n=8\)) and the user DucHaiten (\(n=7\)). For comparison, in the CV model co-usage network, LAION AI ranks as the developer of the most models amongst the top 100 (\(n=17\)). Finally, the most used models in spaces with MM tags (\(n=2394\)) are runwayml/stable-diffusion-v1-5 (\(n=1748\)), CompVis/stable-diffusion-v1-4 (\(n=925\)), and stabilityai/stable-diffusion-2-1 (\(n=854\)). Amongst the developers of the 100 most used models, Stability AI ranks first, with 15 of the 100 most used models and 22 of the top-ranked co-used models in such spaces. These findings highlight the key models and players in the NLP, CV, and MM communities.

Correlations between model likes and model usage

We observe a strong positive correlation between n_likes of models and n_usage_spaces (\(\rho = 0.66\), \(p < 0.001\)), and a weak positive correlation between n_downloads and n_usage_spaces (\(\rho = 0.29, p < .001\)). These findings suggest that the number of likes is more strongly associated with the usage of models in spaces compared to the number of downloads, and that likes in model repositories are a good indicator of its adoption in applications on the HF platform. However, as mentioned in “Data collection” section, we note that download counts are limited and therefore may only provide a snapshot of correlations between downloads and likes or usage, which may not generalise in all time periods.

Discussion

In this section, we discuss the key implications of our findings for research and practice. We highlight the study’s contributions to the literature in “Contributions to academic literature” section. We then reflect on the methodological considerations of using the HF Hub as a data source for research on open source AI in “HF Hub: a new source of research data” section. Building on these insights, we make five recommendations for future research to advance the research agenda on open source AI in “Recommendations for future research” section. Finally, we discuss the implications for practice and make recommendations to practitioners in “Implications for practice” section.

Implications for research

Contributions to academic literature

Uneven influence in the HF Hub developer community: We extend prior findings of right-skewed distributions of commit activity in model repositories [22] with observations of right-skewed distributions of various development activities on the HF Hub, including interactions in model, dataset, and space repositories; code collaborations between developers; and model usage in spaces. Activity distributions follow power law patterns, with a small fraction of repositories accounting for most interactions (e.g., \(<1\)% for 80% of likes, 10% for 80% discussions, 30% for 80% commits, \(<1\)% for 80% downloads). Similarly, the collaboration networks exhibit right-skewed centrality distributions, indicating that influence is concentrated amongst few developers, congruent with prior observations that OSS development patterns generally follow Pareto distributions [24,25,26,27,28]. Influence also flows across the HF Hub, with likes per model having strong correlations with their usage in spaces (\(\rho = 0.66\), \(p < 0.001\)).

Impact of license on collaboration: The Mann–Whitney U tests show that license choice significantly impacts the level of activity and engagement in repositories, with permissive licenses exhibiting the highest activity levels, followed by repositories with restrictive licenses, and finally ones with no license. Furthermore, the Pearson correlations indicate that the use of a license (permissive or restrictive) is associated with stronger correlations between various types activity compared to repositories without licenses. These findings highlight the important role of licencing decisions in influencing the collaborative and community dynamics in open model development and open source AI projects.

Core-periphery structure of the HF developer community: To the best of our knowledge, only one prior study has investigated model development practices in the HF developer community, showing that most models only have one contributor and that model maintenance chiefly involves “perfective tasks” to enhance model performance [22]. We extend this finding with three insights. First, we corroborate the findings that most developers (89%) are islands, who have not collaborated with other developers in model repositories on the HF Hub. This is not unique to the HF Hub: the majority of OSS projects are developed by individuals [75]. However, what may be specific about the small community sizes in model development is the nature of the model development life-cycle (“code once, train often”). Second, the social network structure of collaboration patterns amongst developers in model repositories is characterised by a core-periphery structure, with a dense core of highly active developers, akin to the “layered onion” structure common in OSS [76]. Third, collaborations have high reciprocity and low assortativity, signifying the prevalence of mutual relationships amongst developers, regardless of their social positions in the community.

Uneven model adoption in spaces: By examining model adoption in spaces, we empirically tested the observation of uneven model adoption and the disproportionate influence of industry-leading companies in the open source AI ecosystem [21]. We identified the popularity of a relatively small number of models used in spaces as well as the influential role of a few organisations, including Meta, Google, Stability AI, OpenAI, Microsoft, and EleutherAI, who have developed the most widely used models. Some critics of the open-source model of AI development fear that too many unknown actors will introduce distributed safety issues, while advocates of the development model tout democratisation of power as a core benefit. Our findings show that a few organisations possess majority influence in this ecosystem, which challenges both of these narratives. In many cases, the most influential actors in the open source AI ecosystem are one in the same as those in closed-source AI [21].

HF Hub: a new source of research data

This paper contributes to the research effort to use the HF Hub as a data source for empirical studies on open model development [22, 69, 71]. We share two reflections on methodological considerations. First, informed by prior work that underlines the importance of merging usernames for unique developers, we anticipated that this might be an issue on the HF Hub [28, 77, 121, 124, 125]. While our three-pronged approach strikes a balance between the impracticality of manually inspecting over 100,000 developers versus the risk of misclassification through a fully automated approach, it is still imperfect. Future research may consider more sophisticated approaches to this problem. Second, the API is not optimised for research purposes, which makes data collection time-consuming (e.g., one must make a unique API call to retrieve commit histories of each model and handle rate limits) and limited (e.g., user metadata is not available). The lack of user metadata hinders the ability to study the characteristics and behaviours of individual developers, such as their expertise and affiliations, as well as automated approaches to username merging that incorporate user metadata. To overcome these limitations, researchers may explore alternative approaches and tools, such as the HFCOMMUNITY database developed by Ait et al. to facilitate empirical studies of activity on the platform [71].

Recommendations for future research

We recommend five research directions that can advance the research agenda on open source AI.

  1. 1.

    Implications of concentrations in the HF Hub developer community: We confirm prior observations that the models of a handful of companies are dominant amongst the HF Hub developer community [21]. We encourage future research to investigate what these concentrations mean in practice, such as the potential benefits that these companies accrue from their open model ecosystems, including increased visibility, crowdsourced contributions (e.g., via commits and discussions), and access to diverse fine-tuned versions shared by other developers on the HF Hub. Furthermore, there is a concern that dominant companies benefit from developers being locked-in to their ecosystems, potentially limiting competition and entrenching their dominance. Future research could investigate the factors contributing to such concentrations, such as the reputation of the companies developing the models, their access to resources and support, or the perceived performance and versatility of their models, as well as the implications of these concentrations for the broader AI community, including the impact on research, innovation, and the distribution of benefits and resources.

  2. 2.

    Incentives and modes of participation: Future research could investigate the incentives of individual developers and companies. A number of companies have released open models on the HF Hub, such as Meta’s LlaMA models [127], Mistral AI’s Mixtral models [128], and OpenAI’s Whisper models [129]. Often these releases are presented as acts of “AI democratisation” [130]. Future research could critically examine the commercial incentives behind these releases. In addition, future research could examine commercial approaches to model governance and maintenance—for example, if and how companies welcome or engage with community contributions—and if and how companies collaborate with each other on open model development, as they do in OSS development [108, 110,111,112,113].

  3. 3.

    Collaboration dynamics in active repository communities: We know that model maintenance focuses on model performance improvements [22]; and in the minority of repositories that have active communities, most developers contribute to discussions rather than commits (see Tables 2, 3, 4). Going further, we encourage researchers to examine collaboration dynamics in repositories with active communities from multiple angles. Given the sizeable differences in n_committers and n_disc_starters, future research could investigate the division of roles between discussion and code contributors, typical topics of discussion (e.g., model performance, new ideas, etc.), how discussions inform model maintenance if at all, and the journeys of developers from discussion contributors to committers, amongst others. In addition, future research could examine the governance approaches (e.g., contribution policies) that repository owners use to encourage collaboration. Future analyses could also take into account temporal dynamics, providing insights into evolving patterns, social structures, and trends of open model developer communities on the HF Hub.

  4. 4.

    Impact of model size on collaboration Future research should examine the impact of model size (i.e., parameters) on the nature of collaboration in repositories on the HF Hub. For instance, it could examine how resource constraints (e.g., computational power or data availability) influence collaboration for various stakeholders (e.g., individual developers or developers from industry labs) on models of different sizes. By shedding light on facilitators and barriers for collaboration on open models, such research could guide efforts to foster inclusive and diverse communities.

  5. 5.

    Collaboration beyond the HF Hub: While this analysis provides insights into the developer community that shares and fine-tunes models on the HF Hub, we have a limited understanding of the development of the various components involved in the development of models [4], which largely takes place in proprietary settings or on other platforms like GitHub [14]. We encourage future research to examine how the HF Hub is used in the wider ecosystem of platforms and offline venues for the collaborative development of open models and datasets. This research direction would enable comparisons of the collaboration patterns amongst model developers and model fine-tuners. In addition, researchers could undertake a multi-sited analysis, examining collaboration on the same project across platforms.

Implications for practice

Recommendations for open source practitioners

Beyond our academic research suggestions, we encourage open source researchers and practitioners to develop standardised metrics for studying open model development. Groups like the Linux Foundation’s Community Health Analytics in Open Source Software (CHAOSS) working group [131], which has created metrics to assess the health and sustainability of OSS developer communities, are well-positioned to lead this effort. The lack of empirical data on open model development hinders evidence-based decision-making in this rapidly evolving field, and by working together to establish appropriate metrics, open source practitioners can help to address the data gap regarding open models.

Recommendations for platform providers

We make two recommendations to HF as a platform for open model development. First, HF could work with researchers to identify features and API improvements that would aid research efforts concerning open model development on its platform, building on efforts by members of the HF community, such as Weyaxi/huggingface-leaderboard. This collaboration could include collecting and publishing data on open model development patterns and collaboration, which would help fill the current “data gap” in this area. HF may take inspiration from GitHub’s Innovation Graph [132] or its annual Octoverse reports [133], which provide access to data and insights on development activity on its platform. Second, a concerning proportion of models (64.67%) and datasets (72.13%) lack licenses, which may be due to uncertainty about how or whether they should be licensed [67, 134]. For comparison, the number of unlicensed repositories on GitHub is lower at 46% or 53% if including “other licenses” [135]. In the interest of promoting responsible development, HF should consider developing educational resources on licenses, such as guides or tutorial videos, or developing features, such as a license drop-down menu, which can inform developers of the options available as well as their merits and drawbacks. Such a feature could be considered amongst other recommendations to moderate models on the HF Hub, such as hiring AI safety researchers and proactively red-teaming unsafe models [53, 70].

Recommendations for policymakers

As open models become increasingly widely available and used, policymakers need empirical data to inform discussions about the benefits, risks, and governance of these models. Our analysis provides one empirical lens on the extent of model proliferation and adoption, which can help ground policy decisions. For example, it is illuminating to observe that most models (70.99%) have not been downloaded once or that 1% of models account for 99% of downloads. This is a reminder that the availability of a model does not mean it will be (widely) used. Furthermore, while download counts were limited to the past 30 days, the fact that only 86 models had over one million downloads indicates that the number of widely used models is not excessively large and governable. What is more, the analysis revealed the impact of models developed by a number of non-profit, grassroots initiatives like EleutherAI, BigScience, and BigCode. Following the charge of the French government to fund the digital commons to support open model development [59], policymakers may use such data to identify non-commercial projects that could be supported. Overall, the data points reported in “Development activity on the HF Hub” section could help policymakers assess the real-world impact of open models and develop appropriate governance frameworks to maximise their benefits while mitigating potential risks of open models.

Threats to validity

We evaluate the validity of our findings by following guidance for empirical software engineering research [114, 136].

Construct validity

Construct validity concerns the extent to which a measurement accurately assesses the theoretical construct it intends to measure. Our study aimed to measure typical patterns of development activity on the HF Hub, but we acknowledge several threats to construct validity. First, our analysis is limited to activity in public repositories and does not account for collaboration in private repositories. Second, download counts have a few limitations: they are limited to the past 30 days, download counts may be incorrectly reported (e.g., if the repository lacks a configuration file or if the model is used on-device versus in continuous integration), and dataset downloads are limited to the count of load_dataset() calls [115, 116]. Third, our operationalisation of collaboration relies on commits to model repositories, assuming that the co-occurrence of commits indicates collaboration. However, this assumption may not always hold true, especially in large repositories where developers may work on independent tasks. Future research could operationalise collaboration on specific files and quantify the relative contribution of developers to specific files [80]. Furthermore, this analysis is limited to snapshot of the HF Hub developer community in October 2023, which does not capture the dynamics of collaboration and activity over time, which should be considered in future research, as discussed in “Recommendations for future research” section.

Internal validity

Internal validity concerns the extent to which a study can confidently attribute the observed results to the investigated variables, minimising the influence of confounding factors or alternative explanations. As explained in “Username merging” section, there may be a slight inaccuracy in the enumeration of community size per repository and the number of developers included in the collaboration networks due to discrepancies in username data, such as multiple accounts or usernames per developer. This is a common problem in OSS research, and there is no perfect solution to username merging [77, 121, 125]. API limitations prevent the use of methods that incorporate user metadata for username merging [28, 137]. For example, we rejected 34 username pairs due to insufficient evidence to confirm the match with confidence.

External validity

External validity concerns the generalisability of the findings. While the HF Hub has gained significant popularity, it is important to acknowledge that there may be other platforms where open model development takes place and that our findings may not generalise to those platforms. Future research could explore collaboration practices across different platforms to provide a more comprehensive view of the open source AI ecosystem. That being said, we observe that development activity on the HF Hub is characterised by the Pareto principle, conforming with OSS development patterns on platforms like GitHub [24,25,26,27,28]. Another threat to the external validity of the findings concerns the analysis of model usage. While there were as many as 156,642 spaces at the time of data collection, they do not represent the use of open models beyond the HF Hub platform, thus limiting the generalisability of our claims, with the exception of finding a strong positive correlation between likes of model repositories and their usage in spaces (\(\rho = 0.66\), \(p < 0.001\)). Future research could address this limitation by exploring other sources of data on model adoption, such as academic publications, industry reports, or user surveys, to triangulate the findings.

Reliability

Reliability refers to the consistency and reproducibility of the study’s results. To enhance the reliability of our study, we have uploaded the Python scripts used for data collection and processing to a public GitHub repository [119]. Due to privacy and ethical considerations, we do not share the raw data (see Data Availability statement).

Conclusion

The burgeoning open source AI ecosystem has become a focal point of discussion amongst AI researchers, developers, and policymakers. This study offers empirical insights on practices in this emerging ecosystem via a quantitative analysis of development activity on the HF Hub. Concretely, we make three empirical contributions to the nascent research agenda on open source AI. First, we find that various types of development activity, from likes and downloads to discussions and commits, across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. In addition, activity and engagement is highly imbalanced between repositories; for example, over 70% of models have 0 downloads and 1% account for 99% of downloads. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of highly prolific developers and a majority of isolate developers (89%) who do not collaborate with others. However, collaboration is characterised by high reciprocity and low levels of assortativity regardless of developers’ social positions in the HF developer community. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models are widely used and developed by a handful of industry-leading companies, which signifies the concentrated influence of a handful of actors in the HF Hub ecosystem. These findings are a timely reminder that open source AI is not immune to the influence of dominant industry leaders [21]. We conclude with a discussion of the implications of our findings and recommendations for AI researchers, practitioners, and policymakers, with the hope that the practices in open model development can be more deeply investigated in the future.