The many faces of documentation
Defining multiple forms of software documentation
A major issue around documentation is that it has several definitions. In our interviews, “documentation” was used to refer to a broad set of textual resources, rather than a single kind of text. From those interviews, as well as observations during the Docathon, we identify several major types of documentation. These types are not mutually exclusive categories, but they often have different intended audiences, conventions for presentation, skills needed, and formats for distribution. We list each in Table 1, and discuss them in more detail below.
User documentation, also called narrative documentation, typically gives a broad, high-level overview of what the library is intended to do, how to install it, or how to use it (see Figure 1). It is typically not an exhaustive list of everything the library does, and often targets new users or those who are still deciding if they want to use the package. It may include material that is not kept in raw text files (e.g., Jupyter notebooks, repackaged presentations, or videos), but is generally officially created by a project’s developers and hosted on the project’s webpage/repository.
Galleries and examples generally lack high-level motivations and structure, and instead present a short and specific outcome generated by a single block of code (Figure 2). They are typically created officially by a project’s developers and hosted on the project’s webpage/repository. Because galleries and examples are self-contained code, it is possible to run this code when the documentation is built in order to generate output figures (using a framework like sphinx-galleryFootnote 1).
API documentation (sometimes called “docstrings” in the Python community) is text included in code comments at the beginning of functions or methods (See Figure 3). API documentation has a specific structure that can be parsed by libraries (such as Sphinx or Doxygen) which render it into structured output, like HTML pages. It typically includes a brief, high-level description of what the function does, followed by more structured information about the parameters the function uses. Many development environments (such as Jupyter Notebooks or RStudio) can interactively render API documentation to users and developers.
Non-traditional documentation While the types of documentation mentioned above are most common and well-defined, there is a wealth of unofficial or unstructured material on the internet that several interviewees mentioned. This includes content distributed with the code itself, such as well-written error and warning messages. It also includes distributed content that isn’t created by the project core contributors, such as blogs, community Q&A sites like StackOverflow, or Jupyter notebooks. As these sites often rank highly in search engine queries, they are an important venue for learning and instruction. However, as documentation, they remain ad-hoc, unorganized, and rarely under the editorial control of a project’s developers.
Relationships and tensions in definitions of documentation
In practice, there is an interplay between the above-mentioned types of documentation. One of the Docathon organizers described the difference between the three major kinds of documentation as ranging “from the most zoomed in to the most zoomed out” (Docathon organizer 2) – (from API documentation, to examples and galleries, user/narrative documentation). One of the Docathon participants drew acomparison between atextbook versus adictionary:
[there is] astatic version that basically says, ‘Hey, this is aproject. Here is what the project is meant to accomplish. Here is the project, then type this, the project starts, and then it can do this, this and this’ […] You basically can go through tutorials like you would read abook that tells you how to do statistics or how to do something else.
The second kind of documentation […] is basically, ‘Here is a list, alphabetical order, or another order, of all the things the project can do. If you want to know how to use a function in particular, how aspecific piece of code, you go to this subsection.’ And this subsection will often be relatively short and tell you why and how it can be used and what it is related to, more like adictionary. (Docathon participant 7)
These types of documentation can co-exist with one another, but they can also introduce tensions within the developer team and broader community. At the highest level, tension arises from an imprecise definition of what documentation means to a project. If someone is told to read “the documentation” or says that “the docs” need improvement, it can be unclear which of the above-mentioned types they mean. An additional tension arises when a project does not diversify the types of documentation they provide. As as each type of documentation has different goals, scopes, and audiences, conflicts can emerge if documentation is exclusively imagined as one of these types. Many interviewees noted that it was important for a software project to have good documentation across many different levels, bringing up examples where projects needed to work more on one specific type.
While we found that these kinds of documentation were often conceptually clear for interviewees, they were sometimes combined and merged in practice. One Docathon participant discusses the documentation for asmaller project they work on, where documentation takes the form of asingle README document, with different subsections that do different kinds of work:
[our software package] doesn’t have avery thorough documentation, just aREADME, but it’s amix of everything. It’s like, high level motivation, it has specific examples, and it has, how to install this thing, how to run it. It’s kind of avery technical thing, so it targets more […] hardcore developers. (Docathon participant 5)
Successful projects intentionally adopted a broad definition for what it meant to “contribute to documentation”. For example, the Docathon’s organizers opened the week with talks and tutorials that introduced the different types of documentation and discussed best practices for writing each type. They then encouraged participants to choose whatever definition they liked for the week. In our interviews, Docathon participants frequently made implicit and explicit use of these distinctions when talking about their work. Many chose to specialize in one particular type of documentation for the week, but each of the three major types of documentation had at least one person working on it.
Interviewees discussed moving between these formats. For example, one participant created anew static tutorial using slides they had created for an in-person bootcamp they taught. When asked to talk about examples of good documentation, many participants also praised tutorials that collectively worked as textbooks for abroader conceptual topic (such as machine learning). As one of the Docathon organizers stated:
If you get enough of those tutorials together, then the documentation becomes some sort of […] textbook […]. It’s like acollection of tutorials that will cover the space of ideas that this package cares about, which is, in my mind, something different from just having acollection of random tutorials because it starts to resemble something that’s more similar to atraditional academic textbook or whatever. (Docathon organizer 2)
Roles of documentation
One reason for identifying the many types of documentation described above is that they are linked to the diverse roles that documentation plays in the community, as well as the end-user to which documentation is directed. Most software projects have many different kinds of documentation at once, each with its own relationship to the community.
Interviewees discussed how documentation helped with a variety of tasks, including: facilitating learning and education, giving a project publicity, serving as a signal of health, serving as external memory or a living document, facilitating testing and verification, onboarding newcomers to open source projects, and facilitating collaboration between developers. We found that interviewees highlighted subsets of these roles for documentation in their projects, though they were not usually cleanly tied to a single type of documentation. Because these roles are often hard to identify or define, tensions can emerge between community members who may have different expectations about what documentation is meant to do. The following sections describe common roles that documentation plays in the community.
The most agreed-upon role for documentation is as apedagogical resource for people to learn how to use apiece of software. In this role, different types of documentation can be targeted at different audiences: an expert may use it to look up the details of a function, while anovice may need to look up whether such a function exists at all. The goal of learning was frequently contextualized and specified by interviewees, who discussed different kinds of learners and stages of the learning process. An often-imagined audience of documentation was someone searching for apiece of software to help them do aparticular task:
You can imagine auser, with some sort of need, Googling around trying to find some sort of software to do what they want to do. Then they happen upon software and try it. There’s this patience period that probably is something like five minutes, during which they may try asoftware. Then it might not work, probably won’t work. Then if there’s no documentation to help, that user is basically lost for that software project and will say, “I tried that but it didn’t work.” You need documentation, like ideally of everything but especially of the very beginning of, to create aminimal user experience and have it in the documentation how to set the thing up and how to do the thing that it’s supposed to do. (Docathon participant 6)
Interviewees frequently discussed forms documentation like tutorials or galleries as intended for new or potential users, while API documentation and docstrings were for those who were already using aparticular piece of software. One Docathon participant discussed these differences in answering aquestion about how they use documentation in their own day-to-day work:
I use the docstrings all the time, alot of this through interactive work […] even for simple things like, what is the order of the arguments of this function? […] Examples are pretty useful when Iget started with things with the new software that I haven’t used before. […] I was looking around for software to model, do statistical modeling of longitudinal studies. Istarted looking at […] a Python project, and Iwas actually bounced off of that because there were very few examples, none of which looked like what Iwas trying to do, so Icouldn’t get that. (Docathon participant 9)
Publicity/signal of health
The above quote shows the overlap between documentation as aresource for learning and asecond role: as an advertisement for the software project. In much of the open source software ecosystem, there are overlapping and competing projects, and we frequently heard mention of documentation as areason for deciding which project to choose. This was true both from end users (who discussed deciding about whether to use apiece of software based on its documentation) as well as project maintainers (who discussed improving documentation in order to recruit new users). One of the Docathon participants discussed making such decisions as auser, which was an unprompted response in alonger answer aquestion about how they use documentation:
I love documentation, Iuse documentation all the time. In fact, it’s certainly the case Idecide whether to use aproject or not based on the quality of the documentation […] If I’m looking for alibrary that does something and Ihave, you know, five libraries, there are different criteria that Iuse to decide which one I’m going to use but quality of the documentation is certainly one of them […] (Docathon participant 5)
One of the Docathon organizers discussed this issue with asoftware project they help maintain, where the team had previously worked to overhaul the project’s documentation:
that was the biggest scale project that Iworked on in terms of documentation […] it was clear by the end of it that when you looked at the website after that overhaul had happened […] there was aclear value added to the project. Even though none of the code of the actual project itself had really changed, it was just, again, the presentation of the ideas surrounding that code base. It made it much easier for me to discover other parts of the package that Ihadn’t learned about already, and also made it much easier for me to pitch it to somebody else if Iwas like, “Hey, you should try [software] to do this stuff.” When Icould show them that website, it was clear that the project was well-constructed and well-managed and had its act together. (Docathon organizer 2)
Institutional memory/living document
Many of our interviewees are long-standing participants in open source software projects, including several who have spent years as core maintainers of projects with dozens or even hundreds of contributors. In such projects, documentation plays an important organizational role as a living memory for the project that records every change made. We often heard from interviewees about how projects are difficult to manage at such large scales without good documentation practices. For example, core developers mentioned having difficulty remembering what changes had been made after a dramatic refactoring of the code. Interviewees also spoke about several cases where an old feature was unused because there was no official documentation written about it, and the only way to discover its existence was to look through the code itself.
One Docathon participant (who is acore developer for several software projects) was asked why they write documentation, responding first by saying: “Well, one, because Iforget how things work. That’s the most valuable thing from my point of view.” They then discussed the importance of “making the product usable.” (Docathon participant 5). Another Docathon participant (also acore developer for several software projects), when asked about how much documentation they write, stated:
I’ve been doing it more and more recently … I care more and more because Icome across more and more things I’ve written acouple of years ago, and Ihave no clue what Ifricking wrote (Docathon participant 7).
Reference point for collaboration between developers
Aside from serving as institutional memory, we found that documentation also facilitates collaboration between developers of a project. Many OSS data analytics libraries are modular collections of different functions that are developed relatively independently from each other (compared to more traditional software applications). As documentation summarizes the overall design of a feature, module, or function, some interviewees spoke about how good documentation can be a useful reference point for developers to communicate their ideas and intentions to one another. For example, one interviewee who maintains a large, complex project (both in terms of number of contributors and number of features) discussed how existing API/reference documentation is sometimes referred to in discussion threads about proposed new features or refactoring existing features. They noted that because many developers restrict their contributions to a small part of the library, discussions about large-scale changes to the code are facilitated by linking to the API/reference documentation. They noted that when attempting to describe where changes to features/APIs should be made, “if there’s already relatively complete documentation, that’s very easy to describe in a single email” (OSS contributor 10)
Documentation was also described as an important part of coding itself, particularly in testing and verification. This takes several forms, starting with API documentation as away for developers to externalize their intentions by describing what they want a function to do. Some interviewees had established practices of writing documentation as they wrote code. One compared this practice to unit testing, which is used to ensure that key functionality of the package had not changed. As one of the Docathon organizers explained:
… it’s trying to give the user an intuition on what the method does. […] it also allows me to make sure that Iunderstand exactly when the method works and when it doesn’t work. […] it also allows us to check that the API is nice, and it’s also avery simple way to check that the method works. So this is also very common in research, you just look that things make sense, and sometimes you don’t really get this information when writing unit tests. (Docathon organizer 1).
Onboarding newcomers to open source projects
Finally, a major auxiliary role that documentation plays is as a way for newcomers to contribute to open source projects. In open source software communities (both in and out of the data analytics context), documentation has long been discussed as a kind of low-risk, entry-level task that will help newcomers gain familiarity with the project—a model long discussed by scholars of legitimate peripheral participation (Lave and Wenger 1991). In fact, the Docathon organizers reported that one of the key reasons for organizing the event was to connect open source software projects in need of documentation work with people who wanted to get involved, but were unsure how.
Documentation work is seen as agood task for onboarding, because newcomers can work through the process of submitting changes for review (e.g., aGitHub pull request) without having to also advocate for achange to the codebase. Another reason for newcomers to work on documentation is that they are often in the best position to know what is confusing, unclear, or important to someone new to the project. However, it should be noted that having afresh perspective is often atrade off with being able to contribute high-quality documentation in line with the project’s standards and goals. One of the Docathon participants, who also is acore contributor to many open source software projects, summarized some of the major benefits and drawbacks:
Interviewer: One thing that some people have suggested is that documentation is agood place for people who are new to open source to get started. How do you feel about that?
Docathon participant 7: Iwould agree and disagree. Iwould agree because it’s relatively easy to start contributing to. You don’t need to understand the code. It’s really nice when you’re new to open source, and you need to understand the process of submitting patches. You don’t have this overhead of thinking about, “Is the code I’m writing correct? Ican focus on the workflow.” […] It makes it great also because if you’re new to aproject you have the views of newcomer, and so you realize what is not of use from the documentation […]
The problem is, to write good documentation you need to already have, usually, Ithink, relatively good knowledge of the project, because you need to understand how pieces are intertwined. […] And how they interact with each other and what are the useful and useless information or the thing that may be missing. Which, by definition, someone who is new to aproject cannot know. At the same time, once you’re familiar with the project, you don’t see anymore what’s needed for anewcomer. So it’s both the right place and the wrong place to start in my opinion.
One Docathon participant had used open source tools, but had never contributed to an open source project. They to the came to Docathon specifically to start contributing, and reported agenerally good experience:
Docathon participant 4: Ithink that the Docathon was agreat low-barrier way of getting acquainted with how it all works. […] docs are something that anyone can sort of critique and improve, even if they don’t necessarily have adeep knowledge about the code base.
However, we also found pain points and lessons learned in using documentation for onboarding newcomers. Some Docathon participants who were newcomers to a project were not able to easily know what tasks needed to be done, and did not want to make substantial changes to documentation without specific guidance. While changes to documentation are often easier to get approved than changes to code, interviewees recounted many conflicts over documentation, including those involving newcomers (see Section 2.3.3 on documentation standards). Several interviewees discussed having to go back and forth with what was presumed to be a non-controversial update to documentation, sometimes waiting days to get a change approved. In some cases this was because of a lack of input from the core developers, in others because of direct disagreements over how the contribution should proceed.
In all, we found that documentation can be a productive and low-barrier way for newcomers to contribute to open source software projects, but we emphasize the need for projects to actively support such forms of peripheral participation. While we leave a systematic study of onboarding for future research, we find more cases of success with projects that have a well-developed culture of documentation, where there are clear and agreed-upon standards for documentation, active review procedures, and where most core contributors to code also contribute to documentation (we also discuss tensions around who does documentation work later). Our interviewees mentioned several projects that have such qualities (with relatively consistent answers), as well as many more that did not.
Skills and barriers around documentation work
Developers and users of data analytics OSS libraries generally acknowledged that software documentation is important, yet documentation is routinely either not written or not kept up to date. Like all the issues and tensions discussed above, there is a spectrum: some projects have little to no documentation in any form while other projects highly prioritize documentation and integrate its development into community practices. However, documentation seems to be consistently under- maintained in such projects, by contributors’ own standards. For example, a recent questionnaire asked contributors to open source scientific Python libraries to state what percent of their time they think should be spent on documentation, versus what percent of their time they usually spent on documentation (Holdgraf and Varoquaux 2017). There was a general distance between these two responses, reflecting a belief that open source developers felt they should spent more time on their project’s documentation (See Figure 4). These findings are further supported and contextualized by our ethnographic and interview research, in which contributors routinely discussed a wide range of issues around why documentation work was both a personal and collective challenge. In the following section, we discuss skills involved in documentation work, technical barriers that often exist for contributors, and issues around standards and quality for documentation.
The most straightforward barrier to creating good documentation is having the skills to do so. Contributing code to open source software requires a specific set of skills: knowledge of the programming language used, version control, and other practices in software engineering. Writing, contributing, and reviewing documentation often requires not only these skills but also an additional set that are often not taught in traditional software engineering. These include communication skills, creative writing, empathy, and good knowledge of the English language (which for some contributors may not be their native language). Below we identify several skills important to documentation.
Like all skills, individuals must have both have the skill itself and self-efficacy, the belief that they have such acompetency. Throughout the interviews, many participants expressed that they lacked the correct skills to write good documentation for their own software.
I don’t know many people who enjoy writing documentation. Ithink one of the reasons being it’s not askill that we learn very well, so Ithink alot of us feel that it’s not something we’re good at. If we have been feeling different, that we’re good at it, probably we would enjoy it more, but it’s sort of apainful process to do. (Docathon organizer 1)
Some interviewees expressed atension around their status as advanced users and developers, as documentation is often seen as used primarily by novices. One concern expressed by some interviewees is alack of empathy, as documentation work involves putting oneself into the user’s shoes, and advanced developers may not know what their audience actually needs:
you need to have avery good sense of who your audience is, and what you need to tell them when. […] The biggest problem is that what Ineed in documentation is not necessarily what someone coming to the library using documentation does. Imay be lacking sufficient empathy to write what newcomers need. Whereas anewcomer probably still remembers what they didn’t know yesterday and can write the docs with that in mind. (Docathon participant 3)
Interviewees discussed the importance of various communication skills, which go far beyond the skills required to fix abug or write anew feature. For example, many interviewees felt that far more English skills were required to write documentation than to write code or even informally interact with others in the project. Several of our respondents were not native English speakers, and many respondents said that they had observed this barrier in projects:
It [writing documentation] actually requires writing much more English than code requires. They don’t necessarily feel as competent to do that, or they ask for help. (Docathon participant 9)
While being competent with alanguage is apre-requisite in community interaction, it is not enough to guarantee strong documentation. It is also important to be able to communicate ideas in an easy-to-understand manner. Documentation is intended to be public material to be read by an audience, and many of our respondents emphasized how storytelling and creative writing skills were highly important for documentation:
Creative writing is important, to enable search to boil down whatever are the key features of the software, and also what the science of the software is doing, down to clear explanations (Docathon participant 9)
Knowledge of software to be documented
Finally, interviewees discussed how documentation contributors also need a good working knowledge of the software library being documented (and the concepts behind it) in order for the documentation to be accurate, precise, and concise. This does not refer to the technical barriers of participating in open source software communities (e.g. how to use GitHub), which we discuss in the next subsection. This can be in conflict with an increasingly popular trend in some open source software communities in which newcomers are encouraged to write documentation before contributing code. As we previously discussed, it is important to understand how the process of writing documentation is a collaborative effort between experts and newcomers.
In addition to the skills involved in writing documentation discussed above, there are often substantial technical skills required to contribute this work to an open source software project. Projects frequently store documentation in the repository they use to store code, requiring a working knowledge of version control and online code repositories like GitHub. While contributing documentation is an increasingly popular onboarding mechanism, it often challenges new users with skills and workflows with which they are not familiar.
Furthermore, with many forms of API documentation (like docstrings), the documentation text is stored as comments in the code itself. This means that contributing documentation typically follows the same complex process and workflow as contributing code: downloading the code repository, installing it on one’s own computer, adding or editing the documentation text, running tests to ensure the new changes do not introduce bugs, creating a patch in a version control system, submitting that patch via the project’s preferred platform, waiting for someone in the project to review it, responding to any questions, and iteratively improving the patch as necessary so that it matches the project’s contribution guidelines or reviewers’ expectations.
For many potential contributors to documentation, these technical barriers pose a significant problem. We identified two kinds of technical barriers, which our interviewers either personally experienced or witnessed in cases of newcomers to a project:
Using open source software platforms
Projects use many platforms, tools, and practices to manage their workflow of contributing code and documentation, each of which has its own learning curve. For example, newcomers must learn how a project uses a version control platform like GitHub and continuous integration platforms like Travis CI to submit, review, and incorporate changes. Furthermore, projects may also have differing community norms around contributing code (such as whether to rebase code before merging new contributions). As one interviewee noted, “there’s not always consensus within the field about the right way to use those tools (Docathon organizer 2).”
Using documentation-specific tools
There are also challenges in learning the tools that are specific to writing and building documentation. These tools require text to be formatted and structured in specific programmatic ways, which are often idiosyncratic to someone who isn’t familiar with the tool. For example, putting the same information about a function in a python docstring can require writing different structured text, based on what tools are being used to automatically parse the text. The two code blocks below are docstrings that illustrate the difference between two popular formats: numpydoc and “Google-style”:Footnote 2
Some interviewees expressed concerns around technical barriers to newcomers, though noted that documentation is still often agood first-contribution for many people. Contributions to documentation generally will not ”break” anything crucial in the package, are relatively easy to roll-back if an error is made, and provide an immediately-apparent contribution. One of our interviewees was anew contributor to open source software projects and the GitHub platform, and discussed their experience:
And Ilearned alot more about GitHub. Inever had squashed or rebased before. Or I’d never really used branches correctly until that experience. So, Ithink it definitely made me better at using Git and … a little more understanding of how open source is and like, the faces behind all the GitHub handles (Docathon participant 4)
Standards, quality assessment, and validation
One struggle many interviewees expressed around contributing documentation to open source software is the lack of standards and validation criteria for documentation. For example, in the previous subsection, we identified different documentation formats as a technical barrier. However, the many options for tooling introduces social challenges, as a there are widely differing opinions across communities on which standards should be used.
What constitutes good documentation is often contextual to various uses and goals, subjectively interpreted by different people, and left underspecified in community norms. This can especially be the case with tutorials, user guides, and other user/narrative documentation, rather than the typically well-structured and narrowly-scoped goals of API/reference documentation. Interestingly, some interviewees indicated that it was more difficult to contribute to user/narrative documentation (like tutorials or user guides) and much easier to contribute to examples or API documentation, which is generally highly structured. As one interviewee stated:
Docstrings are supposed to be pretty terse and straightforward and those I’m not worried about doing on volunteer effort. Because again, basically, you take all the voice out and they say this is what it does, these are the parameters, this is what it returns. (Docathon participant 3)
In contrast to API documentation, user/narrative documentation can be complex and written with various narrative voices, points of view, or tenses. They have varying levels of structure, formality, and background knowledge assumed. They may also have inconsistencies in the author’s tone, such as using humor or not. Consistent style and structure of documentation within a project was frequently identified as both an important property of good documentation as well as a major organizational challenge for open source software projects. Contributing user/narrative documentation can lead to long debates on details that have no one correct answer – often referred to as “bikeshedding” in OSS culture as inspired from Parkinson’s law of triviality (Parkinson 1957).
Several interviewees discussed difficulties in getting pull requests around documen in generaltation accepted. One interviewee discussed frustration with getting their documentation contributions blocked because project developers objected to text they felt was “more like an opinion” (Docathon participant 4). Interviewees also mentioned “bikeshedding” around documentation. However, some stated that in some projects they felt it was easier to contribute documentation than code because it is “written rarely enough that people are very grateful that someone actually did that” (Open source contributor 10).
These tensions align with CSCW literature on conflict, particularly Hinds and Bailey’s (2003) framework of task, process, and interpersonal conflict. Task conflict is when people disagree on what tasks ought to be done, process conflict is when people disagree about how the tasks ought to be accomplished, and interpersonal conflict centers around interpersonal relationships and interactional norms. These types are not mutually exclusive and one form of conflict can turn into another, but they help specify and distinguish different kinds of issues. We should also note that these issues are not unique to documentation, as they also frequently arise over code contribution.
Motivations for doing documentation work
Even if technical and social barriers were minimized in contributing documentation, an individual must still be motivated to do so. Another major theme in our interviews centered around incentives and credit (or the lack thereof) for doing documentation work. Our interviewees all believed that documentation was important and valuable for their projects, but there was a range of attitudes toward doing documentation work. In line with previous theoretical literature (Ryan and Deci 2000), we found it more useful to put interviewees’ expressed motivations on a spectrum between fully intrinsic (where the task is seen as its own reward) to fully extrinsic (where the task is done for an external reward)—rather than see intrinsic/extrinsic as a binary. In Table 2 we outline Ryan and Deci’s six kinds of motivations and give an example of each in the case of documentation work. We find it crucial to discuss motivations for doing documentation work in relation to motivations for other work in the OSS project, especially developing code. Most of our interviewees stated that documentation work in general was substantially less inherently enjoyable for them than developing code, which we discuss in the first subsection. In the second subsection, we discuss structural factors impacting motivation which differ between OSS projects, like project rules requiring documentation work or the level of credit/recognition for such work in the project.
Do contributors enjoy doing documentation work in general?
A large majority of our interviewees stated that documentation work is not as enjoyable for them, in the way that coding new features or fixing bugs is. This aligns with previous survey work finding that scientific open source software contributors enjoy tasks like writing code and fixing bugs far more than both writing and reviewing documentation (Holdgraf and Varoquaux 2017). Interviewees routinely used phrases like “eating your vegetables” or “bite the bullet”, discussing how they felt it was important to write documentation for the good of the project, but that it was something they had to force themselves to do. Many of these interviewees also stated that this was a shared attitude among their peers, both in their own OSS projects and across OSS in general. “We all hate writing documentation” (Docathon participant 5), one interviewee stated matter-of-factly, adding that they were drawn to the idea of the Docathon because they felt it would facilitate some “team spirit” around a task that many people had neglected.
Several interviewees explicitly linked the issues around writing documentation to the contributor-driven nature of open source software development, stating that contributors contribute primarily to satisfy their own needs, making documentation asecondary goal. Several interviewees expressed what we call the “paradox of documentation:” those who know enough about the project to write documentation are the least in need of it:
[writing documentation] is actually super hard. It’s not super rewarding … most people don’t get the dopamine kick from writing documentation as implementing anew feature, right? Whether [you are adding] anew feature or you have aproblem and you have fixed it, right? And the whole ‘scratching my itch’ aspect of open source typically means if you’re working on something, you’re working on something because it’s bothering you. And you made it better and you’re happy. Whereas with docs, the docs don’t help you at all, because you know what they said because you wrote them. (Docathon participant 3)
However, two of our eleven interviewees (one Docathon organizer and one Docathon participant) did discuss the act of writing documentation as a creative process with high intrinsic enjoyment, similar to how they feel when writing code to develop new features or fix bugs. However, both of these interviewees also reflected that their attitudes were different than most in their communities. Both also stated they enjoyed and had extensive previous experience in other forms of writing, as well as having high competency in the English language.
Finally, a smaller number of interviewees expressed receiving strong levels of satisfaction from completing documentation tasks, such that they regularly performed such work — even though they did not generally inherently enjoy the task itself. One such interviewee made comparisons between documentation work and other forms of infrastructural and/or community support work that do not typically involve writing or fixing code directly, such as maintaining the build systems which automatically compile code to see if it runs on a variety of systems. They discussed their motivations to contribute to open source in general in terms of what would make the most impact, with work on documentation, build systems, releasing stable versions, and other meta-work having the biggest “return on investment” (Open source contributor 10) of their time.
Structural factors relating to motivation
Despite not inherently enjoying doing documentation work, most of our interviewees freely choose to do it without needing to be paid, forced, or shamed into doing it. We explore these different valences of motivation next, finding that motivation deeply intersects with various projects’ specific organizational structures, cultural norms, as well as the peer production model of OSS projects. We find four structural factors that relate to motivation around documentation work: rules/policies requiring contributors to do documentation work, funding to pay contributors to do documentation work, contributors’ feelings of responsibility toward users of a project, and the extent to which documentation work is valued and respected by other contributors as much as more ‘technical’ work like writing code.
Rewards and rules
One of the most common extrinsic motivations to doing documentation work is being either directly paid to do such work or being required to do it in order to participate in the project. Some interviewees discussed projects that needed substantial overhaul in their documentation and hypothesized that it would only be done if someone was paid specifically to work on documentation. More foundations and grant agencies are awarding grants to specific OSS projects, particularly OSS libraries used for data analytics. Some grants awarded by funding agencies to support open source data analytics software projects specifically include documentation work as part of the tasks that will be done by those hired under the grant (Perez and Ganger 2015)—an emerging phenomenon that future research should investigate.
Interviewees also referenced projects which have documentation requirements as part of their rules around “pull requests,” which is the process for submitting new changes to the codebase. Many open source projects in the data analytics ecosystem have increasingly standardized code review processes, especially for new features/functions. Much of the requirements are more code-focused, such as unit tests, conforming to code style guides like PEP8 in Python (Sharma et al. 2017), and passing a continuous integration check. Some of our interviewees discussed OSS libraries that have also added documentation requirements for new features, such that the project’s code review rules do not permit a new feature to be added unless until it is documented (typically with API/reference documentation).
Responsibility to users
We also heard many introjective motivations framed around responsibilities to users of the software library. Several of our interviewees discussed that an implicit responsibility of contributing to open source software project is in receiving requests from those who use the software. Even though there is no formal, contractual obligation to provide support, previous literature has discussed how open source software contributors feel obligations toward those who use software they have released (Lakhani and Wolf 2005; Kelty 2008). We similarly identified such issues in our study, particularly for more specialized libraries, which are common in the data analytics and scientific computing ecosystems. Some interviewees discussed projects where they were the only regular contributor and point of contact, regularly receiving direct requests from users. One sub-theme in interviewees’ expressed motivations was around the perceived time it would save responding to questions from users on amore ad-hoc basis. Several interviewees referenced cases where either they or someone they personally knew (who did not enjoy writing documentation) ended up writing documentation because they were constantly receiving questions from users about how to use the software:
The way the documentation got written there was the following:… they would send me an email [asking] … how do you use it? So Iwould write alittle explanation of how to do things. And after like, the fifth email, Iwas like “Well, maybe Ishould just make this apage.” And once Imade it awebpage … well, maybe Ishould write alittle bit of API documentation and alittle bit of examples and so forth. And so, it was very kind of organic to where Igot sick and tired of writing emails and Ijust put up apage. (Docathon participant 5)
Recognition and credit
In larger and more popular OSS data analytics libraries, there are often dozens of regular contributors, and we found acommon theme around community attitudes for documentation work. Many interviewees who regularly contribute documentation to such projects stated that they did not feel like they received same levels of positive community feedback for documentation work as they did for adding new features or fixing bugs. Acommon perception was that documentation work was perceived as being less valued, less important, and less “technical” than coding new features or fixing bugs. Participants discussed how documentation of anew or changed feature—which typically takes place after the coding work is complete—would often be de-prioritized, with developers moving on to other more “critical” tasks. This also varies from project to project, and some interviewees who contribute to multiple projects painted differing pictures of how much they believed these projects valued, respected, or even required documentation work. One interviewee discussed this perception, also raising issues with the gendered aspect of such work, stating that they did not want documentation work to be disproportionately performed by women—a theme long discussed by scholars of infrastructural and invisible work:
One of the reasons open source documentation isn’t great is it’s definitely not viewed as as sexy as writing code. It’s definitely viewed as less technical by some people […]. And it’s definitely viewed as less important by some people […]. But that just kind of ties into the whole, you know, the trends everywhere of shunting women to work that is less valued by the community, type things. (Docathon participant 3)
In contrast, asmaller number of interviewees did feel like people in their projects were quite thankful when they wrote documentation. One explicitly referenced the perception that documentation is not valued as much as code, then took issue with it:
I think there’s this common perception about things that are not code … let’s say, documentation, is less valued than code. And especially people that write exclusively documentation are less valued than people that write also code or exclusively code. … I would say that in general, this value system is not really true. Ithink on average, I’ve got way more positive responses on documentation contributions rather than code contributions, and Ithink that’s true for other packages as well, because people do understand the value of documentation and especially because they don’t like doing it, they’re especially appreciative if you do it … (Open source contributor 10)
These sentiments are likely to differ across projects (or as one interviewee noted, not so much between individual software projects, but between groups or ecosystems of projects that share the same developer community).