Understanding massive artistic cooperation: the case of Nico Nico Douga
The relation between popularity (in terms of view) and influence on the cooperation process.
The specialization of creators.
The organization of the network of citation.
KeywordsSocial network analysis Massive cooperation Artistic cooperation Nico Nico Douga Peer production Online social networks
Online social networks (OSN) have attracted a lot of attention from scientists of many different fields. By the large quantity of data about humans behavior that they make available, they allow to study interactions between individuals in a way that was not possible before. Among the tremendous amount of work published, we can cite works on information diffusion such as (Bakshy et al. 2011; Yang and Counts 2010; Morales et al. 2014), structural properties (Amaral et al. 2000; Clauset et al. 2009; Savage et al. 2014), community structures (Leskovec et al. 2009), influence (Cha et al. 2010), and so on and so forth. A wide variety of platforms have been studied, including Facebook, Twitter, Wikipedia and many others.
In this article, we work on a particular process, massive artistic cooperation, which has not been studied before, to the best of our knowledge. More specifically, we study the creation process of complex music videos in a platform called Nico Nico Douga. Our goal is to better understand the emergent organization of this massive artistic cooperation, how it takes place, and what roles do different users play in it.
In the first section of this paper, we will introduce what is massive artistic cooperation, and how it relates to other forms of collaborative creation. In a second section, we will present our dataset, what data we extracted and how. In the third one, we will study what is the relation between the popularity of a video—the number of time it has been watched—and its importance in the creation process. The fourth section investigates the specialization of actors, one aspect of the self-organization. Finally, the last part concerns the network structure of the creation process, in particular how different categories of creations are related to each other, and what different roles can videos play in it.
2 Massive artistic cooperation
The processes we want to study compose a phenomenon of artistic cooperation. In this section, we will describe what are the particularities of such a creation process, and in particular why do we speak of cooperation, and not of collaboration. We will then discuss about the specific case of artistic cooperation, and why it has not been widely studied until now.
2.1 Massive cooperative creation
A distinction is usually made between two forms of collective creation: collaborative and cooperative. In the collaborative one, the actors involved are conscious about being part of a same process, and of a common objective. These actors are able to communicate, and often use a centralized approach to distribute tasks among themselves. A good example of a collaborative process is the conception of a car, or a plane: many actors are involved, and tasks are distributed among them, but the meetings and the communication in general is essential to reach a final, collective result.
Cooperation, on the contrary, is when many actors or groups of actors are involved in a same process without being formally in contact with each other, each of them following its own personal objective or interest. They can use the productions of other actors in the way that suits their needs, and there is not central organization or planning, but instead a form of self-organization. Actors do not act against each other, but do not act for others either. A well-known example for the reader is the one of scientific production: researchers often are inspired by—or directly use—the work of many different authors that they often do not know personally, and with who they are not in direct communication. Despite this, the science progress in many directions and in an efficient manner.
We speak of massive cooperation when a large number of individuals are involved in the creation process. Because of the cost in time and cost of communication, collaboration processes become less and less efficient while the number of individual increases. Cooperation, on the contrary, can exist both on small scales and large scales. As both situations might have different properties, we use the specific name of massive cooperation to describe a cooperation process involving a large number of users. Although there is no well-defined threshold for this definition, we can consider that a hundred of users at least are necessary to speak of massive cooperation.
On the Internet, new forms of collective work have emerged. In the social sciences, this kind of production have been studied under the term of Peer production (Benkler and Nissenbaum 2006; Wilkinson 2008; Duguid 2006; Haythornthwaite 2009), or Open collaboration (Forte and Lampe 2013; Riehle et al. 2009). Behind these terms, which can represent many kinds of different things such as wikis, crowdsourcing, open source development, and many others, is the idea of a collective production in which everyone is free to join and to contribute. Many factors have been identified as important in such creation, for instance the turnover of individuals or the presence of “leaders” who catalyze the creation.
The goals of the cooperative creations can be very diverse, sometimes obvious such as in Wikipedia or in Open Source development platform such as Github, but it can be considered that a phenomenon such as the efficient diffusion of information is also the result of a process of cooperative creation. The exact frontier between what is and is not a cooperative creation is not in the scope of this paper, it is for instance obvious that Wikipedia and open source development present some degree of communication and centralization, and are, therefore, an hybrid form of collective creation.
2.2 Artistic cooperation
Although collaboration is at the center of many artistic creation, including commercial music and movie creation, large-scale cooperation is rarer, and there is no, to our knowledge, successful, large-scale platform devoted to this purpose. Content databases exist, containing elements (sound, pictures, 3D models, etc.) used by artists to create new contents, but these contents are usually not fed back into the database, so there is no open cooperation, or at least not in a way that can be tracked.
Some platforms exist for the cooperative creation of drawings,1 videos,2 writings3 and so on, but most of them are chiefly games, or experiments, and do not produce contents widely diffused outside of the creators themselves. Furthermore, they seem to have relatively few participants. On the contrary, popular music videos published on NND often have hundreds of thousands of unique viewers, and reach notoriety even outside of their original platform.
3 NND dataset
3.1 Nico Nico Douga network
Nico Nico Douga (NND) is a video sharing social network, with functionalities comparable to those of YouTube or DailyMotion. NND originated from Japan, and is extremely popular in this country, with over 20 million registered users as of 2014, and ranking in the top 15 of most visited websites. Compared to previously cited platforms, NND offers some additional possibilities that we will use in this paper, such as associating keywords to videos and making references between videos.
While any kind of video can be published on NND, this paper will concentrate on videos involved in artistic cooperation, and more particularly on Music Video creation.
3.2 VOCALOID, Hatsune Miku and music videos
VOCALOID is the name of a singing voice synthesizer, a special voice synthesizer able not only to pronounce words but also to sing them according to a defined tune. Since its introduction in 2004, this software has encountered a huge popularity, in particular in Japan. NND played an important role in this popularity. Some users first created songs and published them on NND as music videos, usually with very simple visuals such as a static drawing. Other users liked these songs, and started to produce derived music videos, modifying visuals, voice, music, in thousands of different ways. Inspired by the character represented on the packaging of the software, they started to assimilate all songs composed with the synthesizer with a fictional singer, called Hatsune Miku. Although other voices have been created for VOCALOID, corresponding to different characters, Hatsune Miku remains the most famous one. While the authors of the songs and derived videos were initially amateurs, they became so famous that some of them started to release commercial versions of their productions. For instance, the group of producers known as Supercell have published albums sold in hundred of thousands of copies.
It is this cooperative creation of music videos that we will study in this article. To give an idea of the scale of the phenomenon, more than 200,000 videos have been tagged with the keyword VOCALOID in our dataset, and more than 130,000 with the keyword Hatsune Miku; these videos represent only a fraction of the total number of videos involved in the cooperative creation, as derivative productions such as dancing or singing are not usually tagged with it.
The dataset that we used has been used and described in some previous papers (Hamasaki et al. 2013; Cazabet and Takeda 2014), but none of the topics we address here have already been studied. In this chapter, we will describe its characteristics and the preprocessing we performed to study more in-depth the artistic cooperation phenomenon. Although we cannot share the full dataset due to privacy and properties issues, we share4 all data necessary for reproduction of our results on the figshare repository.
The dataset was composed by crawling metadata associated with all videos published on the network between January 2007 and December 2012. It is composed of a set of 2.6 Million videos with at least one keyword associated to them. For each video, the collected metadata consist of the author, associated keywords, associated description (author comment), number of views, and date of publication.
3.3 Categories of videos
The productions in NND can be of different categories, and we wanted to study the similarities and differences between these categories. By our knowledge of the network and observation of common tags, we derived a classifier, similar to the one used in Hamasaki et al. (2013), which, according to the keywords associated with a video, attribute a category to this video, using direct matching and regular expressions. Keywords are rich on NND because any user is free to associate a keyword to a video, even if it is not its own. As the number of keywords for a video is limited to 10, only the most relevant one remains.
ORIGINALSONG: an original musical composition
SINGING: a person is singing (example: replace the original voice of a famous song)
VOCALOID: a person is using VOCALOID to create a voice (example: replace the original voice of a famous song)
INSTRUMENT: a person uses musical instruments to create this video (example: add an instrument to a famous music)
PICTURE: a user creates one or several static pictures (example: illustrate a famous song with original drawing(s))
DANCING: a user films himself dancing for a famous song
3DCG: a user uses a 3D Computer Graphic software to create this video (example: animate dancers dancing on a famous music)
ANIMATION: a user creates an animated picture (example : illustrate the lyrics of a famous song)
MASHUP: a mashup is a music video created by combining several original sources, such as different original music, or different versions of a same music.
MAD: MAD videos are an original type of video originally invented in Japan, involving a collage of videos and sounds from multiple sources. Compared to MASHUPs, MAD videos are more free, as they can be composed of people speaking, unrelated sounds or pictures, and do not usually compose a single, coherent music or song.
A small fraction of videos (less than 1 %) can be associated with several categories. In this case, to avoid confusion, we keep only one category, the less common one.
In NND it is a common practice, in particular among music video creators, to reference the videos they used, or by which they have been inspired. This explicit citation is done by inserting the unique identifier of the referenced video in the comments of the video. This method is recognized by the platform, and a hyperlink is automatically created to the referenced videos. We crawled the collected comments to automatically extract the links associated to videos.
A total of 7.9 million links have been identified this way. On problem is that these references have many usages. For instance, they can be used to reference videos in a same series by the same authors, such as a long video cut in several parts, or just to link the other creations made by the same author. As we are only interested in cooperation, we filtered out all videos referencing a later one, and all videos referencing another video by the same author. Although this might suppress some interesting links, it allows us to focus on actual cooperation between individuals. The resulting graph, composed only of videos with a recognized category and having at least one link to another video of a known category, is composed of 671,428 nodes and 960,854 edges, following a typical long-tailed distribution for the in-degree, as illustrated in Fig. 1. Note that there are some irregularities in the distribution for the out-degrees around \(d=20\) and \(d=40\), probably due to a platform limit.
4 Relation between popularity and influence on the creation process
In this chapter, we will study the relation between two forms of impact that the videos of our dataset can have. On the one hand, videos can be viewed by any user of the platform, whether they themselves create videos or not. By counting the number of views totalized by a video, we can evaluate its Consumption Impact, or Co-Impact, that is a measure of the popularity of a video among the general public.
4.1 Impact of individual videos
The first question we ask is how strong is the linear correlation between these two forms of impact. We intuitively expect them to be correlated, but our observation shows that this relation is not reciprocal. In Fig. 2, we plot the relation between these two variables. As we can see, the relation exists but is rather weak (Pearson’s correlation \(=\) 0.31).
If a video has a low Co-Impact, then it also has a low Cr-Impact
If a video has a high Cr-Impact, then it also has a high Co-Impact
If the video has a low Cr-Impact or a high Co-Impact, it does not tell us much about the other variable.
In Fig. 3, we propose a variation of this graph, for which the color of the dots corresponds to some chosen categories: CG3D, ORIGINALMUSIC and SINGING. We can observe different behaviors for each of these categories. In particular, CG3D videos tend to have high Cr-Impact relatively to their Co-Impact, while SINGING videos have an opposite behavior. The interpretation is that many SINGING videos have a lot of success among the general public, but do not inspire much other creators. On the contrary, CG3D videos are rarely seen as much as the most famous videos, but many other authors create new videos based on them.
4.2 Impact of authors
We can also define the Impact of an author, by taking the sum of the Impact of the videos she has published. The overall relation is, unsurprisingly, rather similar to the one for the videos. In Fig. 4, we show this relation, including another element: the “speciality” of the author, defined as the most common type of videos she has created. Here again, we can observe different trends according to the specialty of the author. However, compared to the video case, we can see that the authors specialized in CG3D do not present a clear pattern. A possible explanation is that these authors publish other types of videos as well, or that several types of CG3D creators exist.
4.3 Preponderance of top influencers
As is common in social networks, the distribution of the influence of authors follows a long tail distribution. This means that a small fraction of users, that we call top influencers, generate most of the impact, both for Creation and Consumption Impact. However, the strength of these top users is not the same for both types of influence. In Fig. 5, we show that the top users are significantly more important for Cr-Impact. If we look at the influence of the top 10, top 100 and top 1000 users, we observe that they represent, respectively, 2, 14 and 40 % for Co-Impact, and 12, 37 and 68 % for Cr-Impact.
This shows how essential a fraction of the users can be for the cooperative creation. While hundreds of thousands of users are involved in the creation process, only a 1000 of them are the source of inspiration of more than two-thirds of all creations.
But we can also show that there are no, on one side, famous creators that attract all the attention, and on the other side, unknown creators whose work is unknown from the general public. On the contrary, the number of views is distributed more evenly than the number of citations. On Fig. 6, we represent the relation between the cumulative frequency of Cr-Impact and the one of Co-Impact for authors sorted by Cr-Impact. We can read this graph in the follower manner: the authors that attract 50 % of all references (Cr-Impact) attract only 13 % of all views. The authors that concentrate 85 % of Cr-Impact represent only around 50 % of the global Co-Impact.
5 Users specialization
When several individuals are working together to solve a complex problem, one may argue that they are more likely to succeed if they do not share the same knowledge, if they have different backgrounds and capabilities that can complete each other. This is a common observation in economics, with the specialization of Labor for instance, or in the animal world, where social animals like ants or bees are highly specialized. Because each individual is better in what she is doing than anyone could have become if they all worked on everything, each individual can take care of one aspect of the global problem, and the resulting product is better than if it were done by a similar number of non-specialists.
5.1 Entropy as a measure of specialization
5.2 Specialization of famous users
6 Network of cooperation
Topological properties of the cooperation: triad census, diameter and maximal in-degree for three networks
6.1 Topological properties of the cooperation
Synthesis triads (a) appear when a production makes references to several other ones with no relations between themselves. These productions are likely to take their inspiration from several sources.
OTM (One-to-Many) triads (b) appear when several productions are based on the same one, and do not reference each other. The typical situation is a famous production inspiring many unrelated new ones.
Cascade triads (c) appear when a production A references a production B, B references a production C, but A does not reference C. The reason might be that A is not aware of C, A is only referencing the “added value” of C, or A just does not acknowledge the transitivity.
Transitive triads (d) can have two interpretations: it can be seen as a cascade triad acknowledging the previous work, or as a synthesis of two works referencing themselves.
The DBLP (Ley 2002) is a well-known database of scientific papers, we used a version including references between papers, as described in Tang et al. (2008). Nodes represent papers while links represent citations
TwitterRT is a Twitter dataset described in Toriumi et al. (2013), Remy et al. (2013), that contains between 80 and 90 % of all tweets published between Japanese users on a period of several days. Nodes represent tweets, and links represent a follower/followee relation between a retweeter and either the original author of the tweet, or a previous retweeter of this same tweet. This dataset is different from DBLP and NND, because it corresponds more to information diffusion than pure cooperation. However, we propose that information diffusion on social media can be seen as a special case of cooperation in which the productions are (hopefully) not altered. A retweet by a user U can be seen as the publication of a production on the personal space of user U. This production is directly inspired by the production of a previous user, and is, most of the time, identical. In this particular case, the implicit objective of the cooperation is not to be understood as the creation of more complex creations, but as the efficient diffusion of information.
Whereas all three networks have a majority of OTM triads, we can see that the proportions greatly vary. The citation dataset is the less shallow, an observation not surprising as it is common to cite the most recent articles, which cite older articles, and so on and so forth. Our artistic cooperation dataset is by far the shallowest, an interesting observation that shows how important are famous productions for the cooperation process. We, however, want to stress that although the proportion is small, we nevertheless counted 418,111 Cascade triads and 217,643 Transitive ones, meaning that they are not rare phenomenons, but rather dwarfed by the extreme importance of OTM triads.
6.2 References between categories
To study in more details the cooperation behaviors, we looked at the chaining between categories. We cannot simply look at global metrics, such as the proportion of inter-category links, or the modularity of categories taken as classes, because categories are highly diverse: first, some categories are much more common than other, in particular the Singing category represents nearly 75 % of all videos, while other categories, despite their importance in the cooperation process, represent less than 1 % of the total amount of videos, such as ORIGINALMUSIC. Secondly, because, as we will see in the following sections, videos from each category have different relation patterns, that we highlight.
6.2.1 Prevalence of pairs of categories
6.2.2 Heat map analysis
In Fig. 11, we present the heat maps of the relations between categories. The categories on the vertical axis correspond to categories making references, while categories on the horizontal axis are receiving references. The heat map on the left corresponds to the relative importance of the referencing category for the referenced category (sum of columns \(=\) 1) while the one on the right corresponds to the relative importance of the referenced category in the referencing one (sum of rows \(=\) 1).
We can first observe that some categories are self-replicating, or assortative, i.e., videos of the same type tend to reference each other. This is particularly true for DANCE and CG3D. This effect is also strong for MAD movies, but mostly in one direction: MAD movies reference in majority other MAD movies, but they are referenced by other categories of videos as well, in particular SINGING videos.
In the left figure, we can observe that SINGING and MASHUPS are the most common referencing categories for most other categories, but CG3D and DANCE.
6.2.3 Typical patterns of cooperation
In the previous sections, we have studied the topological properties of the network and how categories are linked between themselves. We argued that the cooperation was shallow, and we observed some common pairs of categories. By limiting ourselves to famous videos, we can generate a visual representation of the network, helpful to understand intuitively these observations. We generate the network of references between 664 videos having more than 500,000 views. Figure 12 is a partial view of this network, where the sizes of the nodes are proportional to their in-degree. We can see that it is mostly composed of small connected components (max size: 25), containing different categories. These connected component are typically star shaped, or composed of several star-shaped subnetworks. The center of the star is, commonly, an ORIGINALMUSIC video.
This visualization can help us to understand how the cooperation takes place on NND: an author proposes a new music video, which becomes successful. This video inspires other authors, specialists in their own field, to create videos based on it. As a consequence, several variations of the original music, improved in a direction or another, become also famous among the NND public.
In this article, we investigated several aspects of a massive artistic cooperation phenomenon on an online social network platform, Nico Nico Douga.
The relation between the popularity (in terms of views) and the influence on the cooperation process is complex, but could be better understood by studying the type of videos.
Users are specialized. Whereas many works have shown the different roles of users (Bridges, Hubs, etc.) in dynamic processes such as information diffusion, we were able to show that, for artistic cooperation, users were also specialized in terms of the type of their productions. This can explain the high quality of some productions on NND, despite all contributors being only amateurs. We have shown that this specialization was especially strong among famous authors of videos.
In the last chapter, we investigated network properties of the cooperation. We found that cooperation was mostly shallow, that is to say, there are no long sequences such as video A reference video B that reference video C and so on and so forth. Despite the existence of such long chains, most of the works inspired by a given video reference it directly, without intermediaries.
Observed properties could have broad implications on the way we understand these cooperation processes: if creators are specialized, then the balance of the number of specialists is each category might be an explication for successful or unsuccessful processes, for instance. The complex relation between popularity and influence is also a factor to take into account when designing a creation-sharing platform for instance: individuals using the platform only as consumers and individuals who are also creators could be shown different videos, as the latter might be more interested in influencial videos than the former. Finally, our last observation on the network of citations is also very important, it shows, on the one hand, that one cannot really understand the network of cooperations without taking into account the nature of the productions, and, on the other hand, it shows how concentrated the cooperation process is. Most previous work have only observed this by looking at the degree distribution of nodes, whereas, in this paper, we looked also the sequences of nodes using the triad census method, and observed how the One-to-Many pattern was extremely dominant, a characteristic of a system in which the diffusion of creation is occurring mostly directly, and not by long chains of inspirations, which tells us a lot about how the mass cooperation process takes place.
We hope our observations could be used to create and improve existing platforms for artistic cooperation, by taking into account its specificity. We also think that some of our observations are not specific to artistic cooperation but could be related to other types of cooperative creations, in particular scientific research.
In future works, we plan to study more aspects of auto-organized cooperative creation process, in particular their dynamics. Understanding the dynamics of the system could allow us to explain the observations we highlighted in this paper: how a video becomes popular/important for the cooperation, how users choose to contribute to a growing cooperation process, or how users find the source of their inspiration.
- Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 65–74, ACMGoogle Scholar
- Cazabet R, Takeda H (2014) Understanding mass cooperation through visualization. In: Proceedings of the 25th ACM conference on hypertext and social media, pp 206–211, ACMGoogle Scholar
- Cha M, Haddadi H, Benevenuto F, Gummadi PK (2010) Measuring user influence in twitter: the million follower fallacy. ICWSM 10:10–17Google Scholar
- Davis JA, Leinhardt S (1967) The structure of positive interpersonal relations in small groups. Darthmouth CollegeGoogle Scholar
- Duguid P (2006) Limits of self-organization: peer production and “laws of quality”. First Monday 11(10) (2006). http://firstmonday.org/ojs/index.php/fm/article/view/1405/1323
- Hamasaki M, Goto M (2013) Songrium: a music browsing assistance service based on visualization of massive open collaboration within music content creation community. In: Proceedings of the 9th International Symposium on open collaboration, p 4, ACMGoogle Scholar
- Haythornthwaite C (2009) Crowds and communities: light and heavyweight models of peer production. In: HICSS’09. 42nd Hawaii International Conference on system sciences, 2009, pp 1–10, IEEEGoogle Scholar
- Ley M (2002) The dblp computer science bibliography: evolution, research issues, perspectives. In: Laender AHF, Oliveira AL (eds) String processing and information retrieval. Springer, Berlin, Heidelberg, pp 1–10Google Scholar
- Remy C, Pervin N, Toriumi F, Takeda H (2013) Information diffusion on twitter: everyone has its chance, but all chances are not equal. In: 2013 International Conference on signal-image technology & internet-based systems (SITIS), pp 483–490, IEEEGoogle Scholar
- Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 990–998, ACMGoogle Scholar
- Toriumi F, Sakaki T, Shinoda K, Kazama K, Kurihara S, Noda I (2013) Information sharing on twitter during the 2011 catastrophic earthquake. In: Proceedings of the 22nd international conference on World Wide Web companion, pp 1025–1028. International World Wide Web Conferences Steering CommitteeGoogle Scholar
- Wilkinson DM (2008) Strong regularities in online peer production. In: Proceedings of the 9th ACM conference on electronic commerce, pp 302–309, ACMGoogle Scholar
- Yang J, Counts S (2010) Predicting the speed, scale, and range of information diffusion in twitter. ICWSM 10:355–358Google Scholar