1 Introduction

Since its acquisition by Google in 2005, YouTube has been a video -sharing social media and a search engine with over 2 billion views per month [41]. It allows users to upload videos and share their content. It is a preferred search engine for contents like cooking recipes because of its audio and visual medium of communication. In addition to watching the videos, users can leave their comments and feedbacks for each video. The combination of audio, video, and comments make YouTube a valuable source of data. Researchers have been using this source of data to analyze various topics across wide range of research domains. One of the research domains that utilizes YouTube is health and healthcare. Educational videos and users’ feedback towards them have been a common research topic. For example, Li et al. [41] examined YouTube as a source of information on COVID -19 pandemic. Khatri et al. [34] also researched YouTube as a source of information on COVID -19 on English and Mandarin content. Hussein et al. [30] evaluated YouTube as a source of information by measuring the information on this platform and by auditing misinformation in videos. Indirectly, some research works developed methods that can be used in YouTube video analysis [42,56,57,45]. Analyzing research trends in YouTube papers requires the use of the bibliometric method.

Bibliometric is a quantitative analysis of papers published in a specific research domain [46]. The bibliometric study analyzes the authors’ activities, publication trends, and collaborations among institutions and countries. The bibliometric analysis evaluates impact of published papers and reveals the potential gaps and future directions in a research area, which increases interest and attention of researchers and funding bodies. The bibliometric study has been used in many research areas like COVID -19 pandemic [26], agricultural [47], accounting [50], and economic [8]. The advantages of using a bibliometric study are: 1) reveals important research works in a research domain; 2) helps to discover the gaps need to be addressed by researchers; 3) gives young researchers a holistic view of a research area.

To scrutinize research trends and direction on YouTube, this study aims at performing a bibliometric analysis on research works focused on YouTube published between 2006 and 2021. We propose the following research questions to fulfill the aim of this study: 1) what are the trends and directions in YouTube research? And 2) what information can be discovered related to YouTube research? The contributions of this study are as following:

  1. 1)

    we found only one published paper on YouTube bibliometric study, which presents number of papers, citations, and countries that published research works related to YouTube [56]. However, our work presents a more comprehensive analysis of the YouTube papers by providing network analysis, research structure, and thematic mapping.

  2. 2)

    we present a comprehensive network analysis like co -citation network, co -cited sources network, authors’ collaboration network, institutions’ collaboration network, nations’ network, as well as keywords and co -occurrence network.

  3. 3)

    We analyze the trending research and provide a structured research trends as well as thematic and historiographic mapping.

  4. 4)

    we adopt dominance factor, Bradford’s law, and Lotka’s law to analyze the published works using scientific methods.

This article is organized as follows. Section 2 describes the methodology used to carry out the analysis. Section 3 deals with research findings. Section 4 discusses the research findings. The last section deals with research limitations and explores potential avenues for future research.

2 Method

This study is guided by the following four steps:

  1. 1.

    Selecting the database and defining the search terms.

  2. 2.

    Conducting the preliminary statistical analysis.

  3. 3.

    Performing the bibliometric network analysis.

  4. 4.

    Performing the conceptual structure, thematic and historiographic mapping.

To conduct the analysis, the R version 4.1 software [58] was used along with several libraries such as the bibliometrix, wordcloud and ggplot2. For network visualization, we used the VOSviewer software [61]. We discuss here the steps outlined above in some detail.

2.1 Database and documents’ extraction

Following Sigala et al. (2021), the Scopus database was selected to conduct the analysis. As the largest database for peer -reviewed journals (Norris & Oppenheim, 2007), Scopus is frequently used by researchers to conduct bibliometric analysis (Cunill et al., 2019; Hassan et al., 2021). Having selected the database, we extracted bibliographic records related to the selected documents, including relevant information about documents’ titles, authors, and keywords. Retrieved documents were then transformed to a plain text format for further filtering and analysis. Choosing a particular type of document for bibliometric analysis has long been the subject of debate [51, 52]. For instance, journal articles only have been selected in prior studies (e.g., [20]), whereas some authors have focused on both books and journal articles (e.g., [4]), yet others excluded only meeting abstracts, corrections, and editorial material, (e.g., [2]). Here, we opted for peer -reviewed articles only because such articles “usually undergo a meticulous peer -review process and are generally of high quality” ([16], p. 206). To avoid false -positive results, only article titles, abstracts and keywords were searched using the terms “YouTube.” Figure 1 plots the search procedure followed to extract the articles used in this analysis. We limited the selection to documents written in English and we chose 2006 as the date of reference because YouTube was launched in 2006.

Fig. 1
figure 1

Schematic flowchart of data acquisition and methodology (Adapted from [15])

Having selected the database, we extracted bibliographic records related to the selected documents, including relevant information about documents’ titles, authors, and keywords. Retrieved documents were then transformed to a plain text format for further filtering and analysis. Choosing a particular type of document for analysis has long been the subject of debate [51, 52]. For instance, journal articles only have been selected in prior studies (e.g., [20]), whereas some authors have focused on both books and journal articles (e.g., [4]), yet others excluded only meeting abstracts, corrections, and editorial material, (e.g., [2]). Here, we opted for peer -reviewed articles only because such articles “usually undergo a meticulous peer -review process and are generally of high quality” ([16], p. 206).

Table 1 shows the main information about the YouTube research data.

Table 1 Main information about data

The table reveals that 1781 research articles were extracted. The articles were written by 4699 authors, and they include 65,677 references. 417 articles were written by single authors, whereas 567 were written by multi -authors, with a collaboration index of 3.26. This index is calculated by dividing the total authors of multi -authored articles by total multi -authored articles [23, 36]. Our result indicates that the average YouTube research team falls between 3 and 4.

2.2 Bibliometric network analysis

A network can be regarded as “a structure composed of a set of actors, some of whose members are connected by a set of one or more relationships” ([35], p. 8). In social network analysis (SNA), an edge connecting two nodes represents a relationship. Khan and Wood [32] noted that “when used to synthesize the existing literature from a network perspective, the SNA technique can reveal valuable invisible patterns that can certainly facilitate theory development and uncover areas for future research.” There has been extensive prior research using network analysis in areas as diverse as exploring individual scientific collaboration networks [11, 27, 66], collaboration among research institutions [21] and keywords co -occurrence networks [7].

2.3 Thematic and conceptual structure maps

Thematic maps or strategic diagrams were suggested by Law et al. [39]. The map is usually employed to reveal the clusters’ dynamics based on analyzing the keywords or co -word occurrences [29]. The Callon et al. [10] density and centrality metrics are generally used to construct the map. The map also draws heavily on the financial portfolio analysis and concepts based on co -word networks [5]. Due to its usefulness, the map has been used in a plethora of research articles [33, 40, 65]. On the other hand, conceptual structure maps can be employed to investigate the conceptual structure of a research area by breaking down a research domain into clear “knowledge clusters” [63].

3 Results

3.1 Scientific output, core journals and impactful authors

We extracted 1781 Scopus documents related to YouTube. The documents were written by 4699 authors representing 70 nations. Timewise, the documents covered almost fifteen years (2006-2021). Figure 2 plots the scientific output trends in the field. Although the figure reveals an exponential annual growth rate, this rate is not evenly distributed. For instance, in the first two years there was a paucity in YouTube research with only a handful of papers per year. These two years might be referred to as “the initial stage in the YouTube research.” However, the next decade (2008-2017) appears to witness a tremendous increase in research dealing with YouTube. This decade might be called “the rapid growth stage.” Indeed, this period represents the highest growth rate. The final stage (2018-2021) might be called the “consolidation and stabilization stage” because the YouTube research reached the “saturation/maturity” stage. This result is in line with several bibliometric studies conducted in several research areas [53, 66].

Fig. 2
figure 2

YouTube research annual scientific production (2006–2021)

Table 2 shows the most important Scopus -indexed journals publishing YouTube research. The table reveals that the most relevant sources publishing YouTube research include journals such as New Media and Society, Convergence, Journal of Medical Internet Research, Computers in Human Behavior and the Physics Teacher. Another way to examine the journals’ influence is known as the Bradford’s law [37]. This law was first proposed by Bradford [9], who noted that “if scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus.” Fig. 3 plots the Bradford’s law in YouTube research. From the graph, we see that the “core zone” is dominated by just few journals, including New Media and Society, Convergence, Journal of Medical Internet Research, etc. Such journals are considered the outlets publishing the “core” YouTube research.

Table 2 Most relevant sources
Fig. 3
figure 3

Bradford’s law in YouTube scholarly research

The YouTube research growth is also evident from the corresponding author’s country involved (Fig. 4).

Fig. 4
figure 4

YouTube research by corresponding author’s country. Note: SCP = Single Country Production; MCP = Multiple Country Production

Table 3 shows the most cited articles in YouTube research. The table shows that Smith et al. (2012) paper in the Journal of Interactive Marketing is the most cited paper as it was cited 457 times. In this article, the authors compared brand -related user -generated content between three social media platforms, namely Twitter, Facebook, and YouTube. Results provide a general theoretical framework demonstrating how consumer -generated brand communications are influenced by a particular social media channel. The second most cited paper (443 citations) is Lang (2007) paper published in the Journal of Computer -Mediated Communication. In this paper, the author employed ethnographic methodology to analyze how YouTube participants develop and maintain social networks related to video sharing activities. With 333 citations, Susarla et al. (2011) article is the third most cited paper. In this article published in Information Systems Research, the authors analyzed the networked structure of interactions on YouTube. Results revealed that “social interactions are influential not only in determining which videos become successful but also on the magnitude of the impact.” (p. 23). Halpern and Gibbs (2013) paper published in Computers in Human Behavior was cited 298 times. In this paper the authors used two social media platforms, namely YouTube and Facebook to examine how social media can be used to foster democratic deliberations. Results showed that the “Facebook expands the flow of information to other networks and enables more symmetrical conversations among users, whereas politeness is lower in the more anonymous and deindividuated YouTube” (p. 1159). Khan’s (2017) paper published in Computers in Human Behavior was cited 291 times. In this paper the author investigated motives behind YouTube users’ engagement. Results revealed that YouTube participation is driven mainly by the relaxing/entertainment motive. However, passive content viewing was mainly driven by reading comments posted on the platform. Table 4.

Table 3 Most cited papers
Table 4 Most relevant affiliations

The dominance factor is a bibliometric measure that calculates authors dominance by dividing the number of multi -authored articles in which the author is the first author by the total number of multi -authored articles [38]. This metric has been used widely in the literature [23, 25]. Figure 5 shows the dominating authors over time. From the figure, we see that the most dominating authors were C Basch from 2015 till 2021, Riendeau from 2009 till 2012 and S Azer from 2012 to 2021. Newcomers to the field have also achieved some dominance. Examples include J Yin (2017-2019) and J Park (2016-2021).

Fig. 5
figure 5

YouTube authors dominance over the time

In bibliometric studies, “Evenness/concentration of authors’ contribution” is a widely used metric [49]. This metric can be quantified using Lotka’s law (Lotka, 1926). Based on the well -known Zipf’s law, Lotka’s law implies that “the number of authors producing a certain number of articles is a fixed ratio, 2, to single -article authors.” Results suggests that the Lotka’s law seems to hold in YouTube research (K -S two sample test p > 0.05).

3.2 Network analysis

3.2.1 Co -citation networks

A co -citation network is formed when two authors are cited together in a third reference. Figure 6 displays the YouTube research co -cited authors’ network. Based on the color used, the graph reveals four distinct clusters. The red cluster includes authors such as J Burgess, M Thelwall and J Green. The size of the node indicates which author occupies a central position in the cluster. Such author(s) might be regarded as influential as they have disproportionate impact on the information diffusion on the network [6]. From the graph, we also see that some nodes are quite close to each other, whereas others drift further away. McPherson et al. [48] argued that closeness signifies a strong “homophily effect,” which occurs when authors in a virtual -room -like environment discuss common topics [24]. In bibliometrics, homophily is an indicator of “disciplinary or thematic similarity” [31]. For example, the nodes representing both R Schatz and A Finamore are very close to each other, indicating possible “homophily effect.”

Fig. 6
figure 6

YouTube authors co-citation network (> = 30 articles)

The green cluster includes authors such as C Basch, J Keelan, A Pandy and S Sarangi. The blue cluster includes sixty -two authors such as J Baker, D Charnock, A Rapp and J Lee. The yellow cluster is the smallest and it includes ten authors such as A Finamore, R Schatz and J Wang. The centrally located authors in each cluster might be regarded as influential authors as they “tend to anchor each community and they have a large impact on other communities as they control and stimulate information diffusion [in the network] through research activities” ([53], p. 664).

Figure 7 displays the YouTube research co -cited sources’ network. The graph reveals five distinct clusters. For example, the Journal of Clinical Rheumatology, Epilepsy Behavior and the Journal of Cancer Education are co -cited together as they belong to the same cluster. The American Sociological Review is co -cited with Discourse and Society, and Feminist Media Studies. The Journal of Advertising is co -cited with the Journal of Business Research and the Journal of Consumer Research, whereas Body Image is co -cited with the Journal of Pragmatics and Sex Roles. Interestingly, “core” journals occupy central position in the network with a minimal interaction among the distinct clusters, confirming what Glotzl and Aigner [28] term “the orthodox core -heterodox periphery” phenomenon within the field of YouTube research. Dobusch and Kapeller [22] found that “orthodox journals” tend to be heavily cited, whereas “heterodox journals” tend to be drifted towards the periphery.

Fig. 7
figure 7

YouTube source co-citation network (> = 30 articles)

3.2.2 Collaboration networks

The collaboration network among authors is depicted in Fig. 8. The thickness of the link in this graph is proportionate to articles coauthored, whereas the node size is formed based on the author’s publications. A glance at the graph reveals that the sparse network is formed by seven distinct communities, signifying a limited cooperation among authors. The sparse network implies that impactful researchers in the field work in isolated “silos” [62].

Fig. 8
figure 8

YouTube authors’ collaboration network (documents > = 2 articles)

Figure 9 depicts the collaboration network at the institutional level. The thickness of the link is proportional to the institution’s collaboration, whereas the node size is formed based on each institution’s publications. From the graph, we see that there are seven distinct clusters. For example, there is a strong collaboration between Columbia University, the New York University and the William Paterson University in the US. Zou et al. [66] argued that this type of sparse collaboration reflects a “locally -centralized -globally -discrete” cooperation. It also reflects a “North -South” divide, with a clear lack of cooperation between developed/developing world institutions.

Fig. 9
figure 9

Collaboration network among institutions producing YouTube research (documents > = 1 article)

Figure 10 shows the collaboration at the nations’ level, with a total of 62 nations collaborating in the scientific production of YouTube research. The figure shows that US tops the world in terms of the total collaboration links, followed by the UK and Australia. A closer look at the graph reveals that some clusters are formed based on geographic distance or linguistic similarity. For example, Spain cooperates with Colombia, Ecuador and Mexico. The cluster that includes Egypt also includes Kuwait and Saudi Arabia. Figure 11 plots the “geographic atlas” of the countries producing the YouTube research.

Fig. 10
figure 10

Collaboration network among nations producing YouTube scholarly research (documents > = 2 articles)

Fig. 11
figure 11

Geographic atlas of collaboration among nations producing YouTube scholarly research

3.2.3 Keywords and co -occurrence network analysis

Due to their abstract nature [12], keywords can be used to reveal the content of a paper. Figure 12 shows a simple wordcloud constructed based on the author -provided keywords. A wordcloud plot is an appealing visual tool that can be used to summarize textual data. The size of each word and its closeness to the cloud center determine its significance [42, 43]. From the figure we see that the most relevant/frequent keywords used are “Youtube”, “social media” and “Internet.”

Fig. 12
figure 12

Keyword-based wordcloud of the most frequent YouTube terms

To further scrutinize how frequently keywords co -occur in the same document, we also used the author -provided keywords to construct the YouTube keyword co -occurrence network because “authors of a paper should be the ones that have the best feel as to what areas are spoken to by the paper” [19]. Figure 13 displays the resulting co -occurrence network. The graph reveals eight main clusters. For example, the first cluster in blue deals with medical/health use of the YouTube and includes words such as “health communication”, “health education” and “health information”. The second cluster (green -colored) deals with consumer comments and includes words such as “user -generated content”, “social network” and “Web 2.0”. The third cluster (yellow -colored) deals mainly with the educational use of the YouTube and includes words such as “e -learning”, “medical education” and “online videos”.

Fig. 13
figure 13

Co-occurrence network for author-provided YouTube keywords

A three -field plot, also known as a Sankey diagram, was also used to contextualize the flow trend linking keywords (left), authors (middle) and sources (right). In this diagram the size of the boxes is proportional to the related quantity (keyword, author, or source). Figure 14 displays the YouTube research Sankey diagram. Not surprisingly, edge widths flowing from keywords as “YouTube”, “social media”, and “Internet” are the largest, signifying that such keywords were used by several authors in their publications. We see also see that while some authors have used an extensive list of the keywords reflecting the diversity of their research (C Basch), others used a unique keyword (J Kim).

Fig. 14
figure 14

Sankey diagram for YouTube research flow (kewword-author-reference)

3.2.4 Trending topics and thematic evolution

Figure 15 plots the major YouTube research trending topics. From the graph we see that there is a move from established YouTube topics such as “video sharing” (2010-2018) and “web -based learning” (2012-2014) to new topics such as “COVID -19” (2020 onwards) and “misinformation” (2020 onwards). Such topics might be regarded as “trending topics/hotspots” in the scholarly publications dealing with YouTube because it has been argued that trending topics usually represent hotspots or evolving themes in a specific research domain [13, 14, 54, 60]. Abrupt burst or surge in keywords might be also an indicator of “potential fronts” [57] as “the body of knowledge in a certain discipline can be seen as a sequence of topics that appear, grow in importance for a particular period and then disappear” [18].

Fig. 15
figure 15

YouTube research trending topics

3.3 Conceptual structure and thematic maps

We applied the Multiple Correspondence Analysis (MCA) method on the author -provided keywords. The MCA is an extension of correspondence analysis, akin to the Principal Component Analysis (PCA), that helps to analyze the pattern of relationships of categorical data [1]. It was selected since the results of this method is proved to be better on categorical data compared to other methods [1]. Figure 16 depicts the resulting YouTube research conceptual structure over four decades. From the graph, we see that the best dimension reduction achieved for the first two dimensions of the MCA account for roughly 72% of the total variability. In this graph, the closer the dots, the similar the profile they represent, whereas each cluster of dots represents discriminating profiles [64].

Fig. 16
figure 16

Conceptual structure map for YouTube scholarly research (MCA method)

An inspection of the graph reveals the depth and breadth of the domain. For instance, the largest red cluster comprises keywords emphasizing the consumer -generated content such as “user -generated content”, “web 2.0” and “online video.” The second cluster (in green) appears to deal with health and medical issues and includes keywords such as “health communication”, “health information” and “misinformation.” The third cluster in blue appears to deal with YouTube research within the context of information quality and includes keywords such as “internet”, “information” and “quality.”

A thematic/strategic map is also shown in Fig. 17. In this graph, average values of both axes are represented by a dotted line dividing the map into four quadrants. Each quadrant in this graph represents a different theme, whereas the bubble size is drawn in proportion to the frequency of documents in which the keywords is used. The first quadrant represents “motor themes” that are well -developed both internally and externally as it is characterized by high density and centrality. [17]. Within the YouTube research, such themes include “user -generated content”, “new media”, “influencers”, and “gender.” The second one is usually labeled the “highly -developed -and -isolated themes” quadrant as it deals with niche themes. With high -density -low -centrality structure, this quadrant highlights the fact that while the themes it comprises are well -developed internally, they are marginally important externally. Within the YouTube research, such themes include “education,” “medical education”, and “technology.” The low -density -low -centrality third quadrant is termed the “emerging -or -declining themes” quadrant. This implies that the themes in this quadrant are characterized by weak ties at the internal and external levels. Such themes might indicate potential hotspots in YouTube research. Examples include “COVID -19”, “health communication” and “Twitter.” Finally, the “basic -and -transversal themes” quadrant (low density -high -centrality) comprises themes that are weakly developed in terms of internal ties. Nevertheless, they are characterized by important external ties. Within the YouTube research, such themes include “Social media” and “internet.”

Fig. 17
figure 17

YouTube research thematic/strategic map

4 Discussion

This study examined published research works related to YouTube between 2006 and 2021. At this point, we can answer the research questions. To answer the first research question about trends and directions, we found that between 2006 and 2008, there was a slow growth in publications since the YouTube platform was new. Then from 2008 to 2017, there was a rapid growth in research on YouTube. Afterwards, the trend is still upward with a slower pace. We also found that the trending topic changes over time. While “gaming” and “video sharing” were trending topics in some time period, the trend shifted towards topics like “COVID -19” and “misinformation”.

The second research question is related to the information discovered from YouTube research. We discovered the most cited papers, authors, and countries with highest number of publications. We also discovered the network between the published works. Specifically, the authors’ collaboration network, collaboration between institutions, and collaboration between countries. We also analyzed the collected works regarding the Bradford’s law and Lotka’s law. It was proved that large number of papers were published in a small group of journals, which followed the Bradford’s law. Also, it was proved that the frequency indexes of author productivity distribution followed Lotka’s law. Additionally, the MCA algorithm was used to find the conceptual structure map related to YouTube papers. The output shows three clusters, consumer -generated content, health and medical issues, and information quality.

Based on this paper’s results, large number of works are related to health and medical issues. Among the institutions, department of public health appeared more than other institutions. Additionally, the journal of medical internet research is in the third spot of the most relevant sources. The MCA algorithm dedicated one cluster for health and medical issues. Furthermore, “medical education” topic started trending in 2014 and is still trending, based on Fig. 15, which is one of the longest trending topics. It is clear that researchers are interested in analyzing YouTube about health -related issues. These points coincide with studies on effectiveness of YouTube videos as a health educational platform. Allgaier mentioned that many people use YouTube as a source of information on science, technology, and health [3]. It is also assumed that because of the sensitivity of health and medical related issues, researchers focused more on the health aspect of the YouTube to find information and misinformation in videos. They analyzed videos and comments to understand users’ feedback on the health -related videos [59].

5 Limitations and future research

Despite the major contributions of this study, it suffers from some limitations. First, we relied only on the Scopus database to conduct our bibliometric analysis. Thus, we unavoidably commit a selection bias. Subsequently, we believe that future research should test the robustness of our finding by merging several databases such as WoS and Google Scholar. However, it has been argued that the Google Scholar database is less stringent as it comprises citations from unpublished manuscripts, blogs, etc. (Gavel & Iselid, 2008; [55]). Second, we limited the selection of documents to articles published in English. Thus, our results might be limited in terms of coverage [57]. Future research might add other languages to test the generalizability of our findings. Finally, although we conducted a comprehensive study on the whole domain of YouTube research, future research might focus on specific journals publishing YouTube research such as New Media and Society, Convergence, Journal of Medical Internet Research, Computers in Human Behaviour, and the Physics Teacher, among others.

6 Conclusion

This work conducted a bibliometric study on YouTube, as a research topic, in the literature between 2006 and 2021. The search in Scopus database resulted in 1781 research works, which were collected along their meta data such as authors name, keywork, etc. The collected data were analyzed, and the results were presented in the form of network of collaborations between authors, institutions, and countries. We also show the results of networks of keywords. We then created a thematic map based on the keywords to find the trending topic in research related to YouTube. The analysis revealed that 2006 -2007 were initial stage in YouTube research followed by 2008 -2017 which is the decade of rapid growth in YouTube research. The 2017 -2021 is considered the stage of consolidation and stabilization of this research topic. We also found that the trending topic changes over time. While “gaming” and “video sharing” were initially trending, the trend shifted towards topics like “COVID -19” and “misinformation”.