1 Introduction

Biomass is the combined mass of organic compounds generated by all living beings of planet Earth [1, 2], it is regarded also as “renewable organic material that comes from plants and animals” [3, 4] including also microorganisms [5] as well as wastes [6]. In the twentieth century, it was mainly classified as an energy source [7, 8]. In the past few decades, it was recognized that biomass also represents a mixture of valuable (organic) chemical compounds [9,10,11,12] and many-sided research efforts have been started for the utilization (“valorization”) of biomass from this aspect [13,14,15,16,17]. The 2012 summary of these ambitions [13] induced an even more intense research activity, as it can be seen also from the high number of publications citing this paper. It is the main goal of the research described in the present article to inquire about the features and tendencies of this influence by means of mathematical statistics applied for the analysis of scientific literature.

The literature screening during the present research indicated lignin as one of the central topics in the valorization efforts of biomass. We add here some information about this material. Lignin is an organic polymer serving as supporting material in plants. From a present practical viewpoint, it is a large-quantity undesired sideproduct of industries like papermaking, formed yearly in hundreds of Mt quantities. However, from the viewpoint of research efforts and future hopes at applications lignin is a valuable, highly challenging, complicated organic polymer which offers the possibility of producing useful organic compounds by suitable treatments.

The structure of lignin is fairly variable, depending on the plant species. Three main components can be identified in most kinds of lignins: coniferyl alcohol, (4-hydroxy-3-methoxyphenyl)propane, sinapyl alcohol (3, 5-dimethoxy-4-hydroxyphenyl)propane and paracoumaryl alcohol, (4-hydroxyphenyl)propane, which appear in highly variable bonding lattices of ca. 10,000 unit, or more, molecular masses.

The intensity of the research and development activity centered on problems and possibilities offered by lignin is indicated by the fact that Web of Science gived on December 6, 2023 the following number of “hits” for lignin: 127,967 (21,186 patents). We cite here a choice of very recent review articles, dealing with various—but not all—aspects of lignin chemistry [86,87,88,89,90,91,92,93,94].

The increasing availability of an enormous quantity of scientific data (e.g., Web of Science, Scopus, PubMed, SciFinder) made it possible to put science itself under the microscope and gain a quantitative understanding of the patterns and dynamics of scientific discovery, which gave rise to a new scientific field, that we call the science of science [18].

Recently, various scholarly networks have been investigated thoroughly, including citation networks and keyword co-occurrence networks [19]. However, co-authorship networks have attracted probably the greatest deal of research attention, since these shed light on an important aspect of research collaboration, the process that combines distributed knowledge and expertise, enlightening problems from different sides and thereby producing new scientific discoveries [20].

Co-authorship networks have been thoroughly researched from a variety of angles and viewpoints: e.g., the collaboration network of a research community that publishes in a particular journal, that cites a certain important paper, or which are from a specific country or institution [21,22,23,24,25,26]. In the present paper, we investigate the co-authorship network of the research communities which were active in the field of the valorization of biomass [13].

In the present work, we considered the citation database of Web of Science from August 3, 2022, and we analyzed the impact of C. O. Tuck, E. Pérez, I. T. Horváth, R. A. Sheldon and M. Poliakoff Valorization of Biomass: Deriving More Value from Waste [13] (published in Science on 17 August 2012) on academic and applied research, ten years after the publication of this important, orientative paper. In the rest of this work, the paper of Tuck et al. is referred to as the main article. According to the Web of Science database, 5110 authors of 1331 scientific documents have cited the main article. We refer to these citing documents as citing works (or citing documents, citing publications).

2 Elementary statistics of citing works

The analyses were performed using the NetworkX Python package [27]. The visualizations have been made using matplotlib [28], Gephi [29], and VOSviewer [30].

Figure 1 shows the histogram of the number of citing works per author, in other words, it shows the distribution of the number of citing papers that the authors have. The majority of the authors have only one paper, which cites the work of Tuck et al. [13]. Figure 2 shows the distribution of the number of collaborating authors per citing work, illustrating that most of the citing documents have been written by four to six co-authors, frequently as a product of cooperation of different research groups.

Fig. 1
figure 1

Distribution of the number of citing works per author

Fig. 2
figure 2

Distribution of the number of authors per citing works

Figure 3 shows the impact of the main article on different sectors of natural sciences (categories defined by Web of Science). The distribution indicates that the majority of the research activities are still in the field of “pure” chemistry, but “applicative” efforts of technological-engineering character are rapidly emerging.

Fig. 3
figure 3

Impact of the main article in different research areas (as defined by Web of Science). The figure shows the distribution of the research areas of the citing works

Keywords of the citing works provide a succinct overview of the paper’s main ideas and content. Figure 4 shows the two-dimensional density map of the keyword co-occurrence network. Keywords that have co-occurred more frequently are placed closer to each other on this map. The font size indicates the number and strength of the connections of a keyword. A more intense color implies a larger number of keywords and higher connectivity in the neighborhood of the point. Figure 4 was constructed using VOSviewer [30].

Fig. 4
figure 4

Density map of the keyword co-occurrence network based on the citing works

The previous figures lead us to the following conclusions:

  1. (a)

    Obviously, the most frequent keyword is “biomass” which needs no comment.

  2. (b)

    The second most frequent keyword is “lignin” which is one of the most abundant biomass types [31, 32] (up to 25%). Interestingly, the prevalence of lignin research is also reflected in other analyses described in the present paper (see Tables 1, 2).

  3. (c)

    Lignocellulose, a close derivative of lignin [33, 34] appears also frequently.

  4. (d)

    Additional frequent keywords are also in contact with lignin treatment methods (biorefinery, hydrolysis, pyrolysis, lignin valorization, etc.).

  5. (e)

    The above points and also the majority of the rest of the more frequent keywords in the citing works indicate a tendency to work up biomass as a complex chemical raw material by suitable chemical, biochemical or chemical–physical treatments.

Table 1 Centrality measures of top 5 scientists of the co-authorship network
Table 2 Number of citations of the citing works of the researchers listed in Table 1 (citation data from SciFinder 11 December 2022)

3 Basic graph theoretical concepts

First, we summarize here the main concepts of social network analysis (SNA) and graph theoretical definitions (here terms “network” and “graph” are used interchangeably) and fix the notations used hereafter in the present paper. We follow the notations and definitions given in our previous works [21,22,23,24].

Definition 1

A network or a (finite undirected) graph denoted by \(G=\left(V,E\right)\) is represented by a set of nodes (or vertices) \(V\) and a set of edges \(E\) joining some pairs of nodes. Social network analysis SNA is the mapping of relationships between people, groups, computers, and other connected entities (nodes). The edges are the relationships between these nodes.

Definition 2

We say that two nodes \(x\) and \(y\) are adjacent or connected if they are joined by at least one edge \(xy\).

Definition 3

A loop is an edge that joins a node to itself. Two or more edges that are incident to the same two nodes are called multiple edges or parallel edges.

Definition 4

A network or a (finite undirected) graph is called simple if it does not contain loops or multiple edges.

Definition 5

A path between \(x\) and \(y\) is a sequence of successive edges \(x{v}_{1},{v}_{1}{v}_{2}, \dots ,{v}_{i}y\), where nodes \(x,{v}_{1},{v}_{2},\dots ,{v}_{i},y\) are distinct from one another. The length of the path is the number of these successive edges.

Definition 6

Nodes \(x\) and \(y\) are said to be connected (by path) if there exists at least one path between \(x\) and \(y\) in the network. Otherwise, \(x\) and \(y\) are disconnected.

A network is said to be connected if every two nodes \(x\) and \(y\) are connected. Otherwise, we call it disconnected. A disconnected network is formed by connected components that are maximal (i.e., the largest possible subgraphs in which all nodes are connected to each other by path).

Definition 7

The degree of a node \(x\) [denoted by \({\text{deg}}\left(x\right)\)] is the number of edges incident to \(x\), where loops are counted twice.

Definition 8

The average degree of a network with \(n\) nodes and \(e\) edges provides information about the number of edges compared to the number of nodes. Each edge is incident to two nodes and counts in the degree of both nodes, thus the average degree of an undirected network is \(\frac{2e}{n}\).

Definition 9

A path is geodesic if its endpoints cannot be connected by a shorter path. The distance \(d\left(x,y\right)\) between two nodes \(x\) and \(y\) is the length of a geodesic path connecting the nodes (i.e. the length of the shortest path connecting \(x\) and \(y\)).

Definition 10

The betweenness centrality (or overall centrality or simply centrality) of node \(x\) is

$${C}_{B}\left(x\right)={\sum }_{s\ne x\ne t\in V}\frac{{g}_{st}\left(x\right)}{{g}_{st}},$$

where \({g}_{st}\) is the total number of geodesic paths between the nodes \(s\) and \(t\) and \({g}_{st}\left(x\right)\) is the number of geodesics connecting \(s\) and \(t\) that contain \(x\).

This notion was introduced by Freeman [35]. The terms of this sum are the probabilities that node \(x\) falls on a randomly selected geodesic path connecting \(s\) and \(t\) for all different \(s, t \in V\backslash \{x\}\). Nodes with high centrality have a large influence on the transfer of information through the network, under the assumption that information transfer follows the shortest paths.

Definition 11

The \(h\)-index of an author is a metric that aims to measure the productivity and impact of a researcher's work. It is calculated by counting the number of papers that an author has published, and then determining the highest number \(h\) such that \(h\) of those papers have been cited at least \(h\) times.

Definition 12

In a connected graph, the closeness centrality of a node is a measure of the centrality of a node based on its distance to all other nodes in the network. The closer a node is to all other nodes in the network, the higher its closeness centrality score. The closeness centrality of node \(x\) is the reciprocal of the average shortest path distance to \(x\) over all \(n-1\) reachable nodes, where \(n\) is the size (number of nodes) of the network:

$${C}_{C}\left(x\right)=\frac{n-1}{{\sum }_{y\ne x}d\left(x,y\right)}.$$

Definition 13

In a network or a (finite undirected) simple graph \(G=(V,E)\) let us consider three nodes \(u, v\) and \(w\) that are connected by only two undirected edges \(uv\) and \(vw.\) Then nodes \(u, v, w\) form an open triplet centered around node \(v\). Moreover, if nodes \(u\) and \(w\) are also connected in \(G\) by an undirected edge, then nodes \(u, v, w\) form a closed triplet centered around node \(v\). In this case \(uw\) is the closing edge of the closed triplet centered around node \(v\).

Definition 14

The global clustering coefficient or transitivity of a graph \(G\) is defined as

$${Cl}_{global}\left(G\right)=\frac{\mathrm{number\,of\,closed\,triplets}}{\mathrm{number\,of\,all\,triplets\,}\left(\mathrm{open\,and\,closed}\right)}.$$

Using the fact that any triangle subgraph contains three closed triplets, one centered around each of the nodes, \({Cl}_{global}(G)\) can be calculated by dividing the number of triangles times 3 by the number of all triplets.

The local analogue of this concept can be defined as follows:

Definition 15

The local clustering coefficient of a node \(v\) with degree greater than 1 can be given by

$$Cl \left(v\right)=\frac{\mathrm{number\,of\,closed\,triplets\,centered\,around\, }v}{\mathrm{number\,of\,all\,triplets\,centered\,around\,}v \,\left(\mathrm{open\,and\, closed}\right)},$$

and \(Cl \left(v\right)=0\) for nodes with degree 0 or 1.

Thus, the local clustering coefficient of a node \(v\) in a graph describes how close its neighbors are to being a clique (complete graph). Since the number of all triplets centered around \(v\) (open and closed) is equal to \(\frac{{\text{deg}}(v)({\text{deg}}\left(v\right)-1)}{2}\), we have

$$Cl \left(v\right)= \frac{2 \times \mathrm{number\,of\,closed\,triplets\,centered\, around\,}v}{{\text{deg}}(v)({\text{deg}}\left(v\right)-1)}$$

for nodes with degree greater than 1.

Definition 16

The average clustering coefficient of a graph \(G=(V,E)\) is

$$C{l}_{average}(G)= \frac{1}{n}{\sum }_{v\in V}Cl \left(v\right),$$

where \(n\) is the number of nodes in \(G\).

4 The co-authorship network based on the citing works

Definition 17

The co-authorship network based on the citing works is a social network determined by the simple graph with a node set formed by all scientists who have at least one work that cites the main article [13] C. O. Tuck, E. Pérez, I. T. Horváth, R. A. Sheldon and M. Poliakoff Valorization of Biomass: Deriving More Value from Waste (Science, 2012). In this network two scientists (nodes) are linked by a single edge if they co-authored at least one paper that cites the main article.

Edge xy does not mean that the corresponding co-authored citing work or citing works have only two authors (x and y). In case of any edge \(xy\), the possibility of other co-authors is not excluded.

The co-authorship network is based on the citing works, and it is a collaboration network of scientists influenced by the main article. There are no multiple edges in this graph, so in the case of two co-authors we cannot see the number of common citing works, they are linked only by a single edge. The average degree of this network is 7.34. It has 5110 nodes and 18,756 edges with 19 isolated nodes (see Fig. 5).

Fig. 5
figure 5

The co-authorship network based on the citing works. The graph is colored according to communities that were detected by the modularity-based algorithm (Color figure online)

The co-authorship network shown in Fig. 5 prompts us to suppose that only a few groups organized their research in a cooperative manner, and the influence of the main article was extended to a great number of individual scientists or smaller groups.

The degree distribution of the co-authorship network illustrated in Fig. 6 shows that most authors collaborated with five other scientists on citing works (not necessarily working on the same article with all five co-authors).

Fig. 6
figure 6

Degree distribution of the co-authorship network based on the citing works. The figure is cropped at the degree of 30

Table 1 highlights the betweenness centrality score, the closeness centrality score, degree, and h-index (see Definitions 7 and 10–12) of the top 5 scientists in the co-authorship network. Scientists with the highest betweenness centrality are the most important here since they assert more control over the network.

We also investigated the publications of the scientists listed in Table 1 from some other viewpoints. The results are summarized in Table 2.

The data presented in Tables 1 and 2 induce some comments:

  1. (1)

    Three of the five authors (R.-C. Sun, F. Wang, J. Zhang, 60%) are from P. R. of China (Beijing, Dalian, Chengdu), in accordance with the trend shown in Fig. 12 (later in this paper). The other two authors, however, are from countries showing minor contributions to the set of the citing works: K. Barta (Graz, Austria), and G. T. Beckham (Golden, CO, USA).

  2. (2)

    The numbers of publications of these 5 authors within the citing works show only an approximate correlation with the position of these authors in Table 1: K. Barta, 23 papers [36,37,38,39,40,41,42,43,44,45,46,47,48], R.-C. Sun 13 papers [49,50,51,52,53,54,55,56,57,58,59,60,61], F. Wang 5 papers [62,63,64,65,66], G. T. Beckham 11 papers [62, 67,68,69,70,71,72,73,74,75,76] and J. Zhang 4 papers [63, 77,78,79].

  3. (3)

    The citations of these papers (SciFinder 11 December 2022) are even more variable: K. Barta 3262, R.-C. Sun 388, F. Wang 532, G. T. Beckham 4611, J. Zhang 183 with averages ranging from 29.85 to 419.18. However, it could be stated that the averages are fairly high, counting also that a great part of these publications has appeared in the past 2–4 years, that these papers did not have “enough time” yet for being highly cited.

  4. (4)

    It is an important conclusion of the more detailed analysis of the data behind Tables 1 and 2 that the huge majority of the publications of all the 5 authors in these tables dealt with the chemistry or technology of utilization of lignin (and its derivatives), which is in full accordance with the trend shown also in Fig. 4 obtained from the entire data set of the citing works. The lignin-related publications of the 5 authors in Table 1 are K. Barta 17 (73.9%), R.-C. Sun 12 (92.3%), F. Wang 3 (60.0%), G. T. Beckham 5 (45.5%) as well as J. Zhang 3 (75.0%), from a total of 53 publications 37 (69.8%) dealt with lignin and closely related chemistry. This is a much higher percentage than the percentual quantity share of lignin in the total quantity of biomass [31, 32] (up to 25%), indicating that progress in lignin chemistry is a very “hot” topic in biomass-valorization aspirations.

  5. (5)

    The averages of the numbers of citations to the citing works dealing with lignin are higher at all the 5 scientists in Table 1 than averages calculated from all of their citing works. This fact, too, indicates the elevated actuality of lignin-centered research in the field of biomass valorization.

  6. (6)

    It was an interesting “by-product” of this analysis that it turned out that K. Barta (Graz, Austria) has a living cooperation [39, 46, 80,81,82] with the Groningen (NL) group of B. L. Feringa (Nobel Prize in Chemistry 2016). This cooperation brought very recently an outstanding result that is highly relevant to the results of the present analysis: scientists of the Barta-group and the Feringa-team constructed experimentally a kind of molecular motor by sophisticated chemical transformations of the degradation products of lignin and demonstrated its motor-like behavior experimentally and theoretically [82].

  7. (7)

    Signs of cooperation between the groups mentioned in Table 1 can also be found. In Reference [62] one finds cooperation of even three of the five groups in question, and in Reference [63] two of these groups. It is worth mentioning that the 3-group cooperation [62] dealt with problems biorefining of lignin and it is perhaps not exaggerated if we state here that this direction seems to be one of the most important future (and present) directions of biomass-valorization research.

5 The largest connected component of the co-authorship network

We used the Clauset–Newman–Moore greedy modularity maximization algorithm to derive communities in the co-authorship network [83].

Figure 7 shows the largest connected component of the co-authorship network. The network is colored by the communities. The most outstanding results are compatible with the results presented in Tables 1 and 2.

Fig. 7
figure 7

The largest connected component of the co-authorship network

The average local clustering coefficient of the largest connected component is 0.87, the global clustering coefficient is 0.6. However, the top authors from this component (already listed in Table 1) have much lower clustering coefficients, more precisely the following values: Katalin Barta (0.14), Run-Cang Sun (0.16), Feng Wang (0.27), Gregg T. Beckham (0.14) and Jian Zhang (0.49). This can be explained as follows: the local clustering coefficient of a node \(v\) measures the local density of edges in the neighborhood of \(v\), quantifying how close the adjacent nodes of \(v\) are to being a complete graph. If a scientist has a multi-authored article with other scientists of the network, but he has not written any citing work with other than these authors, his local clustering coefficient will be equal to 1. But for those authors who connect many communities of the network (and therefore have a high betweenness centrality) the clustering coefficient will be lower. Thus, there is a negative correlation between betweenness centrality and clustering coefficient, which is clearly shown in Fig. 8.

Fig. 8
figure 8

The scatterplot of the local clustering coefficient against the betweenness centrality

Figure 8 shows the scatterplot of the local clustering coefficient against the betweenness centrality calculated on the largest connected component of the network. The top 5 scientists listed in Table 1 are annotated.

6 The top two places of the list of journals publishing citing works

Figure 9 gives the broad spectrum of journals publishing at least 12 citing works. There are 20 such journals, with Green Chemistry and ACS Sustainable Chemistry and Engineering in the top two places (110 and 83 citing works, respectively).

Fig. 9
figure 9

Journals that published at least 12 citing works

The results shown in Fig. 9 led us to construct the subgraphs of citing works published in Green Chemistry and ACS Sustainable Chemistry and Engineering, which are shown in Figs. 10 and 11.

Fig. 10
figure 10

The subgraph of citing works published in Green Chemistry

Fig. 11
figure 11

The subgraph of citing works published in ACS Sustainable Chemistry and Engineering (Color figure online)

We have extracted the different countries from the affiliations of the authors of the papers and counted the occurrences/frequencies of the countries. Figure 12 illustrates the top 10 most frequent countries and leads us to the conclusion that China has the highest number of citing works.

Fig. 12
figure 12

Top 10 countries

Figure 13 provides the co-authorship network of the countries: the size of the nodes (countries) denotes the betweenness centrality of the countries, and the edge width represents the number of papers written in collaboration between the countries.

Fig. 13
figure 13

Co-authorship network of the countries

7 Citations of the examined citing works

Beyond the statistics presented in Table 2, we checked how many citations each citing work (article) received and how it correlates with the number of authors and the internationality of the articles. An article is international if its affiliations include at least two different countries. Note that even a single-authored article can be international if the author has more than one affiliation in different countries.

Figure 14 shows the box plot of the distribution of the number of citations of the citing works, grouped by the number of countries (which are listed in the affiliations given in these papers). As shown in this figure, papers written in international collaboration receive more citations than non-international papers. This result is in accord with the results of related works [84, 85].

Fig. 14
figure 14

Box plot of the distribution of the number of citations of the citing works, grouped by the number of different countries listed in the affiliations

In addition, Fig. 14 shows an increasing trend up to papers belonging to five countries. Note that the number of papers belonging to more than five countries is insignificant.

We also studied what is the relationship between the number of authors and the number of citations. Figure 15 shows the distribution of the number of citations of citing works against the number of authors, grouped by a binary variable that indicates whether the paper was written in an international collaboration. The distribution indicates that while international papers receive more citations in general, papers with several co-authors typically do not receive more citations than sole author works. We see a downward trend up to 10 authors, and then a slight upward trend from there.

Fig. 15
figure 15

The box plot of the distribution of the number of citations grouped by the number of authors and colored by whether the paper has been written in international collaboration or not: orange denotes international papers, and blue corresponds to non-international papers

Finally, the last figure (Fig. 16) shows the distribution of the number of citations grouped by research areas. Only research areas with at least 10 articles are shown. Here we do not see significant differences in the number of citations among the research areas. This result indicates the broad spectrum of interdisciplinary approaches applied to solving the problems of biomass valorization.

Fig. 16
figure 16

The distribution of the number of citations grouped by research areas

8 Concluding remarks

The statistical analysis of publications that cited the important summary of biomass-valorization [13] published 11 years ago, provided some interesting features of research activity in this field. The results of the statistics indicated the most actual research directions, as well as the most successful persons, groups, countries, and journals. The numerical nature of the type of analysis used in the present paper enables us to get an idea also of the “size” of differences disclosed by the analysis, which can be highly useful when planning new research programs or continuing already existing efforts, as an example we mention here only the publication Abu-Omar et al. [62], where the fact of cooperation of three important groups indicated the importance of the use of biorefining technology in lignin valorization, which most probably can also be generalized in several other fields of biomass chemistry and technology.