Most of the GP statistics are derived from the GP-bibliography.Footnote 1 The bibliography is available from the Internet in a variety of formats and locations. Also entries from it have been incorporated in other online bibliographic resources. Unfortunately, no similar effort has been undertaken for the literature on evolvable hardware, so this section deals only with GP.
As of November 2009 there were 5253 GP entries in the GP bibliography (excluding late breaking papers, unpublished, miscellaneous, master thesis, undergraduate student reports and some short posters). (5253 is nearly twice the number in 2004.) Figure 1 shows the number of each entry according to when they where published and by type. Naturally most papers were published in conference proceedings. Initially there was an exponential rise, with the number of publications doubling every year from 1988 to 1996. This has been followed by a rapid linear increase since 1997. (Figure 2 right shows that this is a typical behaviour for Computer Science bibliographies.) In the case of GP (and perhaps others) the effort available to maintain the bibliography has not increased exponentially. Hence the bibliography has lagged behind the growth in the field. A lot of effort was devoted to ensuring the GP bibliography was as complete as possible up to 1996 (corresponding to the publication of Advances in Genetic Programming volume 2 which contains an annotated bibliography of GP [3]). Unfortunately since then there have been GP publications which have escaped recording in the bibliography. This leads to a bias in favour of those researchers who actively support the maintenance of the bibliography.
Figure 2 plots the number of people active in GP (i.e. according to the GP bibliography they published in a given year). Figure 2 shows that even in recent years (2000 onwards) almost half the authors who published in that year were new to GP. The total number of people who have published GP related papers is about 3494. (Almost twice the figure in 2004.) The number of new authors per year, along with the rise and fall of publication types like PhD dissertations, gives a sense of how vibrant the GP community is. However, as GP has matured more application papers appear in biology, chemistry and other non-computer science journals. Unfortunately this probably means some recent articles are missing.
Figure 3 shows the distribution of the number of authors per GP publication (excluding MSc. theses, unpublished, etc.) for each year. We can see that as the field took off in the 1990s, the publications were dominated by one and two authors. However, as we approach the 2000s, the number of three, four and five authors has been steadily increasing. As GP is largely an empirical research field, it makes sense that as applications and analysis has matured, more collaboration is taking place resulting in multiple authors. Also, as GP is applied to other disciplines, we would expect to see more co-authors appear on publications.
Use of the GP bibliography
The GP bibliography can be searched via the collection of computer science bibliographies. The Artificial Intelligence collection of bibliographies is the 4th largest computer science collection by subject. And within the AI collection, the GP bibliography is the 4th largest. (Up from 7th in 2004.) Since logging started (April 2003) up to November 2009 there have been approximately 23,251 page views of the GP bibliography home page. Figure 4 shows that typically the use of the bibliography web pages is concentrated in Europe, North and South America, the far east and India.
Online electronic versions of papers, even those on publisher web pages, can be directly linked to the bibliography. There are 4115 GP publications with such links (almost three times that in 2004). In almost all cases, possibly with a small amount of manual intervention, the paper is actually available via its links. This is an increase on 2004, when about 10% of links were broken. In other words the text of 76% of GP papers are immediately available via hyperlinks in the bibliography. This is 25% more than 5 years ago. Most of the change comes from the increased proportion of journal articles (92 vs. 54%), conference proceedings (73% up from 45%) and papers in collections (46% up from 32%) which are available online. The fraction of PhD theses (69 vs. 63%) and technical reports (87 vs. 82%) are little changed. This is a little disappointing. For one of your papers to have an impact it must be available and that means available on the web [4]. Yet still about a quarter of GP papers are not (as far as the main index is concerned) on line.
Use of the GP bibliography papers
Figure 5 shows that most requests to download GP papers are automatically generated by computers interrogating other computers. However, since many robots are written to try and conceal their owners and their purposes, it is impossible to be sure that web traffic is correctly allocated. The two main robots are those (apparently) belonging to Google and Yahoo. Much of the load is due to robots re-reading the papers (to check that they have not changed). Some bursts of activity, cf. Figure 6, appear to be concentrated at weekend nights to avoid inconveniencing other users. However robot activity varies radically. Figure 7 suggests a vaguely log-normal distribution, with download rates clustering near the average.
The bibliography is used continuously, at every hour of every day (including Christmas). There are no downloads recorded for 17 hours, indicating that the web site was probably down for only 17 hours in the whole year.
It appears that on average more than 20 GP papers are downloaded via the bibliography per day by people.
Over the past 3 years, the most popular papers (i.e. the most downloaded by people) have been tutorials on GP, followed by financial applications. Cf. Table 1. (More than half the top 20 downloaded papers are on finance.) This is followed by surveys, user manuals for GP packages and more widely drawn applications. Naturally the most downloaded authors are those with the most online papers linked to the bibliography and authors of popular tutorials or popular finance papers, cf. Table 2.
Table 1 Most frequently downloaded GP bibliography papers (from Sep 2006 to Oct 2009)
Table 2 Most downloaded GP bibliography authors
When calculating Tables 1 and 2 we have been as scrupulous as possible to ensure we include only real personal downloads and exclude all web robots. Unfortunately this is not easy and so we have deliberately erred on the cautious side. By excluding cases where we are not sure, we will underestimate. Nonetheless the results should still give a fair indication of use of the GP bibliography by people.
Surprisingly GP papers downloaded by people via the bibliography are mostly accessed using commercial Internet service providers (ISPs). Even the most active university (Essex) is not in the top ten. This could be because universities may have subscriptions which encourage academics to search via publishers’ web pages or simply because most people access the Internet via ISPs. Even university students who work from home or coffee shops, etc., may use an ISP rather than a university connection.
The GP coauthorship community
In 2006 Cotta and Merelo [25] used bibliographic data from DBLP [26] relating to many fields of evolutionary computation, including GP and Evolvable Hardware, to show the global structure of EC. Future reviews of genetic programming and evolvable machines might also use other literature analysis tools such as those recently used on Swarm Intelligence [27].
The GP bibliograpy has been used in a number of studies. For example, Luthi investigated research collaborations in GP using co-authorship [28] and how friendships change with time [29]. In [30] we showed how the “small world” sparse connections of the central connected component of the GP co-authorship graph can be analysed into eigen-collaborations.
As a result of this research, the bibliography now includes an online tool which displays the current collaboration network and allows users of web browsers to search and navigate through collaborations. It even allows direct access to the papers produced as fruits of such joint work. Figure 8 shows the central connected component of the GP bibliography.