Introduction

The citation indexes established by Eugene Garfield are selective databases. In his book, Garfield (1979) explains that there are several reasons for selecting which journals to index: it is impractical or even impossible to list all scientific journals and it is not economically feasible to index all journals. Thus there has to be a selection process, which is based on the core journals in each discipline (Bradford 1934). However, when considering a multidisciplinary database, it turns out that much more than the core is covered, because according to Garfield’s law of concentration (Garfield 1977), the “tail of the literature of one discipline consists, in a large part, of the cores of the literature of other disciplines” (Garfield 1979, p. 23). Due to this finding, the core literature of all scientific disciplines is estimated to be covered by no more than 1,000 journals. In 1979, the Science Citation Index already covered more than 3,000 journals, which is far more than the core. ISI included additional journals based on quality judgements. As of January 2009, the Science Citation Index Expanded accessible through the Web of Science (WOS), indexes more than 6,650 journals (Thomson Reuters 2008), and the list is constantly expanding. As a comparison, Elsevier’s Scopus lists almost 12,000 active journal titles in the areas of health, life, multidisciplinary and physical sciences (Elsevier 2008).

Until recently, the Web of Science included journal papers only with a few exceptions. One of the more notable exceptions was the Lecture Notes in Computer Science (LNCS) series—most volumes of these series are proceedings of computer science conferences. Just as an example, the series published more than 550 volumes in 2008 alone, and the huge majority of these volumes are proceedings of computer science conferences. WOS indexes, as of January 2009, 139,302 items from LNCS and an additional 25,999 items from its sub-series Lecture Notes in Artificial Intelligence (LNAI). The earliest indexed records from the LNCS series that are indexed by the Science Citation Index Expanded are from 1981, and between 1999 and 2005 the LNCS was even included in the Journal Citation Reports. It is not quite clear why LNCS were indexed by ISI, because the huge majority of the items published in LNCS are conference papers, and LNCS does not cover the most prestigious computer science proceedings. The proceedings of the two major societies (ACM and the IEEE Computer Society) are published by the societies and are not part of the LNCS series.

However, the above discussion is of little value, since in September 2008, ISI merged into WOS all the records from its two proceedings citation databases (Thomson Reuters 2008), Conference Proceedings Citation Index-Science (CPCI-S) and Conference Proceedings Citation Index-Social Science & Humanities (CPCI-SSH) covering proceedings from 1990 to present. When searching WOS, one can exclude these two databases from the search, but the citation counts are from the whole database and they include citations received also from proceedings publications indexed by ISI.

In this paper we examine the effects of this recent change on computer science publications. It is well-known that in computer science proceedings publications are considered to be at least as important as journal publications (see for example Kling and McKim 1999; Goodrum et al. 2001 or Visser and Moed 2005). Thus, it is expected that the inclusion of citations to and from proceedings will have a huge influence of the number of publications and on the number of citations to publications. Here we considered the most highly cited publications of a sample of highly cited computer scientists (as defined by ISI Highly Cited Researchers—http://hcr3.isiknowledge.com/).

Literature review

One of the issues we consider in the paper is re-publication of proceedings papers as journal papers later on. Drott (1995) examined the re-publication rate of papers published in the proceedings of the ASIS conferences and found that the republication rate was much lower than expected in other disciplines. He created typography of conference paper functions:

  • Self improvement—as a venue to report initial results, which are later extended and rewritten in the form of a journal paper.

  • Group contribution—as a means of sharing information.

  • Final product—no intention to republish. Goodrum et al. (2001) suggest an additional category: a substitute for journal publication. This category is based upon discussions with computer scientists who view conference proceedings as sufficient and do not feel the need to republish the results in journals. In our opinion this fourth category is well covered in “final product”.

Glänzel et al. (2005) found, based on data extracted from the 1994–2002 volumes of the ISI Proceedings database’s Science and Technology edition, that about half of the papers indexed there belonged to the field of engineering. In their categorization, computer science is viewed as part of engineering. Their results support the prevailing view that proceedings publications have great importance in computer science.

Moed and Visser (2007) produced an extensive report on the need and feasibility of extending WOS with proceedings publications in the field of computer science. They explored this possibility for Dutch computer scientists. The WOS source and citation data was expanded with proceedings published in LNCS, ACM and IEEE computer science conferences. The expanded database increased the coverage of the publications of Dutch computer scientists from 25% to 35%. The results were shown to Dutch computer scientists, who claimed that even with this extended coverage some of the important conferences were missed. The internal coverage of the extended database (i.e., the percentage of citations in the items indexed by the database that referred to items in the extended database) was 51%, which is only a moderate coverage compared to about 80% internal coverage in WOS for physics and chemistry. The citation impact of proceedings series was found to be more variable than annual volumes of journals, but the citation links between recurring conferences was found to be statistically similar to journal self-citation rates.

Bar-Ilan (2006) studied the citations to works of Michael Rabin, a prominent computer scientist and found that among the top 12 most highly cited items based on sources indexed by WOS, there are two technical reports and one conference publications, emphasizing that other means of publications (sometimes called grey literature) are of high importance in computer science not only in terms of quantity, but also in terms of visibility (citation count). She also developed the notion of several manifestations of a work based on the FRBR specifications (IFLA 1998). In computer science rather often several almost identical versions of the same work are published, first as a technical report or a preprint, then as a proceedings paper and later as a journal paper and/or a chapter in an edited book. A similar notion of a concept is also mentioned by Moed and Visser (2007). This line of thought will be further developed in the current paper.

Goodrum et al. (2001) compared highly cited items in the ISI’s Science Citation Index with Citeseer (now at http://citeseer.ist.psu.edu/—not updated anymore, for an updated version see Citeseerx beta at http://citeseerx.ist.psu.edu/), a computer science citation database where items are indexed and citations are extracted automatically. With one exception, all of the overlapping items between the 25 top cited items in both databases were to books or book chapters, and none of the items in either list were proceedings papers.

Meho and Rogers (2008) compared the citation counts retrieved from WOS and Scopus for 22 top researchers in the area of human–computer interaction (HCI), which can be viewed as a subfield of computer science. The data from WOS was retrieved before the proceedings databases were merged into WOS. The results show that Scopus provided considerable better coverage mainly due to the indexing of ACM and IEEE proceedings. However the wider coverage did not significantly alter the rankings of the scientists based on citation counts.

Lisée et al. (2008) studied conference proceedings in general. They showed that conference papers age faster than journal papers, found that conference publications constitute about 8% of the references in engineering papers, and about 20% of the references in computer science papers. The data for the study was derived from the ISI Citation Indexes (without the Proceedings Citation Indexes). Frohlich and Resler (2001) used the Science Citation Index to determine the citation counts for all publications of the University of Texas Institute for Geophysics, and found that papers in what they called “mainstream journals” receive on average considerably more citations than papers published in proceedings.

The rest of the paper is organized as follows: first we describe the research objectives and the data collection and analysis. In the results section the number of publications with and without proceedings papers and the percentage of citations from proceedings papers in the h-core are analyzed. Finally we discuss the problem of “re-publications”, i.e., proceedings papers that are published later on as journal papers.

Methods

All of the above-mentioned studies examined ISI citation patterns before the merger with the proceedings citation databases. This merging created an entirely new situation where citations are extracted from a much larger database, and thus it is of great importance to study and understand the effects of this change. It should be noted that the study conducted by Moed and Visser (2007) also created an expanded database for computer science, with the papers from the most important conference series added to the ISI database. The effects of this change were studied for Dutch computer scientists irrespective of their scientific standing, whereas here we consider highly cited computer scientists, and study the changes in the number of publications, number of citations and the sources of citations.

In the current study we asked the following questions:

  • What is the effect of the inclusion of proceedings papers on publication counts?

  • What percentage of the top-cited items are conference publications?

  • What is the percentage of the citations to the top-cited items that come from conference proceedings?

  • What proportion of the conference publications are later re-published as journal papers?

As a case study we chose computer science, where the importance of conference proceedings is known to be high. More specifically, the current study concentrated on highly cited computer scientists.

Data collection and analysis

The list of highly cited computer scientists was retrieved from the ISI Highly Cited database (http://hcr3.isiknowledge.com/home.cgi). As of mid-December 2008 the list of highly cited computer scientists was comprised of 339 researchers. Out of this list a random sample of size 47 was created. In a few cases the names were highly ambiguous and even after refining the publications by subject area to computer science related fields it was difficult to tell whether the remaining list contained only the publications of the specific researchers and whether most of his/her publications were included. In these cases the next researcher in the alphabetic list was selected instead of the researcher chosen for the random sample.

For each selected researcher we downloaded his/her list of publications. Researchers with middle initials sometimes appear without their middle initials, thus we searched both variations, i.e., when searching for the indexed publications of Barbara H. Liskov, the query was Liskov BH or Liskov B in the author field. For each researcher two queries were submitted, once to all databases (SCI Expanded, SSCI, A&HCI, CPCI-S and CPCI-SSH) and once without the two proceedings databases (CPCI-S and CPCI-SSH). In each case the results were filtered to include relevant subject areas only (computer science and its subcategories, electronic and electrical engineering, applied mathematics and telecommunications). The number of journal articles and proceedings papers as categorized by WOS (under document type) was recorded. It must be noted that the document type categorization is not perfect. In our context we noticed special problems with the Lecture Notes in Computer Science, which is sometimes categorized as a journal and sometimes as proceedings. In addition many of the 1990–1992 LNCS records are indexed twice (once as a journal and once as a proceedings paper), inflating both the publication and the citation counts. This double-indexing has been corrected since, but the analyzed dataset includes doubly indexed items.

When searching for author, at the time of data collection in December 2008, WOS searched not only for authors but for editors as well, and retrieved all items published in volumes/proceedings edited by the researcher. For example, 508 records were retrieved for Oscar Ibarra, but he authored only 206 items according to WOS. We could not find any systematic way to exclude edited, but not authored items through the WOS interface, but this was easily done on the files downloaded from WOS’ Marked List feature in text-delimited form. Sometime during the first part of 2009, WOS added a new field “editor”, and since then when searching for author only items authored by the given person are retrieved. All items edited but not authored by the selected researcher were excluded form the analysis.

Next we determined the researcher’s h-index (Hirsch 2005) using the Citation Report option of WOS and removing items that were edited but not authored by the specific researcher. For each of the h i items (h i —is the h-index of researcher i), we recorded the number of journal articles and the number of proceedings papers that cited the item, based on the information provided by the “Analyze Results” option of WOS. We also noted the number of proceedings papers among the h i items.

Data for the 50 selected researchers were collected during the second half of December 2008. In the next section we provide descriptive statistics derived from the collected data.

Results

Publication counts

Table 1 displays the publication counts of the selected scientists with and without the Conference Proceedings Citation Indexes (CPCIs). We see that on the average the publication counts increase by 39% with the addition of the new databases. We also see extremely huge variations between the researchers.

Table 1 Publication counts of the selected scientists with and without the Conference Proceedings databases

In Table 2 we see the number of articles and proceedings papers WOS indexes, without and with the citation indexes. As stated before, even before September 2008, WOS indexed some proceedings papers, but their number increased considerably after the merge with the proceedings citation indexes.

Table 2 Distribution of articles and proceedings papers of the selected scientists with and without the Conference Proceedings databases

Note that the sum of the journal and proceedings papers is usually somewhat lower than the total number of indexed items. This is because besides articles and proceedings papers there are additional document types (e.g. editorials or review articles). On average 39% of the publications are proceedings papers (SD: 17%), and 52% are journal articles (SD 16%) when the proceedings citation databases are included. When trying to interpret these numbers, one has to take into account that the proceedings data are only from 1990 and onwards, whereas the journal data are from 1965 and onwards and some (or even most) of these highly cited researchers were active before 1990, thus probably the actual percentage of proceedings papers is higher than what we see from the WOS data. Before the merge, on average for each these highly cited computer scientists 69% of their indexed publications were articles and 19% proceedings papers. Again this relatively large percentage of proceedings papers was mainly due to the indexing of the Lecture Notes in Computer Science and the Lecture Notes in Artificial Indexing series in the citation indexes even before the inclusion of the proceedings citation databases.

We presented Tables 1 and 2 for comparison purposes. From here on the data only relates to Web of Science with data from the conference proceedings citation indexes.

Citation counts

In Table 3 we display the data related to the citations, total citations, the h-index, the number of proceedings papers in the h-core, the number and percentage of journal citations to the h-core and number and percentage of proceedings citation to the h-core. The h-core, as defined by Burrell (2007) is the set of papers that are included in the computation of the h-index, where the h-index (Hirsch 2005) of an author is the unique number such that h of his/her papers have h or more citations, and the rest of the papers have at most h citations. The sum of the percentages of the journal and proceedings papers in the h-core is less than 100%, because of the additional document types that are not recorded in the table.

Table 3 Citation data of the selected scientists—journal citations and proceedings citations

The average h-index of these highly cited researchers is 19.4. Out of the papers in the h-core only 12.7% on the average are proceedings papers, i.e., most of the highly cited papers (at least in this sample) are journal articles. There are 14 scientists in the list for whom there are no proceedings papers at all in the h-core. The highest number of proceedings papers in the h-core is 11, which is 45.8% of the size of the h-core for Rajeev Alur.

There are only two cases where the publication with the highest number of citations is a proceedings paper. These are:

  • Broder, A; Kumar, R; Maghoul, F; Raghavan, P; Rajagopalan, S; Stata, R; Tomkins, A; Wiener, J (2000). Graph structure in the Web. Computer Networks-The International Journal of Computer and Telecommunications Networking, www2000, 33, 309–320

and

  • Tarokh, V; Seshadri, N; Calderbank, AR (1998). Space–time codes for high data rate wireless communication: Performance criterion and code construction. IEEE Transactions on Information Theory, 1997 IEEE International Symposium on Information Theory, 44, 744–765.

The first paper was authored by Prabhakar Raghavan and the second by Nambu Seshadri. Note that both proceedings papers were published in special issues of journals. Such items were sources of proceedings papers in the Web of Science even before the merge with the proceedings citation databases.

On the other, hand, citations from conference papers constitute on average 42.9% of the citations to the papers in the h-core. For 17 researchers (36.2%) more than 50% of the citations to their h-core are from proceedings publications. Thus we see that even though conference papers are considered very important in computer science, the journal papers receive more citations on average than proceedings papers, at least for our sample. However, the results indicate that proceedings are a major source for citations, and the extension of WOS with the Proceedings Citation Indexes has a major impact on citation counts.

Re-publication of proceedings papers

Proceedings papers can be seen as a first publication of a result (sometimes a technical report or a preprint precedes it). Results published in a conference can be later re-published (perhaps in an extended form) as a journal paper. To study the extent of this behaviour, we tested the publication lists of half of the sampled researchers, and looked for identical or almost identical titles of proceedings papers and journal articles. The results are displayed in Table 4. Note that we looked for near identical titles only; if the titles of the journal articles were considerably different we were not able to identify them as re-publications of the original. The findings are based on the titles only, and it is quite possible that the re-published items are considerable extensions of the originals. In addition the journal coverage is from 1965 and onwards (our institutional subscription provides access from 1965 and onwards), while the proceedings coverage is only from 1990 and onward. However, even if this is the case the re-publication of works has considerably effect of publication and citation counts—this point is further discussed in the next section of the paper.

Table 4 Exemplifying the extent of re-publication

We see from Table 4, that on average almost a quarter of the proceedings papers are republished as journal articles. In a few cases the item published first (the proceedings paper) received more citations than the journal paper that was published at a later time. In computer science there are considerable publication delays of journal articles which can explain why the proceedings papers are cited.

Discussion

The results show that proceedings publications have a major effect on the publication and citation counts of highly cited computer scientists. In this section we discuss the implications of re-publication of works, which as we saw above is rather prevalent in computer science. Its extent is probably even greater than what can be seen in Table 4.

The FRBR (IFLA 1998) defines four entities for describing products of intellectual or artistic endeavour:

  • Work—a distinct intellectual or artistic creation, an abstract entity.

  • Expression—realization of a work. FRBR views “variant texts incorporating revisions or updates to an earlier text are viewed simply as expressions of the same work” (IFLA 1998, p. 16). But on the other hand, rewritings, abstracts and summaries are considered to represent new works.

  • Manifestation—the physical embodiment of an expression of a work. When production involves changes in the physical form, it results in a new manifestation.

  • Item—a single exemplar of a manifestation.

Moed and Visser (2007) used the term concept to describe something similar to the FRBR entity work. Bar-Ilan (2006) used the term manifestation to describe different versions of the concept, i.e., re-publication of a result in different publication venues. However a closer examination of the FRBR taxonomy shows that manifestation is not the appropriate terminology for what we are trying to describe here. Taylor (2007) would almost definitely view each republished item as a different work, because usually the journal publication extends and revises the proceedings paper. However, we prefer to talk about different expressions of the same work. Note that Wilson (1968) differentiates between work, text and example, where text can be a version of a specific work. Here we will use the terminology developed in FRBR.

Publishing several expressions of the same work has far-reaching effects on both publication and citation counts. When the ISI indexed journal articles only, the extent of indexing several expressions of the same work was very small. Now the situation changed considerably. The extension of WOS results in an increase in publication counts and also in citation counts because of the increase in the number of source items. Such increase takes place also when ISI increases its journal coverage, but extending journal coverage does not necessarily increase the number of expressions of a work indexed by the database. However, the addition of proceedings papers as source items to the database does not simply increase the quantity of the source items, but results in an increase in multiple expressions of the same work in the database. Although, usually journal articles based on previously published proceedings papers are not identical to the proceedings paper, still there is a huge overlap in the reference lists of the two publications. Thus, multiple expressions of a work not only have a positive effect on the publication counts of the authors, but they also have a positive effect on the citation counts of items referenced in the publications. Thus it seems that researchers living in the “publish or perish” and “get cited” world should all benefit from multiple expressions of a work. Thus, if for some reason the author fails to re-publish a proceedings paper as a journal paper (as this happens quite often in computer science, where some researchers are not interested in publishing their results as journal articles), then not only his publication count “suffers”, but also the citation counts of the items referred in his proceedings paper.

There is another point to be considered. It is not always enough to get cited, sometimes authors want (or need for promotion) to have highly cited papers that count when computing the h-index. In this case multiple expressions can have an adverse effect, because instead of citing a single expression of a work, the referring author can chose from several expressions, resulting in a dispersal of citations among the different expressions. It can happen that none of the expressions becomes part of the h-core, whereas combined citation counts from all the expressions of the same work would have increased the h-index.

If we opt for counting citations to a single work, instead of a specific expression we come across interesting, new problems: how do we differentiate between works and expressions? Does a thoroughly extended and revised paper, where errors in the proceedings paper are corrected constitute a new work? Or (as it sometimes happens) when the set of authors involved in the proceedings paper is not identical to the set of authors of the journal article, are we still speaking of multiple expressions? These questions were of minor relevance when the citation database for a very high percentage of the items contained only a single expression of each work. Now that WOS is extended with the Proceedings Citation Indexes and Scopus also covers a considerable number of proceedings we will have to deal with these questions.

Conclusion

This paper studied the effects of the extension of WOS with the Conference Proceedings Citation Indexes. As a case study we examined the publications and the citations of a set of highly cited computer scientists. The results based on WOS show a considerable increase in both publication and citation counts.

We recommend further studies in this area examining the effects of conference publications in Scopus and comparing the researchers’ publication lists with the items indexed by the Web of Science.

We also raised some theoretical questions regarding works, expressions and manifestations of products of intellectual endeavours.