Introduction

Areas of scientific research that generate intense interest from other scientists tend to be perceived as the most promising (Braam et al. 1988; Hirschman 1970), are particularly well funded (Boyack and Borner 2003), and are more likely to result in commercial discoveries (Narin et al. 1997; Trajtenberg 1990). In this paper we study small clusters of highly cited research, called “research fronts.” We work to provide quantitative and qualitative support for continued, focused study of these areas as important for understanding the development of science and technology more broadly. These areas of intensive work are interesting to R&D laboratories looking for future innovation breakthroughs, venture capitalists looking to allocate investment, governments interested in promoting emerging science, and researchers hoping to work on promising topics.

The long-term goal of this work is to develop a robust and efficient methodology for identifying and tracking highly cited research areas at the micro-specialty level. This includes detecting them as they emerge and understanding the role these fronts play in the development of science and technology. The broad requirement of this methodology is that it does not presuppose the existence of any specific research area, such as would be required in a traditional literature-searching approach, nor any prior knowledge about the scientific area, but instead relies on an objective, comprehensive monitoring of citations. It should be possible to increase or decrease the sensitivity of the detection by adjusting parameters and make direct comparisons of different time slices. In addition, the method should be multidisciplinary and utilize field normalization to obtain a systematic view across different disciplines. The scope should be scalable from the micro-structure to the macro-structure of science to see the context of the innovation. Finally, the method should capture both social aspects and the topical content of scientific areas.

In order to establish this methodology for the identification and exploration of emergent research fronts, we first describe the distinguishing factors of such fronts. In the second section, we provide an overview of the existing literature and its contribution to the field. The third section of this paper delves into the proposed methodology of co-citation clusters. In the forth section, we describe our quantitative data, the rationale behind the choice of variables as well as the regression model to analyze the relationship between emergence and absorption of research fronts. The qualitative analysis of the following section builds on this methodology to highlight examples from scientific research being conducted in top U.S. universities. The case studies ground our work with specific examples and explore interesting aspects of these cases, suggesting future extensions for this work. Finally, we discuss the overarching findings of this study, and reiterate the contributions of our proposed methodology to the field of science and technology studies.

Characteristics of research fronts

A research front can be conceptualized as the convergence of scientific findings and social interests. New scientific findings may initiate the process of front formation by attracting the interest of more scientists who form social ties and generate more findings. The relevance and bearing of each new finding is continuously defined and evolved by the group. The foci of interests can be driven, of course, by the sources of funding as well as perceived scientific potential. This combined intellectual and social process is seen most vividly in the publications and citation patterns in science and technology. It is manifest in the emergence of clusters of highly cited papers representing the key scientific findings that are cited jointly. The authors of these cited and citing papers form what Derek Price has called an “invisible college.”

For successful research fronts that generate important findings there are two possible outcomes—they can grow independently as areas of study, or be “absorbed” by others as the result of their impact. In the first case, the front may initially grow in size and then split off as a new field of research or even develop into a new discipline (Small and Greenlee 1990). Alternatively, in the second case, a successful front may have great influence on its field and thus be incorporated by it, effectively being “absorbed” by the appropriation of the insights of the front within a broader field. This process of absorption, as it pertains to specific findings and papers in science, was described by Robert Merton as “obliteration by incorporation,” in which explicit mention of prior knowledge can disappear because of its very success in generating interest and use (Merton 1968, 1972).

In this study we distinguish between these two outcomes by differentiating between fronts that “emerge” by growing in size (growth) and fronts that are “absorbed” as a result of their papers being increasingly cited (impact), resulting in a kind of absorption through diffusion. Which fronts emerge and which are absorbed, we hypothesize, is significant in determining the shape and structure of scientific and technical research as it evolves.

One notable aspect of research fronts is their potential to span traditional scientific disciplines. Potentially, for example, fronts that combine disciplines and challenge existing paradigms will have more difficulty being absorbed and may, in aggregate, presage paradigm shifts (Kuhn 1970). The progress of science is a result of the virtual and actual collaboration of thousands of scientists who, formally and informally, share their findings and build on one another’s work. The research on explicit collaboration between scientists has emphasized the value of cross-company alliances, informal networks, and social capital (Gittelman 2003). Interdisciplinary work, as a process for sharing information and as an inspiration for analogies, is often seen as one of the drivers of innovation (Amir 1985; Birnbaum 1981a, b; Ponzi 2002). In the absence of such interdisciplinary work, knowledge tends to become more compartmentalized and the interference of disruptive paradigms become more likely as tensions between distinct theories accumulate (Fleming and Sorenson 2001). We examine the role of interdisciplinary research in research fronts when knowledge combines from different disciplines in an intellectually cohesive manner (Cuhls 2003).

We use the Web of Science database from Thomson Scientific of over 8,500 journals and over one million articles per year in the sciences and social sciences and analyze it for emerging fronts representing new micro-evolutions in science. Our data consist of highly cited articles clustered through co-citation to obtain closely related sets of articles. Earlier studies have shown that these clusters, or fronts, correspond to small, socially cohesive groups of scientists working on closely related topics (Small 2003). A few studies have attempted to find or predict emerging areas of research, particularly over such a comprehensive database (Small 2005). Using single link clustering, we have a rapid and efficient means to analyze large data sets in successive time frames to identify new or continuing clusters. Our methodology builds on the contributions made by earlier research aimed at delineating the patterns of science’s progression.

Identifying emerging research

The effort to understand innovation through an examination of co-citations among scientific and technical papers began at the Institute for Scientific Information—now Thomson Scientific—in the 1970s (Small 1976). As Sullivan et al. (1977) put it, “A series of claims for the technique of co-citation analysis…. The first and most important claim is that co-citation clusters ‘reflect the … cognitive structures of research specialties.’“Early research found that citation structure might be used to gain insight into the social structure of science and technology—how knowledge changes and develops over time (Crane 1972; Garfield and Stevens 1965; Small 2003). The ideas of schools of thought that could be revealed through citation analysis was crystallized conceptually by Crane (1972) based on older ideas of the social structure of science pioneered by Derek Price, Thomas Kuhn, and Robert Merton (Kuhn 1970; Merton 1972; Price 1961, 1963).

Much of the earlier research in this field has focused on delineating the structure of science using algorithms that find similar papers and organizing them into clusters (Small 1977). Later studies have mapped out specific fields such as scientometrics itself (Chen et al. 2002), management and information science (Culnan 1986, 1987), organizational behavior (Culnan et al. 1990), chemical engineering (Milman and Gavrilova 1993), economics (Oromaner 1981), and space communication (Hassan 2003). Recently, many researchers have focused on the visualization of these fields, developing tools such as crossmapping and DIVA (Morris and Moore 2000), HistCite (Garfield 1988; Garfield et al. 2003), and Pathfinder (White 2003), and methods for graphing large-scale maps of science (Small 1997). For a review of the seminal literature see Osareh (1996).

Morris has developed a method to help expert panels evaluate small research topics. This method organizes papers visually over time and studies the evolution of a topic, such as anthrax research. The focus is on temporal changes and timeline visualizations. Morris first clusters documents based on bibliographic coupling, the sharing of references, and then visualizes clusters using horizontal timeline tracks, plotting documents along them.

Small has developed a comprehensive method for identifying and tracking research fronts (Small 2003) based on the co-citation of highly cited papers. We build on Small’s methodology and suggest additional steps to weed out certain artifactual clusters.

Methodology for delineating research area

Unlike most methods for analyzing research areas, co-citation clustering is an a priori method that makes no assumptions about what research areas exist. Rather it selects whatever papers are highly cited using a global criterion and clusters these papers based on their pattern of co-citation. One limitation of this method is that it will not identify a specialty if none of its papers have become highly cited. Thus, the co-citation method will not detect an area immediately upon its emergence, but rather at some later stage in its development. In addition the method does not identify all papers that might be considered relevant to the area. Rather, it is designed to simply detect that a research area exists and provide a sample of its highly cited and citing papers. The distinguishing feature of the method is that it is designed to do a quick screening of the scientific landscape rather than a definitive delineation of some specific area.

For the purpose of this study, we have defined highly cited papers as the top 1% of papers in each of 22 broad disciplines (for field definitions, see http://www.in-cites.com/field-def.html). Since our goal is an automatic and easily updated surveillance of the scientific literature, the 1% threshold should be viewed as a parameter which can be adjusted up or down depending on the desired resolution of the analysis. The same 1% thresholds by discipline are used in Thomson Scientific’s Essential Science Indicators (ESI) web product.

For this product roughly 30,000 papers are clustered on a bimonthly basis and grouped into co-citation clusters through a single-link process. All co-citations for the selected highly cited papers in the 22 fields are computed prior to clustering. A co-citation link is defined as a pair of highly cited papers co-cited two or more times. The integer co-citation frequency is normalized by dividing it by the square root of the product of the citation counts of the two papers, the so-called cosine similarity. A force-directed placement mapping method can be used to display the strongest co-citation links within a given cluster, as we show in Figs. 1–4.

Figs. 1–4
figure 1-4

(clockwise from upper left) Discipline size over time, frequency count by discipline presented first

Single-link clustering is a simple and rapid way to extract strong patterns of links among very large sets of tens of thousands or even hundreds of thousand of papers, and is suitable for the large-scale and periodic analyses required by ESI. Because only a single co-citation link at a specified threshold is needed to join a paper to an existing cluster, the method has a tendency to create very large clusters by chaining unless the co-citation threshold is not carefully controlled. Studies of individual clusters have shown that by varying this linkage threshold, upward for larger clusters or downward for smaller clusters, it is possible to identify the level at which chaining begins and below which the size of the cluster increases exponentially rather than linearly. It is possible to optimize the threshold for each single-link cluster by picking the lowest possible link threshold prior to this onset of sudden expansion. This can be achieved in an approximate way by defining a low starting level, setting a maximum cluster size and incrementing the threshold until a cluster within the desired size limit is formed, in effect, optimizing each cluster. The process is analogous to pruning a tree where none of the pruned branches are larger than a preset size (Small 1985).

The starting threshold used in this experiment was 0.3 of the cosine normalized co-citation, but individual clusters may form at higher thresholds. The clustering parameters along with the initial citation thresholds are adjustable parameters that control the level of resolution desired by the analyst. The cosine similarity of 0.3, a maximum cluster size of 50, and a cosine increment of 0.1 are used in this analysis and are the same as those used in ESI. We have purposefully selected very high co-citation thresholds because speed of analysis rather than high resolution was sought.

To track clusters over time we look at successive time slices of data to determine the patterns of continuing highly cited papers from one dataset to the next. Such patterns of continuity are referred to as cluster strings. A new or emerging area is defined as a cluster of highly cited papers in one time period whose papers did not appear in any clusters in the immediately preceding time period. Continuing areas can be distinguished by whether they merge, split or continue from a single previous group.

Two sets of co-citation clusters were used representing two overlapping six-year time periods: 1998–2003 and 1999–2004. A six-year time frame was defined to provide ample time for papers to reach their peak citation year, which usually occurs in year 3 or 4 after publication, and diminishes thereafter. Thus a given cluster can include both older and newer papers but excludes classic or very highly cited method papers. Both cited and citing papers are restricted to these time spans. In our case studies we also include some of our results from clustering earlier time periods. Table 1 gives statistics on the datasets used: the number of clusters, highly cited papers, average citations per paper, and average publication year of papers.

Table 1 Statistics on clusters in two time periods

The field or discipline assignment of a cluster is determined by the journals in which the highly cited papers are published, using the journal classification mentioned above. The discipline weight for a cluster is defined as the number of papers in a particular discipline. We expect that larger clusters will have larger numbers of disciplinary assignments.

In Figs. 1–4 below we use the 1999–2004 data to show where papers comprising the clusters are distributed by discipline in our dataset (as a percent of the total). The stability of most disciplines over time is notable, with the exception of an apparent slight trend upward in the biological sciences such as molecular biology, biology, and microbiology (Table 2).

Table 2 Discipline size over time, frequency count by discipline presented first

One type of cluster we find with the methodology might be considered an artifact of the publishing process. It is the result of a set of papers that artificially cite each other—for example, the case of a “single issue cluster” (Rousseau and Small 2006; Small 2005) formed when an editor creates a special issue of a journal and arranges each article to cite some or all of the other articles in the same issue, creating a citation clique. Normally, cited and citing document populations are somewhat distinct, but in the above case potentially every citing item is also a cited item.

Because clusters are defined for a multi-year period, e.g., 1999–2004, it is possible for a highly cited paper to also legitimately function as a citing paper for the group. This would happen, for example, if a paper citing one of the founding papers in the front became itself highly cited before the end of the time period. The degree to which citing papers are also highly cited papers would then, conceptually, measure the extent to which the papers are building on each other. To capture this we create a metric called endogeneity, discussed in detail later, which is the percentage of citing papers that are also cited papers. The average endogeneity for the file is quite low: 2.3.

Only about 1.5% of clusters have an endogeneity percentage of 20% or higher. Most of these clusters are almost certainly the artifact of a editorial policy and do not reflect the emergence of a new research area. Therefore, for our quantitative analysis, we have excluded any cluster having an endogeneity percentage of 20% or higher. Naturally occurring levels of moderate endogeneity below 20% may indeed be a healthy sign for a research area, indicating that there are highly cited papers in the current citing paper population and that the front is building on its own findings efficiently (Pfeffer and Salancik 1978).

To sum up, research fronts consist of clusters of highly cited papers, with some upper bound on cluster size. The papers are linked by strong co-citation relationships at or above a defined normalized clustering threshold unique to each cluster. For each front there is a corresponding set of citing papers. The cited and citing paper sets can overlap and the same authors can appear in both sets.

Quantitative analysis

Variables

Growing, shrinking, stable, emerging, and exiting fronts

To explore the trends in nascent research we construct some variables to assist in our analysis. Following on previous work on emerging fronts (Small 2003), we measure the growth rates of fronts from the 1998–2003 dataset to the 1999–2004 dataset and categorize them as growing, stable, or shrinking. Growing fronts are those that have more papers in our 1999–2004 period than the sum of all of their contributing fronts in the 1998–2003 analysis (a “contributing front” means that at least one paper from an earlier front is in a later front). Similarly, shrinking fronts are those that are smaller than the sum of all their contributing fronts in the previous time period, and stable fronts are those for which the sum of all contributing fronts yields the same number of papers. Emerging fronts are fronts in the 1999–2004 dataset that contain no papers from the 1998–2003 dataset. Exiting fronts are fronts that existed in the 1998–2003 analysis but have no papers in any front in the 1999–2004 analysis. Some basic statistics about fronts are in Table 3.

Table 3 Summary statistics by front type

Endogeneity

As noted above, the extent to which the cited papers set overlaps with the citing papers set for the front may be an important aspect of its potential growth, provided that the overlap is not the result of an artifact such as a single journal issue. A front that has a high cited-citing overlap is said to have high endogeneity, and reflects the compression of cited and citing generations. Scientists in such a front may have a better chance of building on each other’s work quickly and creating a “cohesive paradigm”.

As seen in the last row of Table 3, the level of endogeneity is generally higher among emerging fronts than among existing fronts. Additionally, growing fronts clearly display larger average levels of endogeneity.

Multidisciplinarity

We construct a variable for cluster multidisciplinarity by creating a Herfendahl index of the distribution of disciplines of the papers comprising the front. We do this by summing the squared percent distribution of each front in each discipline. This marks the extent to which a front is composed of one main discipline or split between many disciplines. The closer a front is to having a multidisciplinary concentration score of 1.0, the closer it is to being composed of one discipline only, and the closer it is to zero, the more it is fragmented between many disciplines.

Percent non-academic

For all authors we coded whether their affiliation was “academic” or “non-academic”—i.e., academic vs. government or industry. We found that academic institutions almost always had “univ,” “school,” “coll,” “insti,” or “ecol” in their titles. Some academic institutions were exceptions, so we added a number of more institution-specific filters such as “Berkeley,” “MIT,” “Harvard,” “polytechnic,” “politecnico,” and “polytechnique,” among others. We did not differentiate between government and industry in this variable. We found that the percent of academic (64%) and non-academic (36%) affiliations matched those in our four case studies, which we coded manually.

Models

In our descriptive model, for regressions, we choose to contrast two dependent variables that comprise the fates of successful research fronts—emergence and absorption. The first dependent variable, used to measure emergence, is a percentage increase of the front’s growth (in number of papers) from the first period to the second. Since a growing front implies that more papers were drawn into its existing paradigm, this measure is meant to reflect the extent to which the front is emerging as a distinct area of research. The second dependent variable, measuring absorption, counts the percentage change in number of citations received by the papers in a front. This is meant to reflect the extent to which the front’s knowledge is being “absorbed” or incorporated into other research. The extent to which these dependent variables differ will determine whether a front focuses on “absorption” of its findings or “emergence” as a distinct area.

Our independent variables of interest are a continuous variable for multidisciplinarity, the Herfindahl index of discipline concentration, and a variable for endogeneity, which measures the percent of citing papers that are also cited papers. For both regressions we control for the size of the cluster, the number of citations a cluster has received and the average year of publication for papers within the cluster, the percent of authors in a cluster who are affiliated with non-academic institutions, and the discipline code(s) of the cluster.

Additionally, in each regression we control for the other dependent variable to further show the divergence of the two trends for fronts—i.e., for the regression predicting percent change in papers we control for percent change in citations, and for the regression predicting percent change in citations we control for percent change in papers. We include these last two controls to counter the natural collinearity of citations and number of papers. In other words, we expect size and number of citations to grow in tandem; however, what we are interested in quantifying is the incremental variation of citations after accounting for the increasing (decreasing) size of a front, as well as the incremental variation of growth after accounting for increasing (decreasing) citations. All regressions are reported using robust (Huber-White) standard errors.

Results

Among the successful and growing “hot” research fronts we find that different mechanisms lead to fronts that maximize emergence, measured by how fast a front is growing, and absorption, measured by increase in citations.

In our front absorption model, using increase in total citations as the dependent variable, we find that the coefficient for multidisciplinarity is not significant, while endogeneity is negative and significant. This suggests that fronts which are outward-looking or have low endogenous are more likely to garner attention and use by their discipline.

In the front emergence model, using front growth as the dependent variable, we regress cluster growth on the same independent variables. The coefficients for both endogeneity and multidisciplinarity are positive and statistically significant. Fronts that are more “inward looking” or endogenous are more likely to develop cohesive identities and develop independently. Thus even among successful fronts the characteristics of the citation structure have profound effects on the future direction and development of the fronts.

The size and direction of the coefficients of our control variables are consistent with expectations. In the front emergence model, the control variable for number of papers is positive and significant, indicating that larger fronts tend to grow faster. Conversely, as predicted, in the front absorption model we found that the coefficient for number of papers is negative and significant, indicating that smaller fronts tend to grow citations faster, as a percentage increase. Average age of papers is positive for citations but negative for growth, perhaps because older fronts remain distinct and have not been absorbed, indicating a survival bias over time. Percent of author affiliations from non-academic institutions was not significant in either model. This may be due to the confounding of government and firm involvement in non-academic affiliations, but our best attempts to further divide these two classes of affiliation and add them to the model did not yield more significant results. In both models the controls for discipline, accomplished via 21 dummy variables, are statistically significant as a group. To establish the significance of these we jointly tested the 21 discipline dummy variables using a Wald test. After some exploratory analysis we found that emerging fronts, controlling for size, tend to be more multidisciplinary, on average, than fronts in general, perhaps implying a general trend toward increasing multidisciplinarity in recent years.

For a validity check we exchanged our key independent variables with the dependent variables in our models and re-ran the regressions. These models indicated identical trends to the original models in both direction and significance (Table 4).

Table 4 Coefficients of regression analysis and descriptive statistics

One of our more interesting findings is the sign change for endogeneity between our two models. As seen in the correlation table (Table 5), the correlation between endogeneity and both of our response variables is positive and significant prior to controlling for any cluster characteristics. We explored the change of direction in more detail by building up our front absorption model step by step to fully understand how the predictors interrelate to explain the growth of research fronts. The sign change occurs as soon as we control for the percentage growth in number of papers, and it persists through the addition of all other explanatory variables. Among other findings our results suggest that fronts that were more inward-oriented and drew from more disciplines were more likely to grow into differentiated communities, while those that are less endogenous may be more accessible to outsiders and were more likely to be absorbed into their more general discipline.

Table 5 Descriptive statistics

Qualitative analysis

Past research has typically focused on picking areas or fronts that were ex post judged to be interesting and tracing their history. Our method has identified these clusters, potentially before they are widely recognized as important areas of research, using co-citation data alone, not reputation or word of mouth. Our objective behind conducting the following case studies is to seek responses from experts in the field and to confirm both the potential and scope of a methodology that can identify emerging fronts accurately (Cuhls 2003; Falkingham and Reeves 1998).

To test our success in identifying new fronts relatively early in their development we select a sample of 20 of the largest fronts that emerged in one of the three most recent time periods. As a first step we gathered and studied the full texts of the 360+ articles included in these clusters. The topics of each cluster are listed in Table 6.

Table 6 Summary of 20 emerging clusters examined more closely

Interviews

We contacted researchers in the specific areas at the University of Pennsylvania, Drexel University, Columbia University, and New York University. We selected these universities based on their reputation as leading scientific research institutes in the country and locations in areas with technology and pharmaceutical expertise. In total we interviewed 30 researchers, with interviews lasting from thirty minutes to two hours, providing them with full texts of all papers in each front. Previous studies that made extensive use of interviews in conjunction with citation analysis found interviews to be a crucial sense-making tool (Braam et al. 1991; Castro 2001; Collins 1997; Cuhls 2003; Small 1977, 2004). Our aim was to determine whether the research fronts identified using our methodology were, in fact, seen by experts as an interesting or important area of research.

About three quarters the experts/researchers believed based on their expertise that the program had indeed identified coherent and emerging areas in their sub-specialty. For example, two interviews pointed to the promise of micro-fluidic analytical techniques (experiments in chemistry and biology being carried out on a small scale) which was identified as a front. In this field, ultraviolet light polymerization improves detection accuracy for less cost (Arutyunov and Medvedeva 1992), and two-dimensional separation systems (Gottschlich et al. 2001) and chaotic advection allow for more rapid mixing of small amounts of liquids (Liu et al. 2000). These advances may open the way for more useful chips for microscopic experiments.

About one quarter of the research fronts appeared to be artifacts of a specific event or release of data and therefore not necessarily the emerging fronts we were looking for. An example is the front surrounding NASA’s Wilkinson Microwave Anisotropy Probe, which looks at differences in the Cosmic Microwave Background radiation left over from the Big Bang (Bastero-Gil and Mersini-Houghton 2003; Eriksen et al. 2004; Tegmark et al. 2003). Two space physicists at the University of Pennsylvania indicated that this project releases data periodically and that our identification of a high-impact cluster of papers on this topic was largely a function of the punctuated release of information rather than some important breakthrough or area of new interest among researchers. The strength of our methodology, in such cases, is that it checks for endogeneity and absorption levels so that such artifacts can be identified.

The positive responses and constructive critiques from interviews with experts highlighted the potential of the emerging front procedure. Additionally, the methodology seemed in most cases to correctly select the majority of important papers and authors in the fronts during that time period.

Below we focus on two emerging clusters that our interviews identified as among the most interesting and potentially important. We generated citing papers sets for each front, and analyzed some bibliometric and social characteristics of these sets to better understand why the fronts emerged. To create these statistics, we analyzed the cited and citing paper sets for each of the two fronts. Self-citation involves matching author names on cited and citing papers, and counts the number of citations where one of the citing authors on a paper matched one of the cited authors. All self-citation counts are within the expected range. The percentage of citing papers by cited authors counts the number of citing papers having one or more authors from the highly cited paper set. Hence, this measures the degree to which leading authors are still active in producing current papers, and the degree to which they are citing one another. The percent of cited papers and citing papers in the current year measures the degree to which the research area is “front loaded” with papers in the most current year of the six-year time window used to create the cluster. A high value for this measure is expected for an emerging research area based on a high volume of current papers. Finally the endogeneity, previously discussed, is given for each front.

Following the discussion of each of the two cases, we present a map showing the relative positions of the highly cited papers in each front. This two-dimensional representation is generated using a force-directed placement algorithm (Small 2005), which works by setting up attractive forces between co-cited papers and repulsive forces between all papers that vary with distance. The method attempts to arrange the papers in two dimensions such that the residual force or stress in the system is minimized. Each circle on a map is a paper whose size is proportional to its citation count. Circles of papers of the final year of the year range are one color; circles of other papers are in a contrasting color. The papers are connected only by the strongest normalized co-citation links for each paper (solid lines), supplemented by a small number of weak links (dashed lines) to connect the papers in a minimal spanning tree.

It is interesting to note the breakdown of authorship in these papers. We coded every author of every paper and categorized their affiliations as academic, industry, or government in order to get a sense of where this research was being conducted. We considered “academic” any non-profit research institution or university, “corporate” any for-profit company or hospital, and “government” any government body or foundation funded largely by the government to do research. Organizations we coded as government were national and international, including Japan’s Science & Technology Corporation, CERN, the World Health Organization, and various Departments of Health. The extent of corporate involvement in these fronts, including companies such as Xerox and IBM and hospitals such as the Mayo Clinic and Massachusetts General, as seen in Tables 7 and 8, is significant and indicative of their potential for applications as well as basic science.

Table 7 Case study characteristics
Table 8 Affiliation of case study authors

The breakdown of the author affiliations of the papers that cited the highly cited papers in our fronts had roughly similar results, with a trend toward slightly higher academic representation (Table 9).

Table 9 Affiliations of those who cite the case studies

Emerging front case studies

Organic thin-film transistors

Organic semiconductors have been known since the 1940 s, and the first transistor based on an organic semiconductor was reported in 1986. Organic-based thin-film transistors have attracted much interest because of their various potential uses in many low-cost, large-area electronic applications such as smart cards, radio-frequency identification tags, and flat panel displays (Sheraw et al. 2003).

Despite progress and potential, several technical issues remain to be resolved before thin-film transistors can be in wide-scale use. Most of these issues are associated with grain boundaries and interfacial disorder in organic thin films, which are the major factors that limit the mobility, cause the dependence of the mobility on the gate voltage, and result in the broadening of the on/off transition (Podzorov et al. 2003). This front is the largest of the new clusters in the 1999–2004 dataset. It appeared in our data in August 2004 in the form of three highly cited papers. By year-end it had grown to 26 papers, a dramatic growth that was noted by several authors (Afzali et al. 2002; Halik et al. 2003; Murphy et al. 2004).

We find that 15 of the 228 citing papers are in the set of 26 highly cited papers, giving an endogeneity of 6.5%. The 228 citing items cited the 26 highly cited papers 460 times, and 17% of these were self-citations. The authors on the highly cited papers were also authors on 40% of the citing papers, accounting for about 50% of the citations. This means that highly cited authors are still involved in writing current papers and cite each other frequently, highlighting the small-world character of this front.

Even though technically the cluster is a static snapshot of a six-year period, we can look at the time distribution of papers. The percent of highly cited papers from the most current year of the period (2004) is 34%, and the percent of citing papers in the most recent year of the period is 79%. There is a sharp rise in number of citing papers starting in 2003 and accelerating in 2004. This coincides with the cluster’s emergence in 2004. Another notable feature is the rapid expansion of the number of review papers in the citing population, which grew from three in 2003 to 24 in 2004.

The drive toward applications is very clear if we examine the cited and citing papers. All but five of the 26 highly cited papers specifically mention possible applications in their first paragraphs. Ten of the 26 papers also point to “significant recent progress” in their first paragraphs. This is echoed by the citing authors. One states: “Many new discoveries were brought to light during the [2003–2004] time frame” (Facchetti et al. 2003a, b; Mushrush et al. 2003). The progress alluded to is often the increased charge carrier mobility. Also sparking interest is the promise of cheap methods of fabrication such as inkjet printing or the stamping of circuits directly on plastic, analogous to printing on paper.

The focus on applications and economic payoff is reflected in the involvement of authors from companies such as Lucent, IBM, Infineon, Philips, and Xerox, often in collaboration with university partners. Overall the institutional mix of author addresses on cited papers is about 61% university, 34% corporate, and 6% government. One industry respondent also pointed to the fact that large companies are entering the field and setting up expensive multidisciplinary research teams similar to those in academia. It was suggested that these new interdisciplinary teams may in part account for the upsurge in number of publications. However, the race for commercialization has also made companies and universities more secretive about their work, for example, making collegial interactions at meetings more difficult. Often patents are applied for prior to the publication of results, which adds to the time lag for dissemination (Murray and Stern 2005).

Thus, the emergence of this front in 2004 has to do with both the increase in activity sparked by new advances and the increased investment of resources by academia and industry to overcome the remaining technical obstacles to commercialization. Ironically, if publication activity declines in the future, it might signal either the privatization of the field as patenting replaces publication or alternatively the failure of the field to overcome the remaining technical barriers (Fig. 5).

Fig. 5
figure 5

Organic thin-film transistors display

Amyloid precursor protein (APP)

The front researching the amyloid precursor protein may lead scientists one step closer to understanding the pathogenesis of Alzheimer’s disease. APP belongs to a group of proteins called receptors and has several characteristics that are similar to the Notch protein. Several observations indicate that the accumulation of amyloid-β peptide (Aβ) is a common initiating event that ultimately leads to the neurodegeneration in Alzheimer’s disease. However, it is unclear how Aβ induces neurodegeneration (Sisodia et al. 2002). Insight into APP and its cleaved fragment amyloid peptide may prove to be useful in the treatment of Alzheimer’s and slow the onset of the disease, helping prevent further deterioration. The authors of the papers in this front were 58% from academia, 34% from industry and hospitals, and 8% from government health centers.

Risk factors for Alzheimer’s disease include age and inheritance; however, there is evidence that APP and β-amyloid (Aβ) peptides have a central role in the early pathogenesis of the disease, regardless of primary cause (Sisodia et al. 2002). The abnormal cleaving of APP generates Aβ peptides that are deposited in senile plaques in the brains of aged individuals and patients with Alzheimer’s disease (Leem et al. 2002), which may lead to the onset of Alzheimer’s symptoms. The cleaving procedure is facilitated by proteins such as presenlinin 1 (PSEN 1) and presenlinin 2 (PSEN 2), which produce amyloid peptides from APP. In fact, the majority of early-onset, autosomal-dominant familial cases of Alzheimer’s disease are caused by mutations in the presenlinin genes (Leissring et al. 2002).

With the identification of the role of APP in the onset of Alzheimer’s disease, scientists we interviewed believe that they may be able to develop better treatments to tackle the disease. As a result, attention is being focused on developing therapies that include the inhibition of the production of amyloid peptide (such as inhibiting the two proteolytic cleavage events that liberate amyloid peptide from APP) and the modulation of the fate and toxicity of amyloid peptide (Sisodia et al. 2002).

However, there are problems to overcome if this front is going to achieve breakthroughs. First, in contrast with Notch, the function of APP remains basically unknown. Second, the cytoplasmic function of APP is extremely difficult to observe even under conditions that allow perfect detection of Aβ (Cupers et al. 2001). Nevertheless, successful research in the APP front may enable scientists to one day delay or prevent the onset of Alzheimer’s disease (Fig. 6).

Fig. 6
figure 6

Amyloid precursor protein display

Conclusion

Research fronts provide a way to study areas of science that scientists find interesting and useful, and also reveal important insights into how new scientific knowledge is incorporated into existing research (Birnbaum 1981c; Kuhn 1970; Small 1999, 2003). For qualitative richness and for expert external validation, we interviewed researchers in industry and academia to obtain their feedback regarding our method for identifying “hot” areas of emerging research. Quantitatively, we focused on differentiating between fronts that maximize their “absorption” or impact by being highly cited by others at the cost of being less distinct as a front as their knowledge is incorporated into broader research, and fronts that maximize their “emergence” or size by growing more as a distinct, cohesive unit. We show that there are fundamental differences in the structure of these two sorts of successful research fronts. This may affect the way science as a whole absorbs or splinters off new areas of research and may have important consequences for its evolution and development over time.

We find that multidisciplinary research fronts which span traditional research fields are harder for a single area to absorb and digest and therefore tend to remain more distinct. Examining the dynamic consequences of this possibility would be fruitful for future research. We also studied how the “endogeneity” of a cluster affects its growth and impact. While endogeneity is positively associated with research fronts that are growing, it is negatively associated with research fronts whose knowledge is being absorbed via increased citation. Further, endogeneity is highest among new and growing clusters. This lends support to Kuhn’s speculations, and the work of Crane and Pfeffer, on the importance of intellectual cohesiveness or paradigm strength in creating a distinct research perspective, since high endogeneity can be seen as a crude sign of a more comprehensive, explanatory intellectual framework (Crane 1972; Kuhn 1970).

These observations in aggregate provide insight into how science changes. In our two models the characteristics of research fronts that are associated with front growth are very different from the characteristics that are associated with front impact. Research fronts that grow are associated with multidisciplinarity and tend to be more endogenous. Research fronts that receive increased citations, on the other hand, tend not to be endogenous, and the coefficient for multidisciplinarity is not significant when controlling for other factors.

Research that has a high impact is more likely to become accepted wisdom and “change” or “evolve” the field from within in an incremental way (Collins 1997; McCain 1987). On the other hand, research that grows as a distinct front is more likely to lead to fragmentation—evolving into a distinct subfield or even becoming a new field entirely. Further historical analyses examining the antecedents of paradigm shifts—periods of paradigm disruption and change—would help test this conjecture.