Introduction

Recent research in research policy and scientometrics has begun investigating the theories by which to predict the value of inventions. Various operationalizations of patent value (Lee 2009; Meyer and Tang 2007) or quality (Cheng et al. 2010; Acosta et al. 2009) notwithstanding, studies have found that past inventors’ performance, counts of backward citations, and invention size positively correspond to invention value (Bass and Kurgan 2009; Lee 2009; Lee et al. 2007; also see Gay et al. 2005). Such studies help the R&D community to better understand the relationship between technological innovation and their commercial benefits (Lombardo 2008; Thomas 2001), especially insofar that little direct correlation exists between intensity of research activity and the sheer number of patents produced (e.g., Bhattacharya and Meyer 2003).

Yet we know little about any regularities in the manner by which technological features of useful technological inventions are related, such that search for these inventions can be guided. Most notably, Fleming and Sorenson (2001) argue that the inventions of moderate complexity are likely the most useful, striking a balance between ‘fruitful uncertainty and overwhelming complexity’ (Baldwin and Clark 2000: 32). The authors find support for their theory via basic examination of the patent sub-classes associated with a given patent.

While Fleming and Sorenson’s (2001) research begins to guide our understanding of technological usefulness of inventions, a couple difficulties still plague attempts to convert the findings into prescription for search. First, while entrepreneurial or innovation-seeking endeavors often fundamentally depend on search for valuable solutions to valuable problems (Hsieh et al. 2007), Fleming and Sorenson’s (2001) theory doesn’t inform search because it only addresses the ease of combining components from categories with some degree of diversity of other such categories, and in their empirics forgivably the identities of components are eventually washed out. Fleming and Sorenson’s theory simply does not help much in guiding inventors to search for useful specific combinations of knowledge. What we lack is some way of accounting for the relatedness among knowledge ultimately recombined to create or identify valuable inventions. Second, Fleming and Sorenson’s (2001) empirics do not account for intertemporal changes in the likelihood that classes of components will combine with one another, over time. Changes in the general environment or consumer tastes (e.g., Mackenzie and Wajcman 1985) or advances in science and technology (e.g., Bijker 1987) introduce new sociocultural situations, and the technologies or product features introduced in some industries become better or worse as candidates over time for recombinative endeavors than those in other industries (see Fishman et al. 1993).

I argue that discovery of a valuable invention often requires relating or associating features of a prospective invention to one another via phenomena or principles (Arthur 2007); in other words ‘connections’ exist (see Baron 2006; Baron and Ensley 2006). The more that features can be connected, the higher their relatedness. From this I hypothesize that an inverse U-shaped relationship exists between an invention’s usefulness on one hand and relatedness among its features on the other. I also hypothesize that repeated application or implementation of specific sets of features during the evolution of technological advance tends to result in decreasingly useful inventions. The first hypothesis is broadly supported by the empirical results. Indeed, an inverse U-shaped relationship exists between an invention’s usefulness and relatedness among its features, suggesting a tradeoff between minimizing the high costs of connecting unrelated features versus minimizing the high costs of synthesizing highly related ones. Even given these results, however, tests of the second hypothesis are inconclusive.

Below I present the theory, describe the data and methods, and present the results. A discussion describes contributions to the literature as well as limitations of this study, and a conclusion wraps up the paper.

Connections and the measurement of relatedness among an invention’s technological features

The design of products or services is often preceded by decision-making regarding the features of the inventions from which products draw (Khilji et al. 2006; Livesay et al. 1989). Here, the term ‘feature’ is intended to refer to a performance-related aspect (e.g., a visual display’s high resolution), a physical part (e.g., a video game system’s movement-sensitive controller), or experiential characteristic (e.g., a new age material’s softness to touch) of an invention.Footnote 1 Choices are made regarding which feature is best. When particular features are considered alongside one another, a series of principles or phenomena—based on means-ends relationships (Shane 2004), shared situations (see Baron 2006), or any combination thereof—may be identified relating or associating them (Arthur 2007). Put another way, a ‘connection’ has been found (c.f. Baron 2006).

A given set of features may be connected in various ways. A light integrated into an alarm clock helps eliminate clutter in support of bedside nighttime activities. The same light could also be integrated with the alarm clock to shine slowly to a high brightness to subtly awaken its user. In this way a set of features may remain the same (e.g., a light and an alarm clock), even as different connections may be revealed, identified or selected between them (e.g., bedside nighttime activities, or an awakening mechanism).

Relatedness between features is low when few if any connections can be made between them. For example, consider the feature embodied by a fluorescent light, and the feature embodied by a cloth bag. It is not immediately obvious how integrating these two features would be useful. A light is used to illuminate objects or a path in the dark, or to scare away would-be evildoers. Cloth bags are used to carry or cover things. A connection between the two is not immediately obvious.

Yet what at first glance looks like a situation where connections are unavailable between features could just mean that a connection is based on a long series of principles or phenomena. Such ‘indirect’ connections can be usually reflected by specific demographic market segments, specific occasions, and specific locations. Consider demographic market segments. In contrast to a screen saver program that simply prevents computer screen burn-in, a screen saver program may also relate to a computer screen by providing stimuli that is used to maintain a user’s hand-eye coordination or physical well-being, for a specific demographic market segment: senior executives working overtime who must remain sharp to take overseas conference calls. The screen saver and the computer screen itself are thus related in both a direct manner and an indirect one. Alternatively, indirect connections may reflect specific occasions. While the connection between a portable light and a cloth bag is not immediately obvious, one indirect connection can be clearly made in the situation of Halloween, the Western tradition where kids dress up in costumes and walk from door-to-door seeking candy treats. A portable light integrated into the bag’s bottom is useful insofar that it helps children to illuminate their path in the dark night to protect themselves not only from obstacles but also from unwelcome strangers leaving one hand free thus not impeding the collection of candy. Finally, specific location may characterize situations indicated by indirect connections. Consider Gamewear, a company that designs and manufactures jewelry combining typical chains, bracelets, and lockets with pieces of sports equipment (Ruth 2006). People like to display sports or team spirit, something that attracts attention and is a ‘conversation starter’ yet does not need to be washed and can easily be put away, which makes it a perfect complement to jewelry. This indirect connection especially applies in the USA and a few other European countries where sports leagues are popular, but would not apply in a country such as Afghanistan, where display of sports team spirit is frowned upon due to religious or political factors (Burns 1996).

Hypotheses

As described above, low relatedness between features is indicated when few connections exist between the features, or in other words when possibly only the most indirect connections are available (i.e., latent). With only the most indirect connections available this suggests that the features are only related in the most specific or specialized situations. Familiarity with these specific and likely rare situations requires either enough luck to ‘be in the right place at the right time,’ or knowledge of various dimensional constraints that delimit the situation. In other words, individuals generally must possess, generate or identify a wide scope of knowledge to valuably connect highly unrelated features. Besides requiring broad knowledge, one must also be able to piece together the steps in logic to identify the connection.

If features of an invention are too highly related, then search for useful inventions integrating them is made difficult due to costs of finding the situation that distinguishes those features. Those features are more likely to represent similar purposes, uses, or architectures. Particularly useful or valuable integration would then require understanding a wide scope of different theories, phenomena, or social conventions related to increasingly specific situations. The costs of generating an effectively wide scope of knowledge increases as features become overly related. The difficulty shifts from the cost of connecting unrelated features to the cost of synthesizing overly related ones.

For example, consider an invention that combines today’s rolling shoes—shoes with ‘pop-out’ wheels that allow for rollerskating—with pop-out interlockable planks that can be connected to form a skateboard. In many ways these two are substitutes: they are both used for transportation and exercise in ways that are less jarring than running and more portable than bicycles. To the best of the author’s knowledge, no such invention exists. Why would a skateboarder or rollerblader ever want to pay for a contraption that switches within seconds between the two modes? As illustrated here, it is simply very difficult to synthesize the feature embodied by a pair of rolling shoes, with the feature embodied by a skateboard.

At moderate levels of relatedness among features, valuable connections are more likely discovered. Relatedness between features is not so low that it takes a prohibitively high level of knowledge scope to find situations shared by the features. At the same time, because relatedness is not too high, valuable integration does not require burdensomely costly synthesis. A minimization can be struck between the cost of connecting unrelated features and the cost of synthesizing or integrating overly related ones.

Hypothesis 1

The expected usefulness of inventions is highest when it involves features that tend to be intermediately related to one another.

Past research suggests that the success of a new product or invention is a positive function of the level of the firm’s so-called ‘proximal’ technological experience (Nerkar and Roberts 2004). Experience combining a set of components contributes to a ‘cognitive map’ of the values of solutions to a given problem (Gavetti and Levinthal 2000). Results and lessons learned from experience with specific features can be generalized and combined with other such lessons to help serve as a map in guiding experimentation with technologically related sets of features (c.f. Fleming and Sorenson 2004; see also Schilling et al. 2003).

However, if we address investment in knowledge (e.g., Dorroh et al. 1994) instead of learning-by-doing (e.g., Adler and Clark 1991), and we measure the effects of repeated experimentation with a specific set of features on the usefulness of subsequent inventions using the same sets of features, we are likely to find a more dramatic negative relationship between repeated trials and the usefulness of subsequent inventions. Experience with a set of specific features does little to serve as a map for search over that same set, other than to show which connections have already been tapped. The benefits of generalization described above are no longer relevant. From a more intuitive perspective, the earlier-identified connections among features likely reflect the more commonly occurring situations. Subsequent application or implementation of the particular set of features in future inventions relates to declining usefulness.Footnote 2

Hypothesis 2

Over time, subsequent application of a particular set of features relates to inventions of declining usefulness.

Empirical considerations

At the heart of both hypotheses is the concept of relatedness, which can be operationalized via any of various candidate methods described in the literature. Some scholars have utilized preset classification schemes to measure relatedness (e.g., Brouthers and Brouthers 2000). While such schemes can appear objective and accurate given the pre-determined codes, the construction of each code’s definitional scope and the cumulative set of codes can be highly arbitrary. Also, over time the definition of classes may become distorted if scientific advances render some patent classes or technological areas obsolete, or other definitions simply outdated (e.g., Pavitt 1985 p. 89). Other scholars have developed text-based systems related to “literature-based discovery”Footnote 3 (seminally, Swanson 1986, 1987) where the typically required input from experts becomes potentially prohibitively costly as analysis extends across industrial or scientific areas. Lastly, still other scholars have investigated the use of citation structure for measuring relatedness across fields, striking a balance between the arbitrariness of classification systems and the costliness of expert-guided text-based systems. Stepping chronologically through citation structure inherently addresses the dynamics of relatedness (i.e., when compared to the matching codes), and also can reasonably approximate a natural path of search.

Of course, the use of citations to measure the origins of knowledge content is often met with skepticism. For example, bibliographic citations (e.g., in academic literature) may pay homage to pioneers, correct the work of others, or criticize that work (Garfield 1962; also see Liu 1993; Bornmann and Daniel 2008). Also, bibliometric citations may reference secondary sources such as literature reviews instead of the seminal paper itself or may leave out citations altogether (MacRoberts and MacRoberts 1989, pp. 343–344). Yet the patent system represents a more rigorous context where citations are made. For a patent to be granted, an innovation must satisfy three requirements: (i) it has to be novel; (ii) non-obvious, in that a skilled practitioner of the technology would not have known how to use it; and (iii) useful. In order to prove novelty, non-obviousness, and usefulness of an invention, the inventor and the patent examiner compare it with prior art through the use of patent citations. Patent references—typically measuring technical knowledge serving as the source of novelty (Sternitzke 2009)—are less likely to be redundant or superfluous than references in journal papers (Collins and Wyatt 1988) due to the controlled nature of the patenting process and its legal consequence (von Wartburg et al. 2005). Additional citations are often added by patent examiners and legal counsel. As Schmoch (1993) explains, because of the novelty requirement the examiner has to look for earlier documents that have the same or almost the same features as the patent application. Only if there are no other relevant documents questioning the novelty of the invention, will the patent application be accepted.

Much past research does agree that patent citations generally reflect features of inventions. As Lanjouw and Schankerman (2001), pp. 133–134 attest: “A patent comprises a set of claims that delineate the boundaries of the property rights provided by the patent. The principal claims define the essential novel features of the invention in their broadest form, and the subordinate claims are more restricted and may describe detailed features of the innovation claimed… Like claims, the citations in the patent document help to define the property rights of the patentee.” Von Wartburg et al. (2005) go onto say that “If two patents are cited, the new invention can be assumed to base equally on both prior patents… The rationale is that the new invention is likely to integrate certain aspects of both former ones, and thus can be regarded as a hybrid development.” On the other hand, these authors also suggest that “the measure of bibliographical coupling is a proxy for the amount of ‘shared-ness’ of technological features among technological variants” (von Wartburg et al. 2005, p. 1599).

Even patent citations can be prohibitively costly from which to draw conclusions. However, compared to European patents, “US patents are more likely to encompass all relevant citations… The US was regarded as a much tougher legal environment. There had to be as much background information as reasonably possible to convince the patent examiner that the prior art was studied closely before filing the application, there were very good distinctions between the claims drawn and the prior art disclosures, and in the event of future litigation there should be good, meaningful distinctions that can be relied upon in a legal battle” (Meyer 2000, p. 108; also see Narin 1994). According to Meyer (2000) p. 106, “US law stipulates that the applicant has to cite any prior art relevant to patentability of the invention known to him or her to the USPTO as long as the application is under examination (‘duty of candor’, USPTO). Non-compliance with this requirement is considered as fraud by the USPTO and can be used as grounds for invalidating the patent.”

Based on the abovementioned considerations, I use data on US patents and both their backwards and forwards citations to test the hypotheses.Footnote 4 Specifically, I construct ‘patent citation networks’ (e.g., Small and Upham 2009; also Milman 1994) to help represent the degree to which features of an invention are related, as measured by the degree to which they have been related in the past. A network of patent citations would reflect connections between features via ‘citation chains’ (see Von Wartburg et al. 2005). Various scholars agree. As described by Atallah and Rodriguez (2006) p. 459, “patents can be viewed as elements of a network, with the citations constituting the links between those elements. A patent is linked directly to another patent through a citation, and indirectly through an indirect citation… indirect citations can be of different orders, and hence a patent can be said to be more or less closely related to another patent (and through different channels, i.e., Different citation chains)… a longer chain of citations is indicative of continuity of the impact of an innovation.” von Wartburg et al. 2005 p. 1595 argue that:

…“to map actual developments in a certain technical field and to draw on technological trajectories (Dosi 1982) or avenues (Sahal 1985), citation analysis should rely on everything, bibliographical coupling, co-citations, direct and indirect citations… The technological foundation of citing patents does not only encompass the most recent developments cited directly. It also draws on basic principles provided by earlier patents. Connections to basic patents are revealed by indirect linkages which are captured by citation chains… Given that a patent A cites exclusively patent B which in turn solely cites another patent C, a unique development path can be assumed which stems from C and leads to A.”

Data and methods

I utilize the publicly-available patent database assembled by Hall et al. (2001) that includes information of every invention granted a US patent between 1975 and 1999, a list of all citations that each such invention makes, and a list of all patents that eventually cite it. This database lists each patent’s technological category and subcategories. The usefulness of an invention is measured by future citations, and the network of citations is utilized to measure relatedness. While the database is useful since it accounts for these variables, the empirical analysis requires selecting only the patented inventions where relatedness can be measured among all features.

Dependent variable

Following Fleming and Sorenson (2001) and Narin and Hamilton (1996), I measure the usefulness of inventions via future citation counts over the subsequent 6-year time window (c.f. Maurseth 2005; Wang 2007). Empirical studies have repeatedly shown that future citation counts are indeed related to value (notably see Albert et al. 1991).

Independent variables

Relatedness

Existing operationalizations of relatedness—e.g., Standard Industrial Classification (SIC) or product codes—are problematic (e.g., Brouthers and Brouthers 2000),Footnote 5 particularly in this setting. Such codes are especially inappropriate in trying to analyze the relatedness of features for which no useful coding scheme can be created. Thus, I measure relatedness among features by examining patent citation histories. One clear advantage to measuring relatedness via patent citation chains instead of via coding schemes is that the actual relatedness of specific features is approximated, instead of relying on coding schemes which only measure the apparent or definitional relatedness at the level of classes of features. Specifically, via a patent citation map (e.g., Huang et al. 2003), I count the number of ways in which two features are connected in the patent citation network. For example two features may be connected in the sense that they both relate to the same backwards citation (Small 1973). As another example they may also be connected as a shared indirect citation; see von Wartburg et al. 2005 pp. 1595–1596. Here the citation chain is longer than the one in the first example. Five of the different types of citation chains that I examine are shown in Fig. 1.Footnote 6

Fig. 1
figure 1

a Shows the relationship between patented inventions and their backward citations. bf Shows the different types of connections between components of an invention. Note that arrows pointing northward indicate going back in time. Connection type #2.5 is indicated by either Connection type #2 or Connection type #3

Utilizing indirect citation chains to help measure relatedness in the empirical analysis imposes additional constraints on the data. Specifically, some inventions are made up of features the relatedness among which cannot be properly measured in the data. Put another way, for some inventions the patent database simply does not go far back enough to catch citation chains that might otherwise have been shown to exist if the citation data had extended farther back in time. Thus, from all 2.14 million inventions granted a US patent between January 1, 1975 and December 31, 1993, I select only those for which all backward citations two generations beforehand were granted after 1975 resulting in a “Patents” dataset corresponding to 18,882 patented inventions. Most patented inventions are based on knowledge fundamentally corresponding to relatively older inventions and of course the requirement of a 6-year window between 1994 through 1999 limits the number of patented inventions that can be examined.Footnote 7

For each of these 18,882 inventions, I take every possible pair of backward citations and treat this as one row. For example if an invention comprises 10 features, it has 10 × 9/2 pairs of features and demands 45 rows. Accounting for all 18,882 inventions yields a “Connections” dataset with 318,966 rows.Footnote 8 For each row, I include the frequencies of the different types of citation chains in Fig. 1 that relate the respective pair of features, by running a computerized analysis of the entire 1975–1993 citations list. To calculate indices of relatedness, I merely take the average number of citation chains among all the features of a patent. As an example, consider a hypothetical invention made up of three features A, B, and C. Between features A and B are 3 citation chains of type #1, between B and C lie 5 chains of this type, and between A and C lie 10 such chains. Thus, according to one method of measurement, the relatedness of the invention’s features equates to (3 + 5 + 10)/3 = 6. This represents the citation chain type #1 relatedness measure for this invention. I calculate similar statistics for citation chain types #2–#5, and repeat for each of the 18,882 inventions. Finally, because citation chain types #2 and #3 actually represent the same kind of relationship only differing in asymmetry, I create one last measure of relatedness by adding the frequencies of these two citation chain types, and I label this as citation chain type #2.5.,Footnote 9, Footnote 10

Measure of prior search along familiar features, number of trials

To estimate the degree that knowledge about a set of features underlying an invention has been accumulated, I determine the number of past technologies involving the exact same set of features, examining the full 1975–1993 portion of the citations database.Footnote 11

Control variables

Grant date control

This variable helps to control for trends in patenting at the system level (i.e., at the USPTO; see Hall et al. 2001 p. 10). Thus, I add a time variable in terms of number of days after January 1, 1960 that the patent was granted.Footnote 12

Number of components

The number of components (i.e., features) is indicated via backward patent citations. In other words, if features of the invention have already been anticipated by previously existing technologies, any such technologies if patented must be mentioned in the invention’s patent application as prior art. Following Schumpeter (1939), all inventions are largely based on recombinations of prior knowledge, and thus backward citations are considered a reasonable measure for a patented invention’s features.

Number of classes control

Inventions involving more technological classes are more likely to receive more future citations simply because there are more technological classes that may involve these inventions in the future. This is not unlike how academics who tap into research from various fields are cited by various literatures.

Claims control

Researchers have suggested that the claims made by an invention in a patent application—serving to delineate what is protected by the patent, contingent to patent office approval—signal importance of the invention. Tong and Frame (1994) propose the number of claims as a measure of the ‘size’ of an innovation, and show that claims-weighted patent counts are more closely related to R&D spending at the national level than simple patent counts.

Descriptive statistics

Descriptive statistics are shown in Table 1. Most correlations are very low; those that are significant appear to be between variables that derive from one another.

Table 1 Descriptive statistics of variables

Methods

Because patent citation counts are non-negative, linear regression can yield inefficient, inconsistent, and biased coefficient estimates (Long 1997). Poisson models can be utilized to analyze count data, but they assume that the mean and variance of the observed distribution are equal. Like most count data, the data here exhibit over-dispersion (i.e., the variance exceeds the mean), and negative binomial regressions should be used (see Hausman et al. 1984).

As noted earlier, not all commercialized or commercializable inventions or innovations are patented Patent statistics underestimate the amount of innovative activity in large firms (Pavitt 1982). Inventors or companies may choose to appropriate value from inventions via secrecy, lead time, learning curve economies, or superior sales and service (Levin et al. 1987). Product inventions and process inventions benefit from these appropriation mechanisms to varying degrees, as do inventions depending on industry type (Levin et al. 1987). Finally, inventors may disclose the minimum necessary depending on the nature of the invention, patent examiners have varying amounts of experience (Cockburn et al. 2003), patent class assignment can be inadequate (as described earlier), and claims can differ in legitimacy depending on industry norms (Merges and Nelson 1990). To begin accounting for this heterogeneity in the propensity to rely on patenting, which can affect conclusions regarding relatedness or usefulness, negative binomial regressions are run separately for each of the six major technological categories as described in the Hall et al. (2001) database: chemical (listed as CAT1), computer and communications (CAT2), drugs and medical CAT3), electrical and electronic (CAT4), mechanical (CAT5), and others (CAT6).

Results

Table 2 shows the results of the negative binomial regressions. Model 1 shows the effects of the control variables. Most notably, the number of distinct technological classes associated with an invention’s features is shown to be positively related to future citations. Also, the number of claims an invention makes—often associated to an invention’s importance—is positively related to future citations.

Table 2 Negative binomial regressions of future citations

The even-numbered subset of Models 2–13 includes the explanatory variables: first- and second-order terms for features, different measures of relatedness, and trials. Hypothesis 1 maintained that usefulness would be highest when relatedness among features is intermediate. In these models where all patented inventions satisfying the data selection constraints are aggregated together from all technological categories, the empirical analysis generally shows that the first-order effect is indeed positive, the second-order effect is negative, and both coefficients are statistically significant. Furthermore, the negative second-order effect does indeed overwhelm the positive first-order effect across the range of relatedness as reported in Table 1. It is not unreasonable to expect that citation chains with the nature of indirectness of type #5 (i.e., the inventions cited by features are linked via subsequent co-citation) should show up as statistically insignificant. Thus, the results provide broad support for H1.

Hypothesis 2 maintained that future attempts to combine a previously combined set of specific features ultimately lead to inventions of lower usefulness. The even-numbered models of Table 2 show a negative relationship between the number of past trials with a set of features, and the usefulness of an invention that uses such a set. In this preliminary analysis, the most useful configurations for a given set of features appear typically found first.

I include a second-order term for number of trials. As the odd-numbered subset of Models 3–13 shows, there appears to be a nonlinear relationship between number of past trials and usefulness. At first, number of trials is negatively related to usefulness. As experience with a set of features accumulates, the negative relationship begins to disappear.

Since it is well-accepted that the propensity to patent inventions differs across industries (Cohen et al. 2000; Levin et al. 1987), additional negative binomial regressions are run but this time with category fixed effects. Table 3 supports the notion that some industries are more likely to patent than others.Footnote 13 Specifically, the computer and electronics industries apparently have a greater tendency to patent than the chemical and drug industries. Whether these coefficients indicate differences in rates of innovation or differences in rates of patenting vis-a-vis other appropriation mechanisms is unknown from this data alone (c.f. Pavitt 1982), and would have to be combined with other variables indicating innovation-based inputs or innovation output measures. More importantly, even after controlling for industry via these fixed effects, Hypothesis 1 is still supported.

Table 3 Negative binomial regression of future citations, with category fixed effects

Table 4 shows negative binomial regressions for each technological category. Hypothesis 1 linking an invention’s usefulness and the relatedness among its features is supported fully for three of the five definitive technological categories (computer-, electronics-, and mechanically-oriented industries). However, the relationship between number of prior trials and invention usefulness found in Table 2 is generally not evidenced by these industry-specific regressions. Thus, when industry-specific regressions are run, Hypothesis 2 is unsupported.

Table 4 Negative binomial regression of future citations, per technological category

Discussion

The paper is the first to my knowledge to explicitly examine the relationship between an invention’s usefulness and the socioculturally oriented relatedness of its features. Generally speaking, a statistically significant inverse U-shaped relationship is found between an invention’s usefulness and the relatedness among its features, evident from models that measure relatedness by citation chain types #2, #2.5, and #4. Connections between features too direct do not seem related to the usefulness of a focal invention (i.e., citation chain type #1), and the same appears to hold for connections based on linking inventive features after they have been discovered (i.e., citation chain type #5).

The data used to test the hypotheses has a couple limitations. First, the data likely does not account for all of any given invention’s features; non-patent citations are not covered. While the focus of this paper is indeed on technological features and development (e.g., Verbeek et al. 2003) and not scientific phenomena underlying patented inventions, the structure underlying the connectedness among the patent citations (i.e., technological features) can help to indicate the scientific phenomena at hand (c.f. Faucompré et al. 1997; Lo 2010; Narin and Noma 1985). To re-iterate, citations made to patents have also been shown to indicate novelty, in a qualitatively different fashion compared to citations made to publications (also see Meyer 2000). Second, the data may account for features that have little to do with the invention itself. Specifically, some backwards citations that become listed as features may be cited extraneously as substitutes for other features, for the sole purpose of documentation.

This paper helps respond to recent work lamenting that the effects of relatedness are not well-understood. As described by D’aveni et al. (2004) pp. 365–366, “…the empirical search for synergistic effects (from resource-sharing among related businesses) on corporate-level performance has produced mixed and inconsistent results… Mixed results suggest that scholars need to understand the impact of diversification at a finer-grained level of detail (Lubatkin et al. 2001).” And while prior work on resource-based synergies has focused on equating synergy to mere relatedness and economies of scale or scope (e.g., Gary 2005; Schilling et al. 2003; St. John and Harrison 1999; for earlier work, see Amit and Livnat 1988; Barney 1988; Davis and Thomas 1993; and their references), the current paper specifically examines relatedness at a finer-grained dynamic level. Instead of examining relatedness according to arbitrarily assigned coding schemes, this paper examines relatedness according to a patent system that requires records of association between features regardless of coding schemes (in other words, patent examiners are generally understood to be familiar enough with the prior art that they will include all relevant patent citations as prior art). Of course a measure of relatedness based on informetric or citation-based analysis may still suffer from the arbitrariness of coding schemes, insofar that pieces of knowledge (i.e., patents) are categorized by codes, and inventors are expected to search and cite knowledge from some fields more thoroughly than others (i.e., by patent examiners). However, insofar that inventors are required to (or require themselves to) search for all relevant prior art regardless of those codes, then operationalizations of relatedness based on citation-based analysis may be particularly appropriate.

Finally, the findings also reveal a handful of relatively novel future research questions or directions, at the more fine-grained level. First, research can be done to refine understanding of what else moderates the effects of prior experience on the search for useful inventions. For example one might expect that repeated trials and innovations exploring familiar features should lead to greater marginal improvements in usefulness when inventors update their own personal cognitive representations, versus adjusting cognitive representations due to the prior efforts of others. More nuanced data for testing would also be preferred. Second, future research could examine more of the types of connections created between features after those features have been discovered but before the invention has been granted patent rights (e.g., citation chain type #5). Third, successfully relating patent citation networks to the usefulness of invention (underlying products or opportunities) may clue researchers into the nature of search processes. While Fleming and Sorenson’s (2001) analysis leaves some room for interpreting how search specifically might take place, this study posits that individuals may systematically search across knowledge spaces by examining specific features or other inventions that have used those features, much like the process that academics use upon analyzing bibliographies, the ‘Web of Science,’ or ‘Google Scholar.’

Conclusion

This study is one of the first attempting to identify links between an invention’s usefulness and two variables: the degree of relatedness among its features, and the number of times the invention’s specific set of features has been used for prior inventions. We simply know little about any regularities in the manner by which technological features of useful technological inventions are related, such that search for these inventions can be guided. As argued, when features are barely related, burdensomely broad knowledge is required to identify the situations that they share since the features are only related in relatively specific or specialized situations. When features are overly related, burdensomely broad knowledge is required to identify the specific situations that distinguish them. When features are moderately related, the costs of connecting and costs of synthesizing are cumulatively minimized, and the most useful inventions emerge. I also hypothesize that continued experimentation with a specific set of features is likely to lead to the discovery of decreasingly useful inventions; the earlier-identified connections reflect the more common consumer situations. Covering data from all industries, the empirical analysis provides broad support for only the first hypothesis. Regressions to test the second hypothesis are less conclusive, however, when examining industry types individually.

Besides using patent citation data to investigate the determinants of invention usefulness, this paper also preliminarily explores an operationalization of relatedness at a fine-grained dynamic level. While patent data is nuanced enough to reflect technological connections or linkages, as suggested by previous scholars (e.g., von Wartburg et al. 2005), raw patent citation data does not appear to provide clues to the cross-time changes in usefulness of inventions when they make the exact same set of prior patent citations. As information processing technologies advance, future research may be able to test refined hypotheses across industries with sophisticated data analysis related to literature-based discovery.