Introduction

Patent citations have been widely utilized as empirical tools for studies of patent systems, particularly in relation to economic value and knowledge flow (Trajtenberg 1990; Jaffe et al. 1993; Hall et al. 2005). Although earlier studies did not distinguish between examiner and applicant citations, subsequent studies have examined whether they differ. For example, a study by Alcacer and Gittelman (2006) demonstrated the similarity between examiner and inventor citations with respect to geographical distance. While follow-up work has compared examiner and applicant citations with respect to other dimensions of patent systems, including their relationship with renewal rates (Hegde and Sampat 2009) and the probability of use for rejections (Cotropia et al. 2013), few have analyzed how patent offices are influenced by obstacles to prior art searching. Given that examiners (and searchers working for patent offices) can never be faultless in conducting prior art searches, the types and extent of any obstacles should form part of the policy design parameters.

For example, the Patent Prosecution Highway (PPH) programs allow a patent office to utilize previous search and examination work from an earlier prosecution process at a participating patent office, provided that patent applications are made in the two countries with the same priority date, and with a corresponding (i.e., substantially the same) set of claims. The premise of this program is to expedite prosecution at the later-acting office by way of utilizing information gathered in the earlier examination. Theoretically, a patent office examiner is supposed to search for relevant prior art worldwide to confirm novelty.Footnote 1 If (in an ideal case) the outcome of a search reliably covers all relevant prior art, subsequent patent prosecution processes in different countries can simply utilize the search outcome without a duplicate search. In other words, a single office can act as a search agent for all other offices when there is no impediment to searching.

However, if patent offices have very different local advantages in technological knowledge, then the searches are not duplicates, such that searches at later-acting offices need to be conducted on top of any search outcomes supplied by the other offices. Accordingly, in order to design an international work-sharing plan between offices, the types and extent of any obstacles to prior art searching should be scrutinized to reduce duplicate search costs (or obtain a more complete search through overlapping searches) through collaboration. Unfortunately, we know little about how such obstacles stand in the way of searchers and examiners.Footnote 2 It would then be useful to first define and test several possible searching “distances” for use at patent offices when searching for prior art.

One reason why there has been no large-scale study to date on the obstacles to prior art searching by patent offices is a lack of measurement. To address this, we employ International Search Reports (ISRs) to measure the search difficulties of the trilateral patent offices, and test how “distance” binds officials, including both geographical distance along with similar kinds of obstacles to prior art searching, without relying on comparison with applicant citations. In conducting the analysis, we also consider applicant self-selection, given applicants from both the US and Japan can choose the European Patent Office (EPO) as their search agency, where the EPO has a reputation for quality prior art searching (such that applicants seeking a more stringent search may select the EPO ex ante).

Background and prior literature

Following pioneering work in measuring the effects of knowledge spillover through patenting data (Jaffe 1986) and the value of patents through patent citations (Trajtenberg 1990), Jaffe et al. (1993) and Jaffe and Trajtenberg (1999) considered the measurement of knowledge diffusion by patent citation. They found that knowledge diffusion is geographically localized, assuming that patent citations show traces of knowledge transmission. However, while survey results confirm that patent citations indicate knowledge flow with considerable noise (Jaffe et al. 2000; Duguet and MacGarvie 2005), there is criticism of the method in that patent citations are often unrelated to knowledge transfer between inventors. This is partly because patent citations include examiner citations, and because even attorneys acting on behalf of inventors in preparation for patent prosecution sometimes add applicant citations.

Given that examiners are not inventing and because the perceptions of inventors regarding prior knowledge have been a central concern for innovation research, one area of recent research is how “noisy” examiner citations are in relation to applicant citations. For example, Alcacer and Gittelman (2006) has found that examiner and applicant citations have similar distributions in terms of the geographical distance between citing and cited patents. Their results, as based on US patent data, partly contradict those from EPO data, although geographical distance also binds examiner citations by the EPO (Criscuolo and Verspagen 2008). There are other advantages of examiner citations for economic research, including in terms of better measuring the value of patents than by applicant/inventor citations (Hegde and Sampat 2009). Other detailed comparisons between applicant citations and examination citations have revealed that examiners do not rely on applicant-submitted information on prior art (Cotropia et al. 2013). However, these studies did not consider examiner citations independently from applicant/inventor citations.

In their search for prior art, examiners are professionals, but they are not perfect. Recent micro-level studies on examiner experience level and granting behavior (Lemley and Sampat 2012; Frakes and Wasserman 2014) as well as others on examiner citations (Cotropia et al. 2013) acknowledge the limitations of examiners. However, apart from a few studies, the economics literature has not considered the extent to which obstacles to searching bind examiners. A related series of research in Melbourne (Jensen et al. 2005; Webster et al. 2007; Palangkaraya et al. 2011; Webster et al. 2014) compared the results of patent grants from the trilateral offices and concluded that patent offices are biased toward local applicants (and against foreign applicants) in terms of patent grants. While differential grant rates against foreign applicants can be caused by “prejudiced” examinations in each office, examiner bias (i.e., local advantages in technological knowledge) may also contribute to the seemingly differential rates of patent grants. A remaining question is how we can measure examiner bias as caused by obstacles to searching.

Most of these existing studies use patent citation data from either a single country or two regions at most. Each data set of examiner citations in a country show only the results of a single patent office. However, if we combine multiregional citation data and consolidate citations pairs through international patent families, we should obtain in principle a way to measure the difference between regions with respect to the same criterion of family-to-family citation. That is, patents in an international family cite patents in another international patent family. Put differently, given examiner citations across different regions show traces of the examination outcomes in each region, we can track back and compare how examiners behave when citing the same prior art. As explained in the following section, we assume that every examiner citation in the national phase could have been added in its earlier ISR phase if there are no obstacles to searching by examiners, or searchers for an International Search Authority (ISA). Drawing on this assumption and the concept of family-to-family citation, we can statistically evaluate the obstacles to searching.

The methodology: PCT and ISR as the basis for empirical measurement

The measurement of examiner search obstacles is itself an impediment to research on examiners and searchers at patent offices. We propose a method of measuring search obstacles of the trilateral patent offices by focusing on ISRs issued by different ISAs, specifically the patent offices in Europe, the US, and Japan, according to the Patent Cooperation Treaty (PCT).

Before explaining the details of citation-level methodology, we note that PCT applications are increasingly important for applicants seeking patent protections internationally, and that PCT applications should receive more attention from the field of scientometrics. The number of PCT national phase entries from abroad has already surpassed the number of nonresident applications via the Paris Convention route worldwide (Fig. 1). While the PCT is now the main route for international applications, there have been few empirical studies of the PCT system. Given that the trilateral offices—the EPO, the US Patent and Trademark Office (USPTO), and the Japan Patent Office (JPO)—received most PCT applications before the mid-2000s, it is reasonable to limit our sample to those PCT applications made to and examined by all of the three offices, at least up to 2005.Footnote 3

Fig. 1
figure 1

Nonresident PCT and Paris Convention route entries (WIPO 2011, p. 48)

An ISA gives a PCT application received at a patent office an ISR at the time of international publication of the application. Under the PCT, “…an applicant must file an application with a Receiving Office (RO) and choose an international searching authority to provide an International Search Report and a written opinion on the potential patentability of the invention” (WIPO 2011). An ISR contains a list of prior arts, and the set of prior arts becomes part of the citations. ISRs are issued under a common search criterion established by the World Intellectual Property Organization (WIPO) under the PCT system. “The applicant generally has at least 30 months from the filing (priority) date to decide whether to enter the national phase in the countries or regions in which protection is sought” (WIPO 2011). The WIPO guidelines apply to every ISA when issuing an ISR, whereas some countries permit applicants to choose between ISAs. The same criteria for a prior art search apply for different patent offices, while national phase examinations do not have such standardized rules. We can then distinguish between cited patents added in the national phase by designated offices (or DO-cited patents) and those cited patents caught earlier during the ISR (or ISR-cited patents).

As shown in Fig. 2, there are time differences between ISRs and national phase examinations, implying the existence of a lag between ISR-citations and DO-citations on average in the national phase. While ISRs are produced at an early stage, more searches occur later in national offices. Given that knowledge is geographically localized (Jaffe et al. 1993; Jaffe and Trajtenberg 1999), and knowledge diffusion takes time, the additional time between the ISR and the national phase search facilitates a more complete search in the later stage. We limit our sample to PCT applications examined in all three of the trilateral offices, meaning that any localized knowledge captured in any of these areas at the time of the ISR can be caught by the offices in the national phase in a less localized way.

Fig. 2
figure 2

PCT procedure (WIPO 2011, p. 13)

Following the logic above, we retrospectively define the probability of every cited patent for a PCT application (the union set of ISR-cited and DO-cited patents), consolidated and identified at the INPADOC familyFootnote 4 level, as already caught in the ISR of the originating PCT application (whether or not included in the ISR-cited patents) (Fig. 3). Taking this probability (found_in_ISR) as the dependent variable, we implement PROBIT analyses at the INPADOC family level with explanatory variables representing the various “distances” between citing and cited patents, including the technological complexity of the originating applications and other related indicators. Simply put, we assume that every DO-cited patent for a PCT application has been cited in its ISR if every citer and cited pair is consolidated at the INPADOC family level, and if examiners (or searchers for an ISA) are unbounded in their searching capability.

Fig. 3
figure 3

Dependent variable found_in_ISR: a binary variable representing the probability of a DO-citation or ISR-citation already included in the set of ISR-citations (modification to Fig. 2)

We should mention several caveats concerning the methodology. First, we exclude applicant (inventor) citations from the analysis because our primary objective is to evaluate the determinants of search completeness by the ISAs. However, when an applicant is relatively capable in searching for prior art, ex ante disclosure might affect the quality of a search by the patent office. To address this, we conduct additional analysis to consider the self-selection to the EPO of US and Japanese applicants. This is because the EPO has a reputation for a higher examination standard and therefore higher capability applicants from the US and Japan may choose the EPO as their ISA.Footnote 5

Second, if a relevant prior art was missed at the time of an ISR, we assume that one of the designated offices (DOs) will cite it. In reality, DO-citations vary according to the different standards in different regions. Given that the US patent system does not provide citation category (such as “X” and “Y”) information, we have been unable to apply the same standard of rejection for a cited piece of prior art. In addition, DOs can never be perfect in a prior art search. Citations made by post-grant oppositions are included, but citations by post-grant litigations are not. Thus, the union set of DO-citations is only an approximation of the quasi-complete search made possible ex post. Conversely, DOs may cite prior art in response to an applicant action such as an amendment of claim, divisional application, or continuation. Although ISAs are supposed to cite prior art reasonably expected to be relevant in subsequent changes of claims, we cannot ex ante search all prior arts triggered by the ex post amendment of claims. Then, we may violate our basic assumption that “every DO-cited patent for a PCT application should have been cited in its ISR if ISAs are perfect in their search capabilities” if the amendment of claims is too drastic.

Third, sometimes outsourced to non-PTO agencies, we consider ISRs as the basis of evaluating PTOs because issuance is under the name of the patent office, not any private search agencies. We also consider only those citations made by the trilateral offices, such that search completeness made possible by nontrilateral offices is not considered.

Finally, given that PATSAT, our primary data source, records nonpatent literature in a nonstandardized format, we could not consolidate it across different records. For this reason, we only employ patent citations. Further, US citations are not as complete in PATSTAT. In particular, there is no record of citations for rejected applications in PATSTAT.Footnote 6 Although it is usually possible to retrieve citations data from the Public PAIR database for rejected applications filed after 2001, we have been unable to combine the data from the two sources.

Hypotheses

Given that ISR searchers (including examiners and searchers working for patent offices) are affected by obstacles to searching because of various “distances,” we hypothesize that a prior patent (found in the ISR or national phase) is more likely to be included in the ISR when distances are less problematic, i.e., if:

H1

A relevant prior patent is geographically closer (shorter geographical distance).

H2

A relevant prior patent is older (more knowledge diffusion time).

H3

A relevant prior patent is from the same applicant (less organizational distance).

H4

A relevant prior patent has a greater number of forward citations (more knowledge diffusion beforehand).

H5

An application for which an ISR is issued has less scope, a lower number of claims, a fewer number of inventors, and a smaller size of international family (less complexity relative to search).

In addition, we consider the possibility that an applicant’s self-selection of an ISA affects the outcome variable. As shown in Fig. 4, PCT applicants from the US and Japan are permitted to select the EPO as their ISA, unlike applicants from European Patent Convention (EPC) contracting states who are not permitted to select the USPTO or the JPO as their ISA.Footnote 7 Given the reputation of the EPO for high-quality prior art searches, applicants from the US and Japan may self-select if they seek a more stringent search at the EPO. We therefore include switching behavior on PCT applications for ISRs as one of the factors for ISR completeness, and use instrument variables for ISA-switch (a binary variable ISA_changed).

Fig. 4
figure 4

Selection of ISA from the RO

The data source

The empirical domain of analysis is triadic patent applications through PCT with an earliest priority date within its international family between 2002 and 2005. Triadic PCT patent applications are defined here as INPADOC families that contain all EPO, USPTO, and JPO applications recorded on EPO’s PATSTAT database, with only one “WO” (PCT) application in a family. This means that a single PCT application initiates the international phase for all applications in a family. There are 97,828 international families used in the analysis. Although international applications to and from China and Korea have increased dramatically in the last 10 years, the trilateral patent offices of the EPO, the USPTO, and the JPO represent the vast majority of applications before 2005, which is our observation period.

We use EPO PATSTAT (2013 OCT version), and INPADOC family is the unit of analysis. Therefore, the accuracy of international families depends entirely on the INPADOC family table on PATSTAT. The citation data are also from PATSTAT (2013 OCT), and the JPO citation data is augmented using the Seiri–Hyojunka data (the standardized patent prosecution data of the JPO). We consolidate applicant identifiers using the EEE-PPAT database developed by ECOOM (Du Plessis et al. 2009; Magerman et al. 2009; Peeters et al. 2009).

As discussed, US citation data are not complete in PATSTAT because it does not record citations for rejected applications. Even after the publication rule change in the US in 2001, a published application went unrecorded in PATSTAT if the application was abandoned (possibly due to rejection). The lack of US citations for rejected applications may affect the results of our analysis, but we have not yet verified this.

Based on the data set described, applications from the EPO area represent more than a quarter of the entire sample, as shown in Fig. 5. In the figure, “JP-EP” denotes the JPO as the RO and the EPO as the ISA, and US-EP is the USPTO as the RO and the EPO as the ISA. Applicants from the EPO area are not permitted to choose ISAs; in contrast, US applicants are allowed to choose ISAs from the EPO, IP Australia, the Korean Intellectual Property Office (KIPO), the Rospatent (Russian Patent Office), etc. In fact, more than half of all PCT applications from the US choose the EPO as their ISA, while just 0.7 % select the KIPO.Footnote 8 Applicants from Japan are allowed to choose either the JPO or the EPO as their ISA, but only about one-tenth of Japanese PCT applications have chosen the EPO as their ISA.

Fig. 5
figure 5

Composition of triadic PCT applications, priority years 2002–2005

Underlying the selection of ISAs across the trilateral offices, there are differences in their reputations regarding completeness of search reports, i.e., the EPO has the best reputation. This is consistent with a simple comparison with the average of found_in_ISR for the three ISAs in Fig. 6. Given that the EPO has a good reputation, and given that applicants from the US or Japan can choose the ISA, we expect that self-selection by applicants influences the outcome variable, found_in_ISR. This is partly because applicants with inventions of higher economic value or with higher capability would spend more for a prior art search themselves, so that they would identify more prior art before submitting a formal application. Furthermore, highly capable applicants may desire a more stringent search in this early stage to avoid rejection in a later stage, i.e., the national phase. Indeed, there is evidence that applicants know that the EPO produces higher quality ISRs in general; the fees it charges are also relatively higher than those of the other offices.Footnote 9 In order to account for self-selection, we hypothesize that the more experienced and capable an applicant in the US or Japan is in terms of technological innovation, the more likely the applicant will choose the EPO as its ISA.

Fig. 6
figure 6

Simple average of the dependent variable found_in_ISR according to ISA

Variables and estimation methodologies

We employ several categories of explanatory variables, representing each of the above hypotheses, in probit analyses specifying the probability of capture of a cited patent in a previous ISR as the binary dependent variable (found_in_ISR). The unit of analysis is the pair of citing and cited international families, both consolidated at the INPADOC family level.

For H1, we define three variables: euro_cited (cited family has its first priority, i.e., the earliest date, in EPC contracting states within a family, derived from tls201 and tls219 tables of PATSTAT), us_cited (cited family has its first priority in the US), and jp_cited (cited family has its first priority in Japan). When a cited family has its origin in the same region where an ISR is issued, we expect the ISA of the region to have a geographical advantage over the relevant technology. The expected sign is positive for each region, e.g., positive jp_cited coefficients for applications originating from Japan.

For H2, we define the citation lag between the first priority of a citing family and that of a cited family as fam_cite_lag (derived from tls201 and tls219 tables of PATSTAT). The longer the lag, the more easily the prior art is found at the time of the ISR. Therefore, its expected sign is positive.

For H3, we define self as a binary variable taking a value of one if one of the patents in a cited family and one of the patents in its citing family belong to the same applicant, based on PATSTAT (tls207) combined with EEE-PPAT, using “L2” identifier. Here we hypothesize the patent office will find it easier to locate prior relevant art within the same applicant. Therefore, the expected sign is positive.

For H4, we define fwd_cite_of_the_cited obtained from PATSTAT (tls212) as the number of forward examiner citations at the publication level (but consolidated at the family level), and made out to the cited patent family. When a prior art has been already cited by many patents, patent offices will find it easier to identify. Therefore, its expected sign is positive.

For H5, we first use a scope indicator, where IPC4_count is the total net count of IPC subclasses (4-digit IPC, derived from tls209) assigned in a citing INPADOC family. Because the patent classification of an application may change during the prosecution process, in both the international and national phases, we include all IPC subclasses to capture the breadth of a family. The number of claims of a patent correlates with the complexity of the technological content. As an indicator of the number of claims, we obtain publn_claims_max_tls211, which is the maximum number of claims registered on PATSTAT (tls211 table) in a citing INPADOC family. We do not simply rely on claims data from a single office, such as the EPO, because an application can be modified internationally during its prosecution. We also employ invt_nr, the maximum number of inventors in an application included in a citing INPADOC family, from PATSTAT (tls207). The size of the international family, family_size, is a count variable of applications in different countries in a citing INPADOC family (tls211/219). Because all of the complexity measures act negatively against prior art searches by patent offices, the expected signs are all negative.

In addition to the above variables used to test our hypotheses directly, we define two variables representing the capabilities of applicants in order to address the self-selection of ISA by the applicants. The first of these is total_count, which is the number of total applications that an applicant has made, taken from EEE-PPAT. The second is applicant_avg_cited, which is the number of average forward citations an applicant has received for an application, as calculated by PATSTAT (tls212) and EEE-PPAT. Both are supposed to represent the experience level of the applicant, and thus are used as instrument variables for instrumented PROBIT for the variable ISA_changed. This binary variable ISA_changed indicates that a US or Japanese applicant chose the EPO as their ISA (the EPO can be chosen by a US or Japanese applicant, but not vice versa for a European applicant). We obtain this information for PCT applications on PATSTAT, given the citation table tls212 has a field for citation origin whereas ISR is shown for PCT applications. Given that the first application country (RO) in a family is available from tls201, we can code the switch from RO to a different ISA. The correlation coefficient between ISA_changed and the dependent variable found_in_ISR is low, at around 0.03.

Lastly, we specify control variables for the originating area, being JP_app and US_app (applications from Japan and the US, respectively). We control for technology class using 35 WIPO technology classification dummies, setting the last classification as the reference class.

Estimation results

The results under Model 1-1 in Table 1 employ the sample only from EPO regions. H1 is supported by the positive sign of euro_cited and the negative signs of us_cited and jp_cited. Likewise, the results for this model support H2, H3, H4, and H5, except that the estimated coefficient for the number of inventors has a positive sign, contrary to our expectation from H5. Model 1-2 further limits the sample to those from the EPO region and non-self-citations as a robustness check. The results are unchanged from Model 1-1. Model 1-3 employs all triadic samples from the EPO, USPTO, and JPO regions, with JP_app and US_app as applicant region controls, implying that the EPO is the reference category. The coefficients for the two region controls have negative and significant signs, indicating that ISRs prepared by the USPTO and the JPO are disadvantaged on average, compared with ISRs by the EPO. The binary variable ISA_changed indicates when US applicants or Japanese applicants select the EPO as their ISA. The estimated coefficient for ISA_changed is positive and significant, meaning that switching an ISA from the USPTO or the JPO to the EPO has made an ISR more complete. The results for the other variables are mostly unchanged from Model 1-1 and Model 1-2, except that the coefficient for the number of inventors has lost significance. The coefficient for jp_cited has shifted from a negative to a positive sign, but this is because of the pooled sample. This suggests that prior arts from the JPO area are easier to be found by the trilateral offices on average. The results from Models 1-1 and 1-2 clearly show that the EPO finds it more difficult to detect prior arts from the JPO area (See Tables 2, 3, 4 for summary statistics, definitions and correlation matrix).

Table 1 PROBIT analyses on the probability of ISR coverage; dep. var. = found_in_ISR
Table 2 Summary statistics
Table 3 Variables
Table 4 Correlation matrix

Model 2-1 uses applications from the US only, and all of the results are consistent with the hypotheses, except that euro_cited has a positive coefficient (European prior art seems to be easier for searchers in the US). Model 2-2 also focuses on the US, and limits the citation data to non-self-citations as a robustness check, while employing two instrument variables for the variable ISA_changed through instrumented probit (IV Probit). The results are almost unchanged from those discussed earlier. The only exception is that the estimated coefficient for IPC4_count has lost significance.

Model 3-1 uses only the sample of applications from Japan to examine the local bias of prior art searches in Japan. As expected by H1, jp_cited has a positive and significant sign, whereas us_cited has a negative and significant sign. The other variables display similar results as Models 1 and 2 and are consistent with our hypotheses, except that self has a negative sign and there are insignificant coefficients for prior arts from Europe, the number of forward citations to the cited patents, and the number of inventors. For Japanese applications, the coefficient for ISA_changed lost significance in Model 3-2, suggesting that the advantage provided by the ISA change from the JPO to the EPO is from applicant self-selection. However, we do not observe this effect for the US-only applications in Model 2-2.

Some of the results relating to the 35 WIPO technology classes are also noteworthy. The estimated coefficients for class 14 and particularly classes 15 and 16 have consistently positive signs. The WIPO field classification for 14 is “Organic fine chemistry,” 15 is “Biotechnology,” and 16 is “Pharmaceuticals” (Table 5). Those technological classes are known as discrete technologies, the patents for these technology classes generally have a higher economic value when compared with more complex technologies. Because applicants conduct relatively complete searches before filing applications in discrete technology classes, the prior arts on the ISRs are thought to be relatively complete.

Table 5 WIPO technology fields

Discussion and further development

The overall results are consistent with our hypotheses, suggesting the binding of examiners (and searchers working for the PTOs) by various kinds of distances, including the technological complexity of applications. These are not very surprising results, but are supported by a novel methodology for the first time. Examiners (unlike inventors) are required by law to find prior art from all over the world, but are naturally bound by obstacles to searching. Most prior studies using examiner citations do not incorporate these informational obstacles in the way of examiners, and the present study has proposed and implemented a methodology to determine the existence of barriers. As stated in the literature review, prior studies on the difference of examination outcomes between patent offices (Jensen et al. 2005; Webster et al. 2007, 2014) have not explicitly considered these issues. Taking the cost of prior art search into a grant rate comparison offers a potential way of extending the research envelope. However, as explained in the methodology and data sections, we must first address several limitations. In particular, the US data require filtering on citation categories, and augmentation with rejected (abandoned) applications. Further, the results including instrument variables suggest self-selection is evident, but only for the Japanese sample. There is a need for further scrutiny using updated data including additional attributes of both applicants and applications.

These results have important policy implications, especially as PPHs rely on earlier outcomes from other patent offices. Given that knowledge is locally concentrated because of agglomeration economies, a local patent office may have an advantage over other distant patent offices in finding relevant prior knowledge locally. This is also likely because local examiners are educated and employed locally and have access to up-to-date information in the local language. In other words, the physical distance between the location of an invention and the location of its relevant prior art is not independent of the probability of the prior art found by examiners (and searchers employed or contracted by patent offices). If we attempt to evaluate merit by combining the work done by more than one patent office, an efficiency question depends on how distant patent offices duplicate their efforts. Put differently, in order to justify a system of physically dispersed patent offices on the planet, rather than a unitary single patent office that searches and examines patent applications worldwide, we need to know how complementary the offices are in terms of their searching capabilities. This paper provides a preliminary step toward responding to this key policy question.