1 Introduction

In preparing a patent application or judging the validity of a patent applied for based on novelty and inventiveness, an essential task is searching patent databases for related patents that may invalidate the patent. Patent searching is usually performed by examiners in a patent office and patent searchers in private companies (Alberts et al. 2011). For searching patent databases the patent searchers follow a strict scheme. They compartmentalize the invention into searchable features and expand the features using synonyms, equivalents and co-occurring terms, using the concept of an invention diagram. To narrow the search topic, query terms are specialized into keyword phrases (Hunt and Nyugen 2007). Table 1 shows an example of a diagram including the features of the invention completed with expansion terms as they are used for query generation by the patent searchers.

Table 1 Invention diagram

The first column includes the searchable features of the invention selected from the source document, particularly from a patent document or an invention report. The second column provides the corresponding expansion terms. The terms are synonyms or equivalents, such as “screen” for “display”, co-occur in the source document, for example “signal” with “transmitter”, or limit a feature of the invention to a keyword phrase, such as “control module” for “module”. Particularly for finding synonyms to the searchable features of the invention, there is an increasing need to assist patent searchers as this process is very time-intensive and the probability to miss relevant expansion terms is high (Azzopardi et al. 2010; Fujita 2007; Hunt and Nyugen 2007). Yet, no sources providing synonyms, such as patent domain specific lexica or thesauri, are available.

The goal of this paper is to learn synonyms for the patent domain to enhance patent searchers in query generation, particularly in query expansion. Particularly, patent searchers shall be allowed to formulate queries and to evaluate patent applications in the same way as patent examiners do it. To this end, we learn domain-specific term networks from the query logs created by patent examiners as part of the application validation procedure.

The remainder of the paper is organized as follows. We first review related work on automatic query expansion in patent searching and enhancing query generation using query logs. In Sect. 3 we present the structure and characteristics of query logs of USPTO patent examiners. We then present our approach to learn synonyms from the query logs and the term networks each learned for a specific US patent class in Sect. 4. Experiments based on query expansion done by patent examiners are provided in Sect. 5, followed by conclusions and an outlook on future work in Sect. 6.

2 Related work

2.1 Automatic query expansion in patent searching

Currently, automatic query expansion in patent search is mostly limited to computing co-occurring terms for the searchable features of the invention (Konishi 2005; Mahdabi et al. 2011; Magdy and Jones 2011). Additional query terms are extracted automatically from the query documents, the feedback documents or from the cited documents based on statistical measures, such as term frequencies (tf) and a combination of term frequencies and inverted document frequencies (tfidf), or from the translations of the claim sections (Jochim et al. 2011; Konishi 2005; Mahdabi and Crestani 2011; Magdy and Jones 2011; Russo 2011; Xue and Croft 2009).

Also whole documents or whole sections of the query documents, like the title, abstract, description or the claim section are used for query generation and query expansion (Xue and Croft 2009).

For conceptual search (searching by meanings rather than literal strings) in the patent domain, the international patent classification (IPC), in particular their categories and short descriptions, or standard dictionaries, such as WordNet, or lexica, like Wikipedia, are used for query refinement (Bashar and Myaeng 2011; Herbert et al. 2009). Especially, to provide synonyms for conceptual search, the related experiments for automatic query expansion commonly rely on the usage of the standard dictionaries and lexica (Bashar and Myaeng 2011). We assume, that these standard dictionaries will achieve only fair performance in the patent domain. In Sect. 5 we will reassess this assumption.

One approach computing synonyms from European Patent Office (EPO) patent collection is described in Jochim et al. (2010, 2011). The claim sections in English, German and French are aligned to extract translation relations for each language pair. Based on the language pairs having the same translation terms, synonyms are learned in English, French and German. We refined the approach to learn synonyms on specific claim terms, identified as the subject features of the inventions (Tannebaum and Rauber 2012b). We used the reference signs appearing next to these specific claim terms in the granted patents of the EPO. We assembled bigrams learned from the claim sections to learn translation relations from the claim sections. From these we learned synonyms based on the same translation terms as shown in (Tannebaum and Rauber 2012b) but for specific claim terms. Further, our approach avoids mistakes through incorrect word alignment. The limited collection of documents available for learning patent class-specific term networks prove to be disadvantage. Since 2003, about forty thousand granted patent documents including the translations of the claim sections have been published by the EPO each year. Hence, in view of more than 600 subclasses of the cooperative patent classification (CPC) classification scheme of the EPO, the document collection will be too small and finally inappropriate for learning patent class specific term networks.

2.2 Enhancing query generation based on query logs

Query logs are being intensively studied in many information retrieval settings, specifically for web searchers (Silvestri 2010). The main focus is on the analysis of the queries to enhance searches (Amitay and Broder 2008; Clough and Berendt 2009; Kato et al. 2013). In previous research for learning lexical term networks using query logs, terms are extracted directly from the query log collection. Relations, specifically synonym relations, are generally learned by using external sources, such as lexica, glossaries or databases like WordNet (Sekine and Suzuki 2007; Zhang et al. 2006). Further relations are retrieved by analyzing the retrieved, particularly the clicked, documents. If two queries are related with the same document, these two queries are associated with each other (Hang et al. 2002; Kunpeng et al. 2009). All approaches to find synonyms depend on external sources such as lexica or glossaries. These approaches do not utilize relations between the query terms in the query logs. Yet, we need these and can use these for learning synonyms in the patent domain due to the radically different setting in which patent searches are performed. Contrary to conventional query logs, where searchers are usually one-step single query events, patent search sessions extend over many queries that are gradually refined, and that rely more heavily on the use of synonyms to ensure coverage, as patent applicants are permitted to be their own lexicographers, i.e. they can define their own terminology.

Finding query logs in the patent domain has been a difficult task due to the lack of publicly available logs (Jürgens et al. 2012). Private companies and searchers are not interested in making their logs that will usually include or hint at features of their inventions available. The USPTO is the only source known to us which publishes the query logs of patent examiners. In (De Marco 2011) a detailed analysis of individual the USPTO patent examiners query logs is presented to reveal search strategies and potential shortcomings/limitations. In (Tannebaum and Rauber 2012a, b) we analyzed the basic characteristics of USPTO patent examiners’ query logs. We manually downloaded a limited set of query logs (346 log files) of USPTO patent examiners, particularly for one specific patent domain from the US Patent and Trademark Office Portal PAIR. Initial results indicated that specialized term networks can be extracted directly from query logs to complement resources for standard English using the domain specificity of patents and the extensive classification scheme they are structured in. This has positive effects on automated query expansion in patent searching.

In this paper we radically extended these initial studies, proving a method to create valuable domain-specific term networks to assist in query expansion in this highly professional search setting. We collected and preprocessed a significantly larger corpus of patent query logs, facilitating more in-depth studies and providing reasonably comprehensive term networks.

3 Query logs of the USPTO

The query logs of USPTO patent examiners called “Examiner`s search strategy and results” are published for most patent applications since 2003 by the US Patent and Trademark Office Portal PAIR (Patent Application Information Retrieval).Footnote 1 The download is limited by the USPTO. For each patent application a verification code has to be entered. Google has begun crawling the USPTO’s public PAIR sites and provides free download of the patent applications.Footnote 2 Since, April 2013 Reed Technology a contractor to the USPTO undertakes this task and hosts the data.Footnote 3 For each patent application a single zip file is available containing several folders including information such as: Address and Attorney/Agent, Application Data, Continuity Data, Foreign Priority, Image File Wrapper, Patent Term Adjustments, Patent Term Extension History and Transaction History. The Image File Wrapper is of concern to us here. This folder can contain one or multiple query log files.

Each query log of the USPTO is a PDF file consisting of a series of queries. Figure 1 shows an example of such a query log. Each query has several elements. We focus on the search query element showing the query formulated by the patent examiner. Further elements are: reference, hits, database(s), default operator, plurals, and time stamp.

Fig. 1
figure 1

Example of a USPTO query log

There are several kinds of queries in the search query element as shown in Fig. 1. Text queries, such as queries S1 and S2, are used for querying whole documents (fulltext search) or only sections of patent documents, such as the title section (title search) using query terms. Non-text queries are used for searching patent document numbers, classifications, or application and publications dates. For example, the non-text query “@ad ≤ 20030604” is used for searching patent documents applied before 4th June of 2003, as shown in query S4. A further kind of query is the reference query, such as query S3, which is a combination of earlier queries, in particular of query S1 and S2, i.e. re-using the terms of a previous query and expanding it with further elements, thus avoiding to have to re-type an earlier query. Text queries include search operators between the query terms. The types of search operators are (1) Boolean operators, such as “AND or OR” and (2) Proximity operators, like “SAME, ADJ(acent), NEAR, or WITH”. Furthermore, Truncation Limiters, such as “$”, are used for query formulation. If the search operators are added manually, they are shown between the query terms in the text query element, else they are indicated by the default operator element. For our purposes we are specifically interested in the queries including the Boolean operator “OR” to learn synonyms.

4 Learning term networks

4.1 Experiment set up

The USPTO published about 2.7 million patent applications, since 2003. The applications are classified into 473 classes each including hundreds or thousands of subclasses. Hence, on average, about 6,000 application documents are available for each class. Because patent searchers use the classification system to narrow the search, we selected fifteen classes for our experiments. At first we collected all application numbers of the published patent applications for the fifteen classes and generated a list of download links for each class based on the download URL “http://storage.googleapis.com/uspto-pair/applications/APP_NUM.zip”, where we replaced “APP_NUM” in the URL with the application numbers. Secondly, we harvested the zip files via Wget Footnote 4 a free software package for retrieving files from web servers, unzipped and filtered the files using the file name ending “SRNT.pdf” of the query log files. Following, we carried out OCR conversion using ABCocr Footnote 5 a product to extract text from images on a Windows 7 platform and converted the PDF files to TXT files. Subsequently, all terms were fed into the extraction process.

Overall, we downloaded and preprocessed 103,896 query logs available for the fifteen US classes, making it the largest collection of query logs used for experiments in the patent IR domain. Table 2 shows the number and title of the selected classes and the number of downloaded query logs for each class.

Table 2 Experiment set up

As shown in Table 2 the number of query logs for the classes differs between 1,820 and 16,864 files. For our experiments, particularly for learning the class-specific term networks and for evaluating them, we conceptually grouped the classes according to their size: (small) having less than 4,000 query logs (medium) having up to 8,000 files and (large) having more than 8,000 logs. The grouping allows us to assess, on how far the performance of class specific term networks depend on the class size (number of query logs) and whether a minimum number of query logs is needed to achieve accurate performance in automatic query expansion. Furthermore, we selected some classes that are topically related (e.g. classes 384 and 148; or classes 128, 433 and 623 from the medical domain) as well as completely disjunct classes to evaluate in how far term networks can be learned on a more generic level, using the hierarchical relationship between classes.

4.2 Extracting term networks based on synonym detection

In patent searching the Boolean Operator “OR” is used to expand a query term with an expansion term, which has the same meaning, such as “drill” for “burr” or “tool” for “instrument” in the medical domain concerning dentistry equipment. We use that for automatically detecting synonyms in the query logs based on the Boolean operator “OR”, which indicates that two query terms are synonyms, or can at least be considered as equivalents. The process works as follows: We extract all text queries including the search operators between the query terms from the query log collection. We then filter all 3-g generated from the text queries in the form “X b Y”, where b is the Boolean operator “OR” and X and Y are query terms. In addition, to exclude mismatches and misspellings and for ranking of the extracted synonyms according to their support in the specific classes, in particular for suggesting initially the synonyms having the highest support followed by additional terms that have a lower support later-on, we utilize a confidence value CV. We measure the frequency of each synonym in the specific class. Table 3 shows for each class the number of extracted synonyms based on the confidence values CV 1 to CV 5 , i.e. that have a support greater than or equal to 1–5.

Table 3 Number of extracted synonyms based on confidence values CV 1 to CV 5

As shown in Table 3, in particular for CV 1 , for each class the number of unique synonyms extracted from the query logs increases with the size of the query log collection. The highest number of synonym relations (36,366 relations) could be extracted from the large class 422 for “Chemical apparatus and process disinfecting, preserving, or sterilizing”. Furthermore, using the CLAWS part of speech tagger for English terms (Garside and Smith 1997) we identified the synonyms based on the confidence value CV 1 w.r.t. part of speech and find out that more than half of the terms are nouns (69.61 %) followed by adjectives (15.53 %) and verbs (14.87 %).

As expected, we notice for all classes a considerable decrease of the number of synonyms having a support between 2 and 5. Because patent searching is a recall orientated task, we consider those synonyms that were encountered at least two times as synonyms (CV 2) in the specific class to learn the term networks. This reduces spurious mismatches, but provides as many synonyms for automatic query term suggestion as possible. Table 4, shows the learned term networks which resemble thesauri of English concepts.

Table 4 Learned term networks

The thesauri provide English synonyms for each specific patent class. In each term network terms that have the same meaning are linked to each other. In total, the learned term networks provide 64,750 unique synonym relations based on 36,601 unique query terms. Finally, the learned term networks can be used, particularly in each specific patent US class, for (semi-) automated query suggestion, particularly query expansion.

5 Automatic query expansion

In this section we use the term networks for automatic query expansion in patent searching. For the evaluation we expand query terms of queries from real query sessions of patent examiners (gold standard). Based on the suggested terms from the term networks and the expansion terms used by the examiners, we calculate recall and precision scores to evaluate the performance of the learned term networks.

For each class we split the query log collection in a training set and a test set for evaluation. The training set is further divided into several sub-sets to learn multiple term networks for each class to evaluate size and time dependency characteristics. Specifically, having the query logs ordered by time of application of the patent, we use the first set of query logs of each class for training, with the test set being created from the chronologically last set of query logs in each class. For each class we generate up to five training sets for learning the class specific term networks (TS1 to TS5). The size of these sub-sets depends on the class size. For the five classes having less than 4,000 query logs (grouped as small) we learn 17 term networks based on training sets having between 500 and 2,500 query logs in increments of 500. For the medium grouped classes we learn 20 term networks having between 3,000 and 5,000 query logs. Finally, for the large classes we generate 21 term networks based on training sets having between 6,000 and 10,000 log files. In total, for all classes we learn 59 term networks based on specific class and training set size. Table 5 shows the generated training sets used for learning the class specific term networks.

Table 5 Test and training sets, sets in bold were used for LogNet

Furthermore, we learn a class-independent term network, which we call LogNet. For this we use the largest training sets of each specific class. Table 4 shows the selected training sets in bold. This network is still domain-specific in the sense that it is based on patent query logs, yet it stretches across class boundaries and is thus less specific.

To evaluate the lexical term networks based on recall and precision of the suggested expansion terms, we query the synonyms from the test sets. Particularly, to calculate the recall scores, we compare the suggested terms from the term networks with the synonym or equivalent terms from the test sets, which were used by the patent examiners for searching. To compute precision we compare the synonyms used by the examiners in the test sets with all expansion terms suggested by the term networks.

5.1 Query expansion based on training set size

First we use the specific term networks TS1 to TS5 of each class and the class-independent term network LogNet to evaluate the performance of the term networks based on the size of the training sets and for each class. For calculating the recall and precision measures we excluded synonyms in the test sets which are out of the vocabulary of the term networks, i.e. terms that did not appear in any earlier query log. Table 5 shows the achieved recall and precision measures.

As shown in Table 6, for almost all classes the recall measures increase with the increase in training set size. Specifically, we can assume that the recall scores will further increase with even larger training sets. Because we excluded synonyms that are out of the vocabulary, in some cases, particularly for classes 180 and 417, with larger training sets the recall scores go down. The reason for that is, with the larger training sets more synonyms and equivalent terms appear in the term networks (the terms are not out of vocabulary any more) but not necessarily as synonyms. Best recall measures are provided, on average, by the term networks learned from the training sets of the large US classes with a size larger than 6,000 query logs (with one exception, class 128). In particular, best recall is provided by the term Network TS5 learned for the class 623. The term network TS5 provides with a recall of 73.33 %, on average, 7 of 10 synonyms, which are used by the patent examiners for query expansion.

Table 6 Query expansion based on training set size and class

The precision values show, that with increasing training set sizes the achieved precision scores decrease as the number of suggested synonyms increases. The term network TS3 learned for class 128 provides with a score of 44.17 % best precision. On average, 4 out of 10 terms that are suggested by the term network as synonyms were actually used by the examiners for query expansion. Considering the term network providing the best recall performance (TS5 for class 623) on average only 2 out of 10 terms suggested are used by the patent examiners for query expansion. Note that the lower precision may not be a serious impediment for deployment of the query term expansion: as patent search is recall-oriented rather than precision-oriented, i.e. preferring a higher number of potentially irrelevant documents in a result set over a more limited result set missing relevant documents, especially when suggested by a system for manual deployment rather than performed in a fully autonomous manner, may be assistive rather than harmful. Furthermore, precision is likely to be under-estimated, as the fact that certain suggested expansion terms are incorrect rather than potentially useful but having been not thought of by the searcher would need to be confirmed by expert searchers. This would help to determine whether the terms suggested but not used in the original patent verification search are actually wrong, or whether the examiner performing the validation search simply did not think them.

Furthermore, we evaluate the class-independent term network LogNet. In almost all classes the recall measures of the class-specific networks TS1 to TS5 are further improved. In particular, best recall is provided for class 417 with a recall of 78 %. As explained above, in two cases the recall decreases because of the synonyms that appear in LogNet, but not as synonyms. Mirroring the trend observed with the class-specific term networks, LogNet achieves only weak precision measures across all classes, peaking at 7.14 % for class 128.

In addition, we measure coverage of the respective networks by determining the number of out-of-vocabulary words, i.e. expansion terms that were used later in time that the network could not learn. This provides an indication on the comprehensiveness of the expansion term suggested. Table 7 shows the coverage of the class-specific term networks and LogNet.

Table 7 Coverage provided by the class-specific term networks and LogNet

With the increasing class size the coverage of the networks obviously increases. Best coverage scores of the class-specific term networks are provided by the term networks learned from the large classes. In particular, the term network learned for class 379 provides 81.48 % of the query terms from the test set. For all classes, on average, the class-independent term network LogNet provides a coverage of 87.90 %.

The experiments show that for almost all classes the recall and coverage measures of the class-specific term networks rise with the class size and can be further improved using the class-independent term network LogNet. On the other hand, the class-specific term networks achieve much better precision scores than LogNet. In this case, query terms are expanded in a certain context. To provide term networks for automatic query term suggestion achieving high recall/coverage and precision scores, either (1) the recall measures of the class-specific term networks or (2) the precision scores of LogNet have to be improved. We address this issue in the following experiments.

5.2 Using a confidence value for query expansion

To optimize precision values we re-run the experiments for the large classes which achieve highest recall and coverage scores but provide the lowest precision measures of the class-specific term networks utilizing the confidence values for the synonym relations. In this setting, we only considered those expansion terms that were encountered at least five times as synonyms in the training sets, i.e. that have a higher support for the respective mapping in the training set. The resulting scores are provided in Table 8.

Table 8 Considering confidence values in the large classes

While the recall achieved with this more limited term network obviously decreases considerably, we also observe a drastic increase in precision compared to the values provided in Table 6. For example, the term network TS5 of class 422 provides with a recall of 59.83 %, on average, 6 of 10 synonyms, which are used by the patent examiners for query expansion. The corresponding limited term network provides with a value of 22.00 % a considerable lower recall score. But a drastic increase in precision can be observed. The limited term network achieving a precision score of 23.05 % drastically outperform the term network TS5 providing a value of 7.22 %.

The experiments show, that we can use the confidence values to iteratively suggest an increasing number of expansion terms as the search evolves. This allows the system to strike a reasonable balance between increasingly higher recall/coverage by suggesting additional expansion terms that have a lower support in the training set at the cost of lower precision after having initially suggested the most likely, highest-precision expansion terms.

5.3 Query expansion across different US patent classes

In Sect. 5.2, we addressed the low precision performance of the term networks learned from large classes achieving highest recall and coverage scores. In this section we evaluate the performance of the class-specific term networks when used for patents from other classes. We assume that this will help to detect classes where cross-domain applications might be useful, in particular to improve recall and coverage of small classes providing accurate precision scores. We test the term networks TS2 of the small and medium classes and the term networks TS3 of the large classes across test sets from other classes without excluding out-of-vocabulary words. We have not considered out-of-vocabulary words, because our goal is to detect related class having the most common synonyms. Across different patent classes we cannot assume that expansion terms were used later in time. Table 9 shows the achieved recall measures.

Table 9 Recall measures achieved when using class-specific term network for other than the class they were based upon

As shown in Table 9 the learned class specific term networks achieve respectable recall measures across some class boundaries. In particular, the term network TS3 learned for the US Class 128 called “Surgery” achieves a recall measure of 29.73 % for class 623 called “Prothesis”. This hints at the fact that term networks learned from one class may, in fact, be applied across related classes. Yet, there is no guarantee that this will always work for each pair of related classes. For example the term network learned from query logs of class 148 called “Metal Treatment” achieves at 12.68 % a better recall measure for class 384 called “Bearings” than the corresponding term network learned from class 384 when applied to class 148 with a recall score of only 4.59 %. As known, “Bearings” are generally created from metal material. “Metal Treatment” is a common process for manufacturing bearings and popular in the class 384. Yet, “Bearings” may be irrelevant for treating metal and thus for the class 148 leading to a non-bijective relationship.

Hence, the experiments show that for the classes marked in bold cross-domain applications seems promising. This can be used for expanding smaller sets of query logs of specific classes to learn improved term networks of related classes providing improved recall and coverage scores, i.e. for classes where few query logs are available.

5.4 Query expansion compared to WordNet

As mentioned in Sect. 2, most approaches use standard dictionaries for automatic query expansion, particularly for finding synonyms. In this section we evaluate the performance of our approach compared to the dictionary WordNet (Miller 1995).

In particular, we test the best performing class-specific term networks, the class-independent network LogNet and the dictionary WordNet based on the test sets generated for each specific class and used in the Sects. 5.1 and 5.2.

Again, we calculate the recall/coverage and precision scores based on the suggested terms from the various term networks and the expansion terms used by the examiners. For the expansion of the query terms we use all lexical relations included in the patent domain specific term networks and in WordNet. We will not consider the meaning of the query terms, as our main focus is on the recall score in automatic query term suggestion. Thus, WordNet should benefit from higher recall due to the large number of synonyms added without a potentially harmfuly limitation to specific word senses. We are aware, that the precision measures can be improved when considering the word senses. In spite of this rather defensive assumption, LogNet achieves better recall measures than the standard dictionary WordNet across all classes, as shown in Table 10.

Table 10 Recall and precision values for query expansion based on LogNet, WordNet and the best performing class-specific network

For larger training sets the recall measures of the learned term network LogNet increases. For the large classes LogNet is learned from training sets having more than 6,000 query logs files and provides, on average, for all classes, best recall scores. Compared to LogNet and the best performing class-specific networks, WordNet achieves only low recall for all classes. A comparative performance is only achieved for class 454. Over all classes, WordNet provides, on average, only a recall of 22.06 %. Comparing the precisions measures, WordNet achieves as expacted, like LogNet, only weak precision across all classes, peaking at 5.49 % for class 398.

In addition, to see if the differences between the results of WordNet and the best performing query log based expansion method, in particular LogNet, were statistically relevant, we run a t test. The test allows us to conclude that there were a statistically significance (p < 0.05). With respect to the measures for each specific class achieved by WordNet and LogNet, the t test confirms that for all classes except class 280 (0.27), the differences are significant (p < 0.05).

Again, we measure coverage of the respective networks by determining the number of out-of-vocabulary words. Table 11 shows the vocabulary covered by the term networks. WordNet being the most comprehensive thesaurus provides best coverage followed by LogNet. Best coverage is provided by WordNet for the class 128 at 98.55 %. LogNet has the highest coverage for class 384 at 96.04 %.

Table 11 Coverage provided by the class-specific term networks, LogNet and WordNet

Through the analysis of the failed synonym relations provided by WordNet we learn that (1) patent examiners expand class-specific query terms using general terms. For example, in the class 379 called “Telephonic communications” they expand the specific query term “cellphone” using the general expansion term “device”. A further example is the expansion of the class-specific term “camper” with the general term “vehicle”. Further (2) the examiners expand query terms w.r.t. part of speech, such as “burn” for “burning” or “coat” for “coating”. Furthermore (3) they relate terms, which have the same meaning in a specific classes, such as “portable” for “handheld”, in particularly for the class 379 called “Telephonic communications”. Additionally, through analysis of the vocabulary, which is not covered by WordNet, we find out, that patent examiners (4) use popular trademarks, such as “iphone”, “ipad” or “blackberry” for query expansion. Further, (5) the patent applicants are allowed to create their own terms, such as “pocketpc” for “notebook”, “watergas” for “steam” or “passcode” for “password”. Because of these highly specific expansions of query terms in the patent domain, standard dictionaries, such as WordNet, achieve only low performance. In these standard dictionaries, such patent domain specific vocabulary and relations are not included. But even these kinds of synonyms, equivalents and relations between the vocabulary are needed for automatic query expansion in the patent domain. Using our approach to learn term networks from the patent domain and directly from the query logs of patent examiners fulfills the requirements of this highly domain specific query expansion.

Finally, the experiments show, that the term networks learned directly from the patent domain, in particular LogNet and the best performing class-specific term networks, drastically outperform the general term network WordNet. The standard dictionary WordNet achieves for all US classes only low performance in recall. As expected, all term networks, the patent domain specific and the general term network WordNet, achieve low precision measures. The reason for that is, that we have not considered the meaning of the query terms. In Sects. 5.1 and 5.2, we show how to counter the low precision values of term networks when used for automatic query term suggestion by (1) suggesting class-specific expansion terms first, followed by additional expansion terms provided by the class-independent term network; or (2) by suggesting expansion terms provided by the class-independent term networks in the order of their support in the training set.

Hence, the experiments show that the query logs are valuable resources to learn term networks from the patent domain for query expansion achieving high recall and precision scores.

6 Conclusions and future work

In this paper we presented an approach to support query expansion in the domain of patent search. We used real query expansion sessions done by patent professionals to learn term networks for automatic query expansion. Several US class-specific term networks and a class-independent term network, referred to as LogNet, were learned from the query logs provided by the USPTO to capture synonym relations. We evaluated these term networks based on query expansion done by patent professionals in real sessions.

The experiments show for the class-specific term networks, that recall measures increase with the availability of a larger set of query logs. Best recall measures are provided, on average, by the term networks learned from training sets with a size larger than 6,000 query logs. The class-specific networks provide up to 7 out of 10 synonyms, which are used by the patent examiners for query expansion. We assume that the recall measures will further increase with the rise of the training set size. Because the USPTO publish new query logs regularly for each US class, the size of the collections rises automatically. This will be advantageous for our expansion approach.

Further, the experiments show that for almost all classes the recall measures of the class-specific term networks can be further improved using the class-independent term network LogNet. LogNet suggests, on average, up to 8 out of 10 synonyms, which are used by the examiners for query expansion. Expectedly the class-independent term network achieves lower precision scores.

In addition, we show how to strike reasonable balance between increasingly higher recall/coverage and lower precision. We learned that the class-specific term networks provide better precision scores than the class-independent term network LogNet. Query terms are expanded in a certain context (patent class). On the other hand the class-independent term network LogNet achieves best recall and coverage scores. Hence, (1) expansion terms can be incrementally suggested initially from the class-specific term networks providing high precision scores, followed by more generic terms from the class-independent term network LogNet achieving higher recall measures later-on, or (2) expansion terms can be suggest in the order of their support in the training set. After having initially suggested the most likely and highest-precision terms (i.e. using expansion terms that were encountered most frequentlys as synonyms in the training sets), additional terms that have a lower support and lower precision (i.e. encountered at least one time) can be suggested. Furthermore, (3) related US classes allow the use of class-specific term networks across boundaries, providing valuable expansion opportunities, specifically for smaller classes, in example for classes where few query logs are available.

More importantly, the specific term networks drastically outperform general-purpose sources such as WordNet. The standard dictionary WordNet achieves for all US classes only low performance in recall. This may be attributed to the fact that patent searchers, (1) expand class-specific query terms using general terms, (2) expand query terms w.r.t. part of speech, (3) relate terms, which have the same meaning in a specific class, (4) use popular trademarks and (5) patent applicants are allowed to create their own terms for query expansion. These kinds of synonyms, equivalents and relations between the vocabulary are not included in standard dictionaries, such as WordNet, but are needed for automatic query expansion in the patent domain.

We show, that our approach to learn term networks from the patent domain, specifically directly from the query logs helps in meeting the requirements of this highly domain specific setting.

In future work we aim to improve the performance of the learned class-specific and class-independent term networks. To this end, we want to learn class-related term networks to improve recall and coverage of the class-specific term networks that achieved best precision measures. Further, we want to use a dynamic thresholding on the support scores for synonym relationships, in particular for the class-independent term network LogNet providing best recall and coverage scores but lowest precision measures. Thus, we will be able to sort the expansion terms in such a way as to optimize precision while keeping recall high. In addition, we want to consider the default operator element and the set operator (which can be set at “AND” or “OR”) to learn synonyms from co-occurring query terms, which are not linked to each other by the Boolean operator “OR” to expand our term networks.

Furthermore, we want to use the query log collections to learn further semantic relations that are needed for automatic query expansion in patent searching. In particular, we aim to learn term networks of keyword phrases, which we use for automatic query limitation in patent searching. To learn the keyword phrases we will use the proximity operators appearing in the text queries of the query logs.