A patent search strategy based on machine learning for the emerging field of service robotics

Kreuchauff, Florian; Korzinov, Vladimir

doi:10.1007/s11192-017-2268-3

A patent search strategy based on machine learning for the emerging field of service robotics

Published: 10 February 2017

Volume 111, pages 743–772, (2017)
Cite this article

Scientometrics Aims and scope Submit manuscript

Florian Kreuchauff² &
Vladimir Korzinov¹

1547 Accesses
12 Citations
Explore all metrics

Abstract

Emerging technologies are often not part of any official industry, patent or trademark classification systems. Thus, delineating boundaries to measure their early development stage is a nontrivial task. This paper is aimed to present a methodology to automatically classify patents concerning service robots. We introduce a synergy of a traditional technology identification process, namely keyword extraction and verification by an expert community, with a machine learning algorithm. The result is a novel possibility to allocate patents which (1) reduces expert bias regarding vested interests on lexical query methods, (2) avoids problems with citation approaches, and (3) facilitates evolutionary changes. Based upon a small core set of worldwide service robotics patent applications, we derive apt n-gram frequency vectors and train a support vector machine, relying only on titles, abstracts, and IPC categorization of each document. Altering the utilized Kernel functions and respective parameters, we reach a recall level of 83% and precision level of 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of artificial intelligence on labor productivity

Article Open access 21 January 2021

Giacomo Damioli, Vincent Van Roy & Daniel Vertesy

Robotic process automation

Article 04 November 2019

Peter Hofmann, Caroline Samp & Nils Urbach

Innovation in the Mining Industry: Technological Trends and a Case Study of the Challenges of Disruptive Innovation

Article Open access 23 July 2020

Felipe Sánchez & Philipp Hartlieb

Notes

Beyond its potential productivity effects, SR is believed to induce visible changes in employment structures (Autor et al. 2003; Frey and Osborne 2017; Graetz and Michaels 2015). Its potential to change organization processes in firms as well as everyday life of people is already visible in the diffusion of semi-autonomous physical systems out of industrial fabrication and into service economies.
Cozzens et al. (2010) argue that bibliometric data, in particular proposals and publications, seem to be most useful for monitoring the technological horizon. Patent analysis on the other hand, besides being long known to be valuable for competitive and trend analysis (Abraham and Morita 2001; Liu and Shyu 1997), has become sophisticated to even predict emerging fields (Erdi et al. 2013).
Cf. http://www.oecd.org/sti/intellectual-property-statistics-and-analysis.htm#method.
See for example the annual reports by the German Patent and Trademark Office at http://www.dpma.de/english/service/publications/annualreports/index.html.
Only in 2011, a second subclass, B82Y, focusing on specific uses or applications of nanostructures was introduced for IPC and the Cooperative Patent Classification (CPC). Previously, related nano patent documents could only be identified if they were classified via the European Classification System (ECLA) with the subclass Y01. ECLA Y-codes have been created as an extension of the original classification system, to extend classification capabilities to new (emerging) technology areas of special interest.
With respect to scientific publications, another common strategy is to identify core journals. All articles within those journals are then considered relevant. For patents though, this search strategy is obviously not feasible, which is why we do not deepen it any further.
Such a search strategy is called evolutionary, if subsequent researchers may build upon existing query structures by progressively incorporating terms that better specify the technology and widen its scope (Mogoutov and Kahane 2007).
For the instance of nanotechnology, to which we refer throughout, Arora et al. (2014) measure the growth in nano-prefixed terms in scholarly publications and find that the percentage of articles using a nano-prefixed term has increased from less than 10% in the early 1990s to almost 80% by 2010.
This approach naturally harbors the risk of including generic articles of any scientific field that somehow happen to be cited in a technologically unrelated context. Bassecoulard et al. (2007) therefore incorporate a statistical relevance limit relying on the specificity of citations.
Consequently, the adequate data sources for this identification process are the same that comprise the targets of subsequent analyses which might give cause for some criticism.
For the technology under consideration in this paper, it is important to note that SR patents are very much different from business process and service patents and there is little if no overlap. SR patents are much more 'technological' in the sense that they contain information about how a robot is constructed and for which environment its functionalities are intended. Their content is thus close to IR patents, making them hard to disentangle from each other. In contrast, business and service patents contain organizational innovations to a large extent.
This database encompasses raw data from about 60 million patent applications and 30 million granted patents, utility models, PCT applications, etc. filed at more than 100 patent authorities worldwide.
Manipulators; Chambers Provided With Manipulation Devices. See http://www.wipo.int/classifications/ipc/en/.
According to the USPTO, most of the manipulators classified in B25 J are industrial robots. See http://www.uspto.gov/web/patents/classification/cpc/html/defB25J.html.
We have included one example of such a subquery in the appendix. All other queries are available upon request.
We also tried to incorporate another step (6), which added IPC dummy variables to indicate class belongings. These additional attributes were later abandoned by the following feature selection process, which suggests that these IPC class belongings are not significant for the categorization at hand.
There exist some multiclass SVM approaches. See Duan and Keerthi (2005) for a review.
Since there is no possibility to determine in advance which Kernel function should be used, the choice of the depicted functions was mostly motivated by their popularity in classifiers and availability within the software package used.
We do not discuss the exact implementation of the support vector machine algorithm in the python scikit-learn tool. All necessary materials can be found in open access libraries following the reference provided above.
We even included IPC classes at an early stage of development, but did not find any of these classifications to become part of the support vectors. They turned out to be irrelevant to the discrimination procedure and were thus removed during the feature selection process.
Within PATSTAT, for instance, more than 90% of the listed patent applications are followed by less than three forward citations, 74% do not show any at all.
SR may be seen as such an umbrella term as well—or as a system of technologies, i.e., combining many technologies in the way described by Arthur (2009): In that sense, SR is becoming more diverse and complex with evolving purposes organized to meet human needs.

References

Abraham, B., & Morita, S. (2001). Innovation assessment through patent analysis. Technovation, 21, 245–252.
Article Google Scholar
Ali, S. & Smith-Miles, K. A. (2006). A meta-learning approach to automatic kernel selection for support vector machines, Neurocomputing 70 (123), 173–186. Neural networks selected papers from the 7th Brazilian Symposium on Neural Networks (SBRN 04).
Arora, S. K., Porter, A. L., Youtie, J., & Shapira, P. (2013). Capturing new developments in an emerging technology: An updated search strategy for identifying nanotechnology research outputs. Scientometrics, 95, 351–370.
Article Google Scholar
Arora, S. K., Youtie, J., Carley, S., Porter, A. L., & Shapira, P. (2014). Measuring the development of a common scientific lexicon in nanotechnology. Journal of Nanoparticle Research, 16(2194), 1–11.
Google Scholar
Arthur, B. (1999). Complexity and the economy. Science, 284(5411), 107–109.
Article Google Scholar
Arthur, B. (2009). The nature of technology—what it is and how it evolves. New York: Free Press. (Reprint Ed. January 11, 2011).
Arthur, B., & Polak, W. (2006). The evolution of technology in a simple computer model. Complexity, 11(5), 23–31.
Article Google Scholar
Autor, D. H., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. The Quarterly Journal of Economics, 118(4), 1279–1333.
Article MATH Google Scholar
Bassecoulard, E., Lelu, A., & Zitt, M. (2007). Mapping nanosciences by citation flows: a preliminary analysis. Scientometrics, 70, 859–880.
Article Google Scholar
Boser, B., Guyon, I. & Vapnik, V. (Eds.). (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory—COLT’92 (p. 144).
Bresnahan, T. F. (2010). General purpose technologies. In B. Hall & N. Rosenberg (Eds.), Handbook of economics of innovation (2nd ed., pp. 763–791). Amsterdam: Elsevier.
Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Cozzens, S., Gatchair, S., Kang, J., Kim, K.-S., Lee, H. J., Ordónez, G., et al. (2010). Emerging technologies: Quantitative identification and measurement. Technology Analysis & Strategic Management, 22(3), 361–376.
Article Google Scholar
Duan, K.-B., & Keerthi, S. (2005). Which is the best multiclass SVM method? An empirical study. In N. Oza, R. Polikar, J. Kittler, & F. Roli (Eds.), Multiple classifier systems (Vol. 3541, pp. 278–285)., Lecture notes in computer science Berlin: Springer.
Chapter Google Scholar
Erdi, P., Makovi, K., Smomogyvári, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95, 225–242.
Article Google Scholar
Fischer, M., Scherngell, T., & Jansenberger, E. (2009). Geographic localisation of knowledge spillovers: Evidence from high-tech patent citations in Europe. Annals of Regional Science, 43, 839–858.
Article Google Scholar
Frey, C. B., & Osborne, M. A. (2017). The future of employment: How susceptible are jobs to computerization? Technological Forecasting and Social Change, 114, 254–280.
Article Google Scholar
Frietsch, R. (2015). Collection and analysis of private R&D investment and patent data in different sectors, thematic areas and societal challenges, JRC/BRU/2014/J.6/0015/OC, Inception report, Deliverable 1.1: Methodological report, Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in biomedical literature. Scientometrics, 85, 257.
Article Google Scholar
Garfield, E. (1967). Primordial concepts citation indexing and historio-bibliography. Journal of Library History, 2, 235–249.
Google Scholar
Graetz, G., & Michaels, G. (2015). Robots at work. Center for Economic Performance Discussion Paper.
Griliches, Z. (1990). Patent statistics as economic indicators: A survey. Journal of Economic Literature, 28, 1661–1707.
Google Scholar
Guyon, I., Boser, B., & Vapnik, V. (1993). Automatic capacity tuning of very large VC-dimension classifiers, advances in neural information processing systems (pp. 147–155). Burlington: Morgan Kaufmann.
Google Scholar
Halaweh, M. (2013). Emerging technology: What is it? Journal of Technology Management and Innovation, 8(3), 108–115.
Article Google Scholar
Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of Economics, 36(1), 16–38.
Google Scholar
Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2010). A practical guide to support vector classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University.
Jaffe, A., Trajtenberg, M., & Henderson, R. (1993). Geographic localization of knowledge spillovers as evidenced by patent citations. The Quarterly Journal of Economics, 108(3), 577–598.
Article Google Scholar
Kenekayoro, P., Buckley, K., & Thelwall, M. (2014). Automatic classification of academic web page types. Scientometrics, 101, 1015.
Article Google Scholar
Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249–268.
MathSciNet MATH Google Scholar
Lee, W. H. (2008). How to identify emerging research fields using scientometrics: An example in the field of information security. Scientometrics, 76(3), 503–525.
Article Google Scholar
Lee, P., Su, H., & Chan, T. (2010). Assessment of ontology-based knowledge network formation by vector-space model. Scientometrics, 85, 689.
Article Google Scholar
Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29, 481–497.
Article Google Scholar
Li, Y.-R., Wang, L.-H., & Hong, C.-F. (2009). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, 36, 5200–5204.
Article Google Scholar
Liu, S., & Shyu, J. (1997). Strategic planning for technology development with patent analysis. International Journal of Technology Management, 13, 661–680.
Article Google Scholar
Manning, C., Raghavan, P. & Schütze, H. (2008). Introduction to Information retrieval. online, Accessed Oct 15 2014. URL: http://www-nlp.stanford.edu/IR-book/.
McKeown, K. et al. (2016). Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology, 67(11), 2684–2696.
Article Google Scholar
Mogoutov, A., & Kahane, B. (2007). Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36, 893–903.
Article Google Scholar
Noyons, E., Buter, R., Raan, A., Schmoch, U., Heinze, T., S., H. & Rangnow, R. (2003). Mapping excellence in science and technology across Europe. Part 2: nanoscience and nanotechnology. Draft Report EC-PPN CT2002-0001 to the European Commission. Leiden University Centre for Science and Technology Studies/Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet MATH Google Scholar
Porter, A., Youtie, J., & Shapira, P. (2008). Nanotechnology publications and citations by leading countries and blocs. Journal of Nanoparticle Research, 10, 981–986.
Article Google Scholar
Ruffaldi, E., Sani, E. & Bergamasco, M. (2010). Visualizing perspectives and trends in robotics based on patent mining In IEEE International Conference on Robotics and Automation, Anchorage, Alaska.
Schmoch, U. (2008). Concept of a technology classification for country comparisons. In Final Report to the World Intellectual Property Organization (WIPO), Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Srinivasan, R. (2008). Sources, characteristics and effects of emerging technologies: Research opportunities in innovation. Industrial Marketing Management, 37, 633–640.
Article Google Scholar
Stahl, B. (2011). What does the future hold? A critical view of emerging information and communication technologies and their social consequences, vol. 356 of Researching the Future in Information Systems, IFIP advances in information and communication technology. Berlin: Springer.
Google Scholar
Thompson, P. (2006). Patent citations and the geography of knowledge spillovers: Evidence from inventor- and examiner-added citations. The Review of Economics and Statistics, 88(2), 383–388.
Article Google Scholar
Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43, 1216–1247.
Article Google Scholar
Van de Velde, E., Debergh, P., Verbeek, A., Rammer, C., Cremers, K., Schliessler, P., et al. (2013). Production and trade in KETs-based products: The EU position in global value chains and specialization patterns within the EU. Brussels: European Commission, DG Enterprise.
Google Scholar
Wolpert, D., & Macready, W. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Article Google Scholar
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. Journal of High-Technology Management Research, 15, 37–50.
Article Google Scholar

Download references

Acknowledgements

We are thankful to the High Performance Humanoid Technologies (H2T) group from the Institute for Anthropomatics and Robotics at Karlsruhe Institute of Technology in Germany, in particular to Prof. Dr. Tamim Asfour and Prof. Dr. Gabriel Lopes from Delft Center for Systems and Control/Robotics Institute at TU Delft in the Netherlands for their support and advices. Moreover, we wish to thank the participants in the 15th EBES Conference in Lisbon, 6th annual S.NET meeting in Karlsruhe as well as 5th Global TechMining Conference in Atlanta for their valuable comments and suggestions that have led to the improvement of this article. This work is supported by the project “Value Creation & Innovation Processes in and beyond Technology” of the Karlsruhe School of Services.

Author information

Authors and Affiliations

Karlsruhe Institute of Technology, Rüppurrer Str. 1a, Haus B, 76137, Karlsruhe, Germany
Vladimir Korzinov
Geschäftsstelle Expertenkommission Forschung und Innovation (EFI) c/o SV Gemeinnützige Gesellschaft für Wissenschaftsstatistik mbH, Pariser Platz 6, D-10117, Berlin, Germany
Florian Kreuchauff

Authors

Florian Kreuchauff
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Korzinov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Korzinov.

Appendix

See Tables 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16.

Table 5 Important robot definitions according to ISO 8373:2012

Full size table

Table 6 SR application examples of personal/domestic use according to IFR

Full size table

Table 7 SR application examples of professional/commercial use according to (IFR, 2014)

Full size table

Table 8 Exemplary extract of robot patents under consideration with titles, publication numbers (given by the patent authority issuing the patent), filing dates (on which the application was received), and expert classification decisions

Full size table

Table 9 A fragment of a modular SQL Boolean term search approach for PATSTAT, defined through specific word construction for IFR application field CLEANING SR, augmented by IPC class codes

Full size table

Table 10 List of the 1206 variables used in the SVM for classification: Part 1/4 of the 726 unigrams

Full size table

Table 11 List of the 1206 variables used in the SVM for classification: Part 2/4 of the 726 unigrams

Full size table

Table 12 List of the 1206 variables used in the SVM for classification: Part 3/4 of the 726 unigrams

Full size table

Table 13 List of the 1206 variables used in the SVM for classification: Part 4/4 of the 726 unigrams

Full size table

Table 14 List of the 1206 variables used in the SVM for classification: Part 1/2 of the 370 bigrams

Full size table

Table 15 List of the 1206 variables used in the SVM for classification: Part 2/2 of the 370 bigrams

Full size table

Table 16 List of the 1206 variables used in the SVM for classification: All 110 trigrams

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kreuchauff, F., Korzinov, V. A patent search strategy based on machine learning for the emerging field of service robotics. Scientometrics 111, 743–772 (2017). https://doi.org/10.1007/s11192-017-2268-3

Download citation

Received: 06 March 2016
Published: 10 February 2017
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11192-017-2268-3

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A patent search strategy based on machine learning for the emerging field of service robotics

Abstract

Access this article

Similar content being viewed by others

The impact of artificial intelligence on labor productivity

Robotic process automation

Innovation in the Mining Industry: Technological Trends and a Case Study of the Challenges of Disruptive Innovation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

A patent search strategy based on machine learning for the emerging field of service robotics

Abstract

Access this article

Similar content being viewed by others

The impact of artificial intelligence on labor productivity

Robotic process automation

Innovation in the Mining Industry: Technological Trends and a Case Study of the Challenges of Disruptive Innovation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation