Skip to main content
Log in

A patent search strategy based on machine learning for the emerging field of service robotics

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Emerging technologies are often not part of any official industry, patent or trademark classification systems. Thus, delineating boundaries to measure their early development stage is a nontrivial task. This paper is aimed to present a methodology to automatically classify patents concerning service robots. We introduce a synergy of a traditional technology identification process, namely keyword extraction and verification by an expert community, with a machine learning algorithm. The result is a novel possibility to allocate patents which (1) reduces expert bias regarding vested interests on lexical query methods, (2) avoids problems with citation approaches, and (3) facilitates evolutionary changes. Based upon a small core set of worldwide service robotics patent applications, we derive apt n-gram frequency vectors and train a support vector machine, relying only on titles, abstracts, and IPC categorization of each document. Altering the utilized Kernel functions and respective parameters, we reach a recall level of 83% and precision level of 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Beyond its potential productivity effects, SR is believed to induce visible changes in employment structures (Autor et al. 2003; Frey and Osborne 2017; Graetz and Michaels 2015). Its potential to change organization processes in firms as well as everyday life of people is already visible in the diffusion of semi-autonomous physical systems out of industrial fabrication and into service economies.

  2. Cozzens et al. (2010) argue that bibliometric data, in particular proposals and publications, seem to be most useful for monitoring the technological horizon. Patent analysis on the other hand, besides being long known to be valuable for competitive and trend analysis (Abraham and Morita 2001; Liu and Shyu 1997), has become sophisticated to even predict emerging fields (Erdi et al. 2013).

  3. Cf. http://www.oecd.org/sti/intellectual-property-statistics-and-analysis.htm#method.

  4. See for example the annual reports by the German Patent and Trademark Office at http://www.dpma.de/english/service/publications/annualreports/index.html.

  5. Only in 2011, a second subclass, B82Y, focusing on specific uses or applications of nanostructures was introduced for IPC and the Cooperative Patent Classification (CPC). Previously, related nano patent documents could only be identified if they were classified via the European Classification System (ECLA) with the subclass Y01. ECLA Y-codes have been created as an extension of the original classification system, to extend classification capabilities to new (emerging) technology areas of special interest.

  6. With respect to scientific publications, another common strategy is to identify core journals. All articles within those journals are then considered relevant. For patents though, this search strategy is obviously not feasible, which is why we do not deepen it any further.

  7. Such a search strategy is called evolutionary, if subsequent researchers may build upon existing query structures by progressively incorporating terms that better specify the technology and widen its scope (Mogoutov and Kahane 2007).

  8. For the instance of nanotechnology, to which we refer throughout, Arora et al. (2014) measure the growth in nano-prefixed terms in scholarly publications and find that the percentage of articles using a nano-prefixed term has increased from less than 10% in the early 1990s to almost 80% by 2010.

  9. This approach naturally harbors the risk of including generic articles of any scientific field that somehow happen to be cited in a technologically unrelated context. Bassecoulard et al. (2007) therefore incorporate a statistical relevance limit relying on the specificity of citations.

  10. Consequently, the adequate data sources for this identification process are the same that comprise the targets of subsequent analyses which might give cause for some criticism.

  11. For the technology under consideration in this paper, it is important to note that SR patents are very much different from business process and service patents and there is little if no overlap. SR patents are much more 'technological' in the sense that they contain information about how a robot is constructed and for which environment its functionalities are intended. Their content is thus close to IR patents, making them hard to disentangle from each other. In contrast, business and service patents contain organizational innovations to a large extent.

  12. This database encompasses raw data from about 60 million patent applications and 30 million granted patents, utility models, PCT applications, etc. filed at more than 100 patent authorities worldwide.

  13. Manipulators; Chambers Provided With Manipulation Devices. See http://www.wipo.int/classifications/ipc/en/.

  14. According to the USPTO, most of the manipulators classified in B25 J are industrial robots. See http://www.uspto.gov/web/patents/classification/cpc/html/defB25J.html.

  15. We have included one example of such a subquery in the appendix. All other queries are available upon request.

  16. We also tried to incorporate another step (6), which added IPC dummy variables to indicate class belongings. These additional attributes were later abandoned by the following feature selection process, which suggests that these IPC class belongings are not significant for the categorization at hand.

  17. There exist some multiclass SVM approaches. See Duan and Keerthi (2005) for a review.

  18. Since there is no possibility to determine in advance which Kernel function should be used, the choice of the depicted functions was mostly motivated by their popularity in classifiers and availability within the software package used.

  19. We do not discuss the exact implementation of the support vector machine algorithm in the python scikit-learn tool. All necessary materials can be found in open access libraries following the reference provided above.

  20. We even included IPC classes at an early stage of development, but did not find any of these classifications to become part of the support vectors. They turned out to be irrelevant to the discrimination procedure and were thus removed during the feature selection process.

  21. Within PATSTAT, for instance, more than 90% of the listed patent applications are followed by less than three forward citations, 74% do not show any at all.

  22. SR may be seen as such an umbrella term as well—or as a system of technologies, i.e., combining many technologies in the way described by Arthur (2009): In that sense, SR is becoming more diverse and complex with evolving purposes organized to meet human needs.

References

  • Abraham, B., & Morita, S. (2001). Innovation assessment through patent analysis. Technovation, 21, 245–252.

    Article  Google Scholar 

  • Ali, S. & Smith-Miles, K. A. (2006). A meta-learning approach to automatic kernel selection for support vector machines, Neurocomputing 70 (123), 173–186. Neural networks selected papers from the 7th Brazilian Symposium on Neural Networks (SBRN 04).

  • Arora, S. K., Porter, A. L., Youtie, J., & Shapira, P. (2013). Capturing new developments in an emerging technology: An updated search strategy for identifying nanotechnology research outputs. Scientometrics, 95, 351–370.

    Article  Google Scholar 

  • Arora, S. K., Youtie, J., Carley, S., Porter, A. L., & Shapira, P. (2014). Measuring the development of a common scientific lexicon in nanotechnology. Journal of Nanoparticle Research, 16(2194), 1–11.

    Google Scholar 

  • Arthur, B. (1999). Complexity and the economy. Science, 284(5411), 107–109.

    Article  Google Scholar 

  • Arthur, B. (2009). The nature of technology—what it is and how it evolves. New York: Free Press. (Reprint Ed. January 11, 2011).

  • Arthur, B., & Polak, W. (2006). The evolution of technology in a simple computer model. Complexity, 11(5), 23–31.

    Article  Google Scholar 

  • Autor, D. H., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. The Quarterly Journal of Economics, 118(4), 1279–1333.

    Article  MATH  Google Scholar 

  • Bassecoulard, E., Lelu, A., & Zitt, M. (2007). Mapping nanosciences by citation flows: a preliminary analysis. Scientometrics, 70, 859–880.

    Article  Google Scholar 

  • Boser, B., Guyon, I. & Vapnik, V. (Eds.). (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory—COLT’92 (p. 144).

  • Bresnahan, T. F. (2010). General purpose technologies. In B. Hall & N. Rosenberg (Eds.), Handbook of economics of innovation (2nd ed., pp. 763–791). Amsterdam: Elsevier.

    Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

    Article  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Cozzens, S., Gatchair, S., Kang, J., Kim, K.-S., Lee, H. J., Ordónez, G., et al. (2010). Emerging technologies: Quantitative identification and measurement. Technology Analysis & Strategic Management, 22(3), 361–376.

    Article  Google Scholar 

  • Duan, K.-B., & Keerthi, S. (2005). Which is the best multiclass SVM method? An empirical study. In N. Oza, R. Polikar, J. Kittler, & F. Roli (Eds.), Multiple classifier systems (Vol. 3541, pp. 278–285)., Lecture notes in computer science Berlin: Springer.

    Chapter  Google Scholar 

  • Erdi, P., Makovi, K., Smomogyvári, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95, 225–242.

    Article  Google Scholar 

  • Fischer, M., Scherngell, T., & Jansenberger, E. (2009). Geographic localisation of knowledge spillovers: Evidence from high-tech patent citations in Europe. Annals of Regional Science, 43, 839–858.

    Article  Google Scholar 

  • Frey, C. B., & Osborne, M. A. (2017). The future of employment: How susceptible are jobs to computerization? Technological Forecasting and Social Change, 114, 254–280.

    Article  Google Scholar 

  • Frietsch, R. (2015). Collection and analysis of private R&D investment and patent data in different sectors, thematic areas and societal challenges, JRC/BRU/2014/J.6/0015/OC, Inception report, Deliverable 1.1: Methodological report, Karlsruhe Fraunhofer Institute for Systems and Innovations Research.

  • Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in biomedical literature. Scientometrics, 85, 257.

    Article  Google Scholar 

  • Garfield, E. (1967). Primordial concepts citation indexing and historio-bibliography. Journal of Library History, 2, 235–249.

    Google Scholar 

  • Graetz, G., & Michaels, G. (2015). Robots at work. Center for Economic Performance Discussion Paper.

  • Griliches, Z. (1990). Patent statistics as economic indicators: A survey. Journal of Economic Literature, 28, 1661–1707.

    Google Scholar 

  • Guyon, I., Boser, B., & Vapnik, V. (1993). Automatic capacity tuning of very large VC-dimension classifiers, advances in neural information processing systems (pp. 147–155). Burlington: Morgan Kaufmann.

    Google Scholar 

  • Halaweh, M. (2013). Emerging technology: What is it? Journal of Technology Management and Innovation, 8(3), 108–115.

    Article  Google Scholar 

  • Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of Economics, 36(1), 16–38.

    Google Scholar 

  • Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2010). A practical guide to support vector classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University.

  • Jaffe, A., Trajtenberg, M., & Henderson, R. (1993). Geographic localization of knowledge spillovers as evidenced by patent citations. The Quarterly Journal of Economics, 108(3), 577–598.

    Article  Google Scholar 

  • Kenekayoro, P., Buckley, K., & Thelwall, M. (2014). Automatic classification of academic web page types. Scientometrics, 101, 1015.

    Article  Google Scholar 

  • Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249–268.

    MathSciNet  MATH  Google Scholar 

  • Lee, W. H. (2008). How to identify emerging research fields using scientometrics: An example in the field of information security. Scientometrics, 76(3), 503–525.

    Article  Google Scholar 

  • Lee, P., Su, H., & Chan, T. (2010). Assessment of ontology-based knowledge network formation by vector-space model. Scientometrics, 85, 689.

    Article  Google Scholar 

  • Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29, 481–497.

    Article  Google Scholar 

  • Li, Y.-R., Wang, L.-H., & Hong, C.-F. (2009). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, 36, 5200–5204.

    Article  Google Scholar 

  • Liu, S., & Shyu, J. (1997). Strategic planning for technology development with patent analysis. International Journal of Technology Management, 13, 661–680.

    Article  Google Scholar 

  • Manning, C., Raghavan, P. & Schütze, H. (2008). Introduction to Information retrieval. online, Accessed Oct 15 2014. URL: http://www-nlp.stanford.edu/IR-book/.

  • McKeown, K. et al. (2016). Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology, 67(11), 2684–2696.

    Article  Google Scholar 

  • Mogoutov, A., & Kahane, B. (2007). Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36, 893–903.

    Article  Google Scholar 

  • Noyons, E., Buter, R., Raan, A., Schmoch, U., Heinze, T., S., H. & Rangnow, R. (2003). Mapping excellence in science and technology across Europe. Part 2: nanoscience and nanotechnology. Draft Report EC-PPN CT2002-0001 to the European Commission. Leiden University Centre for Science and Technology Studies/Karlsruhe Fraunhofer Institute for Systems and Innovations Research.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Porter, A., Youtie, J., & Shapira, P. (2008). Nanotechnology publications and citations by leading countries and blocs. Journal of Nanoparticle Research, 10, 981–986.

    Article  Google Scholar 

  • Ruffaldi, E., Sani, E. & Bergamasco, M. (2010). Visualizing perspectives and trends in robotics based on patent mining In IEEE International Conference on Robotics and Automation, Anchorage, Alaska.

  • Schmoch, U. (2008). Concept of a technology classification for country comparisons. In Final Report to the World Intellectual Property Organization (WIPO), Karlsruhe Fraunhofer Institute for Systems and Innovations Research.

  • Srinivasan, R. (2008). Sources, characteristics and effects of emerging technologies: Research opportunities in innovation. Industrial Marketing Management, 37, 633–640.

    Article  Google Scholar 

  • Stahl, B. (2011). What does the future hold? A critical view of emerging information and communication technologies and their social consequences, vol. 356 of Researching the Future in Information Systems, IFIP advances in information and communication technology. Berlin: Springer.

    Google Scholar 

  • Thompson, P. (2006). Patent citations and the geography of knowledge spillovers: Evidence from inventor- and examiner-added citations. The Review of Economics and Statistics, 88(2), 383–388.

    Article  Google Scholar 

  • Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43, 1216–1247.

    Article  Google Scholar 

  • Van de Velde, E., Debergh, P., Verbeek, A., Rammer, C., Cremers, K., Schliessler, P., et al. (2013). Production and trade in KETs-based products: The EU position in global value chains and specialization patterns within the EU. Brussels: European Commission, DG Enterprise.

    Google Scholar 

  • Wolpert, D., & Macready, W. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

    Article  Google Scholar 

  • Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. Journal of High-Technology Management Research, 15, 37–50.

    Article  Google Scholar 

Download references

Acknowledgements

We are thankful to the High Performance Humanoid Technologies (H2T) group from the Institute for Anthropomatics and Robotics at Karlsruhe Institute of Technology in Germany, in particular to Prof. Dr. Tamim Asfour and Prof. Dr. Gabriel Lopes from Delft Center for Systems and Control/Robotics Institute at TU Delft in the Netherlands for their support and advices. Moreover, we wish to thank the participants in the 15th EBES Conference in Lisbon, 6th annual S.NET meeting in Karlsruhe as well as 5th Global TechMining Conference in Atlanta for their valuable comments and suggestions that have led to the improvement of this article. This work is supported by the project “Value Creation & Innovation Processes in and beyond Technology” of the Karlsruhe School of Services.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Korzinov.

Appendix

Appendix

See Tables 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16.

Table 5 Important robot definitions according to ISO 8373:2012
Table 6 SR application examples of personal/domestic use according to IFR
Table 7 SR application examples of professional/commercial use according to (IFR, 2014)
Table 8 Exemplary extract of robot patents under consideration with titles, publication numbers (given by the patent authority issuing the patent), filing dates (on which the application was received), and expert classification decisions
Table 9 A fragment of a modular SQL Boolean term search approach for PATSTAT, defined through specific word construction for IFR application field CLEANING SR, augmented by IPC class codes
Table 10 List of the 1206 variables used in the SVM for classification: Part 1/4 of the 726 unigrams
Table 11 List of the 1206 variables used in the SVM for classification: Part 2/4 of the 726 unigrams
Table 12 List of the 1206 variables used in the SVM for classification: Part 3/4 of the 726 unigrams
Table 13 List of the 1206 variables used in the SVM for classification: Part 4/4 of the 726 unigrams
Table 14 List of the 1206 variables used in the SVM for classification: Part 1/2 of the 370 bigrams
Table 15 List of the 1206 variables used in the SVM for classification: Part 2/2 of the 370 bigrams
Table 16 List of the 1206 variables used in the SVM for classification: All 110 trigrams

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kreuchauff, F., Korzinov, V. A patent search strategy based on machine learning for the emerging field of service robotics. Scientometrics 111, 743–772 (2017). https://doi.org/10.1007/s11192-017-2268-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2268-3

Keywords

JEL Classification

Navigation