Abstract
Emerging technologies are often not part of any official industry, patent or trademark classification systems. Thus, delineating boundaries to measure their early development stage is a nontrivial task. This paper is aimed to present a methodology to automatically classify patents concerning service robots. We introduce a synergy of a traditional technology identification process, namely keyword extraction and verification by an expert community, with a machine learning algorithm. The result is a novel possibility to allocate patents which (1) reduces expert bias regarding vested interests on lexical query methods, (2) avoids problems with citation approaches, and (3) facilitates evolutionary changes. Based upon a small core set of worldwide service robotics patent applications, we derive apt n-gram frequency vectors and train a support vector machine, relying only on titles, abstracts, and IPC categorization of each document. Altering the utilized Kernel functions and respective parameters, we reach a recall level of 83% and precision level of 85%.
Similar content being viewed by others
Notes
Beyond its potential productivity effects, SR is believed to induce visible changes in employment structures (Autor et al. 2003; Frey and Osborne 2017; Graetz and Michaels 2015). Its potential to change organization processes in firms as well as everyday life of people is already visible in the diffusion of semi-autonomous physical systems out of industrial fabrication and into service economies.
Cozzens et al. (2010) argue that bibliometric data, in particular proposals and publications, seem to be most useful for monitoring the technological horizon. Patent analysis on the other hand, besides being long known to be valuable for competitive and trend analysis (Abraham and Morita 2001; Liu and Shyu 1997), has become sophisticated to even predict emerging fields (Erdi et al. 2013).
See for example the annual reports by the German Patent and Trademark Office at http://www.dpma.de/english/service/publications/annualreports/index.html.
Only in 2011, a second subclass, B82Y, focusing on specific uses or applications of nanostructures was introduced for IPC and the Cooperative Patent Classification (CPC). Previously, related nano patent documents could only be identified if they were classified via the European Classification System (ECLA) with the subclass Y01. ECLA Y-codes have been created as an extension of the original classification system, to extend classification capabilities to new (emerging) technology areas of special interest.
With respect to scientific publications, another common strategy is to identify core journals. All articles within those journals are then considered relevant. For patents though, this search strategy is obviously not feasible, which is why we do not deepen it any further.
Such a search strategy is called evolutionary, if subsequent researchers may build upon existing query structures by progressively incorporating terms that better specify the technology and widen its scope (Mogoutov and Kahane 2007).
For the instance of nanotechnology, to which we refer throughout, Arora et al. (2014) measure the growth in nano-prefixed terms in scholarly publications and find that the percentage of articles using a nano-prefixed term has increased from less than 10% in the early 1990s to almost 80% by 2010.
This approach naturally harbors the risk of including generic articles of any scientific field that somehow happen to be cited in a technologically unrelated context. Bassecoulard et al. (2007) therefore incorporate a statistical relevance limit relying on the specificity of citations.
Consequently, the adequate data sources for this identification process are the same that comprise the targets of subsequent analyses which might give cause for some criticism.
For the technology under consideration in this paper, it is important to note that SR patents are very much different from business process and service patents and there is little if no overlap. SR patents are much more 'technological' in the sense that they contain information about how a robot is constructed and for which environment its functionalities are intended. Their content is thus close to IR patents, making them hard to disentangle from each other. In contrast, business and service patents contain organizational innovations to a large extent.
This database encompasses raw data from about 60 million patent applications and 30 million granted patents, utility models, PCT applications, etc. filed at more than 100 patent authorities worldwide.
Manipulators; Chambers Provided With Manipulation Devices. See http://www.wipo.int/classifications/ipc/en/.
According to the USPTO, most of the manipulators classified in B25 J are industrial robots. See http://www.uspto.gov/web/patents/classification/cpc/html/defB25J.html.
We have included one example of such a subquery in the appendix. All other queries are available upon request.
We also tried to incorporate another step (6), which added IPC dummy variables to indicate class belongings. These additional attributes were later abandoned by the following feature selection process, which suggests that these IPC class belongings are not significant for the categorization at hand.
There exist some multiclass SVM approaches. See Duan and Keerthi (2005) for a review.
Since there is no possibility to determine in advance which Kernel function should be used, the choice of the depicted functions was mostly motivated by their popularity in classifiers and availability within the software package used.
We do not discuss the exact implementation of the support vector machine algorithm in the python scikit-learn tool. All necessary materials can be found in open access libraries following the reference provided above.
We even included IPC classes at an early stage of development, but did not find any of these classifications to become part of the support vectors. They turned out to be irrelevant to the discrimination procedure and were thus removed during the feature selection process.
Within PATSTAT, for instance, more than 90% of the listed patent applications are followed by less than three forward citations, 74% do not show any at all.
SR may be seen as such an umbrella term as well—or as a system of technologies, i.e., combining many technologies in the way described by Arthur (2009): In that sense, SR is becoming more diverse and complex with evolving purposes organized to meet human needs.
References
Abraham, B., & Morita, S. (2001). Innovation assessment through patent analysis. Technovation, 21, 245–252.
Ali, S. & Smith-Miles, K. A. (2006). A meta-learning approach to automatic kernel selection for support vector machines, Neurocomputing 70 (123), 173–186. Neural networks selected papers from the 7th Brazilian Symposium on Neural Networks (SBRN 04).
Arora, S. K., Porter, A. L., Youtie, J., & Shapira, P. (2013). Capturing new developments in an emerging technology: An updated search strategy for identifying nanotechnology research outputs. Scientometrics, 95, 351–370.
Arora, S. K., Youtie, J., Carley, S., Porter, A. L., & Shapira, P. (2014). Measuring the development of a common scientific lexicon in nanotechnology. Journal of Nanoparticle Research, 16(2194), 1–11.
Arthur, B. (1999). Complexity and the economy. Science, 284(5411), 107–109.
Arthur, B. (2009). The nature of technology—what it is and how it evolves. New York: Free Press. (Reprint Ed. January 11, 2011).
Arthur, B., & Polak, W. (2006). The evolution of technology in a simple computer model. Complexity, 11(5), 23–31.
Autor, D. H., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. The Quarterly Journal of Economics, 118(4), 1279–1333.
Bassecoulard, E., Lelu, A., & Zitt, M. (2007). Mapping nanosciences by citation flows: a preliminary analysis. Scientometrics, 70, 859–880.
Boser, B., Guyon, I. & Vapnik, V. (Eds.). (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory—COLT’92 (p. 144).
Bresnahan, T. F. (2010). General purpose technologies. In B. Hall & N. Rosenberg (Eds.), Handbook of economics of innovation (2nd ed., pp. 763–791). Amsterdam: Elsevier.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cozzens, S., Gatchair, S., Kang, J., Kim, K.-S., Lee, H. J., Ordónez, G., et al. (2010). Emerging technologies: Quantitative identification and measurement. Technology Analysis & Strategic Management, 22(3), 361–376.
Duan, K.-B., & Keerthi, S. (2005). Which is the best multiclass SVM method? An empirical study. In N. Oza, R. Polikar, J. Kittler, & F. Roli (Eds.), Multiple classifier systems (Vol. 3541, pp. 278–285)., Lecture notes in computer science Berlin: Springer.
Erdi, P., Makovi, K., Smomogyvári, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95, 225–242.
Fischer, M., Scherngell, T., & Jansenberger, E. (2009). Geographic localisation of knowledge spillovers: Evidence from high-tech patent citations in Europe. Annals of Regional Science, 43, 839–858.
Frey, C. B., & Osborne, M. A. (2017). The future of employment: How susceptible are jobs to computerization? Technological Forecasting and Social Change, 114, 254–280.
Frietsch, R. (2015). Collection and analysis of private R&D investment and patent data in different sectors, thematic areas and societal challenges, JRC/BRU/2014/J.6/0015/OC, Inception report, Deliverable 1.1: Methodological report, Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in biomedical literature. Scientometrics, 85, 257.
Garfield, E. (1967). Primordial concepts citation indexing and historio-bibliography. Journal of Library History, 2, 235–249.
Graetz, G., & Michaels, G. (2015). Robots at work. Center for Economic Performance Discussion Paper.
Griliches, Z. (1990). Patent statistics as economic indicators: A survey. Journal of Economic Literature, 28, 1661–1707.
Guyon, I., Boser, B., & Vapnik, V. (1993). Automatic capacity tuning of very large VC-dimension classifiers, advances in neural information processing systems (pp. 147–155). Burlington: Morgan Kaufmann.
Halaweh, M. (2013). Emerging technology: What is it? Journal of Technology Management and Innovation, 8(3), 108–115.
Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of Economics, 36(1), 16–38.
Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2010). A practical guide to support vector classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University.
Jaffe, A., Trajtenberg, M., & Henderson, R. (1993). Geographic localization of knowledge spillovers as evidenced by patent citations. The Quarterly Journal of Economics, 108(3), 577–598.
Kenekayoro, P., Buckley, K., & Thelwall, M. (2014). Automatic classification of academic web page types. Scientometrics, 101, 1015.
Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249–268.
Lee, W. H. (2008). How to identify emerging research fields using scientometrics: An example in the field of information security. Scientometrics, 76(3), 503–525.
Lee, P., Su, H., & Chan, T. (2010). Assessment of ontology-based knowledge network formation by vector-space model. Scientometrics, 85, 689.
Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29, 481–497.
Li, Y.-R., Wang, L.-H., & Hong, C.-F. (2009). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, 36, 5200–5204.
Liu, S., & Shyu, J. (1997). Strategic planning for technology development with patent analysis. International Journal of Technology Management, 13, 661–680.
Manning, C., Raghavan, P. & Schütze, H. (2008). Introduction to Information retrieval. online, Accessed Oct 15 2014. URL: http://www-nlp.stanford.edu/IR-book/.
McKeown, K. et al. (2016). Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology, 67(11), 2684–2696.
Mogoutov, A., & Kahane, B. (2007). Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36, 893–903.
Noyons, E., Buter, R., Raan, A., Schmoch, U., Heinze, T., S., H. & Rangnow, R. (2003). Mapping excellence in science and technology across Europe. Part 2: nanoscience and nanotechnology. Draft Report EC-PPN CT2002-0001 to the European Commission. Leiden University Centre for Science and Technology Studies/Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Porter, A., Youtie, J., & Shapira, P. (2008). Nanotechnology publications and citations by leading countries and blocs. Journal of Nanoparticle Research, 10, 981–986.
Ruffaldi, E., Sani, E. & Bergamasco, M. (2010). Visualizing perspectives and trends in robotics based on patent mining In IEEE International Conference on Robotics and Automation, Anchorage, Alaska.
Schmoch, U. (2008). Concept of a technology classification for country comparisons. In Final Report to the World Intellectual Property Organization (WIPO), Karlsruhe Fraunhofer Institute for Systems and Innovations Research.
Srinivasan, R. (2008). Sources, characteristics and effects of emerging technologies: Research opportunities in innovation. Industrial Marketing Management, 37, 633–640.
Stahl, B. (2011). What does the future hold? A critical view of emerging information and communication technologies and their social consequences, vol. 356 of Researching the Future in Information Systems, IFIP advances in information and communication technology. Berlin: Springer.
Thompson, P. (2006). Patent citations and the geography of knowledge spillovers: Evidence from inventor- and examiner-added citations. The Review of Economics and Statistics, 88(2), 383–388.
Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43, 1216–1247.
Van de Velde, E., Debergh, P., Verbeek, A., Rammer, C., Cremers, K., Schliessler, P., et al. (2013). Production and trade in KETs-based products: The EU position in global value chains and specialization patterns within the EU. Brussels: European Commission, DG Enterprise.
Wolpert, D., & Macready, W. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. Journal of High-Technology Management Research, 15, 37–50.
Acknowledgements
We are thankful to the High Performance Humanoid Technologies (H2T) group from the Institute for Anthropomatics and Robotics at Karlsruhe Institute of Technology in Germany, in particular to Prof. Dr. Tamim Asfour and Prof. Dr. Gabriel Lopes from Delft Center for Systems and Control/Robotics Institute at TU Delft in the Netherlands for their support and advices. Moreover, we wish to thank the participants in the 15th EBES Conference in Lisbon, 6th annual S.NET meeting in Karlsruhe as well as 5th Global TechMining Conference in Atlanta for their valuable comments and suggestions that have led to the improvement of this article. This work is supported by the project “Value Creation & Innovation Processes in and beyond Technology” of the Karlsruhe School of Services.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kreuchauff, F., Korzinov, V. A patent search strategy based on machine learning for the emerging field of service robotics. Scientometrics 111, 743–772 (2017). https://doi.org/10.1007/s11192-017-2268-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2268-3