Abstract
Obtaining document sets to study emerging technologies is challenging. Researchers studying emerging technologies use lexical queries, e.g., core, expanded and evolutionary, to face this challenge. Creating lexical queries requires the selection of search-terms. Manual, automatic and semi-automatic techniques can be implemented to select search-terms. The current reported processes to select search-terms can be complemented by attending two issues. One is the lack of a systematic process for the selection of search-terms from previous literature, and the second is the evaluation of candidate search-terms’ document retrieval interdependence. We propose two steps to complement the process of selecting search-terms to create lexical queries to study emerging technologies. The first step consists of a process to systematically select search-terms from previous literature. The second is an evaluation of search-terms’ document retrieval interdependence, and for its evaluation, we propose the Significance of Interception Ratio (SIR). We tested our proposed steps setting as a reference the big-data lexical query proposed by Huang et al. (Scientometrics 105:2005–2022, 2015). The tests results show that the proposed steps can complement the current automatic methods to select search-terms. The first step increased around a 24% the recall of the reference lexical query. The increase in the recall was possible because of the addition of 37 additional search-terms and the elimination of three search-terms from the reference lexical query. In the second step (application of the SIR), five search-terms from the reference lexical query were optimized, showing a slight complementary ability when selecting search-terms.
Similar content being viewed by others
Notes
A in detail description of this issue can be found in Carpineto and Romano (2012), under the name of the vocabulary issue.
References
Arora, S. K., Porter, A. L., Youtie, J., & Shapira, P. (2013). Capturing new developments in an emerging technology: An updated search strategy for identifying nanotechnology research outputs. Scientometrics, 95(1), 351–370. https://doi.org/10.1007/s11192-012-0903-6.
Avila-robinson, A., & Miyazaki, K. (2011). Conceptualization and operationalization of emerging technologies: A complementing approach. In 2011 Proceedings of PICMET Technology management in the energy smart world (PICMET) (pp. 1681–1692).
Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1–50. https://doi.org/10.1145/2071389.2071390.
Cattelan Nobre, G., Elaine Tavares, B., & Tavares, E. (2017). Scientific literature analysis on big data and internet of things applications on circular economy: a bibliometric study. Scientometrics, 111(1), 463–492. https://doi.org/10.1007/s11192-017-2281-6.
Cruse, L. A. (1988). D. A. Cruse, Lexical semantics. Cambridge: Cambridge University Press. 1986. Pp. xlv + 310. Journal of Linguistics, 24(01), 203. https://doi.org/10.1017/s0022226700011622.
Gantz, B. J., & Reinsel, D. (2011). Extracting value from Chaos State of the Universe: An executive summary iView “Extracting Value from Chaos,”, 1142, 1–12.
Groen, A. J., & Walsh, S. T. (2013). Introduction to the field of emerging technology management. Creativity and Innovation Management, 22(1), 1–5. https://doi.org/10.1111/caim.12019.
Halevi, G. (2012). Special issue on big data. Research Trends, 30, 1–40. https://doi.org/10.1080/07350015.2016.1197681.
Huang, C., Notten, A., & Rasters, N. (2011). Nanoscience and technology publications and patents: A review of social science studies and search strategies. Journal of Technology Transfer, 36(2), 145–172. https://doi.org/10.1007/s10961-009-9149-8.
Huang, Y., Porter, A., Zhang, Y., Guo, Y., & Zhu, D. (2017). Validating the earlier analyses and forecasting on dye-sensitized solar cells (DSSCs). In 2017 Portland international conference on management of engineering and technology, PICMET 2017 (pp. 1–9). https://doi.org/10.23919/picmet.2017.8125293.
Huang, Y., Schuehle, J., Porter, A. L., & Youtie, J. (2015). A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’. Scientometrics, 105(3), 2005–2022. https://doi.org/10.1007/s11192-015-1638-y.
Kostoff, R. N., del Río, J. A., Cortés, H. D., Smith, C., Smith, A., Wagner, C., et al. (2007). Clustering methodologies for identifying country core competencies. Journal of Information Science, 33(1), 21–40. https://doi.org/10.1177/0165551506067124.
Laurens, P., Zitt, M., & Bassecoulard, E. (2010). Delineation of the genomics field by hybrid citation-lexical methods: Interaction with experts and validation process. Scientometrics, 82(3), 647–662. https://doi.org/10.1007/s11192-010-0177-9.
Maghrebi, M., Abbasi, A., Amiri, S., Monsefi, R., & Harati, A. (2011). A collective and abridged lexical query for delineation of nanotechnology publications. Scientometrics, 86(1), 15–25. https://doi.org/10.1007/s11192-010-0304-7.
Manning, C. D., Raghavan, P., & Schutze, H. (2009). An introduction to information retrieval. Online edition (c) 2009 Cambridge UP. Cambridge UP. https://doi.org/10.1109/lpt.2009.2020494.
Moghadasi, S. I., Ravana, S. D., & Raman, S. N. (2013). Low-cost evaluation techniques for information retrieval systems: A review. Journal of Informetrics, 7(2), 301–312. https://doi.org/10.1016/j.joi.2012.12.001.
Mogoutov, A., & Kahane, B. (2007). Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36(6), 893–903. https://doi.org/10.1016/j.respol.2007.02.005.
Navas-Rios, M. E., Londoño, E., Ruiz-Navas, S., & Ruiz-Navas, D. (2012). State of the art of emerging technologies in Colombia. In 2012 Proceedings of PICMET’12: Technology management for emerging technologies (pp. 358–367). IEEE.
OECD. (2013). Exploring data-driven innovation as a new source of growth. OECD Digital Economy Papers, 222, 1–44. https://doi.org/10.1787/5k47zw3fcp43-en.
Park, H. W., & Leydesdorff, L. (2013). Decomposing social and semantic networks in emerging “big data” research. Journal of Informetrics, 7(3), 756–765. https://doi.org/10.1016/j.joi.2013.05.004.
Porter, A. L., Huang, Y., Schuehle, J., & Youtie, J. (2015). Meta data: Big data research evolving across disciplines, players, and topics. In Proceedings—2015 IEEE international congress on big data, BigData congress 2015 (pp. 262–267). https://doi.org/10.1109/bigdatacongress.2015.44.
Porter, A. L., Youtie, J., Shapira, P., & Schoeneck, D. J. (2008). Refining search terms for nanotechnology. Journal of Nanoparticle Research, 10(5), 715–728. https://doi.org/10.1007/s11051-007-9266-y.
Rotolo, D., Hicks, D., & Martin, B. R. (2015). What is an emerging technology? Research Policy, 44(10), 1827–1843. https://doi.org/10.1016/j.respol.2015.06.006.
Rotolo, D., Rafols, I., Hopkins, M. M., & Leydesdorff, L. (2017). Strategic intelligence on emerging technologies: Scientometric overlay mapping. Journal of the Association for Information Science and Technology, 68(1), 214–233. https://doi.org/10.1002/asi.23631.
Rousseau, R. (2012). A view on big data and its relation to Informetrics. CJLIS, 5(3), 12–26. https://doi.org/10.1007/s13398-014-0173-7.2.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston, MA: Addision-Wesley Publishing Company Inc.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval (1st ed.). NY: McGraw-Hill College.
Xie, Z., & Miyazaki, K. (2013). Evaluating the effectiveness of keyword search strategy for patent identification. World Patent Information, 35(1), 20–30. https://doi.org/10.1016/j.wpi.2012.10.005.
Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In SIGIR’96: Proceedings of ACM SIGIR conference (Vol. 19, p. 4). https://doi.org/10.1145/243199.243202.
Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531. https://doi.org/10.1016/j.ipm.2006.03.016.
Acknowledgements
Santiago Ruiz-Navas, thanks Colciencias for providing him with the scholarship that supports his Ph.D. studies and to the kind reviewers for their time, valuable suggestions and comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ruiz-Navas, S., Miyazaki, K. A complement to lexical query’s search-term selection for emerging technologies: the case of “big data”. Scientometrics 117, 141–162 (2018). https://doi.org/10.1007/s11192-018-2857-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-018-2857-9