Skip to main content
Log in

A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’

  • Published:
Scientometrics Aims and scope Submit manuscript


Bibliometric and “tech mining” studies depend on a crucial foundation—the search strategy used to retrieve relevant research publication records. Database searches for emerging technologies can be problematic in many respects, for example the rapid evolution of terminology, the use of common phraseology, or the extent of “legacy technology” terminology. Searching on such legacy terms may or may not pick up R&D pertaining to the emerging technology of interest. A challenge is to assess the relevance of legacy terminology in building an effective search model. Common-usage phraseology additionally confounds certain domains in which broader managerial, public interest, or other considerations are prominent. In contrast, searching for highly technical topics is relatively straightforward. In setting forth to analyze “Big Data,” we confront all three challenges—emerging terminology, common usage phrasing, and intersecting legacy technologies. In response, we have devised a systematic methodology to help identify research relating to Big Data. This methodology uses complementary search approaches, starting with a Boolean search model and subsequently employs contingency term sets to further refine the selection. The four search approaches considered are: (1) core lexical query, (2) expanded lexical query, (3) specialized journal search, and (4) cited reference analysis. Of special note here is the use of a “Hit-Ratio” that helps distinguish Big Data elements from less relevant legacy technology terms. We believe that such a systematic search development positions us to do meaningful analyses of Big Data research patterns, connections, and trajectories. Moreover, we suggest that such a systematic search approach can help formulate more replicable searches with high recall and satisfactory precision for other emerging technology studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. WoS Topical Search (in the Advanced Search feature) captures occurrences in titles, abstracts, authors’ keywords, and Keywords Plus fields. A narrower option considered is provided to search just within titles.


  • Arora, S. K., Porter, A. L., Youtie, J., & Shapira, P. (2013). Capturing new developments in an emerging technology: An updated search strategy for identifying nanotechnology research outputs. Scientometrics, 95(1), 351–370.

    Article  Google Scholar 

  • Campbell, P. (2008). Editorial on special issue on big data: Community cleverness required. Nature, 455(7209), 1.

    Article  Google Scholar 

  • Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation.

    Google Scholar 

  • Danowski, J. A., & Park, H. W. (2014). Arab spring effects on meanings for Islamist web terms and on web hyperlink networks among Muslim-majority nations: A naturalistic field experiment. Journal of Contemporary Eastern Asia, 13(2), 15–39.

    Article  Google Scholar 

  • Garfield, E., Paris, S., & Stock, W. G. (2006). HistCiteTM: A software tool for informetric analysis of citation linkage. Information Wissenschaft und Praxis, 57(8), 391–400.

    Google Scholar 

  • Gorjiara, T., & Baldock, C. (2014). Nanoscience and nanotechnology research publications: A comparison between Australia and the rest of the world. Scientometrics, 100(1), 121–148.

    Article  Google Scholar 

  • Guo, Y., Huang, L., & Porter, A. L. (2010). The research profiling method applied to nano-enhanced, thin-film solar cells. R&d Management, 40(2), 195–208.

    Article  Google Scholar 

  • Guo, Y., Zhou, X., Porter, A. L., & Robinson, D. K. R. (2015). Tech mining to generate indicators of future national technological competitiveness: Nano-enhanced Drug Delivery (NEDD) in the US and China. Technological Forecasting and Social Change, 97, 168–180.

    Article  Google Scholar 

  • Halevi, G., & Moed, H. (2012). The evolution of big data as a research and scientific topic: Overview of the literature. Research Trends, 30(1), 3–6.

    Google Scholar 

  • Hsu, C. L., Park, S. J., & Park, H. W. (2013). Political discourse among key Twitter users: The case of Sejong city in South Korea. Journal of Contemporary Eastern Asia, 12(1), 65–79.

    Article  Google Scholar 

  • Huang, C., Notten, A., & Rasters, N. (2011). Nanoscience and technology publications and patents: A review of social science studies and search strategies. The Journal of Technology Transfer, 36(2), 145–172.

    Article  Google Scholar 

  • Kable, A. K., Pich, J., & Maslin-Prothero, S. E. (2012). A structured approach to documenting a search strategy for publication: A 12 step guideline for authors. Nurse Education Today, 32(8), 878–886.

    Article  Google Scholar 

  • Labrinidis, A., & Jagadish, H. V. (2012). Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 5(12), 2032–2033.

    Article  Google Scholar 

  • Leydesdorff, L., & Zhou, P. (2007). Nanotechnology as a field of science: Its delineation in terms of journals and patents. Scientometrics, 70(3), 693–713.

    Article  Google Scholar 

  • Manyika, J., Chiu, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

  • McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review, 90, 60–67.

    Google Scholar 

  • Miller, H. E. (2013). Big-data in cloud computing: A taxonomy of risks. Information Research, 18(1).

  • Mogoutov, A., & Kahane, B. (2007). Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36(6), 893–903.

    Article  Google Scholar 

  • Park, H. W., & Leydesdorff, L. (2013). Decomposing social and semantic networks in emerging “big data” research. Journal of Informetrics, 7(3), 756–765.

    Article  Google Scholar 

  • Porter, A. L., & Cunningham, S. W. (2005). Tech mining: Exploiting new technologies for competitive advantage. New York: Wiley. [Chinese edition, Tsinghua University Press, 2012].

    Google Scholar 

  • Porter, A. L., Huang, Y., Schuehle, J., & Youtie, J. (2015). MetaData: BigData research evolving across disciplines, players, and topics. New York (July): IEEE BigData Congress.

    Google Scholar 

  • Porter, A. L., Youtie, J., Shapira, P., & Schoeneck, D. J. (2008). Refining search terms for nanotechnology. Journal of Nanoparticle Research, 10(5), 715–728.

    Article  Google Scholar 

  • Robinson, D. K., Huang, L., Guo, Y., & Porter, A. L. (2013). Forecasting Innovation Pathways (FIP) for new and emerging science and technologies. Technological Forecasting and Social Change, 80(2), 267–285.

    Article  Google Scholar 

  • Rousseau, R. (2012). A view on big data and its relation to informetrics. Chinese Journal of Library and Information Science, 5(3), 12–26.

    Google Scholar 

  • Thomas, D. G., Pappu, R. V., & Baker, N. A. (2011). NanoParticle Ontology for cancer nanotechnology research. Journal of Biomedical Informatics, 44(1), 59–74.

    Article  Google Scholar 

  • Wang, X., Li, R., Ren, S., Zhu, D., Huang, M., & Qiu, P. (2014). Collaboration network and pattern analysis: Case study of dye-sensitized solar cells. Scientometrics, 98(3), 1745–1762.

    Article  Google Scholar 

  • Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Mmanagement, 42(6), 1513–1531.

    Article  Google Scholar 

  • Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? Journal of the American Society for Information Science and Technology, 62(1), 19–39.

    Article  Google Scholar 

  • Zucker, L. G., Darby, M. R., Furner, J., Liu, R. C., & Ma, H. (2007). Minerva unbound: Knowledge stocks, knowledge flows and new knowledge production. Research Policy, 36(6), 850–863.

    Article  Google Scholar 

Download references


We acknowledge support from the US National Science Foundation (Award #1527370—“Forecasting Innovation Pathways of Big Data & Analytics”). Besides, we are grateful for the scholarship provided by the China Scholarship Council (CSC Student ID 201406030005). The findings and observations contained in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation and China Scholarship Council.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alan L. Porter.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Schuehle, J., Porter, A.L. et al. A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’. Scientometrics 105, 2005–2022 (2015).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: