Skip to main content
Log in

A complement to lexical query’s search-term selection for emerging technologies: the case of “big data”

Scientometrics Aims and scope Submit manuscript


Obtaining document sets to study emerging technologies is challenging. Researchers studying emerging technologies use lexical queries, e.g., core, expanded and evolutionary, to face this challenge. Creating lexical queries requires the selection of search-terms. Manual, automatic and semi-automatic techniques can be implemented to select search-terms. The current reported processes to select search-terms can be complemented by attending two issues. One is the lack of a systematic process for the selection of search-terms from previous literature, and the second is the evaluation of candidate search-terms’ document retrieval interdependence. We propose two steps to complement the process of selecting search-terms to create lexical queries to study emerging technologies. The first step consists of a process to systematically select search-terms from previous literature. The second is an evaluation of search-terms’ document retrieval interdependence, and for its evaluation, we propose the Significance of Interception Ratio (SIR). We tested our proposed steps setting as a reference the big-data lexical query proposed by Huang et al. (Scientometrics 105:2005–2022, 2015). The tests results show that the proposed steps can complement the current automatic methods to select search-terms. The first step increased around a 24% the recall of the reference lexical query. The increase in the recall was possible because of the addition of 37 additional search-terms and the elimination of three search-terms from the reference lexical query. In the second step (application of the SIR), five search-terms from the reference lexical query were optimized, showing a slight complementary ability when selecting search-terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3


  1. A in detail description of this issue can be found in Carpineto and Romano (2012), under the name of the vocabulary issue.


Download references


Santiago Ruiz-Navas, thanks Colciencias for providing him with the scholarship that supports his Ph.D. studies and to the kind reviewers for their time, valuable suggestions and comments.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Santiago Ruiz-Navas.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruiz-Navas, S., Miyazaki, K. A complement to lexical query’s search-term selection for emerging technologies: the case of “big data”. Scientometrics 117, 141–162 (2018).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classification

JEL Classification