1 Introduction

One of the first steps when conducting a rigorous literature review is the selection of an appropriate literature sample using a rigorous, systematic search approach (Levy and Ellis 2006). Carelessness during the search process can lead to an outdated, scattered, and irrelevant literature sample, a shortcoming that cannot be compensated for in subsequent review steps (Levy and Ellis 2006). However, applying a systematic search approach is often a complex and time-consuming task, especially for students and novice researchers (Fink 2014; vom Brocke et al. 2015). One reason for this issue seems to be a lack of innovation from a systems perspective. While the amount of available information is steadily growing (Hilbert and López 2011), and users’ search behaviors have adapted accordingly (Spezi 2016; Wu and Chen 2014), little has changed over the past decade in regard to technology that specifically assists systematic literature searches. A systematic literature search is a task that still involves a considerable amount of manual labor (Boell and Cecez-Kecmanovic 2015a).

The common starting point for systematic literature searches consists of curated literature databases, such as the IEEEXplore Digital Library or the AIS Electronic Library. However, their limited coverage renders a cross-database search involving several databases mandatory in most cases. Because each database has its own limitations and peculiarities (e.g., available features, search fields, and query syntax), the necessary knowledge and effort to prepare search requests and manage results, as well as the risk of making mistakes, are multiplied (Fink 2014; vom Brocke et al. 2015). This increased complexity is an even larger issue in interdisciplinary research areas, such as the information systems (IS) field, in which scientific contributions are published in a wide variety of outlets (e.g., journals and conference proceedings), which are dispersed over numerous databases (Barnes 2005; vom Brocke et al. 2015). More modern search systems attempt to address this problem by consolidating and simplifying access to research contributions, such as academic web search engines or discovery services (e.g., Google Scholar or EBSCO Discovery). However, these systems have been found to be of little use when following a systematic approach due to their lack of transparency, oversimplified interfaces, and unreliable search results (e.g., Asher et al. 2013; Lewandowski 2010).

From a research perspective, there is little knowledge that could inform the design of innovative systematic search systems. The research on the literature searches and review approaches has discussed search concepts primarily in light of their specific approaches (e.g., Webster and Watson 2002; Wolfswinkel et al. 2013). A generalizable characterization of what defines a systematic literature search has been missing. It is therefore not surprising that, apart from research evaluating existing solutions (e.g., Boeker et al. 2013; Giustini and Kamel Boulos 2013), research has been scarce on the actual design of systems for the purpose of facilitating systematic searches that enable comprehensive and objective literature overviews (Sturm and Sunyaev 2017b). A better understanding of the design and effects of systematic literature search systems (SLSS) could provide new design knowledge on this class of systems, knowledge on systematic search processes, insights into why existing systems fail to sufficiently aid reviewers (i.e., researchers who conduct literature reviews), and guidance for the construction of innovative information systems that improve the quality and efficiency of systematic literature searches and reviews.

We therefore set out to address this gap by answering the following research question: How can an SLSS be designed that effectively facilitates systematic literature searches? To approach this question, we use the design science research (DSR) paradigm (Gregor and Hevner 2013; Hevner et al. 2004). Our research method consists of multiple design cycles comprising artifact development, evaluation, and refinement. In so doing, we derive SLSS meta-requirements and prescriptive design principles, which provide information on both material properties of SLSS (i.e., form and function) and actions made possible through the use of SLSS (Chandra et al. 2015). The derived design knowledge offers a starting point for future research on SLSS and might spark a discussion on the systematic aspects of literature and information searches that leads to innovative methodological and design contributions. Furthermore, we see high generalizability of the derived design knowledge, expanding its usefulness to other application areas that belong to the more abstract class of problems dealing with systematic and objective information accumulation (e.g., journalistic or forensic searches). In terms of practical contributions, our research demonstrates the implementation of a first usable SLSS that facilitates systematic literature searches in the IS context. Furthermore, the derived design principles could provide guidance for the development of a new generation of innovative systematic search systems and, eventually, improve the quality and efficiency of systematic information accumulation in different contexts.

The article proceeds as follows. The next section provides an overview of related research on systematic literature searches, literature search systems, and information retrieval. Subsequently, we present our research method, followed by a description of the derived meta-requirements and design principles. Section 5 briefly describes the instantiation of the design principles, along with the results of two separate evaluation studies. In the final section, we discuss implications from our research findings and outline future research opportunities.

2 Related Research

2.1 Systematic Literature Reviews and Searches

Most research publications contain literature reviews, to provide either the theoretical foundation for the main study or a research publication on their own (Okoli and Schabram 2010). Literature reviews identify, evaluate, and synthesize prior research on a topic or domain of interest and thereby enable researchers to identify the existing body of knowledge (i.e., what we know), as well as relevant research gaps (i.e., what we do not know) (Fink 2014; Rowe 2014). However, conducting a good and rigorous review is a difficult and complex task (Wolfswinkel et al. 2013). To support reviewers (i.e., researchers conducting a literature review), numerous approaches and guidelines provide conceptual foundations for developing and constructing literature reviews. The proposed methods range from highly systematic approaches (Kitchenham et al. 2009; Okoli and Schabram 2010) to more traditional or narrative reviews (Bandara et al. 2011; Boell and Cecez-Kecmanovic 2014; Levy and Ellis 2006; Webster and Watson 2002; Wolfswinkel et al. 2013). However, the wide range of different approaches makes it difficult to draw a precise link between systematic and narrative reviews (Boell and Cecez-Kecmanovic 2015b; Okoli and Schabram 2010). Furthermore, any high-quality literature review requires to some extent a systematic approach (Fink 2014; Okoli and Schabram 2010; Webster and Watson 2002). Unsystematic reviews tend to be subjective, provide no justification for why certain literature is selected and are often based on a partial examination of the available literature, and their findings can be inaccurate or even false (Fink 2014; Levy and Ellis 2006). In accordance with Okoli and Schabram (2010), we therefore define the adjective “systematic” as the degree to which a review follows a methodical approach.

Systematic literature review guidelines usually address three review steps: (1) identification of relevant literature (input); (2) analysis of findings (processing); and (3) results presentation (output). In this article, we focus on the input stage. Different guidelines provide instructions on how to conduct a rigorous literature search (step 1). For instance, Webster and Watson (2002) suggested starting by identifying key articles on the topic, followed by a backward search (reviewing the citations for the identified articles) and a forward search (reviewing articles citing the identified key articles). Levy and Ellis (2006) recommended including a forward and backward search as well but based on an initial literature list acquired through an extensive keyword-based search in electronic databases. Wolfswinkel et al. (2013) and Boell and Cecez-Kecmanovic (2014) described iterative approaches that alternate between the input and processing stages. New insights into the reviewed topic are used to refine the search process for the next iteration. Which method is best suited for a specific review depends on different aspects, such as the research question, available resources, and topic under review (Boell and Cecez-Kecmanovic 2015b; Okoli and Schabram 2010). There is no optimal recipe for conducting a high-quality literature review or search (Boell and Cecez-Kecmanovic 2014). A review methodology is essentially a tool that must fit the job (Okoli and Schabram 2010). The design knowledge developed in this article supports literature searches independent of the search strategy applied.

2.2 Literature Search Systems in Practice and Research

In practice, different tools exist that facilitate literature searches, in addition to visiting a local library and communicating with peers. Commonly used systems are online literature databases (e.g., IEEEXplore, MEDLINE, or AISeL), discovery services and meta-search engines (e.g., EBSCO Discovery Service or ProQuest’s Summon), digital library catalogues (e.g., ERIC or DPLA), and academic web search engines (e.g., Google Scholar or Microsoft Academic Search). Despite the variety of available systems, conducting a rigorous, systematic literature search remains a challenging task. As mentioned above, literature databases and digital libraries provide only limited coverage due to a narrow topical focus (Asher et al. 2013; Boell and Cecez-Kecmanovic 2014). Querying multiple literature databases is therefore often unavoidable. Since each database interface has its own set of features and rules (e.g., available features, search fields, and query syntax), creating semantically similar queries for multiple databases and merging their results are considered highly complex tasks involving a steep learning curve (Fink 2014; Rowley and Slack 2004; vom Brocke et al. 2015). Hence, it is not surprising that discovery services (i.e., faceted search systems) and meta-search engines, which increase the reach of individual search requests, are gaining popularity among researchers (Pontis et al. 2015; Spezi 2016). These services cover a large body of literature by combining multiple data sources into a single meta-index, and they provide access through a unified search interface. As a result, searches are more efficient and extensive, compared to multiple database searches (Asher et al. 2013; Olson 2007; Wells 2016). However, when considered as the means for more systematic search approaches, discovery services are criticized for their low transparency (e.g., inaccessible title lists), as well as their oversimplified and imprecise search interfaces (e.g., limited advanced search functionality and export restrictions) (Asher et al. 2013; Fagan et al. 2012; Wells 2016). Even more criticized are academic web search engines, i.e., special-purpose search engines crawling the entire Internet for scientific contributions. Despite their high coverage of scientific outlets, which surpasses most indices of individual literature databases (e.g., Bramer et al. 2013; Samadzadeh et al. 2013), academic web search engines are widely criticized for their minimalistic search interfaces, fluctuating and nontransparent search indices, low document quality, and export limitations (e.g., Asher et al. 2013; Boeker et al. 2013; Lewandowski 2010; Wu and Chen 2014). As a result, academic web search engines are ill suited for rigorous literature searches (Boeker et al. 2013; Gehanno et al. 2013; Lewandowski 2010).

Research on systems that specifically help researchers to conduct systematic searches has also been scarce. Most research on the topic evaluates existing systems regarding their fit for different search tasks (Boeker et al. 2013; Bramer et al. 2013; Falagas et al. 2007; Giustini and Kamel Boulos 2013; Samadzadeh et al. 2013) and provides practical guidelines on where to search (Levy and Ellis 2006; Schryen 2015) and how to use existing systems more effectively (Bandara et al. 2011, 2015; Wolfswinkel et al. 2013). Research on the design of literature search systems (i.e., prototype systems and features) comprises retrieval systems with high user interaction (Yuan and Belkin 2010), search systems with faceted or symbiotic interfaces (Atanassova and Bertin 2014), scientific paper recommender systems (Huang et al. 2014; Küçüktunç et al. 2013; Naak et al. 2008), systems to support the synthesis and analysis of research articles (Larsen and Bong 2016), meta-search engines for individual full-text articles (On and Lee 2004; Santos et al. 2010), specialized web crawlers for indexing research papers (He and Hui 2001; Hoff and Mundhenk 2001; McCallum et al. 2000), systems for visualizing citation networks (Chou and Yang 2011), and citation analysis tools for mining academics’ social networks (Chen et al. 2011; Tang et al. 2007). Although these research contributions help us to better understand system designs and retrieval techniques in general, insights into how to design effective systems for the specific purpose of conducting systematic, rigorous searches are not provided.

2.3 Ingwersen’s Cognitive Model of Information Retrieval Interaction

Based on what we have learned thus far, it becomes clear that an effective SLSS design must consider both the technical aspects of the information retrieval (IR) process and reviewers’ search strategies, goals, and behavior. Ingwersen’s (1996) cognitive model of IR interaction might help us to better understand this sociotechnical design perspective. Based on ideas from cognitive psychology, the model identifies interactions between different actors during information search processes, while also integrating system design issues (Ingwersen 1996; Wilson 1999). As depicted in Fig. 1, IR interactions involve communication on a cognitive level among human actors (individual users and social/organizational environment), technology artifacts (IR systems and interfaces), and information objects. In the center of Ingwersen’s model are the users that seek information. The users’ cognitive models comprise, for instance, their current information needs, information behavior, problems, and goals. The cognitive models of technical artifacts (e.g., retrieval techniques and database structures) and information objects (e.g., knowledge representation) are explications of the cognitive models of their creators (i.e., system designers or authors). Similar to the Task-Technology Fit Theory (Goodhue and Thompson 1995) and the Cognitive Fit Theory (Vessey and Galletta 1991), Ingwersen (1996) proposed that fit is an essential condition for effective IR interactions, in this case the fit between the actors’ cognitive models. Inconsistencies between the cognitive models increase the interaction effort, resulting in uncertainty and misunderstandings between actors. For instance, an IR system’s definition of a search task (instantiated as specific search algorithms or interface design elements) might not fit a user’s individual information needs and, thus, either forces the user to adjust his/her information needs or find a sufficient work-around. The cognitive models of users are, therefore, valuable input for successful IR system designs (Ingwersen 1996). However, it is difficult to provide a perfect fit for each individual user, if the IR system targets a large user group. Common factors that influence the cognitive models of all users could therefore provide a better starting point for actual design requirements. As depicted in Fig. 1, one of the major influences (directly or indirectly) is the users’ social and organizational environment. All retrieval interactions occur in the context of a social or organizational environment, which changes the cognitive models of system users, creators of information objects, intermediaries, and systems designers and, consequently, influences all interactions within information search and retrieval processes (Ingwersen 1996). Thus, knowledge about the environment’s (collective) cognitive model (e.g., strategies, goals, tasks, and preferences) is highly relevant for effective search system designs. In the literature search context, the environment usually relates to the academic fields from which the users’ information needs arise (Ingwersen 1982). In regard to SLSS, it is reasonable to assume that the cognitive models of the targeted research fields are of particular relevance, as the results of systematic literature searches are normally intended for later publication. Hence, reviewers should have increased incentives to establish conformity with their social environment to improve communication and increase acceptance of their research. Search strategies, goals, preferences, etc. (i.e., the cognitive model), of targeted research fields could therefore provide valuable knowledge about SLSS meta-requirements. The following section describes how we incorporated this implication into our DSR design.

Fig. 1
figure 1

Cognitive model of information retrieval interaction (Ingwersen 1996)

3 Methodology

Based on the design science research paradigm (Gregor and Hevner 2013; Hevner et al. 2004), our research approach consists of multiple design cycles of artifact design, evaluation and refinement. The applied methodology is derived from the design-evaluation pattern described by Sonnenberg and vom Brocke (2012). Compared to other design science research methodologies (e.g., Peffers et al. 2007), the design-evaluation pattern more strongly emphasizes a continuously evaluation approach, which helped us to assess the usefulness of our design artifacts, as well as design decisions throughout the extensive research project, and thus to mitigate the risk of building an insignificant artifact. The design-evaluation pattern comprises four design activities – Problem Identification, Design, Construction, and Use – which are linked by four corresponding evaluation activities, as depicted in Fig. 2. The objective of each evaluation relates to the corresponding design activity: meaningfulness of the design research problem (Eval1); progression of the artifact to a solution for the stated problem (Eval2); that an artifact instance performs as expected within an artificial setting (Eval3); and the artifacts’ usefulness within a naturalistic setting (Eval4).

Fig. 2
figure 2

Adapted from Sonnenberg and vom Brocke (2012)

Design science research method.

Our research project started with a simple observation in our information systems department. Students and novice researchers alike seemed to encounter great difficulties when planning and executing rigorous literature searches. Exemplary issues were, for instance, identifying data sources, developing valid search requests, and exporting, merging, and managing large literature samples. This identification led us to another observation. Like our early ancestors, experienced scholars in our department developed primitive tools to facilitate their more frequent search tasks. We found, for example, handwritten notes that mapped outlets with search locations, simple Excel macros that created search requests, and bash scripts for merging citation export files. The existence of such tools led us to conclude that a mismatch between present technology artifacts and the task of conducting systematic literature searches in the IS context exists (Identify Problem). To evaluate the identified problem, we reviewed extant research on information search and retrieval systems, research on library systems, and existing artifacts in the application domain (Eval1). Although we found a large variety of IT artifacts and design knowledge on building information retrieval systems, we also saw our initial observations confirmed. In both practice and research, we found a strong focus on matters such as ease of use, retrieval accuracy, sorting, and efficiency. Facilitating systematic, rigorous literature searches seemed to be of little concern. Thus, based on our initial problem understanding and the insights gained through Eval1, we developed a first set of requirements and specifications for systems that support systematic literature searches in the IS field (Design). This initial design was then evaluated through a requirements workshop to assess whether the artifact design could present a useful solution for the stated problem (Eval2). To this end, we invited seven researchers from our IS department who conducted or supervised at least five systematic literature searches and reviews. The workshop’s results indicated the usefulness of the initial design for the intended task and contributed additional requirements from the application domain. Guided by our complemented design, we then instantiated a first prototype search system (Construct), which provided a simplistic user interface and access to two literature databases. The prototype was evaluated through an expert review with five IS researchers and developers, who were selected to include expertise on both technical aspects and methodological questions (Eval3). The results demonstrated the technical feasibility of the design, but they also revealed issues that required further refinements. The main issue, in addition usability, was that the experts had highly divergent opinions about the suitability of the implemented search process. Since we were not aware of any usable model for systematic literature searches that would help us find a more adaptable search process for SLSS, we returned to our design (Design).

The goal in the second design iteration was to deepen our understanding of literature search processes on an abstract level, which would enable us to derive a common understanding of a systematic literature search approach that could also function as meta-requirements for SLSS. Considering the strong influence of a social and organizational environment on all of the information search and retrieval processes, as described by Ingwersen (1996), meta-requirements for SLSS should reflect acknowledged quality criteria for the search process and its results (i.e., strategies and goals as part of the collective cognitive model). To this end, we conducted a systematic literature review of literature review guidelines. Following Webster and Watson (2002), we searched the eight top IS journals (AIS Senior Scholar’s Basket) and a special issue of the Communications of the AIS (Vol. 37, 2015) on the literature reviews. We identified a total of twenty-five literature review guidelines. Next, we analyzed the guidelines by following the coding procedures described by Wolfswinkel et al. (2013). We started by identifying requirements related to either the literature search procedures or their results. The initial set of fifty-eight codes was then iteratively refined and aggregated in two rounds of axial and selective coding. In so doing, we abstracted the literature-based requirements into three high-level criteria without ties to one specific review process or set of tools. These criteria have resulted in the three meta-requirements for SLSS, which are further elaborated in Sect. 4. Next, we derived a first set of design principles for SLSS by reflecting on the design knowledge acquired through our previous design activities and insights from our literature review of review guidelines. The derived principles could be classified both as materiality and action oriented design principles, which provide information on both material properties of SLSS (i.e., form and function) and actions made possible through the use of SLSS. We formulated the design principles following the structure suggested by Chandra et al. (2015).To evaluate the changes to our design based on the derived design principles, we conducted a focus group discussion with seven researchers (one professor, two postdoctoral researchers, and four doctoral students) from our IS department (Eval2). The focus group participants were selected to include a mix of experts on systematic literature reviews, as well as IS researchers familiar with design science research paradigms. Subsequently, we instantiated the derived design principles by refining the existing prototype system. This step enabled us to investigate different implementation variants based on the design principles and to provide a first proof-of-concept (Nunamaker and Briggs 2011). The next steps comprised multiple iterations of artifact evaluation (Eval3) and refinement (Construct) cycles. The evaluation efforts included usability tests, focus groups, and software tests. During these construction-evaluation cycles, we regularly revisited the design principles to inductively summarize our current understanding of the design of SLSS. After the third construction-evaluation cycle, we had the necessary confidence in the technical feasibility and applicability of our prototype to continue our research with real users and to use search tasks from the IS department at a large German university (Use). The following evaluation (Eval4) is comprised of two separate studies.

Before evaluating the fulfilment of the three meta-requirements, we were particularly interested in finding out whether the developed prototype was seen as overall beneficial by actual users. According to DSR literature, the usefulness of a previously build artifact is considered one of the most fundamental evaluation criteria in this regard (e.g., Gregor and Hevner 2013; Hevner et al. 2004; Niederman and March 2012). In the first study we therefore focused on investigating the prototypes’ perceived usefulness. To this end, we chose a naturalistic ex-post evaluation of the prototype implementation through nine semistructured expert interviews following Venable et al. (2016). The nine participants were researchers from the IS field who were selected because of their expertise in the literature review process (six research associates, two postdoctoral researchers, and one senior library researcher). The participants had between 3 and 15 years (avg. four) of experience in science, conducted between two and nine systematic reviews (avg. five) and supervised between two and fifty-two systematic reviews conducted by students (avg. thirteen). All of the participants were familiar with LitSonar and used the system for between 2 and 7 months (avg. four). The interview guide was structured into three sections of open questions on: (1) participants’ prior experience with systematic literature searches and reviews (e.g., general understanding of the concept, conducted reviews, and prevailing issues); (2) participants’ perception of LitSonar (e.g., LitSonar’s usability and usefulness in research and teaching); and (3) current and future use of the system (e.g., limits of the current system, positive and negative outcomes of using LitSonar, whether and how it would be used in the future). The interviews took between 40 and 89 min, with an average length of 59 min. The interviews were recorded and subsequently transcribed. We analyzed the transcripts using an iterative descriptive coding process, as outlined by Myers (2013).

In the second study, we performed a comparative analysis of current search systems and LitSonar to evaluate whether the developed artifact presents an improvement over existing solutions, as suggested by Gregor and Hevner (2013). Two search requests were used to query six search systems, along with LitSonar, to assess and compare their level of compliance with the three SLSS meta-requirements (comprehensiveness, precision, reproducibility). The tested search systems represent commonly used system types (Pontis et al. 2015; Spezi 2016), including three online literature databases with different topical focuses (EBSCOhost Business Source Complete, ProQuest ABI/INFORM, and AISeL), a discovery services (EBSCO Discovery), a scientific web search engine (Google Scholar), and a digital library catalog (KIT Katalog Plus). As the study’s benchmark and to allow for an objective assessment of systems’ results quality in terms of relevancy, we utilized the literature search results reported by Keutel et al. (2014). This extensive literature review on case study research in IS was selected because of its rigorous search approach, transparent search results documentation, and independence from a particular search system’s design. Keutel et al. (2014) identified all case studies published between 2001 and 2010 in one of six IS journals by manually evaluating every article issued in these journals during the investigated time frame. Accordingly, the search requests in our study were intended to identify research articles that describe at least one case study and were published in one of the six IS journals queried by Keutel et al. (2014): European Journal of Information Systems (EJIS), Information Systems Journal (ISJ), Information Systems Research (ISR), Journal of the Association for Information Systems (JAIS), Journal of Management Information Systems (JMIS), and Management Information Systems Quarterly (MISQ). We chose a keyword-based search approach with a limited time frame (i.e., 2001–2005) in order to increase the precision of the search requests and at the same time ensure compatibility with the search systems examined. The search terms were selected through an emulated initial search term assessment process. We randomly selected twenty case study articles reported in Keutel et al. (2014) and asked three IS scientists to mark search terms and phrases that identified these articles as case studies within the articles’ titles, keywords and abstracts. The results were then synthesized into the following set of search terms of which at least one had to be found: case*, ‘field stud*’, ‘field survey*’, ‘field observation*’, and longitudinal. Then, we derived two search requests: a high precision request searching for matches only in articles’ titles, keywords, and abstracts; and a broader request that searches in any meta-data field and the articles’ content. Next, we applied the two search requests to each of the seven search systems and exported the retrieved results into an SQL database. The performance of each system was then evaluated against the three SLSS meta-requirements. The results of both studies are presented in Sect. 5.

4 SLSS Design Knowledge

4.1 Meta Requirements for SLSS

Our review of the literature review guidelines unveils the common understanding in the IS community on criteria that constitute a good literature search. The following three meta-requirements (MR) synthesize this understanding.

(MR1) Systematic literature searches require a high level of comprehensiveness The comprehensiveness of a literature search describes the degree to which all relevant literature on the investigated topic is covered. One of the main goals of systematic literature reviews is to gain an overview of the existing body of knowledge. A comprehensive literature review is based on a comprehensive literature sample (Levy and Ellis 2006). A fragmented literature sample can lead to a partial view on a topic (Fink 2014; Levy and Ellis 2006) and increase the likelihood that individual biased articles influence the integrity of an entire review (Cooper 1982; Fink 2014). Hence, a good literature search produces a comprehensive literature sample that comprises as many relevant documents as possible. However, comprehensiveness usually does not equal completeness. Compiling a complete literature base is, in most cases, either very inefficient or even impossible, due to the sheer amount of available literature (Rowe 2014; Wolfswinkel et al. 2013). Literature review guidelines therefore suggest, for instance, “a relatively complete census of relevant literature” (Webster and Watson 2002, p. xvi) or “a good or reasonable coverage” (Rowe 2014, p. 246).

(MR2) Systematic literature searches require high precision The rigorousness of a literature review often depends on the reviewer’s resources in terms of time and effort (Boell and Cecez-Kecmanovic 2015b; Rowe 2014). Reviewers must decide if the amount of time, energy and financial costs is justified (Okoli and Schabram 2010). These finite resources are often spent on conducting main studies, instead of rigorous literature reviews (Jennex 2015). One task during the research process that can have a significant effect on the overall resource requirements is the screening of initial search results for irrelevant literature (Rowe 2014). A good literature search therefore produces results that include as little irrelevant literature as possible, especially when applying an iterative approach comprised of multiple cycles of searching and processing (Boell and Cecez-Kecmanovic 2014; Wolfswinkel et al. 2013). The proportion of documents in a result set that is relevant to the reviewer describes the precision of a literature search. To this end, guidelines recommend the definition of explicit inclusion and exclusion criteria that prefilter search results. These criteria include selecting appropriate databases (database-centered strategies) or outlets (outlet-centered strategies), as well as parameters such as keywords or authors (Boell and Cecez-Kecmanovic 2015b; Fink 2014; Okoli and Schabram 2010). However, precision is a double-edged sword. A more precise search is also more restrictive and more likely to exclude relevant research contributions (Rowe 2014). A good literature search is therefore both sufficiently precise to exclude as many irrelevant articles as possible and sufficiently comprehensive to include all vital contributions (Levy and Ellis 2006).

(MR3) Systematic literature searches require high reproducibility A reproducible literature search follows an approach that is both reliable (i.e., the results do not vary over time) transparent (Levy and Ellis 2006; vom Brocke et al. 2015). Such a search approach enables reviewers to be explicit about how a literature sample was compiled and to justify each process step, including queried data sources (e.g., databases or outlets) and exclusion and inclusion criteria (Fink 2014; Okoli and Schabram 2010; Wolfswinkel et al. 2013). While any literature review benefits from reproducible search results (Okoli and Schabram 2010), it is more critical for highly systematic literature reviews that, for example, are used for formal evaluation purposes (Rowe 2014; Wolfswinkel et al. 2013). A reproducible literature search is more reliable (Cooper 1982; Okoli and Schabram 2010) and contributes to the credibility of a review (Fink 2014; vom Brocke et al. 2015). Fellow researchers are enabled to assess the exhaustiveness of a literature sample and are encouraged to use and extend a review (Barnes 2005; vom Brocke et al. 2015). Furthermore, a reproducible and well-documented search process allows for refining of previous search steps and increases the likelihood of publication (Webster and Watson 2002; Wolfswinkel et al. 2013).

4.2 SLSS Design Principles

The following six principles provide guidance for how to design real-world SLSS that facilitate comprehensive, precise, and reproducible literature searches. Based on our inductive research approach, these principles capture generalized knowledge gained from our design and building process. The structure of our design principles is based on Chandra et al. (2015); that is, each principle specifies a material system property (in terms of form and function), the activity of users (in terms of action), and the boundary conditions under which the design will work. Figure 3 depicts the SLSS meta-requirements and the corresponding design principles addressing them, as detailed in the remainder of this section. Table 1 shows the compliance of three exemplary literature search systems with the derived SLSS design principles to illustrate their practical applicability.

Fig. 3
figure 3

Mapping of SLSS meta-requirements on SLSS design principles

Table 1 Compliance of exemplary search systems with SLSS design principles

(DP1Multi-Sourcing) Provide the system with the ability to query data from multiple sources, so users can retrieve a comprehensive sample, given that, in the specific search context, relevant contributions are scattered over different data source To address MR1, a comprehensive search must cover all sources that might contain literature relevant to the topic under review (Levy and Ellis 2006; Wolfswinkel et al. 2013) and should not be limited to one set of journals or geographic region (Webster and Watson 2002). In practice, this task is often challenging, especially for interdisciplinary research topics. In the IS field, for instance, there is no central literature database that covers all relevant sources. IS-related research is published in more than 800 outlets (Lamp 2017), spread over numerous databases (Boell and Cecez-Kecmanovic 2014; Levy and Ellis 2006). Hence, to provide reasonable coverage for a comprehensive literature search, SLSS must be able to access and query bibliographic data from more than a single source. This design principle can be implemented in more than one way. For example, an SLSS can maintain its own local database, which accumulates bibliographic data from different sources relevant for the targeted user group (e.g., web crawler or merging of existing databases). This approach provides good search performance and full content control, but it also produces high costs for setup, infrastructure, and maintenance, as well as introducing content responsibilities (e.g., data quality and copyright). Another implementation variant for DP1 is the meta-search approach, which distributes a reviewer’s search requests to multiple heterogeneous search systems and returns a homogenous results list. On the one hand, this approach lowers setup and infrastructure costs, and the content responsibilities reside with the data source owners. On the other hand, maintenance of interfaces to the external sources and postprocessing of results (e.g., merging and deduplication) require more effort, and the SLSS is dependent on its external data providers in terms of performance and data quality. Which approach is best suited depends on the available data sources and the requirements of the targeted research areas.

(DP2Filtering) Provide the system with precise filter mechanisms that enable users to exclude result records that are irrelevant for their individual information needs, given that the queried data structures allow for precise subset selection A search request is essentially a set of filter criteria defining inclusion and exclusion criteria, such as specific keywords, authors, outlets, document availability, etc. To allow for precise search requests (MR2), an SLSS must provide a wide range of filter criteria (restricted by the granularity and structure of the queried data sources), which should be easily adjustable during the search process to enable efficient iterative search request refinement. Depending on the implemented retrieval technology, result filtering can be performed ex ante (e.g., filter settings in request forms) or ex post (e.g., search within initial results or facetted search filters). For instance, to generate facets for ex-post results filtering, an SLSS must be able to retrieve and process the entire results set in one step (Nui 2014). An SLSS with a successive retrieval approach (e.g., meta-search engines) can only implement ex-ante filter mechanisms.

(DP3Flexibility) Provide the system with a flexible search interface so that users can apply their individual search strategies, given that there is more than one potential search strategy in the specific search context Reviewers require the ability to formulate search requests that balance the trade-off between comprehensiveness (MR1) and precision (MR2). Since this trade-off is unique for each search instance (Boell and Cecez-Kecmanovic 2014; Rowe 2014), providing a sufficient level of flexibility is a vital SLSS design feature. A fit between the characteristics of an IT system, in this case the SLSS’s functionality, and a user’s tasks will not only lead to higher task performance but will also increase usage acceptance of the system (Goodhue and Thompson 1995). An effective SLSS design therefore considers not only the most common use cases or the largest intersecting set of requirements of all potential users. It should also provide a large degree of freedom and combinability in regard to data sources, request properties, sorting options, and export formats to enable the implementation of individual strategies and constraints (i.e., exclusion and inclusion criteria) appropriate for a review’s individual goals and limitations. However, the number of options should be reasonable to maintain the interface’s usability and to limit the necessary cognitive load. This limitation could require a specialization of the SLSS on a limited number of research areas or user groups to reduce the requirement complexity.

(DP4Semantic Equivalence) Provide the system with semantic equivalency between the users’ search requests and the system’s queries for users to receive predictable results, given that there is a difference between system queries and search request representation in the user interface DP4 basically says that the system should do what it was told by its user. A reviewer’s search request can be described as a representation of an individual information need with the help of functionalities provided by a search system’s user interface. These representations are, for usability reasons, often at a higher abstraction level than the subsequently performed search requests. Reviewers’ input must be translated into machine-readable queries (e.g., SQL, RPN, or CCL) to be understood by underlying data sources. In the case of a meta-search system, these requests must be processed even further to match the individual format expected by each queried data source. If the translation process does not work like reviewers expect, the search requests might not represent their information needs, with undesired or unexpected results (Boell and Cecez-Kecmanovic 2014), and might eventually decrease a search’s comprehensiveness, precision, and reproducibility (MR1-3). Hence, an SLSS must ensure that the reviewers’ input and the machine-readable search queries are equivalent in terms of their semantic meaning. How this design principle should be implemented depends on the actual design of the user interface and the queried data sources. In general, the SLSS should clearly explain the functionality of each interface element and their interactions. Furthermore, the translation process must not introduce any request alterations that deviate from the behavior communicated to the reviewers. Potential causes of alterations are the request syntax (e.g., interpretation order of subexpressions and the behavior of wildcard symbols), technical limitations (e.g., length of queries and stop words), and data representation within queried sources (e.g., outlet names, abbreviations, and field names). In the latter case, automated or computer-assisted approaches that match different constructs to the same real-world phenomena, such as demonstrated by Larsen and Bong (2016), could be an effective device to address the issue and follow DP4.

(DP5Transparency) Provide the system with transparent process information that enables users to comprehend and document their search methods and results, given that the provided information contributes to users’ understanding of the search process Knowledge about an SLSS’s search process empowers reviewers to become a more active part of their searches. It provides the necessary insight to evaluate the quality and sufficiency of a literature search to increase comprehensiveness (MR1), as well as documenting the process and making informed decisions about further steps (MR3) (Fink 2014; Rowe 2014; vom Brocke et al. 2015). This design principle can be instantiated by implementing an SLSS that discloses details about the underlying search database, ranging from high-level information on queried sources (e.g., literature databases, open access repositories, or web crawler targets) down to the coverage of single issues or articles of individual outlets. Reviewers should also have access to information related to their search requests, such as applied queries, necessary alterations (e.g., syntax corrections or removed stop-words), and irregularities (e.g., errors, malformed requests, unavailable sources, or licensing issues). However, the level of detail and presentation of this information must be well balanced to foster understanding of the process while minimizing the additional mental effort necessary to comprehend the presented information.

(DP6Reliability) Provide the systematic search system with a stable search platform for users to retrieve similar search results for identical search requests, given a stable publication practice in the specific search context A reproducible literature search (MR3) requires not only a documented search process but also a stable search platform that can replicate the results of an earlier search when following the same process. No matter how thoroughly the search process is described, unpredictable search algorithms or search catalogues with high content fluctuation, such as Google Scholar (Beckmann et al. 2012; Bramer et al. 2013), will lead to unique search results depending on when or by whom a search is performed (Boeker et al. 2013). Implementing a stable search platform requires an objective search algorithm that does not personalize search results based on difficult to reproduce parameters, such as reviewers’ physical location, browser languages, or search histories. Furthermore, there should be as little content fluctuation as possible within queried data sources. While the former aspect can easily be evaded, implementing the latter is more complex. On the one hand, content fluctuations are unavoidable since data sources require regular content updates to ensure their timeliness. On the other hand, even highly fluctuating sources could be appropriate for an SLSS depending on the research context and availability of alternatives. However, fluctuations should be minimized if possible. For instance, curated databases that mainly contain organized bibliographic records (e.g., monographs, collections, or periodicals) are in general more suitable for reliable search results than databases that contain references to a variety of online documents and that are permanently updated by explorative web crawlers, which are often limited to the surface web. Finally, reviewers should be enabled to access and export the entire results set to prevent any influence of the sorting order on the integrity of the search results.

To conclude this section, we are confident that due to our rigorous DSR approach the set of six design principles is comprehensive regarding the material properties and actions that we have found essential to the construction of SLSS. In other words, the derived design principles address the essence that enables an SLSS to serve its intended purpose and distinguishes it from any other information system. Nonetheless, we do not claim that our set is exhaustive for all possible SLSS instantiations. Numerous general IS design principles exist (e.g., regarding a system’s availability or security) that might be relevant to the functioning of a specific SLSS instantiation. However, since these principles do not directly address the general issues that define SLSS as a problem class, we do not consider them SLSS design principles.

5 Instantiation and Evaluation of SLSS Design Principles

5.1 Prototype Implementation

As described in the methods section, our six design principles were instantiated in the form of a prototype web application – LitSonar (http://litsonar.com). As depicted in Fig. 4, the architecture of LitSonar comprises four main components: a Java EE web service; a web-based user interface; an internal outlet database; and external data sources (i.e., literature databases). The remainder of this section provides a short description of LitSonar and how the implemented features address the derived design principles. For more details on the prototype and its development process, we refer to (Sturm et al. 2015; Sturm and Sunyaev 2017a, b).

Fig. 4
figure 4

Abstract architecture of LitSonar and implemented design principles

LitSonar provides unified access to multiple literature databases by utilizing the meta-search approach (DP1). Reviewers’ search requests are dispatched to up to seven curated databases containing IS-related literature (e.g., ProQuest, EBSCOhost, and AISeL). Access to multiple established data sources with a single request is not only intended to facilitate more comprehensive searches but also to increase the reliability of search results due to the stability of the curated data sources (DP6). To increase the precision of search requests (DP2), LitSonar’s user interface implements two novel features for entering search requests. First, a flexible keyword editor allows for reviewers to define complex nested query structures of any depth using graphical elements, as depicted in Fig. 5. The editor is designed to replace the so-called “expert mode” of literature databases, which is often little more than a single text field that require reviewers to manually formulate the entire request in a predefined syntax. Second, there is a data source-selection mask that allows reviewers to either select multiple databases directly (database-centered) or compile a list of journals and conferences (outlet-centered). In the latter case, reviewers can choose from individual outlets and predefined lists of outlets based on journal and conference rankings. Using information from an internal outlet database, LitSonar automatically identifies appropriate databases so that all selected outlets in the specified timeframe are covered. Thus, both the mask for selecting data sources and the keyword editor showcase two implementation variants to increase the precision of search requests (DP2) and enable a wide variety of different search approaches through a highly flexible interface design (DP3).

Fig. 5
figure 5

Interactive keyword editor (left) and outlet coverage report (right)

Once the reviewers have entered and submitted their search requests, LitSonar translates them into database-specific search queries, including syntax and parameter values (e.g., outlet names). The translation process follows a strict ruleset so that the semantic meaning of reviewers’ search requests is not altered, either for technical reasons (DP4) or to personalize the request (DP6). If the translation fails for any reason (e.g., exceeds length limitations), the reviewer is notified and provided with information about how to define a more compatible search request, thereby keeping the search process transparent and reliable. After the requests are dispatched to the selected data source, the different results sets are merged, deduplicated, and finally presented in a homogenous list. Reviewers can browse through the list, download articles, compose individual result lists, and export article citations. Additionally, LitSonar provides extensive reports on the coverage of literature databases and outlets to increase the transparency of the search process (DP5). The database report shows which databases were searched and how many results per database were found. If a selected database could not be searched, an explicit warning is presented. In such cases, database-specific search queries are provided, along with instructions on how to proceed manually (i.e., using databases’ native search interfaces) and thus increase the comprehensiveness of literature samples. LitSonar also provides an outlet coverage report if the reviewer restricted the search to certain outlets (Fig. 5). This report provides detailed information about each selected outlet by listing the searched time periods, highlights gaps in coverage, and provides assistance on closing these gaps (e.g., links to print copies in local libraries). This information enables reviewers to assess and communicate the exhaustiveness of the conducted search and, if necessary, manually complement the results.

5.2 Results of Study 1

This section summarizes the main findings from the qualitative evaluation of LitSonar from interviewing literature review experts. The goal of the evaluation was to determine the perceived usefulness of the design artifact. To establish a common understanding, we first asked how the participants characterize the systematic search task based on their own hands-on experience and through supervising student theses. We found that systematic literature searches are seen as an important research tool but also perceived as complex and time consuming. The participants chose on more than one occasion a less systematic search method due to time restrictions. To learn more about the complexity of the process, participants were asked to estimate the effort for an outlet-centered search (top 50 journals based on AIS (2017)). The estimated workload lay between 3 and 10 days for an untrained student and between several hours and 2 days for an experienced researcher. The individual steps mentioned that the reasons for this high workload were identifying relevant sources, learning how to use search systems, creating valid search requests, and verifying the comprehensiveness of search results:

It requires a lot of effort to find out which databases cover those outlets. This is not a small matter. You have to build a new search string for each of them. This requires you to become familiar with the syntax of different databases. […] Additionally, you need to check whether all of the outlets really were covered.

In addition to characterizing a systematic literature search as time consuming, participants also describe the task as error prone and unreliable due to the high proportion of manual steps involved:

The manual process is so diverse that you always do something differently or overlook something.

Next, we asked the participants to assess the applicability of LitSonar to their typical systematic search tasks. In so doing, we found an overall match between LitSonar’s functionality and the participants’ core search activities. These activities include, for instance, selecting appropriate databases, execution of the query, and consolidation of the result lists:

I think the system is very intuitive. I like the phased process because it matches my own workflow when I conduct a literature search. […] I would say that the system covers everything from the point where you identified the right keywords up to where the analysis begins.

However, as indicated by the illustrative quote above, participants raised the concern that one of the first activities during a systematic literature search – the identification of relevant keywords – is not directly supported by the system.

If you start at the beginning [of the search process], the system does not suggest relevant keyword combinations. You need to find them on your own. Once all the keywords are gathered, you just need to enter them, and the rest is handled by the system.

Drawing on task-technology fit theory (Goodhue 1995), a fit between a technology artifact and the task for which it is intended is an important precondition for its utilization and positive influence on individuals’ task performance. Considering the above indicated fit with the core search activities, it is not surprising that we also found that the participants either already used the system or had strong intentions to use the system, as well as to recommend it to students and fellow researchers:

I would recommend using the system every time a systematic review is appropriate. It is freely configurable and thereby offers enough flexibility.

In terms of anticipated performance benefits when participants conduct a systematic search with LitSonar, we found two main effects. First, LitSonar’s results are expected to be more comprehensive than those from searching multiple literature databases (i.e., a manual search):

When searching manually, you will always miss something or use malformed search strings. You might produce different and less comprehensive results. In my opinion, comprehensiveness is a precondition for high-quality reviews. It is a huge quality loss if key contributions are left out.

This outcome appears to be a particularly relevant improvement over current systems, as throughout the interviews, participants emphasized multiple times that comprehensiveness is one of their major concerns when utilizing search systems for systematic literature searches:

The most important criterion is the comprehensiveness of search results. To conduct a literature review, you need to put a lot of work into it. Therefore, it is very likely that you want to publish the results afterwards. I think it would be very dangerous to use a tool that does not ensure the comprehensiveness of results.

Second, it is expected that LitSonar will make literature searches more efficient, although with diminishing returns for more experienced reviewers:

You do not have to deal with the individual databases, etc. […] You will definitely save time, especially if you have little prior knowledge. Even if you are experienced, the effect will still be there, though slightly weaker.

The participants anticipate that the increase in efficiency will allow for allocating more time to other research activities, such as analyzing the discovered literature, conducting more search iterations to refine the search request, or expanding the scope of the literature review. As a result, the participants expect that using LitSonar will have a positive effect on the quality of literature reviews, especially when conducted by students or novice researchers:

When students put the additional 2 days that it would take to conduct the manual literature search into the analysis of the results, I expect a definitive increase in the quality of their research projects.

In conclusion, the interview results suggest a fit between LitSonar and the task of systematically searching the literature. Using the system is expected to have a positive outcome on performance in the form of higher comprehensiveness and efficiency of the search process, contributing to the quality of literature reviews. The expert interviews thereby provide evidence for the technical feasibility and utility of our prototype implementation and thus provide evidence for the usefulness and relevance of the developed design principles.

5.3 Results of Study 2

This section discusses the performance of six search systems along with LitSonar to further investigate improvements to the developed artifact over current search systems, as suggested by Gregor and Hevner (2013). In the following evaluation, we consider a search system to outperform another system if it is closer to satisfying the three SLSS meta-requirements. For this we use recall and precision, two well-established measures for evaluating information retrieval systems (Jansen and Rieh 2010; Kent et al. 1955). Comprehensiveness (MR1) is measured by the proportion of relevant articles in the result set (i.e., recall). Articles are considered relevant if they were published between 2001 and 2005 and are listed as case studies in Keutel et al. (2014). Table 2 shows the distribution of these articles over the examined journals. Precision (MR2) is assessed by the same-named measure, which was assessed by the fraction of retrieved articles relative to the number of all retrieved documents. In order to examine different trade-offs between comprehensiveness and precision, seven Fβ scores for each result set are investigated, with β ranging from 0.2 to 5. Fβ-scores calculate the mean between recall and precision, with β acting as a weight for recall (Powers 2011). For instance, F0.5 places greater weight on precision, F2 weights recall higher than precision, and F1 balances both values (i.e., the harmonic mean). The different Fβ-scores enable an evaluation of a search system’s performance in different literature search situations, which might require a stronger emphasis on either comprehensiveness or precision. Reproducibility (MR3) is determined based on the systems’ reliability and transparency. Reliability is assessed by comparing the results of two identical search requests retrieved 2 weeks apart. The systems’ transparency is evaluated through a comparative analysis of user-accessible information and interface features provided by the search systems. Each system is queried with two search requests that aim to identify case studies published between 2001 and 2005 in one of the six IS journals listed in Table 3. Both requests include the same set of search terms. Search request 1 (SR1) searches for matches to these terms only in articles’ titles, keywords, and abstracts. Search request 2 (SR2) is intended to be less precise and more comprehensive by finding matches in any meta-data field or an article’s content. The relevancy of articles is judged based on the list of 120 case studies identified by Keutel et al. (2014). Table 3 lists the seven evaluated search systems, along with their coverage of the targeted outlets.

Table 2 Overview of case studies published in six IS journals between 2001 and 2005
Table 3 Queried search systems

Table 4 provides an overview of the results from SR1. We were not able to retrieve results from either Google Scholar or KIT Katalog Plus due to incompatibility with the request’s structure. KIT Katalog Plus allows only four different filters at a time, while SR1 requires five (title, keywords, abstract, outlet, and publication date). Google Scholar does not provide settings for filtering abstracts or keywords. Among the remaining systems, LitSonar has the highest recall (.6250) and precision (.4717) values. All three literature databases show similar high precision and lower recall values. The latter can be attributed to their limited coverage of the targeted outlets (see Table 3) and, in case of EBSCO Discovery Service, incomplete metadata records. EBSCO Discovery Service has the second highest recall and the lowest precision score. The low precision can largely be explained by the high proportion of duplicates (53.5%) within the results set. The Fβ-scores show that in search situations in which a greater emphasis on precision is required, the tested literature databases and LitSonar are the best choices. However, with sufficient weight on recall, EBSCO Discovery can outperform the tested literature databases but not LitSonar, which consistently has the highest Fβ-score.

Table 4 Results of SR1 (restricted to title, keywords, abstract)

Table 5 shows the results of SR2. This broader search request was compatible with all seven search systems due to the dropped search field restrictions. Set side by side, we see that all of the recall values increased, and all of the precision values decreased compared with SR1. It is also observable that precision becomes more homogenous across all systems. The relative differences in recall remain similar to SR2, mirroring the outlet coverage reported in Table 3. Nonetheless, LitSonar still shows the highest recall (.9917) and precision (.1722) values of all systems. LitSonar was able to retrieve 119 of 120 case studies identified in Keutel et al. (2014). EBSCO Discovery and Google Scholar have comparably high recall values, while presenting the lowest precision scores. The low precision of EBSCO Discovery can again be attributed to a high number of duplicates. Although Google Scholar’s results contain fewer duplicates, one third of the retrieved documents were not published in one of the six specified journals, contrary to the filter settings. The online literature databases show reverse picture, which can largely be attributed to their narrower journal coverage. While precision is comparable with LitSonar, the number of retrieved case studies is significantly lower when set side by side with Google Scholar or LitSonar. Based on the Fβ-scores, a clear improvement of LitSonar over current systems is visible. The Fβ-score differences increase with the strength of the recall weight. Examining the results from both SR1 and SR2, we see a clear indication that LitSonar constitutes an improvement over current systems in terms of comprehensiveness and precision.

Table 5 Results of SR2 (no search field restrictions)

As shown in Table 6, reproducibility was evaluated in terms of reliability, as well as transparency, in the form of information accessibility and process documentability. To assess the reliability of the search systems, we repeated SR2 within 2 weeks after the initial search run. Although the searched time frame remained unchanged, we received different search results from Google Scholar and EBSCO Discovery in the second search run. The results set returned from Google Scholar contained eight new documents, and three old documents were missing. EBSCO Discovery returned two fewer documents. In both cases, we retrieved the same set of relevant articles. The other five search systems provided stable results throughout the test period.

Table 6 Results for the reproducibility evaluation

Regarding information accessibility, we found that sufficient information on the syntax of search requests was provided by all search systems, with the exception of Google Scholar. For instance, we could not find any information on the usage of Boolean connectors, although they can be applied in the search mask. Regarding the transparency of the search index, only LitSonar and all three online literature databases provided detailed information on which sources could be queried by the system, down to the outlet level. EBSCO Discovery, Google Scholar, and KIT Katalog Plus provide only high-level lists of (partially) covered databases with no additional information. In terms of coverage information at the search result level, only LitSonar provided a detailed outlet coverage report. The only source of coverage information provided by EBSCO Discovery and the tested literature databases consists of factettes placed next to the results list, which could be used as an indicator of whether at least one document in the set was published in a specific time period or outlet.

In our investigation of the documentability of the search process, we found that the user interfaces of all of the systems allowed for a structured description of the entered search request. However, documenting the search process becomes more difficult when handling the results. For instance, Google Scholar and EBSCO Discovery limited the number of visible search results. If the number of results exceeds this limit, it can lead to different result sets depending on the current sorting order. Another process step that can reduce its reproducibility is meta-data export. While all of the systems allowed us to download meta-data for an individual document, export of subsets or entire result lists is not supported by Google Scholar, KIT Katalog Plus, and EBSCO Discovery. As a result, additional processing steps are necessary when analyzing a large result set, such as the development and application of manual or script-based export routines, the results merging, and deduplication. Every decision made during this task risks altering the results list and must be documented to maintain reproducibility.

In conclusion, our investigation of seven search systems shows that online literature databases and LitSonar are most suitable for reproducible literature searches in terms of reliability, information accessibility, and process documentability. LitSonar further improves reproducibility by providing detailed coverage information at the search result level.

6 Discussion and Outlook

The knowledge contribution of this article can be classified as improvement, based on the framework proposed by Gregor and Hevner (2013). The goal of improvement design science research is the creation of more efficient and effective IT artifacts by drawing on the existing deep understanding of the respective problem domain (Gregor and Hevner 2013). Starting from an initial problem observation, we conducted multiple cycles of artifact design and rigorous evaluation. Each cycle increased our understanding of the problem domain and the solution space, while continuously drawing on the existing knowledge base. In answer to our research question – how to design an SLSS that effectively facilitates systematic literature searches – we were able to derive six principles that summarize our understanding of effective SLSS design. Their thoroughly evaluation provide conclusive evidence for the improvement over current solutions in terms of comprehensiveness, precision, and reproducibility. This article thereby contributes to the literature review, information search, and information retrieval literature streams. In particular, this research provides four major contributions.

The first contribution arises from the SLSS design principles. It can be argued that a systematic literature search or any objective information search can be performed with the most basic and unrefined tools, as long as the underlying methodology is rigorous and sufficient resources (i.e., time and money) are available. However, over the past three decades, information has become far more accessible, and the amount of available information continues to grow rapidly (Hilbert and López 2011). This growth makes unassisted search approach not necessarily less rigorous but far more complex and expensive and thus less viable. Modern search system design is reacting to this trend with design choices intending to counteract some symptoms of the underlying issue (e.g., reducing information overload by presenting only the most relevant documents). As shown in the previous section, such design choices render performing a systematic and objective search more difficult or even impossible. We are therefore convinced that the presented SLSS design principles constitute a valuable contribution to the development of future systematic search systems, as well as research on solutions to the increasing challenges in our information-driven society, like information overload and information pollution through dis-, mis-, and mal-information (e.g., Wardle and Derakhshan 2017). In this regard, our research also contributes to theory, as it is one of the first studies to directly address the problems concerning systematic information searches. The proposed design knowledge is the product of several cycles of design and evaluation (Gregor and Hevner 2013; Sonnenberg and vom Brocke 2012) and, thus, constitutes the beginning of a prescriptive theory in the form of a design science artifact (Gregor and Jones 2007). Considering the systematic literature search context as an instance of the larger spectrum of problems, which is characterized by the need for objective, comprehensive, and transparent information accumulation, this article represents a first step in developing theoretical knowledge that will help to better understand and address the issues inherent in this bigger class of problems.

The second contribution of this article is the derived set of SLSS meta-requirements. As previously discussed, there is already a sizable body of knowledge on systematic literature searches and review approaches. While many different methods are proposed, and the analyzing stage is usually explained in great detail, recommendations for finding a relevant literature sample often remain vague. Thus, it is not surprising that, to our best knowledge, there is currently no conceptualization of the systematic, rigorous literature search task. The SLSS meta-requirements derived in this article address this gap by providing a first pragmatic model for systematic literature searches. Through consolidation of the most important quality requirements for systematic literature searches, our research offers a first basis for discussion, on which future methodological and design contributions can draw. For instance, by guiding an objective measurement of systematic literature searches, the SLSS meta-requirements could serve as a point of reference for novel evaluation instruments to measure the suitability of systematic search solutions or the quality of literature search processes and their outcomes.

The third contribution of this article lies in successfully demonstrating the potential of the design science research paradigm in the field of digital innovation, which is concerned with “new combinations of digital and physical components to produce novel products” (Yoo et al. 2010, p. 725). The emergence of digital innovations requires both advancements of digital technologies and the digitization of physical objects and processes (Fielt and Gregor 2016). In the context of literature and information searches, we saw the rapid evolution of storage, networking, and retrieval technologies over the past two decades, along with nearly complete digitization of the scientific IS literature (Watson 2015). However, the systematic literature search tasks do not seem to have fully benefitted from these developments, for they remain laborious, complex, and error prone. Our research shows that SLSS can transform these manual steps into a digital IT artifact and thereby successfully demonstrates not only the potential for digital innovation in our research context but also that design science research is a valuable paradigm for generating digital innovation artifacts and design knowledge.

Finally, we believe that this article also contributes to design science research methodology in general. Starting with a simple observation in our IS department, the design science research paradigm enabled us to not only instantiate a usable system artifact but also to derive generalizable design knowledge with potential utility for a whole class of problems. Along this journey, we encountered a multitude of issues and gained unexpected insights into the problem domain and the possible solution space. Adopting only one specific methodology would have been too restrictive to capture and incorporate all of the knowledge gained along the way. Instead, we found that following an increasing rigorously design and evaluation process inspired and guided by a multitude of design science literature (e.g., Gregor and Hevner 2013; Gregor and Jones 2007; Peffers et al. 2007; Sonnenberg and vom Brocke 2012; Venable et al. 2016) is a highly rewarding approach. Hence, one important lesson learned from our research journey could be that, for a longitudinal design science research project, an open and adaptive methodological approach, which is not limited to a specific guideline or framework, is best suited.

This article has several limitations, which at the same time provide future research opportunities. First, our research prototype is focused on the literature searches in the information systems research field. On the one hand, this narrow perspective might influence the generalizability of our results. On the other hand, it raises question of whether an effective systematic search system requires a high level of topical specialization. One lesson learned from our research is that effective SLSS must be highly customizable to address the users’ individual search needs. Increasing its topical coverage will also increase the diversity of potential information needs, search strategies, and data sources for which the system must provide. More sources and interface requirements will inevitably increase the versatility but also the complexity of the user interface. Such a system might be perceived as an overly complex expert system and send a strong dissuasive signal to unexperienced users, who are the very user group that might profit the most from such a system. It is well known that one reason for Google Scholar’s popularity is its simplistic and intuitive user interface (Spezi 2016; Wu and Chen 2014). Hence, one promising research opportunity lies in investigating mechanisms to increase a search system’s topical coverage and usability while also meeting the derived meta-requirements. A resulting larger (potential) user base will increase the incentives for organizations to develop and maintain such systems and thus make them more widely available and eventually an inherent part of our daily information consumption.

A second limitation of our research is its narrow design perspective on the SLSS concept. In our research, we focused on the core functionalities of a search system, which enables reviewers to enter search requests and retrieve matching results. Our expert interviews strongly indicated that assistance for identifying relevant search keywords during the early stages of a systematic literature search might be a relevant SLSS design feature. Although our design knowledge contributes to facilitating creative processes through more efficient search iterations and elaborate process information, these processes were not the focus of our research. Future research should adopt an extended and more holistic design perspective to investigate preceding and succeeding activities, in addition to fundamental search functionalities. Since these activities involve creative thinking processes and require in-depth understanding of the reviewers’ individual information needs, it will be necessary to assess the feasibility and design of innovative systematic search features that facilitate these activities (e.g., defining and refining search requests and selecting relevant samples) without introducing additional technology biases or creative restraints. Based on our research results, we anticipate that a more holistic research perspective will not only lead to further innovations in the literature search context but will also address growing societal challenges, such as information overload, filter bubbles, and information pollution.