Learning by Doing? Reflections on Conducting a Systematic Review in the Field of Educational Technology

“Occasional pitfalls in the construction of educational systematic reviews include lack of focus in the educational question, lack of specification in the inclusion and exclusion criteria, limitations in the search strategies, limitations in the methods for judging the validity of findings of individual articles, lack of synthesis of the findings, and lack of identification of the review’s limitations”


Introduction
In 1984, Cooper and Hedges stated that "scientific subliteratures are cluttered with repeated studies of the same phenomena.
Repetitive studies arise because investigators are unaware of what others are doing, because they are skeptical about the results of past studies, and/or because they wish to extend…previous findings…[yet even when strict replication is attempted] results across studies are rarely identical at any high level of precision, even in the physical sciences…" (p. 4 as cited in Mullen and Ramírez 2006, p. 82-83).
Presumably due to the reasons cited here, systematic reviews have recently garnered interest in the field of education, including the field of educational technology (for example Joksimovic et al. 2018).
Following the presentation and discussion of systematic reviews as a method in the first part of this book, in this chapter, we outline a number of challenges that we encountered during our review on the use of educational technology in higher education and student engagement. We share and discuss how we either met those challenges, or needed to accept them as an unalterable part of the work. "We" in this context refers to our review team, comprised of three Research Associates with backgrounds in psychology and education, and with combined knowledge in quantitative and qualitative research methods, under the guidance of two professors from the field of educational technology and online learning. In the following sections, we provide contextual information of our systematic review, and then proceed to describe and discuss the challenges that we encountered along the way.

Systematic Review Context
Our systematic review was conducted within the research project Facilitating student engagement with digital media in higher education (ActiveLeaRn), which is funded by the German Federal Ministry of Education and Research as part of the funding line 'Digital Higher Education', running from December 2016 to November 2019. The second-order meta-analysis by Tamim et al. (2011) found only a small effect size for the use of educational technology for successful learning, herewith showing that technology and media do not make learning better or more successful per se. Against this background, we posit that educational technologies and digital media do have, however, the potential to make learning different and more intensive (Kerres 2013), depending on the pedagogical integration of media and technologies for learning (Higgins et al. 2012;Popenici 2013). The use of educational technology has been found to have the potential to increase student engagement (Chen et al. 2010;Rashid and Asghar 2016), improve self-efficacy and self-regulation (Alioon and Delialioglu 2017;Northey et al. 2015;Salaber 2014), and increase participation and involvement in courses and within the wider institutional community (Alioon and Delialioglu 2017;Junco 2012;Northey et al. 2015;Salaber 2014). Given that disengagement negatively impacts on students' learning outcomes and cognitive development (Ma et al. 2015), and is related to early dropout (Finn and Zimmer 2012), it is crucial to investigate how technology has been used to increase engagement.
Departing from the student engagement framework by Kahu (2013), this systematic review seeks to identify the conditions under which student engagement is supported through educational technology in higher education. Given that calls have been made for further investigation into how educational technology affects student engagement (Castañeda and Selwyn 2018;Krause and Coates 2008;Nelson Laird and Kuh 2005), as well as further consideration of the student engagement concept itself (Azevedo 2015;Eccles 2016), a synthesis of this research can provide guidance for practitioners, researchers, instructional designers and policy makers. The results of this systematic review will then be discussed with experts and practitioners in the field of (German) higher education, to validate or controversially discuss the findings, providing both an impetus for evidence-based practice in the field of technology-enhanced learning and to gain insights relevant for further research projects.
Theory is one thing, practice another: What happened along the way Whilst in theory, literature on conducting systematic reviews provides guidance in quite a straightforward manner (e.g. Gough et al. 2017;Boland et al. 2017), potential challenges (even though mentioned in the literature) take shape only in the actual execution of a review. Coverdale et al. (2017) describe some of the challenges that we encountered from a journal editor's point of view. They summarize them as follows: "Occasional pitfalls in the construction of educational systematic reviews include lack of focus in the educational question, lack of specification in the inclusion and exclusion criteria, limitations in the search strategies, limitations in the methods for judging the validity of findings of individual articles, lack of synthesis of the findings, and lack of identification of the review's limitations" (p. 250).
In the remainder of this chapter, we will centre our discussion around three main aspects of conducting our review, namely two broad areas of challenges that we faced, as well as a discussion of the chances that emerged from our specific review experience. Dickson 2014, p. 20). The review question that was developed in a three-day workshop at the EPPI Centre 1 at the University College London was: 'Under which conditions does educational technology support student engagement in higher education?'. This is a broad question, without very clearly defined components and thus, logically, impacted on all ensuing steps within the review. 'Conditions' could be anything and therefore could not be explicitly searched for, so we chose to focus on students and learning. 'Educational technology' can mean different things to different people, therefore we chose to search as broadly as possible and included a large amount of different technologies explicitly within the search string (see Table 1) as we will also discuss in the further sections. This was a question of sensitivity versus precision (Brunton et al. 2012). However, this then resulted in an extraordinary amount of initial references, and required more time to undertake screening.
Had we not had as many resources to support this review, and therefore time to conduct it, we could have used the PICO framework (Santos et al. 2007) to define our question. This allows a review to target specific populations (in this case 'higher education'), interventions (in this case 'educational technology'), comparators (e.g. face to face as compared to blended or online learning), and outcomes (in this case 'student engagement'). The more closed those PICO parameters, the more tightly defined and therefore the more achievable a review potentially becomes.
Reflecting on the initial question from our current standpoint, it was the right decision in order to approach this specific topic with its often times implicit understandings and definitions of concepts. The challenge to grasp the student engagement concept is very illustratively captured by Eccles (2016), stating that it is like "3 blind men describing an elephant" (p. 71), or, more neutrally described as an "umbrella concept" (Järvelä et al. 2016, p. 48). As will be detailed below, the lack of a clear-cut concept in the review question that could directly be addressed in a database search demanded a broader search in order to identify relevant studies. Subsequently, to address this broad research question appropriately, we paid the price of tremendously increasing the scope of the review and not being able to narrow it down to have a simple and "elegant" answer to the question. Learner* OR student* Higher education "higher education" OR universit* OR college OR undergrad* OR graduate OR postgrad* NOT ("K-12" OR kindergarten OR "corporate training" OR "professional training" OR "primary school" OR "middle school" OR school OR "vocational education" OR "adult education") Educational technology "educational technology" OR "learning technology" OR "digital technology" OR "digital media" Tools "social media" OR "social network*" OR "social web" OR vodcast OR podcast* OR "digital broadcasting" OR blog* OR weblog OR "electronic publishing" OR microblog* OR "interactive whiteboard*" OR simulation OR forum* "computermediated communication" OR "computer communication network*" OR ePortfolio OR e-Portfolio OR e-Assessment OR eAssessment OR "computer-based testing" OR "computerassisted testing" OR OER OR "open educational resources" OR "open access" OR "open source technology" OR "information and communication technolog*" OR "information technology" OR "social tagging" OR "app" OR tablet* OR "handheld device*" OR "mobile device*" OR "electronic books" OR eBooks Internet "Web 2.0" OR "user generated content" OR "cyber space" Learning environments "virtual classroom*" OR "personal learning environment*" OR "virtual learning environment" OR "virtual reality" OR "augmented reality" OR "learning management system*" Computer "computer-based learning" OR "computer-based instruction" OR "computer-supported learning" OR "computer-supported collaborative learning" OR "computer-supported cooperative learning" OR "computer-supported cooperative work" OR "computer-mediated learning" OR "computer-assisted instruction" OR "computer-assisted language learning" Web "web-enhanced learning" OR "web-enhanced instruction" OR "web-based training" OR "web-based instruction" OR MOOC OR "massive open online course*" OR "online instruction" OR "online education" Technology "technology-enhanced learning" OR "technology-mediated learning" (continued)

Student Engagement: Focus on a Multifaceted Concept
To further explain why both our question and especially our search string can be considered rather sensitive than precise in the understanding of Brunton et al. (2012), discussing the concept of student engagement is vital. Student engagement is widely recognised as a complex and multi-faceted construct, and also arguably constitutes an example of "'hard-to-detect' evidence" (O' Mara-Eves et al. 2014, p. 51). Prior reviews of student engagement have chosen to include the phrase 'engagement' in their search string (e.g. Henrie et al. 2015), however this restricts search results to only those articles including the term 'engagement' in the title or abstract. To us-and including our information specialist who assisted us in the development of the search string-the concept of student engagement is a broad and somewhat fuzzy term, resulting in the following, albeit common, challenge: he main focus of a review often differs significantly from the questions asked in the primary research it contains; this means that issues of significance to the review may not be referred to in the titles and abstracts of the primary studies, even though the primary studies actually do enable reviewers to answer the question they are addressing" (O' Mara-Eves et al. 2014, p. 50).
Would this line need to be moved up? Given the contested nature of student engagement (e.g. Appleton et al. 2008;Christenson et al. 2012;Kahu 2013), and the vast array of student engagement facets, the review team therefore felt Mobile "mobile learning" OR m-Learning OR "mobile communication system*" OR "mobile-assisted language learning" OR "mobile computing" E-Learning eLearning OR e-Learning OR "electronic learning" OR "online learning" Mode of delivery "distance education" OR "blended learning" OR "virtual universit*" OR "open education" OR "online course*" OR "distance learning" OR "collaborative learning" OR "cooperative learning" OR "game-based learning" that this would seriously limit the ability of the search to return adequate literature, and the decision was made to leave any phrase relating to engagement out of the initial string. Instead, the engagement and disengagement facets that had been uncovered-published elsewhere (Bond and Bedenlier, 2019)-were used to search within the initial corpus of results.

Developing the Search String: Iterations and Complexity
Developing a search string, which is appropriate for the purpose of the review and ensures that relevant research can be identified, is an advanced endeavor in itself, as the detailed account by Campbell et al. (2018) shows. Resulting from our initially broad review question, we were subsequently faced with the task to create a search string that would reflect the possible breadth of both student engagement (facets) (see Bond and Bedenlier, 2019) as well as be inclusive of a diverse range of educational technology tools. The educational technology tools were, in the end, identified in a brainstorming session of the three researchers and the guiding professors; trying to be comprehensive whilst simultaneously realizing the limitations of this attempt. As displayed in the search string below, categories within educational technology were developed, which were then applied in different combinations with the student and higher education search terms, and were run in four different databases, that is ERIC, Web of Science, PsycINFO and SCOPUS. Not only due to slight differences in the make up of the databases, e.g. different usage of truncations or quotation marks, but also grounded in misleading educational technology terms, the search string underwent several test runs and modifications before final application. Initially included terms such as "website" or "media" proved to be dead ends, as they yielded a large number of studies including these terms but that were off topic. Again, reflecting from today's point of view, the term "simulation" was also ambivalent, sometimes used in the understanding of our review as an educational technology tool, but often times also used for in-class role plays in medical education, without the use of further educational technology.
However, the broadness of the search string made it possible to identify research that, with a more precise search focusing on "engagement" would have been lost to our review-demonstrated in the simple fact that within our final corpus of 243 articles, only 63 studies (26%) actually employ the term "student engagement" in their title or abstract.

4
Challenge Two: Retrieving, Analyzing and Describing the Research

Accuracy of Title and Abstract
As we began to screen the titles and abstracts of the studies that met our predefined criteria (English language, empirical research, peer-reviewed journal articles, published between 2007-2016; focused on students in higher education, educational technology and student engagement), we quickly realized that the abstracts did not necessarily provide information on the study that we needed, e.g. whether it was an empirical study, or if the research population was students in higher education. This problem, dating back to the 1980s, was also mentioned by Mullen and Ramírez (2006, p. 84-85), and was addressed in the field of medical science by proposing guidelines for making abstracts more informative. Whilst we were cognizant of the problem of abstracts-and also keywordsbeing misleading (Curran 2016), there proved no way around this issue, and we subsequently included abstracts for further consideration that we thought unlikely to be on topic, but which could not be excluded due to the slight possibility that they might be relevant.

The Sheer Size of It…. Using a Sampling Strategy
As described in Borah et al. (2017), "the scope of some reviews can be unpredictably large, and it may be difficult to plan the person-hours required to complete the research" (p. 2). This applied to our review as well. Having screened 18,068 abstracts, we were faced with the prospect of screening 4152 studies on full text. This corresponds roughly to the maximum number of full texts to be screened (4385) in the study by Borah et al. (2017, p. 5) who analyzed 195 systematic reviews in the medical field to uncover the average amount of time required to complete a review. However, retrieval and screening of 4152 articles was not feasible for a part-time research team of three, within the allotted time and the other research tasks within the project. As a result of this challenge, it was decided that a sample would be drawn from the corpus, using the sample size estimation method (Kupper and Kafner 1989) and the R Package MBESS (Kelley et al. 2018). The sampling led to two groups of 349 articles each that would need to be retrieved, screened and coded. Whilst the sampling strategy was indeed a time saver, and the sample was representative of the literature in terms of geographical representation, methodology and study population, the question remains as to the results we might have uncovered, had we had the resources to review the entire corpus.

Study Retrieval
Although authors such as Gough et al. (2017) mention that the retrieval of studies requires time and effort, this step in the review certainly assumed both time and human resources-and also a modest financial investment. We attempted to acquire the studies via our respective institutional libraries, or ordered them in hard copy via document delivery services, contacted authors via ResearchGate (with mixed results), and finally also took to purchasing articles when no other way seemed to work. However, we also had to realise that some articles would not be available, e.g. in one case, the PDF file in question could not be opened by any of the computers used, as it comprised a 1000 page document, which inevitably failed to load. Trying to locate the studies required time. Some of the retrieval work was allocated to a student assistant whose searching skills were helpful for easy to retrieve studies, but this required us to follow up on harder to find studies. Thus, whilst the step of study retrieval might sound rather trivial on first sight, this phase actually evolved into a much larger consideration. As a consequence, we would strongly recommend to have this factored in attentively into the time line of the review execution, and particularly when applying for funding.

Using Software Within the Review
In order to manage a large corpus of literature, it is highly recommended to use software, in order to make the screening and coding steps easier in particular. Popular low cost options include using Excel spreadsheets, Google Sheets, or reference management software, such as Endnote, Citavi or Zotero. Spreadsheets are straightforward to use and are familiar applications, however they can result in an unwieldy amount of information on one screen at a time, and reference management software has limited filtering and coding functionality. Software that has been specifically designed for undertaking systematic reviews can therefore be a more attractive option, as their design can produce quick and easy reports, speeding up the synthesis and trend identification process. Rayyan (Ouzzani et al. 2016) is a free web-based systematic review platform, which also has a mobile app for coding. However, we decided to use EPPI-Reviewer software, developed by the EPPI-Centre at the University College London.
Whilst not free, the software does have an easy-to-use interface, it can produce a number of helpful reports, and the support team is fantastic. However, more training in how to use the software was needed at the beginning to set the review up, and the lack thereof meant that we were not only learning on the job, but occasionally having to learn from mistakes. The way that we designed our coding structure for data extraction, for example, has now meant that we need to combine results in some cases, whereas they should have been combined from the beginning. This is all part of the iterative review experience, however, and we would now recommend spending more time on the coding scheme and thinking through how results would be exported and analyzed, prior to beginning data extraction.
Another area, where using software can be extremely helpful, is in the removal of duplicates across databases. We highly recommend importing the initial search results from the various databases (e.g. Web of Science, ERIC) into a reference management software application (such as Endnote or Zotero), and then using the 'Remove Duplicates' function. You can then import the reduced list into EPPI-Reviewer (or similar software) and run the duplicate search again, in case the original search missed something. This can happen due to the presence of capitals in one record but not in another, or through author or journal names being indexed differently in databases. We found this was the case with a vast number of records and that, despite having run the duplicate search multiple times, there were still some duplicates that needed to be removed manually.

Describing Studies
Against the backdrop of our review being very large, as well as employing an extensive coding scheme, we engaged in discussion of how to present a descriptive account of this body of research that would both meaningfully display the study characteristics, as well as take into account that even this description constitutes a valuable insight into the research on student engagement and educational technology. Finding guidance in the article by Miake-Lye et al. (2016) on "evidence maps" (p. 2), we decided to dedicate one article publication to a thorough description of our literature corpus, thereby providing a broad overview of the theoretical guidance, methods used and characteristics of the studies (see Bond et al., Manuscript in preparation), and then to write field of study-specific articles with the actual synthesis of results (e.g. Bedenlier, Bond et al., Forthcoming).
To handle the coded articles, all data and information were exported from EPPI-Reviewer into Excel to allow for necessary cross tabulations and calculations-and also to ensure being able to work with the data after the expiry of the user accounts in EPPI-Reviewer. Most interestingly, the evidence map-structured along four leading questions 2 -emerged to be a very insightful and helpful document, whose main asset was to point us towards a potentially well-suited framework for our actual synthesis work. Thus, following the expression 'less is more', the wealth of information, concepts and insights to be gained from the mere description of the identified studies is worth an individual account and presentation-especially if this helps to avoid an overladen article that can neither provide a full picture of the included research nor an extensive synthesis due to space or character constraints.

Chances
Whilst we encountered the challenges described here-and there are more, which we cannot include in this chapter-we were also lucky enough to have a few assets in conducting our review, which emerged from our specific project context and which we would also like to alert others to.

Involvement of the Information Specialist
As suggested in Beverley et al. (2003), information specialists can assume ten roles in a systematic review, comprising "traditional librarian responsibilities, such as literature searching, reference management and document supply, as well as a whole range of progressive activities, such as project leadership and management, critical appraisal, data extraction, data synthesis, report writing and dissemination" (p. 71). Whilst the same authors point out that information specialists are often consulted and involved in the more traditional tasks, this is also how we consulted the librarian in charge of our research field. In our case, we were lucky enough to have an information specialist who not only attended the systematic review workshop jointly with us, but also played an integral part in setting up the search string-including making us cognizant of pitfalls such as potential database biases (e.g. ERIC being predominantly US-American focused), and the need to adapt search strings to different databases (e.g. changing truncations). On a general note, we can add that students and faculty who are seeking assistance in conducting systematic reviews increasingly frequent the research librarian for education at our institution. This not only shows the current interest in systematic reviews in education but also emphasizes the role that information specialists and research librarians can play in the course of appropriate information retrieval. It also relates back to Beverley's et al. (2003) discussion of information specialists engaging in various parts of the review-and strengthening their capacity beyond merely being a resource at the beginning of the review.
Thus, although researchers are familiar with searching databases and information retrieval, an external perspective grounded in the technical and informational aspect of database searches is helpful in order to carry out searches and understanding databases as such.

Multilingualism
Our team was comprised of five researchers; two project leaders, who joined the team in the crucial initiation and decision-making phase, and who provided indepth content expertise based on the extensive knowledge of the field, as well as three Research Associates, who carried out the actual review. The three Research Associates are located at the two participating universities; University of Oldenburg and the University of Duisburg-Essen. Whilst Katja and Svenja are native speakers of German, Melissa is a native speaker of (Australian) English, which proved to be of enormous help in phrasing the nuances of the search string and defining the exact tone of individual words. However, Australian English differs from American, British and other English variations, which therefore has implications of context on certain phrases used. Additionally, we now know that authors from Germany do not always use terms and phrases that are internationally compatible (e.g. "digital media in education" = digitale Medien in der Bildung), rather, terms have been developed that are specific to the discourse in Germany (Buntins et al. 2018). A colleague also observed the same for the Spanish context. Both of these examples suggest a need for further discussion of how this influences the literature in the field and also how this potentially (mis)leads authors from these countries (and other countries as well) in their indexing of articles via author-given keywords. Thus, our different linguistic backgrounds alerted us to these nuances in meaning whilst this also raises the question about potential linguistic "blind spots" in monolingual teams. This could be a topic of further investigation.

Teamwork
Beyond the challenges that occurred at specific points in time, we would like to stress one asset that emerged clearly in the course of the (sometimes rather long) months we spent on our review: Working in a research and review team.
We started out as a team who had not worked together before, and therefore only knew about each other's potentially relevant and useful abilities beforehand: quantitative and qualitative method knowledge, English native speaker and plans to conduct a PhD in the field of educational technology in K-12 education. In the course of the work, adding the function of a (rough) time keeper and also the negotiation of methodological perfection, rigor and practicability, emerged to be important issues that we solved within the team and that would have been hardly, if at all, solvable if the review had been conducted by a single person. As every person in the team-as in all teams-brings certain abilities, it is the sum of individual competencies and the joint effort that enabled us to carry out a review of this size and scope. Thus, in the end, it was the constant negotiation, weighing the pros and cons of which way to go, and the ongoing discussions, that were the strongest contributor to us meeting the challenges encountered during the work and also successfully completing the work.
Going back to the title of this chapter "Learning by doing", we can confirm that this holds true for our experience. Although method books do provide help and guidance, they cannot fully account for the challenges and pitfalls that are individual to a certain review-hence all reviews, and all other research for that matter-are to some part learning by doing. And transferring what we learnt from this review might not even be fully applicable to other future reviews we might conduct.
Unfortunately we do not have the space here to discuss all of the lessons learned from our review, such as tackling the question of quality appraisal, issues of synthesizing findings, and which parts of the review to include in publications, a discussion of which would complement this chapter. Likewise, the experiences throughout our review and our solutions to them certainly also constitute limitations of our work-as will also be discussed in the publications ensuing the review. However, it is our hope that by discussing them so openly and thoroughly within this chapter, other researchers who are conducting a systematic review for the first time, or who experience similar issues, may benefit from our experience.