Key factors in digital literacy in learning and education: a systematic literature review using text mining

This research aims at providing an overview of the research field of digital literacy into learning and education. Using text mining, it reviews 1037 research articles published on the topic between 2000 and 2020. This review reveals that there is a plurality of terms associated with digital literacy. Moreover, our research identifies six key factors that define the literature, which are information literacy, developing digital literacy, digital learning, ICT, social media, and twenty-first century digital skills. These factors can be grouped into three main streams, which are 1) digital literacy, 2) digital learning and 3) twenty-first century digital skills. These three streams are supported by informational and technological foundations. These results provide research avenues and offer a framework for digital literacy in education.


Introduction
With the rise of digitalization over the last decades, digital literacy has taken a central role in our society and has become an important concern for institutions and policy makers (European Commission, 2020;U.S. Department of Education, 2014). It is also a particular topic of interest for research, be it in the definition of this digital literacy (Gilster, 1997;van Laar et al., 2017) or its development (List, 2019;Ng, 2012). The impact of digital society is also studied in relation to education (Di Giacomo et al., 2017;Pinto et al., 2020) and research (Ferreira-Mello et al., 2019;Stopar & Bartol, 2019).
In the same way that the term digital transformation covers a vast and varied set of phenomena (Audrin, 2019;Vial, 2019), there is a wide range of concepts for addressing the notions of digital skills in education literature, with some specificities and overlaps among definitions. In going beyond issues of terminology, the very purpose of research on the subject of digital literacy varies, as does the context in which it is led.
Over the last twenty years, the number of publications on digital literacy has grown almost exponentially. This abundance of scientific production is, of course, beneficial because it increases knowledge on the subject, but it also represents a significant challenge for scholars: given the proliferation of studies on the subject, it is very difficult to make sense of the field of research and to fully understand its specificities and areas of interest. Scholars benefit from using digital and quantitative research methods to give sense to the field of research. For example, Stopar & Bartol (2019) analyze clusters of co-citations and co-citing sources to understand how research is organized. The rationale of this study is to make sense of the abundant body of literature on the theme of digital literacy in the context of education and learning. This study uses text mining to structure the field of research.
The purpose of this study is twofold. First, we want to map how digital literacy and its related notions are investigated in the context of learning and educational research. More specifically, we are interested in studying which concepts are used by researchers, and if these refer to distinct specific skills. The second purpose of this article is to study the key research streams on the topic of digital literacy and its related notions in the literature. Thus, our purpose is to both provide an overview of the field, as well as to highlight how research integrates digital literacy into learning and education. Our research questions are the following: 1) What place does digital literacy hold in the literature on education and learning?
2) How is digital literacy conceptualized in the educational context and what are the main research streams on the topic?
To answer these research questions, we conducted a systematic review of the literature using text mining (Ignatow & Mihalcea, 2018;Thomas et al., 2011). This method is particularly suitable for our systematic literature review because it allows us to automate to a certain extent the content analysis process, and thus to process very large volumes of data in a systematic way. Text mining works by associating words or sentences and allows patterns to be extracted from a multitude of documents (Fabbri et al., 2013;Thomas et al., 2011). Our study is based on all articles published between 2000 and 2020 in English peer-reviewed journals collected on all relevant databases (Web of Science, ERIC and PsycINFO).

Methods
As defined by Moher et al. (2015), the purpose of a systematic review is "to collate all relevant evidence that fits pre-specified eligibility criteria to answer a specific research question" (Moher et al., 2015, p. 3). A systematic review allows a clear synthesis of the characteristics and findings of the studies included in the review by adopting a systematic search to identify all studies that meet given eligibility criteria. While traditional systematic literature reviews are mostly performed manually, we propose using (semi) automatic methods. We believe that due to the huge amount of relevant literature available in the early stage of the literature review process (Ananiadou et al., 2009;Fabbri et al., 2013), researchers may benefit from the automatic extraction allowed by textual analyses (Thomas et al., 2011). Text mining is being increasingly used in research in education (Ferreira-Mello et al., 2019), as it helps provide new insights by analyzing huge quantities of information. Text mining has been used in the field of education on a variety of data, but has not been used, to the best of our knowledge, to conduct a systematic literature review on scientific articles in the field of education and on the topic of digital literacy.
Having said that, given the power of such a tool to conduct analyses on large volumes of data (such as the body of literature on digital skills in education), we believe this method could be very fruitful in order to get an understanding of the literature focusing on digital literacy in learning and education. More specifically, text mining is designed to 1) foster information retrieval, 2) extract information and 3) perform data mining by highlighting both direct and indirect associations between various pieces of information (Thomas et al., 2011). Such processes are central when conducting a systematic literature review. We thus used textual analysis to 1) filter and categorize journal articles and 2) summarize central topics emerging from these articles. Text mining allowed us to extract information about the main concepts studied in the selected articles and to organize said concepts based on their co-occurence.

Inclusion criteria
To select the studies included in this systematic review, we used the following criteria: 1) The article had to be published in a peer-reviewed journal and written in English.
These criteria not only guaranteed scientific quality but also allowed us to gather the most important body of research focusing on digital literacy in learning and education. The language criterion was also necessary in order to perform the analysis with WordStat. 2) The article had to be published between January 2000 and November 2020. We chose 2000 as the milestone because this year represents the emerging year of the field of digital literacy in education, shortly after Gilster's seminal work on digital literacy.

Literature research strategy
The studies were identified by searching the databases PsycINFO, Web of Science and ERIC. We searched for all available records starting from January 2000 until November 2020, using the following combination of keywords in the title or abstract of the article: "digital competenc*" OR "digital (NEAR/2) skills" OR ("digital literacy" OR "e-skills") AND ("learn*" OR "education"). We based this query on van Laar et al. (2017)'s systematic literature review. This research held an initial pool of 1460 articles. After removing duplicates, there were 1037 articles left.

Data analysis
We performed text mining with the software WordStat 8 on the titles and abstracts of the 1037 articles selected in the previous section. WordStat is a software that offers quantitative analysis of textual data (Pollach, 2011). Through dictionaries, it allows to explore a corpus of texts and to identify key factors underlying said corpus. Recent literature reviews have been realized using WordStat (Ćurlin et al., 2019;Jede & Teuteberg, 2015). As specified by Durach et al. (2017), researchers performing a literature review are required to summarize the findings of the different studies. Relying on a textometric analysis allows for more objectivity in summarizing studies as researchers do not intervene in the coding and thus, do not tend to introduce biases during the analysis, coding and synthesis of the literature review.
The principle of text mining is that "the frequency with which a content word appears, the statistical relationships between content words and their context all witness to thematic patterns specific to a corpus" (Lavissière et al., 2020, p. 136). Our textometric analysis uses both processes of stemming and lemmatization. Stemming compiles and removes derivations and inflections of words to gather them into roots -or stems. We used Lovins' algorithm (Allahyari et al., 2017;Vijayarani et al., 2015). Lemmatization identifies the basic form of the words used in the corpus and relates them to their dictionary form. It thus returns to the lemma of the world, which is the dictionary form of a given word (Schütze et al., 2008).
Thus, our analysis first consisted in assessing the frequency of phrases across titles and abstracts in our corpus of articles. Then, we performed a co-occurrence analysis on the frequencies of phrases in titles and abstracts. Such analysis is based on the construction of a specific dictionary for the corpus. We built this dictionary based on the results of the frequency analysis. Items were included in the dictionary if they 1) occurred more than 100 times and 2) were directly related to learning, education and digital technologies. This dictionary accounted for 11.4% of the total words of the whole corpus. When considering excluded words (such as "a", "the", etc.), our dictionary accounted for 96% of the corpus. This analysis allowed us to find common phrases in our corpus, and further to perform factor analysis, multidimensional scaling and link analysis on our data. Note that we considered "phrases" to be the sequence of two to eight co-occurring words. We used the Phi coefficient to measure the association between words, as its interpretation is similar to the Pearson correlation coefficient. We selected the second order clustering method, as it is based on the idea that two words are close to each other, not necessarily because they occur near each other or in a same document, but because they occur in similar environments.
Finally, we performed a latent semantic analysis by applying a factor analysis with Varimax Rotation in order to extract a small number of factors from synthesizing the data. Data was segmented between documents, meaning that the topic modeling used was based on the cooccurrence of words in one article (its title and abstract). Phrases that occurred less than 30 times were removed, as it is advised to ensure the stability of the factorial solution. We selected factors which accounted for at least 20% of the cases. To define our factors, we retained words whose loadings (standardized link between a word and its factor) were higher than .3. We chose to perform a factor analysis instead of a hierarchical cluster analysis given that in a factor analysis one word may be associated with more than one factor, which we believe is more realistic. Indeed, this may not only reveal the polysemous nature of words but also highlight that some words may appear in multiple contexts.

An ever-growing proliferation of publications in numerous journals
Results reveal a growing interest in the study of digital literacy and education, as reported in Fig. 1. This figure reveals that very few studies were published on the These results can seem logical and obvious, as digitalization as such is quite a recent concept that has really boomed over the last decade. It shows, however, how research adopts a topic and makes it a fundamental topic of research in a relative short span of time, as illustrated by the progression between 2011 (42 articles published) and 2019 (242 articles published). More precisely, we can say that a threshold has been crossed in 2015 when the number of publications got higher than a hundred in a year, almost doubling the numbers from the previous year. The numbers for the next few years confirm this trend, highlighting the growing importance of digital literacy and its relevance for research. Four journals have published more than 20 papers on the topic, with "Computer & Education" leading at 41 articles and "International Journal of Digital Literacy and Digital Competence" following with 27 articles. Behind these big contributors, there is a very high number of journals that have published on the topic. The topic of digital literacy and learning is not limited to journals in the field of education but is also studied in other fields such as medicine and health, technology, organizational behavior, and so on. This highlights how digital literacy as a topic is widespread in literature, appearing in numerous journals with several terminologies, research angles, methods, and concerns that strongly differ.

A plurality of terms
We now turn to the content of our analysis itself. In Table 1, we report a frequency table of the phrases which were mostly cited across all articles. As specified in the methods section, we considered "phrases" to the sequence of at least two co-occurring words. Phrases were kept if they occurred more than 100 times in the data. This table clearly highlights that "digital literacy" is a central concept, as it occurred 1734 times within 849 articles of our corpus of 1037 articles. It is then followed by "digital competence" (556 occurrences among 235 articles), "digital skills" (455 occurrences among 308 articles), and "digital competencies" (255 occurrences among 148 articles).
A first striking point is to notice the plurality of the terms used on this topic. This point has already been highlighted in various literature reviews on the topic (Aviram & Eshet-Alkalai, 2006;Spante et al., 2018), where scholars have tried to decipher the relationships between e-skills, digital literacy, information and communication technology (ICT) literacy, digital skills, digital competence, and so on. The biggest issue when trying to make sense of the relationships between such a variety of terms is to understand to which extent they are similar or distinct. By going back to the very definitions of the concepts used in the literature, we can note some differences in the way the concepts are defined and used.
Definitions of digital literacy are numerous in education literature, but all trace back to the original definition suggested by Gilster (1997) who defines digital literacy as "the ability to understand and use information in multiple formats from a wide range of sources when it is presented via computers" (Gilster, 1997, p.1). This definition offers a very interesting starting point as to what digital literacy encompasses by emphasizing that digital literacy is not only about technical skills but also encompasses a cognitive dimension (van Laar et al., 2017;Spante et al., 2018). Avila and Pandya (2013) further emphasize the critical-thinking dimension of digital literacy by coining the term "critical digital literacies" (Avila & Pandya, 2013, p. 3). Other scholars such as Aviram and Eshet-Alkalai (2006); Ng (2012); Tuamsuk and Subramaniam (2017) even go beyond in suggesting that there is another dimension to digital literacy, which is the socio-emotional dimension. In this perspective, digital literacy also integrates online behaviors and the sensibility that is required to behave appropriately (Eshet, 2004). One of the first things that is striking when comparing these definitions is that there is no consensus on the actual definition of digital literacy. Scholars agree that digital literacy goes beyond technical aspects to include cognitive aspects. Beyond that, digital literacy appears as a multifaceted notion with some scholars emphasizing specific elements and others emphasizing others.
The second most recurring term in our corpus is "digital competence" or "digital competencies". It is broadly defined by Picatoste (2018) as "a set of different skills for achieving a good performance on digital society" (Picatoste et al., 2018(Picatoste et al., , p. 1033). This definition is interesting because it emphasizes the  (Scuotto & Morellato, 2013) and knowledge production (Cazco et al., 2016). These definitions are interesting because they articulate digital competence around practical aspects of using digital tools. In contrast, digital literacy seems to be focusing more on processing and communicating information.
The third term that appears most frequently in our corpus is "digital skills". Here again, the term has been widely used in the literature and many definitions exist. Van Dijk for example defines digital skills as the "set of skills that users need to operate computers and their networks, to search and select information, and the ability to use them for the fulfillment of one's goals" (van Dijk, 2006, p.73). This definition distinguishes three dimensions of digital skills: technical skills (i.e. the ability to operate a computer or other kinds of digital technologies), information-seeking skills (i.e. ability to browse and select relevant information), and strategic skills (i.e. using technical and information skills in order to achieve something) (van Dijk, 2006). van Deursen and van Dijk (2009) further make a distinction between the technical aspects and the content aspects in the digital skills to account for the specificities of online content. The way digital skills are defined in the literature seems to put emphasis on both the technological and medium of communication aspects.
Table 1 further highlights that the term 'twenty-first century' appears 160 times. This term can be associated with twenty-first century skills or twentyfirst century digital skills, terms that have been made popular by van Laar et al. (2017van Laar et al. ( , 2019. They provide a framework for defining twenty-first century digital skills as it identifies seven core skills, which are technical, information management, communication, collaboration, creativity, critical thinking and problem solving. Twenty-First century digital skills as such, consist of a broad array of competencies that are crucial in order to successfully accomplish tasks in the digital twenty-first century. Van Laar and colleagues further study how some determinants such as education level, age, and social support influence 21st digital skills, and are as such needed to be taken into account by educators and policy-makers (van Laar et al., 2019).
Based on Table 1 and on the definitions available in the literature, it appears that there is a multiplicity of overlapping concepts in education literature, all of which have their specificities and their particular inclinations. The pervasiveness of various terms such as digital literacy, digital competence(s), digital skills, and twenty-first century digital skills casts doubt on the overall appropriateness of use of the terms in the literature. It raises questions about the extent to which scholars use them with a specific intent in mind, with Fieldhouse and Nicholas (2008) noting that terms are often interchangeable in taking the example of 'literacy', 'fluency', and 'competency'. Table 1 highlights the breadth and heterogeneity of the terminology used in the literature; the results of our co-occurrence analysis emphasize this global lack of precision in the terminology and in its use.

Fig. 2
Dendrogram of the co-occurrence of phrases -generated by WordStat

Co-occurrence analysis: Classroom versus everyday life
In the following section, we report the results of a co-occurrence analysis based on the frequencies of phrases in titles and abstracts. Figure 2 represents a dendrogram describing the similarities between the phrases, and Table 2 shows a table of similarity, where the coefficients can be interpreted as correlation coefficients. Note that the values of the coefficients are not based on the frequencies of the words but rather on the co-occurrences of specific words in a case. Two main groups of phrases can be extracted from the dendrogram and from the similarities table. On the one hand, themes related to information literacy, digital media, literacy and social media appear to form one group of concepts. Indeed, social media and social networks are related to "online" (both phi higher than .5) and "internet" (phi = .486). Surprisingly, the term "internet" is negatively associated with concepts related to learning and education (such as "pedagogy", phi = −.729; teaching, phi = −.652, teaching and learning, phi = −.642, school, phi = −.517 and student, phi = −.523).This leads us to the second main group of phrases, which are related to the classroom. This group includes phrases such as "learning", "language", "pedagog*", "school", "student", "teach*, "teacher education", "teaching and learning", which all have an association higher than .5 with the concept of "classroom". These terms are also strongly related with the term "competence" (association higher than .4). Moreover, competence is strongly related to "train*" (phi = .713) and "universit*" (phi = .443).
This dendrogram provides a visual representation of how the terms in our corpus are associated. It is interesting to note that two groups of phrases emerge, with terms associated with the educational and learning environment on the one hand, and those with a more practical focus on the other. We can therefore distinguish in our corpus a literature that deals with the development of digital skills from a literature that focuses on the operationalization of such digital skills. Figure 2 therefore displays two major research streams on digital literacy.

Factor analysis
Results of the factor analysis highlighted 6 factors, all of whom had eigenvalues higher than 1. Results are described in Table 3 below, where factors are sorted by importance of eigenvalues.
The first factor highlighted in the factor analysis (eigenvalue = 2.48) refers to the importance of information (loading = .566) and information literacy (loading = .534). This factor highlights how information and communication are central in the literature on digital literacy. The keywords associated with this factor really highlight the central role of information and communication, which can be considered as the 'informational base' of digital literacy. Information and communication are the two fundamental foundations at the source of digital literacy. Our analysis also highlights other keywords emphasizing the importance of information channels, with keywords such as digital information, information sources, or health information.
There is a strong body of research that focuses on that technological base and on the multiplicity of tools that are available in the education context (see Pinto et al., 2020 for a review). Actions such as gathering and transmitting information and communicating efficiently, communicating through presentations and video-images, collaborating and working with documents online through mobile devices are the essence of digital literacy (Vázquez-Cano et al., 2020). The fact that this is the most important factor in our corpus confirms the importance of information and communication as the two major pillars of digital literacy.
The second factor (eigenvalue = 1.97) refers to the importance of digital (loading = .466) literacy (loading = .672). Central themes to this factor are digital literacy, digital skills and competence as well as digital technologies. The fact that these three terms are grouped within the same factor confirms the inclination already highlighted in our descriptive analysis to use indifferently the terms of digital competence, digital literacy or digital skills. While these terms are supposed to be used in different fields of application, they are combined in the same factor, related to the educational context for the acquisition of digital literacy. This factor focuses on the development of digital literacy, being related with keywords such as classroom (loading = .389) and pedagogy (loading = .227). We name this factor "developing digital literacy -how to become digitally literate".
Among the selected articles in our review, several highlight the importance of teaching digital skills (Peláez et al., 2020) and propose specific programs to developing (critical) digital literacy pedagogies (Alt & Raichel, 2020;Campbell & Kapp, 2020;Handley, 2018;Knight et al., 2020). Other studies highlight that pupils (Pérez-Escoda et al., 2016), students (Al Seghayer, 2020) but also adults and teachers (Eynon, 2020;Martín et al., 2020;Sillat et al., 2017) need support to use digital tools, while Bergdahl et al. (2020) suggest that digital skills are related to students' engagement in (technology-enhanced) learning and learning outcomes (Pagani et al., 2016). This factor not only highlights the importance of digital literacy and the fuzziness surrounding the terminology, but further emphasizes the acquisition of such competence in an educational context. While this is crucial for pupils, the implementation of such digital literacy pedagogies strongly relies on the teachers' digital skills (Fernández-Cruz & Fernández-Díaz, 2016), revealing that there is a gap between teachers' actual skills and the optimal skills required for them to efficiently propose learning activities using technological tools. Moreover, the (intention to) use ICT in their courses is strongly related to: teachers' motivation (Guillén-Gámez et al., 2019) and attitudes towards ICT (Area-Moreira et al., 2016;Nuzzaci, 2017;Siddiq et al., 2016), teaching approach (Mirete et al., 2020), ICT school equipment (Lorenz et al., 2019) as well as ICT school culture (Blau & Shamir-Inbal, 2017). Thus, many scholars (Fernández-Cruz & Fernández-Díaz, 2016;Gómez-Trigueros et al., 2019;Gudmundsdottir et al., 2020;Martín et al., 2020;Mynaříková & Novotný, 2020;Sillat et al., 2017;Tømte, 2015) advocate that teacher training should focus more on the vocational teachers' development of digital skills, notably by integrating ICT in their curriculum (Pombo et al., 2017).
The third factor (eigenvalue = 1.63) was named "digital learning" as two terms were central to the definition of this factor: learning (loading =.627) and university (loading = .532). Notably, results suggest that digital tools are often used in the context of language learning (loading = .287) -central themes of this factor are "language learning", "language teaching", "foreign language", "language education", "language competence" and "language education". This shows that digital technologies are used and relevant in the context of learning (Alvermann et al., 2012): digital tools are used to foster learning of languages (Dixon, 2010), reading (Daley et al., 2020) but also mathematics (Gómez-García et al., 2020). This factor also defines another context in which digital technologies are used to foster learning in higher education, as "university" has a loading of .532. Numerous digital learning initiatives have been carried out at the higher education level (e.g. Blayone et al., 2018;Hardy & McKenzie, 2020;Spante et al., 2018;Tejedor et al., 2020). Despite such enthusiasm, Liesa-Orús reveal that university professors may also benefit from training in the acquisition of digital skills (Liesa-Orús et al., 2020).
Thus, factor three highlights how one can learn thanks to digital technologies. In contrast to factor two, which was named "developing digital literacy", factor three focuses on the use of digital technologies as a tool to foster not only learning but also cognition in general (Di Giacomo et al., 2017).
Factor four (eigenvalue = 1.49) highlights the importance of information and communication technologies (loading = .660; ICT loading = .739) and links it to the educational environment (teaching, loading = .397, school, loading = .464). This factor highlights the technological foundations required for the development of any educational initiative that aims at using or developing information and communication skills (Gómez-García et al., 2020). Factor four has to be considered in relation to factor one, which emphasizes the information and communication dimensions that are at the basis of digital skills. In the same way, this factor emphasizes the technological infrastructure necessary for the development of digital skills, which can be considered as part of the 'technological base' of digital literacy. Several articles report the implementation of ICT in various curricula such as in primary schools (Borysenko et al., 2020), and secondary schools (Dzhurylo & Шпарик, 2019), as well as with special needs students (Rivera et al., 2017), while others compare ICT vs. non-ICT settings (Arrosagaray et al., 2019). When information and communication are the basic activities requiring technological skills, information and communication technologies are the tools that will enable the development of these skills.
Factor five (eigenvalue = 1.28) highlights the media dimension of digital technologies. Notably, social network (loading = .534), social media (loading = .500), online (loading = .499), internet (loading = .425) and web 2.0 (loading = .249) are important words associated with this factor. Factor five focuses on a specific aspect of digital communication tools that is social media, which can be considered as another part of the "technological base" of digital literacy. Social media represents a cornerstone of digital communication and requires digital literacy (Durak & Seferoğlu, 2020): interacting with people on social networks, gathering information, communicating, etc., all these activities require digital literacy. In a way, this factor refers to the operationalization context of digital skills in everyday life. Various studies highlight the importance of social media in everyday life (e.g. Akayoglu et al., 2020;Correa, 2016), further revealing that their use may foster digital skills (Cole et al., 2017;de Mesa & Jacinto, 2020) and learning (Rwodzi et al., 2020), and thus, could be used as a teaching tool (Dennen et al., 2020). Interestingly, they suggest that while social media is strongly used, users nevertheless appreciate when guidance is provided by their learning environment (Akayoglu et al., 2020) or more generally by the social structure they are part of (Eynon & Geniets, 2016). Moreover, these results further highlight the importance of education in 1) the use of social media (Correa, 2016) and 2) the resources that can be mobilized from the internet (van Ingen & Matzat, 2018).
The last factor (eigenvalue = 1.26) highlights the importance of technology (loading = .549) in education (loading = .405) and training (loading = .455) and links digital technology (loading = .379) and knowledge (loading = .305). Central topics are "content knowledge", "knowledge and skills", "knowledge society", "communication skills" and "problem solving". This conjunction of keywords seems to be quite in line with the central components of twenty-first century digital skills. Thus, we name this factor "21 st century digital skills" to connect this factor with the education literature and to emphasize the importance of the capacity to interact with digital technology. Indeed, this factor regroups keywords focusing on how people may use technology not only to foster knowledge (Higgins, 2014), but also to deal with specific issues, to (collectively) solve problems (Sun et al., 2020), show critical thinking (Cladis, 2020;Higgins, 2014;Kivunja, 2014;Novakovich, 2016) and to communicate. Studies highlight the importance of the educational context to foster such competences (Petrucco & Ferranti, 2017), notably through the development of a critical pedagogy (Coker, 2020) and teaching intervention to highlight skepticism (Walton et al., 2018). Thus, this factor refers to questions such as "what is the end goal of digital literacy?" and "how/for what can we use this competence in everyday life?", falling well in line with van Laar and colleagues' definition of what the twenty-first century digital skills are about.
It is important to analyze Tables 3 and 4 jointly in order to keep in mind the continuity between the keywords, the factors, and their correlations. Table 4 below reports the correlation between latent factors. Values higher than 0.4 are highlighted in bold. These correlations reveal that latent factors may be grouped into two main groups. The first group combines the factors digital learning, digital literacy, ICT, and twenty-first century digital skills. Considering the items that these factors gather and the relationships among them, it appears that this first group mainly involves factors related to learning and the educational environment. The second group includes twenty-first century digital skills, information literacy, and social media. Given the keywords that form these factors, this second group consists of factors with a greater practice orientation. It is interesting to note that twenty-first century digital skills bridge the two groups, and somehow connects the classroom to everyday life.

Discussion
The purpose of this review was to provide an overview of the research field of digital literacy into learning and education. More precisely, it aimed at answering the following two questions: 1) What place does digital literacy hold in the literature on education and learning? 2) How is digital literacy conceptualized in the educational context and what are the main research streams on the topic?
Using text mining, this review maps the field of educational research in line with digital literacy. A number of 1037 studies performed between 2000 and 2020 on this topic were included, as they explicitly mentioned digital skills (or related terms such as digital literacy or competence), education and learning either in their title and/or in their abstract. This number of studies shows that there is a great deal of interest in this topic. The first question that this rapid growth raises is that of the sustainability of this research area: is digital literacy a long-lasting research topic or is it going to be out-of-date in a few years because of the emergence of new technologies? On one hand, the fact that digital literacy is tightly linked to ICT and digital media suggests that it might not be relevant in a post-digital world. On the other hand, the very concept of digital literacy is more than twenty years old and still relevant, even if technological evolution has been tremendous over that period of time. Moreover, the concept of digital literacy covers more than the sole technical dimension, which suggests that research on the subject still has a long way to go.
Our analysis suggests that the terms used (most notably, digital skills, digital literacy, digital competence(s)) need to be clearly defined, as authors tend to use them interchangeably although each term has its own specificities. These results are consistent with other literature reviews (Spante et al., 2018). The relative youth of the literature field (about twenty years old) might explain why the terminology is not yet fully established. Nonetheless, this may generate some confusion and potential misunderstandings, as well as a dispersion in the field of research. Our descriptive results shed light on this plurality and our analysis of the corpus provide an overview of the research field and highlights how research integrates digital literacy into learning and education. This allows researchers to better position their research on the subject and to use the appropriate terminology.
Our research allows us to go beyond the issues of terminology to offer a mapping of the field of research in education sciences on digital literacy. Text mining allows us to give an overview of the field, but also to investigate in detail the different elements that compose it.
Our results highlight a fragmentation in this field: on the one hand, there are studies focusing on digital literacy in an educational context (i.e. in classrooms and other learning contexts). On the other hand, there is research highlighting practical aspects related to digital world, such as the use of social media and social network and more generally to information literacy. This fragmentation of research in this field is in line with Stopar and Bartol (2019) who also identify distinct research clusters. This allows us to identify three main streams of research, namely: learning digital literacy, digital learning and twenty-first century digital skills, as well as two fundamental dimensions that support the digital ecosystem that are the informational and the technological base.
Digital literacy and digital learning are at heart of the research field, as they constitute its core focus. Research is booming and there are more and more studies investigating either the development of digital literacy or digital learning setups. There is a high correlation between these two research streams, but more research needs to be led in order to combine these two research streams. Research  Cazco et al., 2016;Hatlevik, 2017;Yu et al., 2017). Research could also specifically target vocational teachers, as their role in both fostering digital literacy and creating digital learning environments are crucial (e.g. Area-Moreira et al., 2016;Gómez-Trigueros et al., 2019;Gudmundsdottir et al., 2020;Guillén-Gámez et al., 2019;Martín et al., 2020;Mynaříková & Novotný, 2020). Research also advocates that teacher training should focus more on the vocational teachers' development of digital skills, notably by integrating ICT in their curriculum (Pombo et al., 2017). The twenty-first century digital skills factor shows another axis of research in the literature field bridging the technological base and the informational base. Rather than asking the question of digital learning or learning digital literacy, the issue here is the operationalization of these skills between technology and information and communication. This factor therefore illustrates a third research axis in the literature on digital skills, which is more concerned with understanding what these digital skills are made of and how they can be used in everyday life. Research in this stream can step out of the traditional education and learning field of research to tackle other issues related to twenty-first century digital skills, for example in the workplace (van Laar et al., 2019).
The technological aspect is very important and can be found in the ICT and social media factors. They represent the "technological base" without which the whole digital ecosystem would not exist, demonstrating their central role in research. The particularity of digital communication tools is that they are constantly evolving, which is why studies focusing on tools are always relevant as they deal with a constantly evolving reality that needs to be understood. The social media factor, beyond its technological dimension, also has to be linked to its daily life dimension in which social media is used for all kinds of activities. This factor therefore has an embeddedness in daily life that is important to take into account.
Information and communication are also a key element of this research area, constituting what could be called the "informational base" without which the whole digital ecosystem does not exist. In this sense, the information literacy factor puts the emphasis on data and on information and communication, which are the core forms of expression in digital literacy. The performance of information and communication tasks is at the heart of digital literacy, and this "informational base" appears in both research streams.
Our study has several limitations. The first limitation is methodological in nature and concerns text mining. In contrast to traditional systematic literature reviews which focus on the most relevant articles by establishing very precise selection criteria and analyze them in depth, our approach via text mining aims rather at processing a very large number of articles (the whole body of articles published on the topic) and analyzing them in an automated way. This does not allow us to obtain the depth of analysis of traditional systematic literature reviews, but it is very useful in order to have a global understanding of the field of research. This objective of mapping the entire literature is incompatible with a more in-depth work on the content of the articles. Future studies, however, could dig deeper into the factors that have been identified in our analysis and analyze how these operate in the field of research in order to make sense of their specificities.
Another issue with text mining is that it works through keywords and can thus miss important topics if they are not signaled through the appropriate keywords. For example, private context does not appear in our analysis, even though there is an important body of literature that highlights the importance of personal factors. This literature focuses on children's first use of ICT (Juhaňák et al., 2019), cultural background (Gui & Argentin, 2011), socioeconomic status (Hatlevik et al., 2015;Jara et al., 2015;Zhong, 2011), parental active mediation of ICT use (Livingstone et al., 2017), parental level of education (Cabello-Hutt et al., 2018). The absence of this dimension in our analysis might be explained by the fact that there is no term that encompasses all these sub-themes, no keyword that the automatic analysis could retrieve. This factor, however, plays an important role in the research as it seems to be a determinant of digital literacy.
Finally, one limitation of our study is its focus on education. This focus on educational research allows us to have a comprehensive view of the whole field, but limits our understanding of digital skills and their applications in daily life and notably within the world of work. Future studies could aim at bridging the classroom with the workplace in order to have a more global perspective on digital skills and results that benefit both (Ahlquist, 2014;Alvermann et al., 2012;Kivunja, 2014).

Conclusion
This research aimed at providing an overview of the research field of digital literacy into learning and education. Using text mining, we reviewed 1037 research articles published on the topic between 2000 and 2020. This review reveals that there is a plurality of terms associated with digital literacy: researchers tend to use the terms "digital literacy", "digital skills", "digital competence" and "21 st century digital skills" interchangeably when these should be used with a grain of salt.
Our results further emphasize the fragmentation of research between studies performed in the classroom vs. research focusing on everyday life digital skills. Future studies should build on the 21st century digital skills to bridge both universes.
Finally, our research identifies six key factors that define the literature. These factors can be grouped into three main streams, which are 1) digital literacy, 2) digital learning and 3) 21st century digital skills. This review not only highlights the importance of individuals' development of digital literacy, but also sheds light on the critical role of digital technologies in education. The 21st century digital skills factor offers a wider perspective on the use of digital literacy beyond the educational context. These three streams are supported by informational and technological foundations, further emphasizing the role of information and technology in this topic: research on digital literacy cannot be untied from its informational and technological prerequisites. This study suffers from several limitations. On a methodological level, text mining only allows for a surface analysis of the field and may miss some keywords. On a more general level, we focused on the field of education only, even though digital literacy is also relevant in other areas of research such as management.