Classification and evaluation of educational apps for early childhood: Security matters

This study explored certain popular educational apps’ vital characteristics and potential profiles (n1 = 50) for kindergarten kids. The profile analysis involved a categorization ascended from an evaluation process conducted by pre-service early childhood teachers’ (n2 = 295) at the University of Crete, Greece, using a new instrument, validated in the present research, the ETEA-2 scale. The categorization criteria were the five dimensions of the ETEA-2: Learning, Suitability, Usability, Security, and Parental Control. The classification based on Latent Class Analysis led to three apps' profiles: Cluster/profile 1 includes apps that have high values in Learning, Usability, Suitability, and medium Parental Control and Security; Cluster/Profile 2 includes apps with medium Learning, Usability, Suitability, but low Parental Control and High Security; Cluster/Profile 3 includes apps with medium Learning, Usability, Suitability, but low Parental Control and low Security. This profile scheme is an indicative categorization summarizing the crucial features that popular apps possess and can help parents and/or educators’ decision-making on choosing the desirable application for their kids. Moreover, from an independent evaluation of these specific fifty apps sought on the internet, the members of Cluster2/Profile 2 were the most popular and preferable, as suggested by the number of downloads. This profile is distinguished for the security dimension.


Introduction
In the last decade, smart mobile technology has become popular among young kids (Nikolopoulou, 2020), as over 50% of the available educational apps target preschool children (Callaghan & Reich, 2018). Historically, this trend was combined with an expressed need for excellence in teaching, promoting digital technologies addressed to young children, and a new educational orientation supported by politicians (Bourbour, 2020). Early childhood education does not ignore interactive smart screen technologies, and a special interest in researching how these devices can improve the learning procedure in formal and informal settings has emerged (Tavernier, 2016). A growing body of research into preschool children's smart tools states that digital devices have become pervasive, whilst educators respond pedagogically to the new opportunities, constantly incorporating these types of devices into their daily teaching practice (Fleer, 2020).
Preschoolers tend to express few issues in navigating touchscreen devices, so they feel motivated to use them in diverse ways, such as engaging in reading digital books, even if they were not interested in reading print books. Many educators seek novel ways to motivate learning by incorporating interactive mobile technology in their classes (Eutsler & Trotter, 2020). Applications (apps) can support children's active engagement by embedding educational concepts into game-like activities. They can also scaffold children's learning through adaptive learning technology, provide feedback and rewards through gameplay, and promote a repeated practice of critical foundational skills (Griffith & Arnold, 2019;Hirsh-Pasek et al., 2015b).
However, the quantity expansion has not seen an associated increase in quality. Studies on the usability of tablet games for children typically occur in laboratory settings, and they do not understand the context of use . Designing technology for children comes with unique ethical challenges and responsibilities related to the inclusion of children in the research and design processes and the outcomes of that work (Frauenberger et al., 2018). Unfortunately, most apps that parents and teachers find under the "educational" category in app stores have no evidence of efficacy. Instead, they primarily target rote academic skills, use little or no input from developmental specialists or educators, and are rarely based on established curricula (Guernsey & Levine, 2015). Most of these apps treat children as young adults, not considering the unique needs of this age group, such as the fine motor skills needed to handle an app. This is particularly important as children constantly engage with technology, which usually lacks quality in aligning with their developmental level or needs (Callaghan & Reich, 2018;Plowman et al., 2012).
Educators need support in the choice of apps with educational values. Choosing an app can be overwhelming; the educational category in app stores often does not help educators find developmentally appropriate apps. In examining the educational value, many of the apps tagged appropriately for young children were inadequate and often mislabeled the acceptable age range (Fantozzi, 2021;Sari et al., 2017Sari et al., , 2019. Apps are a new pedagogical tool, and, as such, they should be subjected to the same "quality control" to ensure their effectiveness in teaching kids in a developmentally appropriate way. Those concerns led the researchers, during the past years, to investigate further the process of designing apps aiming at children's engagement in learning (Callaghan & Reich, 2018).

Literature review
The recent development of many latest information and communication technologies has significantly changed these technologies' role in daily life (Nguyen et al., 2014). Indeed, statistics show that about 75% of the kids in the USA, as young as four years old, own a personal smart mobile device (Kabali et al., 2015). Following this trend, children in the UK favour touchscreen devices over desktop computers or even laptops (Ofcom, 2020). Studies have shown that, by the age of three years old, children can tap and swipe independently on such devices (Marsh et al., 2015;Vatavu et al., 2015), an ability that can support them to learn independently, either in traditional or informal learning environments (Booton et al., 2021).
New mobile technologies have propelled apps and software supporting m-learning (Luna-Nevarez & McGovern, 2018). When Apple's iTunes App Store and Google's Android Market first launched in 2008, smartphone users could choose from about 600 apps (Federal Trade Commission, 2012). More than 470 million educational apps are now in the Apple App store and 466 million in the Android Market. During the first quarter of 2020, the COVID-19 pandemic caused a surge in downloads of educational apps (Leci, 2021). With the advancement of interactive technology and more user-friendly touchable interfaces, these devices in early-year classrooms appropriately prepare young children for the 21st century (Miller, 2018). The new digital mobile technologies bear some advantages for incorporating them in educational settings; they are portable. Their touchscreens are easy to use and require a small workspace while allowing for multiple people viewing (Lawrence, 2017). Compared to mouse-operated computers, a tablet's accessible touch-based operational features (e.g., tap, slide, swipe) make it intuitive. They are convenient due to their portability and size. The attractive multimodal features of apps (e.g., animations, audio, colourful graphics, highlighted texts) stimulate a child's visual, auditory, kinesthetic, and tactile senses and deliver immediate feedback. The interactive nature of tablets gives children the autonomy and agency to select their activities. The multi-functionality of tablets increases the chance of capturing a child's interest, offering a range of activities with one device (Neumann, 2020). Thus, these apps are often called 'edutainment' due to their formats that resemble games, the visual aids they provide, and their didactic approach to teaching educational content (Okan, 2003).
Most children use interactive smart screen technologies to achieve different purposes, some of which may be described as playful, cooperative, and interactive (Marsh et al., 2018). The Joan Ganz Cooney Center named five ways mobile media devices and apps can change children's education. They range from the 'anywhere, anytime' type of ubiquitous learning (i.e., quick and ready access to technologies). Thus, it promotes situated learning and breaks the barrier between home, school, and after school. It can fit with diverse environments to improve 21st-century social interactions by enabling a personalized experience to reach underserved children (Looi et al., 2011;Shuler, 2009).
Among the scopes of preschool education is the development of children's early literacy and numeracy skills. These skills are known predictors of later academic achievement and advance through developmentally appropriate apps (Hoareau et al., 2021). These apps focus on the excellent design of open-ended and child-driven activities to engage young students in learning (Hirsh-Pasek et al., 2015a;Reed & Takeuchi, 2011). The appropriateness of the developmental aspect of the software used significantly impacts children's education because developmentally inappropriate software can negatively affect children's creative skills. For instance, interfaces for young children should be highly visual, avoiding text as much as possible to reduce cognitive load (Soni et al., 2019). Appropriate apps should include prosocial content, non-violent stories, and characters, promote diversity in terms of gender and culture, and have low levels of advertising. Content needs to have the right place for young children to enhance their executive function instead of hindering it (Papadakis, 2021). Given suitable apps, educators can consider how different apps meet children's play interests. Also, they can create new opportunities for children to mix and match modes of communication in digital forms such as video, audio, images, and text, which is vital for sustaining children's in-app digital play (Troseth et al., 2016).
The rapid changes in communication forms have created an environment where parents and educators of young children find themselves in an unknown context, which heavily demands new everyday practices. They must familiarize themselves with the new trends, find easily accessible mobile technologies and support ways to make them attractive and available to young children (Laidlaw et al., 2019). To provide children with high-quality apps that promote play and creativity, caregivers should be able to assess the quality of apps so that they can buy the best in terms of suitability and educational value (Marsh et al., 2018). There are about 80,000 apps promoted as 'educational' (Healthy Children, 2018). However, researchers agree that most children's apps advertised lack educational value and any foundation in relevant studies outcomes (Ólafsson et al., 2013). Indeed, educational benefits are rarely known as few edutainment games are assessed for their benefits or outcomes (Guernsey & Levine, 2015;Hirsh-Pasek et al., 2015b;Nikolayev et al., 2020).
Furthermore, due to design limitations, e.g., poor interface design practices and lack of guidance and feedback during the interaction, children might have difficulty learning the content as intended (Soni et al., 2019). Another crucial factor is that, even though the Android Play Store, over 95% of apps were classified as 'free', the cost of developing and marketing mobile apps is substantial, so, usually, alternative forms of revenue generation (monetization) are often necessary for developers in this context. Most mobile apps can be used free of charge, thus including alternative ways of monetization, such as advertising (Fitton & Read, 2019). The implications of in-app advertisements and buying opportunities for adults have been studied. However, despite the well-documented susceptibility of younger users to manipulation, they have received comparatively little consideration (Fitton & Read, 2019).
The above defaults pose severe challenges for educators and parents in finding high-quality apps (Livingstone et al., 2018). Thus, caregivers could be assisted from a tool designed to evaluate educational apps based on early years learning theory. Such an instrument could also aid app developers in ensuring that their products consist of high-quality elements (Kolak et al., 2021). In this context, the present research investigates the features of effective educational apps. It provides classification and evaluation of several prevalent cases, revealing the crucial criteria that guide parent and/or educators' selection.

Aim and research questions
The aim of the present endeavour is three-fold. The first aim is to present an evaluation instrument for apps and their validity and reliability accounts. The next goal is to achieve a valid classification of several existing popular app profiles, that is, to reveal potential distinct groups with specific characteristics measured by the dimensions of the above-proposed instrument. Note that such classification is not a priori valid, and the hypothesized existing clusters/profiles should be verified by a proper method. Latent Class Analysis (LCA), a statistical model-based method, was used in this study thus, the findings can be generalized as prevailing tendencies in the population. The potential profiles/trends will reflect the existing apps' aggregated features by design. While a third aim is to investigate which of the ensuing profiles is the most popular or preferable among parents. Based on the above aims, the following research questions were posited: -RQ1: Does the proposed instrument for evaluating apps provide valid and reliable measures? -RQ2: Based on the dimensions of the proposed instrument, can a valid classification of the selected apps be derived? -RQ3: What are the characteristics of the ensuing profiles (if any)? -RQ4: Is there any apps profile more prevailing in terms of popularity among caregivers?

Procedures and data collection
The present study followed international research ethics guidelines (Petousi & Sifaki, 2020).

Selection of apps
For the selection and apps analysis, the procedure described previously in the literature was adapted (Soni et al., 2019), leading to a two-phase process. From the 'Children' category apps that belonged to Education or Games in the Google Play app store 200 free apps were randomly selected between February and May 2021. In the next step, the following exclusion criteria were used to find the apps targeted at the intended audience: -Apps with fewer than 500 user ratings were removed.
-Apps intended to be co-used with an adult were removed.
-Apps targeted at children with disabilities were excluded.
The final dataset included 50 apps spanned a mix of games (56%) and educational (44%) apps (see Tables 6 and 7 in Appendix 1). Educational apps included storytelling or teaching children how to write the alphabet.

Apps evaluation phase -data collection
Pre-service early childhood teachers evaluated the fifty selected apps for an educational technologies course. The participants were students (n = 295, 99% females; aged 21-22) at the University of Crete, Department of Preschool Education, Greece, who were familiar with the educational apps and they downloaded the 50 educational apps for preschool-aged children to their smart portable devices. The research followed the University of Crete's ethical practice guidelines and was approved by the respective Institutional Review Board.
The participants had to fill out a specific evaluation questionnaire (five-point Likert scale), which included selected items and dimensions of two validated instruments developed by the authors: the scale Evaluation Tool for Educational Apps [ETEA] (Papadakis et al., 2020) and the scale Perceptions about Educational Apps Use-parents [PEAU-p] (Vaiopoulou et al., 2021). The instrument implemented in this research endeavour is ETEA-2, and its factorial validity is verified with the empirical data.

Data analysis
The Instrument: ETEA-2 comprises of 22 items organized into the structure of five dimensions, i.e., Learning, Suitability, Usability, Security, and Parental Control. Principal Components Analysis (PCA) was applied to the present collected dataset reproduced the fundamental five-dimensional structure. Bartlett's test of sphericity and KMO criteria suggested adequate variance for applying factor analysis [χ 2 = 14730.587, d.f. =201, p < .001; KMO = 0.908]. The Kaiser criterion (eigenvalue > 1) was used for selecting the number of factors, and only items with factor loadings greater than 0.50 were kept. The same factor structure is obtained by applying the principal axis factoring method, whereas varimax rotation was used.   Table 2 shows the correlation matrix for the five components and their means and standard deviations. Descriptive statistics on the evaluation scores of each app in the five dimensions are presented in the Appendix 1. The fifty apps under investigation can be ordered according to their scores and evaluated in each dimension. However, a classification scheme that considers the five dimensions factors is more effective. Thus, in the next step, the five dimensions were used as the basis for cluster analysis to classify the apps. The data were processed using Latent Class Analysis, a model-based cluster analysis that allows the resultant groups to be generalized as existing trends in the population (Magidson & Vermunt, 2004;Stamovlasis et al., 2018).

Latent class analysis
Latent Class Analysis (LCA) was applied to classify the fifty apps into distinct groups/clusters sharing typical profiles based on characteristics measured via the implemented evaluation instrument. Specifically, the averages of 295 evaluation scores in the five dimensions were used (in the Appendix 1). LCA provides several indicators for assessing the model-goodness-of-fit: i.e., the number of parameters, likelihood ratio statistic (L2), Bayesian Information Criterion (BIC), Akaike's Information Criterion (AIC), degrees of freedom, and classification error (Magidson & Vermunt, 2004). Table 3 shows the statistical indexes for one-, two-, three-, four-five-and six-cluster solutions. The three-cluster model is the best parsimonious since it shows the lowest classification error and the lowest Bayesian Information criterion-BIC value. Figure 1 presents the ensued profiles based on the dimensions of the instrument. Cluster/Profile 1 (size = 11) includes apps with high Learning, Usability, Suitability, and medium Parental Control and Security values. Cluster/Profile 2 (size = 21) includes apps with medium Learning, Usability, Suitability, low Parental Control, but high Security. Cluster/Profile 3 (size = 18) includes apps with medium Learning, Usability, and Suitability but low Parental Control and Security. Table 4 shows a qualitative description of the resulting clusters/ profiles on their values in the five dimensions where high, medium and low denote relative levels. Table 5 shows the allocation of the specific fifty apps used (ID numbers, 1 to 50) into the three cluster profiles. The theme content of the app concerned Math, Cluster3 Fig. 1 The three clusters/profiles are based on the five dimensions of ETEA-2 Language, or Both. The distributions of the varying themes within each cluster/profile are analogous, as shown in Fig. 2. Moreover, an independent evaluation of these specific fifty apps on the internet was sought, based on scores indicating the most popular (in terms of ratings) and the most downloadable (in terms of total downloads as provided by the platforms) apps. Interestingly, the members of Cluster2, which belong to the profile that is distinguished for scoring higher in the security dimension, are by far the most popular and the most downloadable apps (Fig. 3). This finding is of paramount importance for anyone involved in the application of mobile technology for young  2, 9, 10, 12, 13, 17, 18, 19, 33, 46, 48 1, 3, 4, 5, 6, 7, 8, 14, 20, 25, 26, 28, 34, 35, 36,37, 41, 42, 43, 47, 49 11, 15, 16, 21, 22, 23, 24, 27, 29, 30, 31, 32, 38, 39, 40, 44, 45

Discussion
Concluding on the research question (RQ1-RQ4), it is clear from factor analysis and internal consistency measures that the proposed instrument conforms to validity and reliability requirements (RQ1). By implementing Latent Class Analysis based on the five dimensions of the proposed questionnaire, a valid classification of the selected apps was derived (RQ2). This led to three emergent profiles, which are understood within the dimensionality of the evaluation system (RQ3). Auxiliary analysis showed that profile 2, which scores higher in security issues, appears as the most popular or desirable among parents or teachers (RQ4). The findings of the present research contributes to the relevant literature and enrich the ongoing discussion on app use and its incorporation into the formal or informal educational process. Smart interactive technologies are already recognized for improving the quality of the learning process mediated by devices and wireless networked technologies (Pondee et al., 2021). The mobile apps marketplace constantly evolves with new media forms claiming to equip educators and parents with tools for their children and students' entertainment and education. The two major app Fig. 3 The mean number of downloads for the three clusters/profiles shows the superiority of Cluster / Profile 2, which emphasizes the Security dimension stores provide app information and controls, but this is not enough. All app ecosystem parts -the app stores, the developers, and the third parties providing services within the apps -must do more to ensure that educators and parents have access to clear, concise, and timely information about the available apps (Federal Trade Commission, 2012). Given that children can learn from well-designed educational apps, this signals a new challenge for developers to create apps with quality content. In this sense, the quality of tablet app experiences, when it facilitates developmentally appropriate learning practices, may be more important than the time spent using tablets. Thus, rather than focusing merely on screen time statistics, a deeper and more differentiated analysis is required. Researchers should focus on analyzing child device usage and how this usage is associated with literacy and language learning outcomes (Neumann, 2020).
The present study results align with the findings presented earlier in the relevant literature, despite the differences in the games, targeted ages of the players, media platforms, sample sizes, and countries of origin. Specifically, simple feedback types, such as verification, seemed to be the most popular feedback in other digital media for children, as Blair (2013) and Callaghan and Reich (2018) found. The same studies also noted the small number of games that offered some facilitating or scaffolding. The discrepancies between negative and positive feedback were similar to the findings of Benton and colleagues (Benton et al., 2018). The lack of positive feedback for motivation and encouragement was consistent with Fong et al.'s (2019) study. These findings highlight that content designers' potential reliance on standard conventions or design strategies that might not be most beneficial for children's learning (Nikolayev et al., 2020).

Concluding remarks: Security matters
Teachers play a decisive role in technology success in schools, and their experiences in teacher education programs influence how they subsequently use technology (McGarr & Gallchóir, 2020). Nevertheless, mobile games to support teaching and learning pedagogies still challenge teachers. Thus, pre-service teachers require specific knowledge to design meaningful learning experiences with mobile games and pedagogically implement them in their teaching (Pondee et al., 2021). Furthermore, updated guidelines or checklists that teachers can evaluate apps are essential. Such approaches should call for apps to be age-appropriate, have clear instructions, hold well-designed multimedia and interactive features, and intuitive operational features for young children. They must also warn against violent characters or actions, negative social values, and gender stereotyping (Gromik & Litz, 2021). Teachers should check literacy and language apps used in the classroom against interactivity, usability, cultural awareness, collaboration, language and literacy content, and learning outcomes (Neumann, 2020). The present work contributes to evaluation issues by presenting ETEA-2, a valid five-dimensional instrument that forms the proposed criteria for assessing educational apps. These criteria, the proposed dimensions, originate from scrutinizing and elaborating on the perceived challenges and advantages of apps in the literature i.e., the factors influencing the attitudes and preferences of the users. The evaluation process was carried out by teacher-students, who, even though are not experts, will most likely be the final recipients in educational settings.
In addition, by applying LCA, a model-based cluster analysis calculates fifty popular apps into three distinct groups/ profiles, being trends that share specific characteristics by design. Profile 1, having at least medium security and parental control, is rated higher in learning, usability, and suitability. Profile 3 has medium learning, usability and suitability, low security and parental control. Profile 2 differs from Profile 3 in displaying high security characteristics. Profile 1 appears to have the highest overall value. However, it is not the most popular or desirable one. Profile 2 takes this position.
Moreover, on behalf of early childhood teacher educators, efforts should be made to promote the use and applicability of technology in a responsible and developmentally proper way (Nikolopoulou, 2020), by introducing the students to evaluation instruments and educating them to use the effectively. In this digital pedagogical enterprise, it is imperative to endorse all the three protagonist components involved: the technology itself, evaluated according to the educational or entertainment goals; the child to whom it is addressed; and the omnipresent adults (teachers, parents, or caregivers) who guide learning and demand their share in control and safety issues.
Overall, this paper flags the need for a systematic evaluation of mobile apps adressed to young children. Although the looked-for learning outcomes, usability and suitability issues are already acknowledged, the findings signify that security matters more to caregivers and convey the message to the present and future app creators to adopt a more rigorous approach in designing educational apps, giving special consideration to safety.

Limitations
Although the current research study relied on rigorous statistical methods, it is not without limitations, arising from specific choices and peculiarities of the procedures. First, the pre-service teachers, who served as evaluators, comprised a convenience sample that happened to be already educated and partly informed in educational technology. The vast majority were females; thus, this study can explore no gender differences. In addition, the findings could not be generalized to other countries, cultural backgrounds, and, of course, both genders. Finally, the apps were selected randomly. This limitation, albeit important, is mitigated by the numerous studies around the globe highlighting the low quality of the self-proclaimed educational apps.