Introduction

Most Higher Education Institutions (HEIs) underwent a fast transition towards a completely remote academic, teaching and learning paradigm in a first phase of response to the COVID-19 pandemic. Worldwide, HEIs quickly enhanced digital learning opportunities for both students and teachers and encouraged new forms of teacher collaboration. According to a recent survey (Schleicher, 2021), online platforms have been widely utilized at all levels of education across countries. However, this transition embraced severe challenges related to deploying trustworthy and credible ICT user identity management solutions.

Continuous user identity management refers to an iterative process, throughout a users’ session with a remote service, that confirms that the interaction between the user and the system is performed continuously by the same person who initially logged in, and is therefore eligible to use remote services and computational resources.

From an academic and pedagogical perspective, being able to verify the students’ identity continuously is mandatory within several online learning scenarios, like remote laboratories, exams and lectures, to prevent fraudulent behavior where individuals intentionally impersonate others in order to unethically participate in academic activities. Such uncertainty puts at risk the whole mission of HEIs, which is to ensure that individuals have acquired knowledge and competencies in order to acquire a profession. From another perspective, continuous student identification not only improves the credibility and trustworthiness of remote learning systems, but can also be used as a tool for student presence awareness. Being able to accurately perceive who is attending in remote synchronous and asynchronous scenarios scaffolds and reproduces social situations that occur in the physical classroom, such as group attendance and awareness of classmates’ presence.

The recent transition to online distance learning led to a major adoption of video conferencing tools by HEIs worldwide due to their feasibility, flexibility, and accessibility (Cai & King, 2020). As a result, an exponential growth on the use of video conferencing tools for higher education purposes was observed, increasing inevitably the popularity of software tools such as ZoomFootnote 1 and Microsoft TeamsFootnote 2 (Massner, 2021). While providing interesting features, such solutions are not tailored to classroom use cases, as videoconferencing is only viable if teachers and instructors use the tools effectively (Martin, 2005), which requires training, careful planning and commitment to maintain student engagement in online courses. Realizing this, key players of the video conferencing market felt the need to retain the large number of recent users by developing exclusive features for online education (see  (Omar & Abdul Razak, 2020), (Kim & Lee, 2020) and (Okmawati, 2020)).

In the response to the COVID-19 pandemic and global shutdowns, institutions have gradually improved their support for delivery of distance learning to students through e-learning infrastructures, and according to Huang et al. (2020) and Coman et al. (2020), this became an opportunity to optimize the process by pushing education institutions to: (i) improve internet infrastructure to avoid interruptions, e.g., during video-conferences; (ii) use more intuitive tools to help students assimilate information; (iii) provide compelling and interactive electronic resources; (iv) build online communities for students to counter isolation; (v) foster creativity by using techniques, such as debates, or learning based on discovery and experience; and (vi) provide services to disseminate trends and policies adopted by universities and governments. Given the aforementioned, some institutions developed new in-house Learning Management Systems (LMSs), such as the web-based UCTeacher solution at the University of Coimbra.Footnote 3

Besides flexibility and accessibility (Bakia et al., 2012), online learning eliminates barriers of space and time, facilitating collaboration, and allowing students to learn in their own rhythm (Arkorful & Abaidoo, 2014), overall students tend to assimilate information in the same way than in traditional classrooms (Navarro & Shoemaker, 2000) and online learning seems to be particularly beneficial for shy and slow learning students, who are able to express themselves more often in the online classroom (Stern, 2004). However, there are also downsides in distance learning, such as students missing the social aspects of learning on campus, leading to feelings of isolation. Other disadvantages include the susceptibility to distractions, dependency on the internet and computers, which may fail unexpectedly, decreased motivation, and physical health consequences when spending several hours working with computers (Coman et al., 2020).

Statement of contributions

Distance learning has been the object of a substantial amount of research for decades. Regardless, continuous user identification within distance learning has proven to be a complex problem that has not been robustly solved, primarily due to challenges in non-intrusive authentication during online tasks, precision and accuracy of intelligent biometric techniques, trade-off between computational requirements and usability, security and data privacy aspects, and challenges related to integrating such technologies in existing learning management systems of HEIs.

This article aims to shed light on a contemporary debate on higher education, following a technology perspective. To this end, we propose the following contributions:

  • Comprehensive review of existing efforts in continuous user identification for distance learning, specifically focusing on intelligent proctoring systems and automatic identification methods. This review offers a valuable resource for researchers and practitioners in the field.

  • Explore various image-based identification, voice-based identification, and biometric trait combination methods. By discussing their applicability in the context of distance learning, we provide insights into the identification technologies and their potential integration.

  • Highlight the relevance of data privacy-preservation issues in distance learning. This emphasis contributes to the ongoing discussion about protecting sensitive student information while implementing continuous user identification systems.

  • Identify research gaps, open issues, and prospects for the advancement of continuous student identification systems. By pinpointing areas that require further investigation, the study guides future research efforts and informs the development of innovative frameworks.

  • Propose a tentative roadmap for the future, aiming to design an innovative framework for student identity management. This roadmap serves as a starting point for researchers and practitioners interested in implementing privacy-preserving techniques for face, voice, and interaction-based continuous user identification.

Review structure

We begin by introducing the concept of distance learning in "Distance learning" section. Next, we present a literature review on intelligent user interfaces for distance learning, with a specific focus on online proctoring systems in "Intelligent online proctoring systems" section. Afterwards, we turn our attention to technologies for continuous user identification in "Technologies for continuous user identification" section, providing a comprehensive review of the most relevant methods of image-based identification, voice-based identification and combination of biometric traits.

Following the literature review, we briefly discuss the relevance of data-privacy preservation issues within the context of distance learning in "Data privacy-preservation issues" section, and the current scientific and technological landscape including an analysis of open research questions is discussed in "Research gaps, open issues and opportunities" section . Finally, in "Conclusion" section the article concludes by summarizing key findings, drawing final conclusions and proposing a tentative roadmap for the future, that we will attempt to pursue in our own ensuing research efforts.

In this study, we have employed a mixed methodology for selecting papers in our literature review. Our approach consisted of three main strategies. Firstly, we conducted database searches to identify relevant studies on prevailing intelligent proctoring systems and automatic user identification methods. Secondly, we sought expert consultation to obtain valuable input and recommendations in the different subject areas, thereby suggesting relevant studies and pointing to emerging as well as seminal works. Lastly, we have applied snowball sampling by expanding our search through the examination of the reference lists of the initially identified relevant papers for the study. By employing this combination of methodologies, we aimed to ensure a comprehensive and diverse selection of literature for our review.

Distance learning

Recent developments in distance learning have been driven by the increasing availability of digital technologies and the growing demand for flexible and accessible education. One major trend is the rise of Massive Open Online Courses (MOOCs), which offer free or low-cost courses from top universities and institutions around the world. Another trend is the use of virtual and augmented reality technologies to enhance the learning experience, such as simulating real-world scenarios for medical students (Zafar et al., 2020).

Distance learning refers to a pedagogic model that utilizes technologies, such as the Internet, allowing students to learn remotely without the need to physically attend a traditional classroom.

In distance learning, students can access course materials, communicate with instructors, and interact with peers through virtual platforms such as video conferencing, discussion forums, and LMSs. This model of education has recently grown in popularity, mostly due to the COVID-19 pandemic, which accelerated the adoption of distance learning as a necessary alternative to in-person classes. In this model, content is generally presented and delivered online through two different methods:

  • Synchronous learning involves real-time interaction between the teacher, students, and course content. This method is similar to traditional classroom instruction, as the class meets together virtually at the same time. Students can engage in discussions, ask questions, and share their thoughts with each other and the instructor.

  • Asynchronous learning, on the other hand, relies on self-directed study and group collaboration via online platforms. Students can access course materials such as documents, videos, or journals, on their own schedule, and engage in discussions with their peers and instructor at their own pace.

In general, it is common to use two types of assessments for evaluating student progress: formative and summative assessments. Formative assessments are designed to evaluate the student’s understanding of the material and learning during the course. These are typically low-stakes assessments that provide ongoing feedback to students and instructors to identify areas where additional instruction or practice may be needed, allowing them to adjust teaching strategies accordingly. Formative assessments can take many forms, such as quizzes, homework assignments, or group projects (Wang & Tahir, 2020). Summative assessments are often administered at the end of a course or unit to evaluate student learning outcomes. These assessments are typically high-stakes and may determine a student’s final grade or certification. Examples of summative assessments include final exams, term papers, and presentations. Since these assessments have a higher impact in the final grade, students might feel more compelled to engage in academic fraud (Genereux & McLeod, 1995), which is particularly aggravating in the case of online learning, requiring the implementation of preventative measures during these evaluations, which for mid-term or final exams are usually in the form of proctoring systems to monitor students during the exams.

Intelligent online proctoring systems

Intelligent user interfaces have been studied within distance learning, which primarily focus on: (i) providing personalized features based on the knowledge of students on particular subjects, their emotions, mood, personality, etc.; and (ii) on building intelligent proctoring systems for online examinations.

Several works on intelligent user interfaces examined e-learning platforms in the context of different learning styles paired with users’ expectations, motivation, habits, and needs. These factors result in building an adaptive learning system providing the users with a unique learning experience “based on the learner’s personality, interests and performance in order to achieve goals, such as learner academic improvement, learner satisfaction, effective learning process and so forth” (El Bachari et al., 2010; Truong, 2016; Kulaglić et al., 2013; Alexandru et al., 2015; Klašnja-Milićević et al., 2016; Montebello, 2018). In this section, we primarily focus on online proctoring systems with Artificial intelligence (AI) technology enhancement.

Online proctoring system refers to software used for examination supervision running on a student’s computers after his/her identity has been approved. During the examination, the proctor (either a real person or an AI agent) is granted access to the student’s web camera, computer screen, microphone, and in some cases computer mouse and keyboard.

The new reality of the COVID-19 pandemic has proven more than ever the importance of online proctoring systems. Even though there are many controversies related to the application of this technology concerning the potential invasion of students’ privacy, civil rights and leading to additional stress or anxiety just to name a few (Helms, 2021; Coghlan et al., 2021), still 54% of HEIs utilize them and the statistics foresee that the further growth will continue, reaching a market size value of US$ 1,187.57 Million by 2027 (Grajek, 2021; Partners, 2021).

Online proctoring systems have been presented as a supporting tool in remote education for over 20 years. Initially, they were implemented as a feature of computer-based examinations to bridge the gap between remote and ‘on-campus’ conditions. With time, several online proctoring systems have been developed to serve as an ‘off-campus’ examination practice, fostering increased ownership of laptops and tablet computers and supporting remote education (Selwyn et al., 2021). The extensive and persistent evolution of proctoring systems has not only inspired numerous research works investigating their application and ethical concerns (Henry & Oliver, 2021; González-González et al., 2020; Coghlan et al., 2021), but also closely relates to user identity verification and access management (Fidas et al., 2021; Gonzalez-Manzano et al., 2019).

With the development of information systems and online accessibility to e-learning, e-banking, e-gambling, or e-government platforms, the necessity of authentication and providing correct access only to the authorized individuals became an integrated part of every identity management system. This is especially critical for all educational organizations that offer MOOCs and whose online certification and accreditation relies on students’ online verification and assurance that all academic achievements were earned honestly. When executed inadequately, the reliability of credentials and certification earned online are affected and harmed through questioning their authenticity (Fidas et al., 2021; Labayen et al., 2021).

A recent research work (Nigam et al., 2021) provides a thorough review of the existing AI-based proctoring systems (AIPS) and online proctoring systems (OPS). The evolution from OPS to AIPS can be traced through the adoption of the technological development, transitioning from human invigilators manually verifying individuals’ identification (e.g., by checking their identity and asking a few proofing questions) and overseeing the test-taker, to the application of AI processes that analyze and continuously monitor biometrics (e.g., facial recognition to match the photographed identity with the student’s face, eye tracking, voice recognition or facial detection to detect any signs of malpractice). Nigam et al. (2021) provide a comprehensive overview of OPS, distinguishing between three types: (i) live proctoring, characterized by the use of the proctoring system in real-time and involvement of the human proctor who can flag students engaged in malpractice; (ii) recorded proctoring, characterized by registering the video for later human proctor analysis of face and eye movements; (iii) automated proctoring, characterized by limited involvement of the human proctor and automated identification of malpractice behavior (Hussein et al., 2020). By incorporating the technological advancement of AI processes in the last type of OPS, this model represents the group of AIPS. However, all OPS focus on two critical requirements: granting access to the web camera (for recording purposes) and preventing access to other web browsers, as well as preventing the opening of new web browser tabs on the computer during the examination (Alessio et al., 2017).

However, these systems are not foolproof and are vulnerable to various attack vectors (Constantinides et al., 2023). One such vector is a student violating identification proofs, wherein the student may use fraudulent identification documents (e.g. using still photographs of someone else) to bypass the system. Another attack vector is a student switching seats after identification, wherein the student may swap places with another person after passing the identification step. A non-legitimate access to shared LMS credentials can be used to bypass standard password based authentication. Additionally, computer-mediated communication through voice or text-written chat and screen sharing and control applications can enable cheating during exams. Students may also access forbidden online resources or receive assistance from non-legitimate individuals on their computer or through a secondary input device. Furthermore, students may communicate and collaborate with others in the same physical context or receive answers on a whiteboard or computing device. All these attack vectors pose significant challenges to proctoring systems, highlighting the need for continuous improvement and adaptation to new forms of academic dishonesty.

To address these threats, the design of AIPS systems should consider a variety of parameters depending on the hardware that is available for students (Slusky, 2020; Atoum et al., 2017; O’Reilly & Creagh, 2016; Li et al., 2015). These parameters include: (i) video recording of the user and their surroundings using a camera, since it is integrated in the majority of today’s laptops (Machuletz et al., 2018) and available through a simple web camera add-ons for desktop computers, which provides the proctor with live or recorded monitoring of the user’s identity and activity, preventing impersonation and providing control over background movements  (Harish et al., 2021); (ii) audio recording using a microphone, which is also commonly found in modern laptops, allowing the analysis of the audio for biometrics and background sounds (Sinha & Yadav, 2020; Prathish et al., 2016); (iii) involvement of a human proctor, which due to inaccuracy of existing solutions is still needed in the AIPS design model. With the supervision of a human proctor, who oversees the accuracy of the system by manually flagging suspicious behavior, intelligent mechanisms learn to recognize and mark more accurate future suspicious activities (Li et al., 2015); (iv) video recording of the desktop environment, revealing whether the user has any opened tabs on a web browser, ensuring that only the allowed materials are accessed during the examination (Slusky, 2020; Beust et al., 2018); (v) restrictions on applications running on the deskop environment, to ensure users only access applications and/or websites that are allowed during the examination (Slusky, 2020), flagging any forbidden attempts  (Metzger and Maudoodi, 2020); (vi) biometric verification, which not only can help to detect possible impersonation threats (Chirumamilla et al., 2020) and improve the security of user authentication (Labayen et al., 2021), but also could automate and support attention tracking, mind wandering and facial behavior analysis (Villa et al., 2020; Blanchard et al., 2014; Baltrušaitis et al., 2016); (vii) eye tracking can prevent malpractice of using external sources of information, such as notes or textbooks (Maniar et al., 2021; Atoum et al., 2017; Li et al., 2015; Villa et al., 2020), albeit with a margin of error allowing users to keep their natural behavior movement; (viii) random question bank methods, in order to provide every individual with a unique paper or set of questions generated only for that user. This parameter can prevent students from trying to share the answers as the examination questions should not repeat among them (Chua et al., 2019; Norris, 2019).

Table 1 Research-oriented proctoring systems

Research-oriented proctoring systems

The importance of student identification and authentication in online activities has been broadly recognized in the academic field, with numerous studies aiming to design a solution supporting user identity verification and authentication. In Table 1, we present an overview of academic driven technologies, that aim to improve credibility of authenticated users and limit the possibilities of most common acts of misconduct. A detailed comparison of research-oriented work can be found in Labayen et al. (2021); Guillén-Gámez et al. (2018). The incorporated solutions concentrate their applicability mainly on user authentication aspects, and therefore address only impersonation threats, leaving a space for misconduct related with the communication, collaboration and resource access threats. Among the presented solutions, the Proctoring System (PS) from Atoum et al. (2017) addresses several threat scenarios, by combining the continuous estimation components, and applying application of multimedia (audio and visual) for continuous user verification, gaze estimation, text on printed papers on the computer screen or keyboard text detection, speech detection, active window detection, phone presence and its usage, and cheating behavior detection. Initial results from a study with 24 test takers, which were evaluated during real-world behaviors scenarios in online examinations, show that the capabilities of the designed system demonstrated nearly 87% segment-based detection rate across different types of malicious behavior threats at a fixed “False Alarm Rate” of 2% (Atoum et al. 2017).

Commercial proctoring systems

Initial attempts to introduce commercial online proctoring solutions took place in the mid 2000’s by Kryterion,Footnote 4 which involved the engagement of human proctors who monitored online exams via web cameras. The company pioneered the market (Foster & Layman, 2013) giving a lead to the development and application of the online proctoring system. Since then, several other organizations have followed, commercializing proctoring technologies through the support of the authentication process, by monitoring the whole session, and/or recording the data for later session analysis. Table 2 presents a comparison of state-of-the-art commercial online proctoring systems, which were selected based on their infrastructure, user verification features, privacy policy of recorded data, integration with LMSs (OReilly & Creagh, 2016; Foster & Layman, 2013; Atoum et al., 2017; Labayen et al., 2021), and their features to help eliminate a number of commonly faced security threats during an online examination (Labayen et al., 2021; Ullah et al., 2016), such as impersonation, communication, collaboration and resource access threats. Accordingly, Smiley Owl (SMOWL) provides complete support in remote learning activities, potentially eliminating several threats. Moreover, this work has been inspired by a thorough analysis of the current market situation and an evident lack of proctoring systems that guarantee a comprehensive and reliable solution. The authors attempt to combine “multi-biometric continuous authentication with continuous visual and audio monitoring, with device activity monitoring and lock-down options with human supervision (only when required)” to fill the gaps in online authentication processes foreseen in remote learning (Labayen et al., 2021).

Table 2 Commercial proctoring systems

Technologies for continuous user identification

Biometric technologies are being considered lately for student identity management in HEIs, as they provide several advantages over the traditional knowledge-based and token-based authentication methods. While biometric technologies have many benefits from both a security and usability point of view, there is still a need for innovative continuous user identification to authenticate students during academic and teaching activities. User identity management is a critical aspect of any information system today aiming to assure that end-users have the appropriate access to sensitive data and services. Core components of user identity management relate to:

  1. 1.

    User authentication aiming to validate that the end-users are allowed to access the system by requiring them to provide various authentication factors, or a combination of them (e.g., textual and graphical passwords, push notifications on smartphones, Time-based One Time Passwords (TOTP), graphical Transaction Authentication Numbers (TAN), biometrics, etc.) (Mare et al., 2016; Ometov et al., 2018; Constantinides et al., 2021);

  2. 2.

    Continuous user identification aiming to verify the end-user’s identity in real-time (after successfully authenticating), while carrying out tasks (Gonzalez-Manzano et al., 2019; Buschek et al., 2015);

  3. 3.

    Access control aiming to regulate user access to the system resources (Rouhani and Deters, 2019).

In this context, biometric-based authentication within user identity management represents a significant and evolving field of research and practice (Bhalla, 2020). Specifically, biometrics can create high entropies of the secret biometric data used for authentication, minimize administration expenses, offer convenience to end-users compared to traditional knowledge-based (e.g., passwords) and token-based (e.g., TOTP) solutions, and they provide a sense of technological modernity to the end-users (Leaton, 2017; Pagnin & Mitrokotsa, 2017). Common approaches for biometric-based authentication are based on the end-users’ physical (e.g., fingerprint, iris, face, voice, etc.) and/or behavioral characteristics (e.g., typing patterns, interaction patterns, engagement patterns, etc.) (Jain et al., (2016; Rui & Yan, 2018). Such technologies have become an important means for enforcing strict security policies in a variety of domains, such as government, education, etc. (Bhalla 2020; Leaton, 2017; Labati et al., 2016).

HEIs already started considering the adoption of biometric technology for seamlessly identifying and/or authenticating students within teaching and learning activities, academic services, etc. This is a key aspect in ensuring the development of suitable procedures and contexts that prevent students from engaging in malicious activities, including prohibited communication and collaboration among students, as well as impersonation cases (i.e., intentionally pretending someone’s identity in order to unethically participate in academic activities).

Through the LMS, students commonly authenticate using a single-point textual password system, which assumes integrity of the student’s attendance within the whole academic activity. As for video conferencing tools, student identification and verification procedures are primarily conducted manually through human intervention (Fidas et al., 2023); for instance, instructors or invigilators visually compare the student’s identity with an identity (ID) card or student card, which is presented by the student using a web camera. The approaches mentioned earlier fall short in detecting fraudulent student activities after the single entry-point of authentication has been performed successfully (Frank et al., 2012). In addition, manual and individual student identification is time-consuming, adds low value and presents a difficult endeavor for instructors throughout the whole academic activity. This also brings consequences at the level of assuring a satisfactory implementation of the HEI’s curriculum and a fair students’ evaluations process.

The literature is rich in solutions for image-based identification and voice-based identification. Below, we review relevant techniques for biometric identification, focusing firstly on single-modality identification through image and voice, and then addressing alternative interaction-based identification and the combination of biometric traits afterwards.

Image-based identification

We start by looking into image-based identification and specifically face recognition.

Face recognition is a contactless biometric technology that matches (authenticates) images or videos of human faces against a database of known individuals.

Such systems are widely used in a broad range of applications, such as access control, financial services (validating transactions), video surveillance, law enforcement, social media, smart advertising, automotive industry, etc.

In general, an image-based face recognition framework follows a pipeline that requires three main modules: (1) face detection, (2) image registration/normalization and (3) classification. The first stage, as the name implies, is used to localize the face in the image. Typically, it is a process that exhaustively scans the input image and returns the bounding box that covers the face region. The registration step deals with normalization tasks. Most approaches rely on non-rigid image aligning techniques to locate a set of facial landmark features, while others use simple image warping methods. Regardless, the main goal is to establish a common reference frame, sometimes known as the canonical reference frame, ensuring that a normalized face always has the same spatial dimensions. The last stage is responsible for the recognition/classification itself. It takes a normalized (warped) version of the face, from previous stages, and makes use of pattern recognition and machine learning techniques to infer the target individual, as illustrated in Fig. 1. In this section, we primarily focus on the last recognition module.

Fig. 1
figure 1

Illustration of a face recognition system under ongoing development within the TRUSTID EU research project (https://trustid-project.eu/) Faria et al. (2023)

Facial recognition has been a widely studied research topic in the computer vision community. A substantial body of work has been reported in the past, with varying degrees of success. In fact, the consistent identification of human faces in unconstrained environments is still an open issue. Note that this is no easy task, as such systems need to account for variations in facial appearances, expression, ageing, motion and orientation (3D pose), as well as image acquisition issues, e.g., lighting, occlusion, resolution, focus and motion blur.

Early approaches, rely on low-dimensional representations of the (normalized) face. Linear models such as Turk and Pentland (1991a, 1991b) or Belhumeur et al. (1997) use Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), respectively. Later, manifold-based face embedding learning was proposed, and methods such as He et al. (2005) or Dornaika et al. (2015) employ locality preserving projection, i.e., a linear approximation of Laplacian Belkin and Niyogi (2003). Still under the low-dimensional design, sparse coding-based approaches (Wright et al., 2009; Zhang et al., 2011; Yang et al., 2011) have also been proposed.

Nonlinear learning based methods, such as kernel PCA Kim et al. (2002), Support Vector Machine (SVM) Heisele et al. (2001), Boosting Guo and Zhang (2001) or Random Forests Kremic and Subasi (2016); Mady and Hilles (2018), were also applied to face recognition tasks. Despite the relative success of the previous approaches, in general they fail under target images that differ from seen data—the so-called training set. Feature-based techniques (local texture descriptors) were later introduced to mitigate this issue. Popular approaches in facial appearances include Local Binary Patterns (LBP) Ahonen et al. (2006), Discrete Cosine Transform (DCT) Hafed and Levine (2001) or Gabor filters Liu and Wechsler (2002). These techniques are usually applied to extract local face characteristics and they can be combined with any of the previous linear or nonlinear learning strategies.

After the Krizhevsky et al. (2012) technique won (by a large margin) the ImageNet Deng et al. (2009) object detection challenge in 2012, Deep Convolutional Neural Networks (DCNNs) have become a reference to solve many computer vision problems, including face recognition.

In short, DCNNs are a cascade of multiple network layers, where convolution operations, activation functions and pooling are the basic building blocks. The configuration of blocks and layers defines the architecture of the CNN. Convolution layers aim to extract features (i.e., to learn a set of weights or filters). These layers are usually followed by an activation function that applies nonlinear transformations, constraining the outputs of the filter responses. Pooling is a nonlinear down-sampling strategy that expands the receptive field and reduces the number of parameters, and hence the overall computation cost. In these settings, the loss function needs to be specified. This function, defined during the learning/training stage, measures the error in the network’s prediction compared to the ground truth.

The DCNNs are compelling techniques that allow learning their own feature representations (Yi et al., 2014), while simultaneously leveraging large amounts of data, thus reinforcing discriminability. It is worth mentioning that DCNNs are highly parallelizable and therefore can be accelerated by Graphics Processing Unit (GPU) computation. A notable approach in deep face recognition is the Taigman et al. (2014) network, which achieved human-level performance, for the first time, in the Labeled Faces in the Wild benchmark (Huang et al., 2007). DeepFace is a 9-layer CNN (120 M parameters) that utilizes several locally connected layers without weight sharing, whose input is a normalized warped image obtained through a 3D nonrigid alignment process.

Currently, the most common face recognition techniques rely on the He et al. (2016) architecture and its variants. Several reasons contribute to this. Firstly, ResNet has demonstrated exceptional performance in the 2015 Deng et al. (2009) and Microsoft Common Objects in Context Lin et al. (2014) challenges. Secondly, the authors of Cao et al. (2018) have used the ResNet-50 architecture to successfully validate face recognition performance in their proposed dataset. Finally, and perhaps one of the key aspects, is the performance/computation effort ratio. ResNet can deliver satisfactory results while being relatively lightweight. Its computational advantage relates to the single fully connected layer that is used at the end of the network, as opposed to the popular Simonyan and Zisserman (2015) network, which includes 3 large fully connected layers.

Later, other DCNNs research directions were exploited. Several networks architectures aimed to improve the performance by learning deeper features (Zheng et al., 2016; Simonyan & Zisserman, 2015; Gruber et al., 2017). Alternatively, other solutions aim to learn embeddings directly through metric learning instead of training a multi-class classifier. Schroff et al. (2015) introduced the triplet loss, which enforces faces within the same class to be closer to each other than to faces from different classes (with a soft margin). Parkhi et al. (2015) follows a similar approach and proposes a fine-tuned version of FaceNet.

Alternative solutions propose enhanced loss functions, such as the center loss (Wen et al., 2016), the Wang et al. (2017) (cosine similarity), the Liu et al. (2017) (angular softmax) or the Deng et al. (2019) (additive angular margin).

Voice-based identification

Research on digital audio processing, voice and speech understanding and computational linguistics has focused on two main fronts: speech recognition and speaker recognition (Peacocke & Graf, 1995).

Speech recognition involves techniques and methodologies aimed at achieving close to real-time recognition of speech by a computer (Vicens, 1969). It allows translation of spoken language into text (Dimauro et al., 2017), thus it is also known as automatic speech recognition (ASR) (Yu & Deng, 2016), computer speech recognition (Zhang & Liu, 2018) or speech-to-text (STT) (Dimauro et al., 2017).

Generically, the main goal of speech recognition is to develop techniques that enable computers to accept speech input (Reddy, 1976; Waibel & Lee, 1990). Research in the topic dates back to 1952 when a rudimentary recognition system called Audrey, developed at Bell Labs, was able to identify the first ten English digits (Meng et al., 2012). Since then, speech recognition has evolved into a significant research field.

The work of Huang and Lee (1993); Huang et al. (1993) is considered a major milestone in speech recognition research, marked by the release of the Sphinx-II speech recognizer at Carnegie Mellon University. This system was the first to achieve speaker-independent continuous speech recognition with support for a large vocabulary set of 1000+ words. It utilized dynamic and speaker-normalized features, and for modelling acoustic-phonetic phenomena, it employed semi-continuous Hidden Markov Models (HMMs), senone, and a tree-based allophonic model. It also led to reduced error rates in vocabulary- and speaker-independent speech recognition by supporting speaker adaptation, efficient search, and language modelling.

While there have been predecessor methods, such as Dynamic time warping (DTW)-based speech recognition (Wan & Carmichael, 2005; Amin & Mahmood, 2008), the vast majority of general-purpose speech recognition systems rely on HMMs, a statistical method for speech processing, that employ a Markov state diagram to capture the temporal properties of speech, and a Gaussian mixture model (GMM) to represent the spectral properties of speech (Rabiner, 1989). HMMs are used in a wide range of applications, from isolated word recognition systems to large vocabulary speech understanding systems. The keys to the success of HMMs include automatic training, simplicity and computational feasibility (Benesty et al., 2008). In recent years, with the increase of computational power, HMM recognition has also been combined with neural networks for pre-processing, feature transformation or dimensionality reduction (Hu & Zahorian, 2010). For instance, in Hadian et al. (2018), a simple HMM-based end-to-end method for large vocabulary continuous speech recognition is presented. The authors propose a one-stage training approach using a lattice-free maximum mutual information (LF-MMI) objective function in a flat-start manner, i.e., without running common HMM-GMM training and tree-building pipeline. This method outperforms other state-of-the-art approaches under similar conditions, reducing the word error rates (WER) to 10% to 25% on well-known speech databases.

More recently, benefiting from large training data and faster hardware, researchers have started effectively training deep neural networks for speech recognition, using a larger number of context-dependent output units to improve performance (Hinton et al., 2012; Deng et al., 2013). This has led to the proliferation of applications of deep feed-forward neural networks (DFFNNs) for speech recognition (Nassif et al., 2019), and widening the performance gap between acoustic models based on DFFNNs and those based on GMMs. Nowadays, modern end-to-end automatic speech recognition systems (e.g., from Apple, Google, Amazon and Microsoft) are deployed on the cloud to overcome the impracticality of deployment on personal devices.

In spite of the importance of speech recognition for modern applications such as vehicles, healthcare, military, telephony and others (Vajpai & Bora, 2016) as well as its extensive research footprint, continuous user identity management may particulary benefit from the topic of speaker recognition for voice-based biometrics.

Speaker recognition addresses the process of automatically identifying a speaker from voice samples (Poddar et al., 2018). For this reason, it is also known as voice recognition (Van Lancker et al., 1985).

Speaker recognition allows for verifying the identity of a speaker as part of a security process, using the acoustic features of speech which differ between individuals (Sambur, 1975). Speaker recognition research dates back to the 1960s (Hargreaves & Starkweather, 1963; Atal, 1969) and is traditionally divided into two main axes: Speaker verification (SV) and speaker identification (SI). On one hand, SV addresses the authentication issue of a claimed identity of a person from his/her own voice samples (Kinnunen & Li, 2010), enabling the confirmation of one’s identity. On the other hand, SI aims to identify a speaker from a given set of speakers from the input speech signal (Chakroborty & Saha, 2009), allowing, for instance, the determination of an unknown speaker’s identity.

The relationship between speech recognition and speaker recognition is clear. For instance, recognized words enable the use of text-dependent speaker modelling techniques. Also, the choice of words or pronunciation can be a useful indicator of speaker identity, as described in Stolcke et al. (2007). In that work, authors survey speaker recognition techniques that make use of speech recognition, e.g., text-dependent modelling and extraction of higher-level features such as speech prosody (supra-segmental pitch, duration, and energy patterns), showing the potential of combining both types of techniques.

Researchers from Google Deepmind teamed up with the French National Centre for Scientific Research (Seurin et al., 2020) to present a new paradigm: Interactive Speaker Recognition (ISR). For this, personalized utterances are requested from users in order to build a representation of the speakers, and the speaker recognition task is then translated to a sequential decision-making problem through a Markov Decision Process, which the authors propose to solve using Reinforcement Learning (RL). The method adapts a standard RL algorithm, which builds an iterative strategy to maximize the identification accuracy, while querying only a few words. Results show that the RL enquirer steadily improves upon training, and it consistently outperforms two non-interactive heuristic baselines, while using minimal speech signal data.

Current state-of-the-art techniques on speaker recognition follow the same trend as in speech recognition, i.e., they mostly rely on machine learning, and specifically deep learning, which has shown the most promising results for SV and SI within the literature, due to the ability of using big data. In Boles and Rad (2017), a design of a text-independent voice identification system is presented. Audio features are extracted using Mel-Frequency Cepstral Coefficients (Hasan et al., 2004), and the lower 20 coefficients are fed into an SVM neural network algorithm for speaker identification, with an accuracy of around 97% for a full 40-person development set. Ravanelli and Bengio (2018) propose to drastically reduce the number of parameters in the first convolutional layer of a Convolutional Neural Network (CNN) to force the network to focus only on the filter parameters with a major impact on performance for SV and SI tasks. Thus, the architecture proposed, named SincNet (see Fig. 2), converges faster than standard CNNs by discovering more meaningful filters in the input layer, outperforming other speaker identification systems (sentence error rates \(< 1\%\)) and speaker verification systems (equal error rate \(\simeq 0.5\%\)) on a variety of datasets. Focusing on very challenging noisy and unconstrained conditions, citenagrani2020voxceleb present Voxceleb: a large-scale audio-visual dataset using a fully automated pipeline, which obtains videos from YouTube and performs active speaker verification using a two-stream synchronization CNN and confirms the identity of the speaker using CNN-based facial recognition. The resulting dataset consists of over a million utterances from over 6000 gender-balanced speakers. The authors then compare different CNN architectures and new training strategies to identify voice under various conditions and conclude that the proposed trained relation module added into a Siamese network outperforms the CNN and non-CNN architectures tested.

Fig. 2
figure 2

The SincNet architecture Ravanelli and Bengio (2018). The speech waveform is convoluted with a set of parametrized sinc functions that implement band-pass filters. Then, a standard CNN pipeline (pooling, normalization, activations, dropout) is employed. Multiple standard convolutional, fully-connected or recurrent layers are then stacked to finally perform speaker classification with a softmax classifier

Interaction-based identification and combination of biometric traits

Several works have focused on identifying users based on alternative methods to image and voice-based identification, via their behavioral traits and interactions (Lamiche et al., 2019; Bailey et al., 2014; Shen et al., 2015; Draffin et al., 2013). In particular, a large body of research has investigated a behavioral biometric-based technique known as keystroke analysis (Bergadano et al., 2002), which captures users’ typing characteristics during keyboard interactions and uses them for authentication purposes. A feasibility study in Clarke and Furnell (2007) investigated two types of common interactions on mobile phones (i.e., entering telephone numbers and typing text messages) and revealed that neural network classifiers can be used for authenticating users based on their typing characteristics on mobile phone keypads. In Tse and Hung (2019), the authors presented an authentication scheme for touchscreen mobile devices that combines the user’s password with features extracted from typing and swiping patterns, revealing that the combined behavioral biometric features enhance the performance of user identification on mobile devices compared to the use of a single set of features.

Keystroke dynamic-based analysis has also been used in graphical-based authentication systems (Chang et al., 2012), by capturing the time and pressure features when users enter their graphical password on a touch screen mobile device, revealing appropriate performance and suitability for even low-power mobile devices. Furthermore, the analysis of touch interactions in handheld devices, namely touch dynamics, has been used for user identification. A study conducted in Sandnes and Zhang (2012) proposed an approach for identifying users based on touch dynamics by considering touch features (e.g., left-hand vs. right-hand dominance, one-handed vs. bimanual operation, gesture size, gesture timing), revealing the effectiveness of using touch dynamics for successfully identifying users. Moreover, touch dynamics have been used for continuous user authentication by considering schematic and motor-skill touch features (Shen et al., 2015).

Other works focused on identifying users by analyzing mouse interaction behaviors, which can be broadly categorized based on the type of authentication (i.e., static authentication and active re-authentication (Shen et al., 2017)). Static authentication usually checks the authenticity of the user once during the login process and requires users to perform pre-defined mouse actions that will be compared with the legitimate user’s profile (Sayed et al., 2013; Shen et al., 2012), while active re-authentication operates continuously by acquiring mouse data during the user’s interactions and implicitly verifying the continued presence of the user (Zheng et al., 2016; Mondal and Bours, 2013; Shen et al., 2012).

Also, behavior-based methods that rely on mobile application usage have been used for authentication purposes (Ashibani & Mahmoud, 2019). Examples include identification of users and detection of anomalies based on users’ interaction with their mobile applications, the use of text messages and calling behavior (Li et al., 2011), implicit or continuous authentication based on the user’s habits and activities with respect to text messages, phone calls, browser history, and location (Shi et al., 2010), and behavioral profiling that authenticates users based on historical application usage (Li et al., 2014). Furthermore, recent works revealed that the generated traffic during accessing mobile applications and the time of accessing these applications can be used to effectively identify users (Ashibani et al., 2018).

Numerous works on continuous or implicit authentication methods have been proposed as an additional non-intrusive security countermeasure (Frank et al., 2012; Shahzad & Singh, 2017; Dzulkifly et al., 2020; Rathgeb et al., 2020). However, existing solutions that simply monitor face and/or body cues are not adequate to prevent fraudulent behavior in online examinations, since they lack students’ interactions and are not able to capture scenarios in which the camera stream switches over to other video sources (Fenu et al., 2018). Other works have focused on the combination of multiple biometric traits. Kaur et al. (2016) discuss and address strong authentication mechanisms through biometrics and propose the idea of a framework model that combines voice recognition with typing pattern/keystroke mechanism recognition to develop an advanced and more secure way to authenticate users in e-learning systems. Focusing on continuous authentication, in Prakash et al. (2020) multimodal biometric traits considering finger and iris print images are extracted to enforce higher security and combined using an optimal feature level fusion (FLF) process. Results report a 92% accuracy for the proposed model when compared to other techniques. Moini and Madni (2009) examine the problem of remote authentication in online learning environments and analyze the challenges of using biometric technology to defend against user impersonation attacks by certifying the presence of the user in front of the computer at all times. They design a client–server architecture for continuous user authentication through combined continuous facial recognition with periodic fingerprint matching to verify the identities of its users. Combining fingerprint with mouse patterns for authentication is discussed in Asha and Chellappan (2008). The authors propose using a multimodal physiological (user fingerprint) and behavioral (mouse dynamics) biometric approach. For mouse dynamics, the authors aim to evaluate mouse movement speed, movement direction, action type, traveled distance, and elapsed time. However, no details about an actual implementation of the said system are given.

A common weakness of works that use multiple-biometric solutions is that they often operate in an intrusive fashion that interferes with students’ activities and requires additional devices. However, these methods offer effective means to prevent and protect against impersonation attacks by unauthorized users. In contrast to the existing one-time authentication methods, continuous authentication goes a long way to ensure that the intended user is present in front of the workstation at all times. However, it cannot detect or prevent fraudulent behavior on the part of the authorized user, nor does it guarantee that only the authenticated and authorized user is present in the same room (Moini & Madni, 2009).

Existing proctoring tools (see "Intelligent online proctoring systems" section) are usually used only during examinations, tend to not consider the rest of the course participation and other important threats related to collaboration and communication with other people, and access to resources. They are not scalable, require technological infrastructure and setup [20], and fall short in adequately addressing privacy concerns related to the recorded videos [20–22]. Specifically, automatic proctoring solutions also exhibit variability in the accuracy of the algorithms and the limited scenarios that they support (Fenu et al., 2018).

Recent advances in biometrics for authentication include touchless fingerprint scanners, in-display fingerprint readers, fingerprint on card, 3D facial recognition, long-range iris recognition, spoof and liveness detection software, and improved machine learning systems that surpass the accuracy of traditional pattern recognition biometric systems (Bhalla, 2020). Despite their potential, these methods often require high-end computing devices and/or additional specific hardware, which are not commonly available in the HEIs domain.

Table 3 summarizes the main features of the most relevant user identification implementations reviewed in this section.

Table 3 Main features of the most relevant user identification works surveyed

Data privacy-preservation issues

Student identity management in HEIs presents several challenges in preserving the privacy of stored biometric data concerning end-users. The key threats and challenges associated with designing secure and privacy-preserving biometric technologies, as discussed in previous studies (Jain et al., 2016; Rui & Yan, 2018; Pagnin & Mitrokotsa, 2017; Tran et al., 2021), include the following: (i) security of biometric data: in many cases cases, biometric data is not kept secret (e.g., fingerprints can be obtained from surfaces touched by the user, faces can be easily acquired from public online sources, voices can be recorded, etc.) (Rui & Yan, 2018); (ii) privacy of biometric data: stored biometric data has the potential to reveal sensitive information about end-users, such as ethnic origin and health details (Pagnin & Mitrokotsa, 2017); and (iii) revocability of biometric data: if biometric data is compromised, revoking the data of end-users becomes extremely difficult (Rui & Yan, 2018).

Therefore, there is a pressing need to implement and deploy innovative solutions that ensure the secure processing and storage of biometric data while maintaining high levels of security and privacy. State-of-the-art approaches (refer to (Jain et al., 2016; Rui & Yan, 2018; Pagnin & Mitrokotsa, 2017; Sarier, 2018; Tran et al., 2021)) for privacy preservation in biometric-driven data include the use of biometric templates. These are digital representations of specific features extracted from a biometric sample, such as the shape of a user’s hand, without storing the exact raw biometric data. This approach avoids potential privacy issues if the data set is compromised. A widely used approach for preserving the privacy of biometric templates involves transforming the biometric template into a new domain through a non-invertible integration of biometric data with externally generated randomness, which provides protection similar to a cryptographic cipher (Teoh et al., 2006). Various non-invertible transformation methods have been proposed, including Cartesian, polar, and surface folding (Ratha et al., 2007). Moreover, the use of biometric cryptosystems, which associate a key with the biometric template (Uludag et al., 2004; Cavoukian et al., 2008), can be combined with neural networks-based transformation approaches (Kumar Jindal et al., 2018; Pandey et al., 2016; Jindal et al., 2019). However, these methods often store the biometric templates in an unprotected manner and are thus vulnerable to attacks.

Furthermore, biometric encryption techniques have been employed to address privacy concerns in biometrics. Traditional cryptographic hashing approaches may not be suitable for biometric data due to its high variability (Pagnin & Mitrokotsa, 2017). Hence, different cryptographic such as homomorphic encryption have been applied to protect biometric templates. In this approach, the encrypted biometric template is stored in the database and during verification, the matching module calculates the similarity score between the encrypted stored template and the encrypted query template (Jindal et al., 2020; Boddeti, 2018). Nevertheless, there is a tradeoff between matching performance, privacy, and computational cost, as the feature extraction methods and template protection methods have been developed independently.

Protocol-based approaches have also been proposed to safeguard the privacy of biometric data, e.g., secure multiparty computation (SMC) protocol, zero-knowledge proof (ZKP) protocol, etc. (Tran et al., 2021). SMC protocols are cryptographic protocols that preserve the privacy of each participant and can be utilized in privacy-preserving biometric systems (Bringer et al., 2013; Chun et al., 2014; Tian et al., 2018). An overview of SMC’s application in privacy-preserving biometric systems, focusing on secure face identification and secure distance computation for fingerprints and iris is presented in Bringer et al. (2013). ZKP protocols, on the other hand, enable a user to prove certain knowledge to the verifier without revealing any additional information and can also be employed in privacy-preserving biometric systems (Bhargav-Spantzel et al., 2010), (Gunasinghe & Bertino, 2017).

Finally, distributed ledger technologies (e.g., private blockchain technologies) possess specific features that can address several challenges in privacy-preserving biometrics. Their distributed nature helps overcome single points of failure, eliminate the need for third parties and mitigate potential privacy breaches. They also facilitate monitoring and access to trustworthy and unmodifiable history logs (Rouhani & Deters, 2019; Sarier, 2018; Zhang et al., 2019; Tran et al., 2021). A biometric recognition architecture, which utilizes a private blockchain for feature extraction and performs decentralized matching is presented in Goel et al. (2019). Recent blockchain-based works in the literature for privacy preservation of biometrics include a protocol for decentralized storage of biometric credentials using decentralized identifiers and W3C Verifiable Claims (Othman & Callahan, 2018), as well as methods for protecting fingerprint templates using blockchain technology, which involves extracting fingerprint features, encrypting them with a AES block cipher, and uploading them to a symmetric distributed storage system (Acquah et al., 2020).

Research gaps, open issues and opportunities

Despite significant progress in identity management for distance learning, the current scientific and technological landscape still allows for substantial future work in deploying systems in practical application scenarios for HEIs. In the remainder of this section, we present a number of specific issues where we believe further improvements can be made to enhance the state of the art.

Clear need for adoption of continuous user identification technology for HEIs

Continuous user identification is a critical technology for HEIs as it aims to verify the identity of the end-users in real-time (after successfully authenticating), while they are performing tasks. This technology plays a crucial role in ensuring that HEIs can provide credible, trustworthy, and accurate degrees to their students, thereby sustaining their credibility in society. The COVID-19 pandemic has highlighted the importance of promoting best practices and learning from both positive and negative experiences during this period of intensified distance learning.

HEIs should prioritize the deployment of secure, trustworthy, and credible continuous student identity management solutions that can adapt to various online educational models (synchronous vs. asynchronous) and teaching approaches (structured vs. unstructured), taking into account the diverse needs and preferences of students and teachers. Relevant opportunities include the integration of such technologies in LMSs and addressing the limitations of existing proctoring systems (see "Intelligent online proctoring systems" section).

Lack of solutions that combine multiple inputs under an agile system integration model

Several literature works propose solutions based on the analysis of users’ interaction behavior analysis on both desktop computers and smartphones (Buschek et al., 2015; Gascon et al., 2014), physiological data analysis including body signals (heart rate, skin conductance, etc.) Ometov et al. (2018); Rui and Yan (2018), electrocardiographic data (Silva et al., 2011), face biometrics (Dabbah et al., 2007), and eye gaze analytics (Jain et al., 2004). Various organizations and companies, such as Acceptto and Veridiumid, are adopting continuous user identification. However, these are far from fully practical solutions as they do not combine multiple sources of input (e.g., face, voice, interaction behavior) within an agile system integration model. Instead, existing solutions are mostly dedicated and favor a certain user feature within certain interaction systems. So far, to the best of our knowledge, there is a lack of frameworks that address continuous student identity management for HEIs, combining face, voice, and interaction behavior of users for continuous identification (see "Technologies for continuous user identification" section). Such frameworks would build a more credible and comprehensive user model while enabling continuous user identification under a unified, agile system integration model, bootstrapped on synchronous and asynchronous online teaching and learning activities.

Break away from traditional user authentication, which compromises student’s continuous identification

The current state-of-the-art online education LMS of HEIs currently compromises students’ continuous identification by relying on traditional user authentication methods (e.g., passwords). Innovative and credible identity management methods for continuously identifying students during online learning activities are needed in order to adopt more secure institutional and instructional strategies for continuous student identification management and develop new competencies in novel methods. In practical terms, this requires reliable and secure solutions for processing and storing biometric data with high levels of security and privacy (see "Data privacy-preservation issues" section). Opportunities include detecting fraudulent student activities after the single entry-point of authentication has been performed and removing the need for instructors to manually confirm individual students’ identification, a practice that is unfortunately still common in critical online learning activities nowadays.

Trade-off between local-based and online-based identification systems

It is clear from the literature that intelligent biometrics, based on a combined analysis of face, voice, and interaction behavior data analytics, will advance the state-of-the-art for seamlessly identifying users. Popular methods, particularly those based on machine learning, and specifically deep learning, are consistently improving the accuracy of user detection and recognition every day. However, these methods are also data-hungry and computationally expensive. In practical terms, the deployment of continuous user identification systems needs to be grounded on a system architecture that is carefully designed and which does not compromise usability. Offline recognition solutions require the availability of computational resources on the end-users workstation, while improving privacy and keeping infrastructure costs low. On the other hand, online verification solutions, such as cloud-based architectures, allow continuous identification of users on virtually any terminal, e.g., smartphones, but raise concerns about the transmission of sensitive data over the network, increased costs, and scalability issues. Research opportunities exist in studying and testing hybrid architectures that aim to exploit the advantages of both online and offline identification systems.

Data protection barriers for adoption

Collecting biometric data from students raises various data protection concerns. For instance, in the European Union, HEIs need to comply with the General Data Protection Regulation (GDPR) when implementing continuous user identification systems. Impact assessment of privacy risks must be carried out to safeguard sensitive data, and obtaining individual student consent is currently a minimal prerequisite to make such systems a reality. However, regulations become a barrier when no consent is granted. In addition, other sensitive scenarios must be accounted for, such as the case of an online examination, where the student should not be denied access due to the absence or a malfunction of a web camera. Alternatives must be provided within the existing regulations. The need to legislate more clearly the use of biometrics in HEIs is a clear opportunity, so that guidelines from national data protection commissions may allow that in some situations biometrics could be used without the student’s consent. Encouragingly, promising preliminary groundwork is being done, such as exceptions in biometric registration control for HEIs while strictly adhering to Article 9 of the GDPR.

Conclusion

In this work, we have addressed continuous user identification from a technological perspective, focusing on the unique requirements of distance learning. To achieve this, we have provided an overview of the prevailing intelligent online proctoring systems and automated identification methods based on image, voice and interaction analysis. Furthermore, relevant points, such as the use of biometrics in higher education and data privacy-preservation issues have been highlighted in order to elicit research gaps, open issues and prospects for the advancement of the continuous student identification systems of the future.

We strongly believe that this study paves the way for the design of an innovative framework for student identity management. This study will utilize privacy-preserving techniques for face-, voice- and interaction-based continuous user identification. The ultimate goal is to deploy this framework in HEIs using a unified, agile system integration model. Future endeavors include the development of the mentioned system following a User-Centered Design (UCD) methodology grounded on case studies validation in three distinct Universities in Europe. Goals include the demonstration of the technology through dissemination activities, and designing guidelines to aid the institutional, personal and technological transition towards more sophisticated online student management solutions, making the system accessible via an open-source software toolkit.

We anticipate that our work can contribute to fostering trust in HEIs that pursue an online academic strategy. In fact, one of the most important missions and strategic objectives of HEIs is to verify that each single graduate has gone through a credible academic process (e.g., laboratories, examinations, class attendance, etc.) and has therefore acquired the necessary knowledge and competence in order to provide their services to society. Therefore, we argue that a continuous user identification system fills an important gap in the current working dynamics between HEIs, their students and society.

The expected impact of a continuous user identification solution is that HEIs will enhance their digital readiness by offering inclusive, trustworthy, and credible online education activities through the provision of innovative and open-source solutions for continuous student identification and presence awareness. The open-source policy allows HEIs to customize the solutions according to their specific requirements and needs, thereby increasing the sustainability of the results and the tools produced.

As an indirect consequence, this allows HEIs to conduct a self-assessment of their current institutional strategy for online student identification and engage in self-reflection of current practices to identify areas for improvement and adapt them to their needs and requirements.

Within online learning contexts, nearly every student owns several password-protected accounts. Clearly, the identity management market represents one of the largest Information Technology (IT) markets to day, and future plans in this area present promising opportunities for exploitation.